Discussion:
Slurm script not writing to stdout in computing node
Manuel Rodríguez Pascual
2014-10-20 11:39:55 UTC
Permalink
Good morning all,

This is my first time as a Slurm sysadmin, so please excuse me if the
problem has a trivial solution. I have configured a small cluster based on
virtual machines (CentOS 7 minimal, NFS, no iptables). Everything is
working OK, but I keep having an error that I'm not able to solve.

The problem is that Slurm scripts do not write into stdout.

If I, for example, execute the following script:

---
---
$more myScript.sh
#!/bin/sh
set -x
echo "execute hostname"
hostname
echo "now create file with hostname as content"
hostname > /home/slurm/host

$sbatch myScript.sh

$more slurm-49.out
+ echo 'execute hostname'
execute hostname
+ hostname
+ echo 'now create file with hsotname as content'
now create file with hsotname as content
+ hostname
$more /home/slurm/host
(created but empty)
---
---

If I manually execute the script, either in the master or the computing
node, the result is succesful and /home/slurm/host contains the name of
the host.

I realize two things that don't make much sense
-apparently, "stdout" is broken and nothing is wirtten into it
-however, "print" commands work OK.

I have tried to sumbit with different sbatch options but all my attempts
have been unsuccesful.

Any hints on the problem?

Thanks for your help,

Manuel


PS: the output of scontrol show config
---
[***@slurm-master slurm]# scontrol show config
Configuration data as of 2014-10-20T12:36:47
AccountingStorageBackupHost = (null)
AccountingStorageEnforce = none
AccountingStorageHost = localhost
AccountingStorageLoc = /var/log/slurm_jobacct.log
AccountingStoragePort = 0
AccountingStorageType = accounting_storage/none
AccountingStorageUser = root
AccountingStoreJobComment = YES
AcctGatherEnergyType = acct_gather_energy/none
AcctGatherFilesystemType = acct_gather_filesystem/none
AcctGatherInfinibandType = acct_gather_infiniband/none
AcctGatherNodeFreq = 0 sec
AcctGatherProfileType = acct_gather_profile/none
AuthInfo = (null)
AuthType = auth/munge
BackupAddr = (null)
BackupController = (null)
BatchStartTimeout = 10 sec
BOOT_TIME = 2014-10-17T16:45:18
CacheGroups = 0
CheckpointType = checkpoint/none
ClusterName = cluster
CompleteWait = 0 sec
ControlAddr = slurm-master
ControlMachine = slurm-master
CoreSpecPlugin = core_spec/none
CryptoType = crypto/munge
DebugFlags = (null)
DefMemPerNode = UNLIMITED
DisableRootJobs = NO
DynAllocPort = 0
EnforcePartLimits = NO
Epilog = (null)
EpilogMsgTime = 2000 usec
EpilogSlurmctld = (null)
ExtSensorsType = ext_sensors/none
ExtSensorsFreq = 0 sec
FastSchedule = 1
FirstJobId = 1
GetEnvTimeout = 2 sec
GresTypes = (null)
GroupUpdateForce = 0
GroupUpdateTime = 600 sec
HASH_VAL = Match
HealthCheckInterval = 0 sec
HealthCheckNodeState = ANY
HealthCheckProgram = (null)
InactiveLimit = 0 sec
JobAcctGatherFrequency = 30
JobAcctGatherType = jobacct_gather/none
JobAcctGatherParams = (null)
JobCheckpointDir = /var/slurm/checkpoint
JobCompHost = localhost
JobCompLoc = /var/log/slurm_jobcomp.log
JobCompPort = 0
JobCompType = jobcomp/none
JobCompUser = root
JobContainerType = job_container/none
JobCredentialPrivateKey = (null)
JobCredentialPublicCertificate = (null)
JobFileAppend = 0
JobRequeue = 1
JobSubmitPlugins = (null)
KeepAliveTime = SYSTEM_DEFAULT
KillOnBadExit = 0
KillWait = 30 sec
LaunchType = launch/slurm
Licenses = (null)
LicensesUsed = (null)
MailProg = /bin/mail
MaxArraySize = 1001
MaxJobCount = 10000
MaxJobId = 4294901760
MaxMemPerNode = UNLIMITED
MaxStepCount = 40000
MaxTasksPerNode = 128
MessageTimeout = 10 sec
MinJobAge = 300 sec
MpiDefault = none
MpiParams = (null)
NEXT_JOB_ID = 50
OverTimeLimit = 0 min
PluginDir = /usr/local/lib/slurm
PlugStackConfig = /usr/local/etc/plugstack.conf
PreemptMode = OFF
PreemptType = preempt/none
PriorityType = priority/basic
PrivateData = none
ProctrackType = proctrack/pgid
Prolog = (null)
PrologSlurmctld = (null)
PrologFlags = (null)
PropagatePrioProcess = 0
PropagateResourceLimits = ALL
PropagateResourceLimitsExcept = (null)
RebootProgram = (null)
ReconfigFlags = (null)
ResumeProgram = (null)
ResumeRate = 300 nodes/min
ResumeTimeout = 60 sec
ResvEpilog = (null)
ResvOverRun = 0 min
ResvProlog = (null)
ReturnToService = 1
SallocDefaultCommand = (null)
SchedulerParameters = (null)
SchedulerPort = 7321
SchedulerRootFilter = 1
SchedulerTimeSlice = 30 sec
SchedulerType = sched/backfill
SelectType = select/linear
SlurmUser = slurm(500)
SlurmctldDebug = info
SlurmctldLogFile = (null)
SlurmSchedLogFile = (null)
SlurmctldPort = 6817
SlurmctldTimeout = 120 sec
SlurmdDebug = info
SlurmdLogFile = (null)
SlurmdPidFile = /var/run/slurmd.pid
SlurmdPlugstack = (null)
SlurmdPort = 6818
SlurmdSpoolDir = /var/spool/slurmd
SlurmdTimeout = 300 sec
SlurmdUser = root(0)
SlurmSchedLogLevel = 0
SlurmctldPidFile = /var/run/slurmctld.pid
SlurmctldPlugstack = (null)
SLURM_CONF = /usr/local/etc/slurm.conf
SLURM_VERSION = 14.03.8
SrunEpilog = (null)
SrunProlog = (null)
StateSaveLocation = /var/spool/slurmState
SuspendExcNodes = (null)
SuspendExcParts = (null)
SuspendProgram = (null)
SuspendRate = 60 nodes/min
SuspendTime = NONE
SuspendTimeout = 30 sec
SwitchType = switch/none
TaskEpilog = (null)
TaskPlugin = task/none
TaskPluginParam = (null type)
TaskProlog = (null)
TmpFS = /tmp
TopologyPlugin = topology/none
TrackWCKey = 0
TreeWidth = 50
UsePam = 0
UnkillableStepProgram = (null)
UnkillableStepTimeout = 60 sec
VSizeFactor = 0 percent
WaitTime = 0 sec
---
---
--
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
Loading...