Discussion:
SLURM fails to recognize job step limit
Jeroen Meijer
2014-07-09 07:46:24 UTC
Permalink
So we have had some issues configuring SLURM to cope with 25000 job steps.
Therefore we have configured the SLURM control daemon to accept only 600
job steps per job. So we split the job steps over multiple jobs. I have
verified that we only have max 600 job steps per job, but still SLURM
outputs:

srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job

How could this be happening?
j***@public.gmane.org
2014-07-09 14:58:32 UTC
Permalink
That error is from when the configured value of MaxStepCount is
reached. What problems are you seeing with larger step counts?

From "man slurm.conf":
MaxStepCount
The maximum number of steps that any job can initiate. This parameter
is intended to limit the effect of bad batch scripts. The default
value is 40000 steps.
Post by Jeroen Meijer
So we have had some issues configuring SLURM to cope with 25000 job steps.
Therefore we have configured the SLURM control daemon to accept only 600
job steps per job. So we split the job steps over multiple jobs. I have
verified that we only have max 600 job steps per job, but still SLURM
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
How could this be happening?
Loading...