Jeroen Meijer
2014-07-09 07:46:24 UTC
So we have had some issues configuring SLURM to cope with 25000 job steps.
Therefore we have configured the SLURM control daemon to accept only 600
job steps per job. So we split the job steps over multiple jobs. I have
verified that we only have max 600 job steps per job, but still SLURM
outputs:
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
How could this be happening?
Therefore we have configured the SLURM control daemon to accept only 600
job steps per job. So we split the job steps over multiple jobs. I have
verified that we only have max 600 job steps per job, but still SLURM
outputs:
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
srun: error: Unable to create job step: Step limit reached for this job
How could this be happening?