Paul Mezzanini
2012-03-23 16:04:02 UTC
Are there known issues with large dependency lists? We have a user who is
doing a fairly large number of generational jobs. The basic view of her
workflow is she spawns N number of workers that need to complete before
the next generation of N workers can start.
Her current set is 16 workers and I have no idea how many generations.
She can submit up to around generation 7 before things really go south.
We start to see the effects around generation 4 (submits slow down
slightly). The moment generation 7 begins submitting the speed drops
significantly. Slurmctld's cpu usage goes to 100% and I begin to get
warning messages about processing time in the slurmctld logs (slurmctld:
Warning: Note very large processing time from _slurm_rpc_submit_batch_job:
usec=2735283). Turning the verbosity up yielded no obvious issues.
Eventually sbatch fails with timeouts and that kills the rest of the
submits.
As a test we slowed her submit script down with a few sleep calls to see
if we were overwhelming slurmctld. The same slowdown occurred at
generation 7.
I have created a very simplified version of her submit scripts for
testing. It shows the same issues.
Important info:
slurm 2.3.1.
Controller is a KVM VM with 2 processors (AMD 2.8ghz) and 14G ram
No memory/disk limits appear to be the issue.
Generation G's jobs only have G-1's jobs listed as a dependencies.
My submit scripts for testing:
####BEGIN CONSOLE DUMP####
[***@tropos submitotron []]# cat submit-many-jobs.sh
#!/bin/bash
# Just a constant variable used throughout the script to name our jobs
# in a meaningful way.
BASEJOBNAME="dep"
# Another constant variable used to name the slurm submission file that
# this script is going to submit to slurm.
JOBFILE="slurm-payload.sh"
#Generations requested
NUMBEROFGENERATIONS=16
#Workers per generation
NUMBEROFWORKERS=16
#The first generation has no dependency so it has its own loop.
#
#We capture the job number slurm spits out and put it into an array with
the index being the generation.
#Future jobs can then reference $GENERATION - 1 to set dependency.
for GENERATION in $(seq 1 ${NUMBEROFGENERATIONS}) ; do
if [ ${GENERATION} -eq 1 ] ; then
for WORKER in $(seq 1 ${NUMBEROFWORKERS}) ; do
echo GENERATION/WORKER: ${GENERATION}/${WORKER}
WORKERLIST[${GENERATION}]=$(sbatch --qos=rc-normal -o /dev/null -J
${BASEJOBNAME}-${GENERATION}-${WORKER} ${JOBFILE} | awk ' { print $4
}'):${WORKERLIST[${GENERATION}]}
done
else
for WORKER in $(seq 1 ${NUMBEROFWORKERS}) ; do
echo GENERATION/WORKER: ${GENERATION}/${WORKER}
WORKERLIST[${GENERATION}]=$(sbatch --qos=rc-normal -o /dev/null
--dependency=afterok:${WORKERLIST[$(expr ${GENERATION} - 1)]%\:} -J
${BASEJOBNAME}-${GENERATION}-${WORKER} ${JOBFILE} | awk ' { print $4
}'):${WORKERLIST[${GENERATION}]}
done
fi
done
[***@tropos submitotron []]# cat slurm-payload.sh
#!/bin/bash -l
# NOTE the -l flag!
#
# Where to send mail...
#SBATCH --mail-user pfmeec-***@public.gmane.org
# notify on state change: BEGIN, END, FAIL or ALL
#SBATCH --mail-type=FAIL
# Request run time MAX H:M:S , anything over will be KILLED
#SBATCH -t 0:1:30
#vaild partions are "work" and "debug"
#SBATCH -p work -n 1
# Job memory requirements in MB
#SBATCH --mem=30
#Just a quick sleep.
sleep 60
[***@tropos submitotron []]#
####END CONSOLE DUMP####
Wow, that totally killed my indentation. Github version:
https://github.com/paulmezz/SlurmThings
I know there are ways I could clean up the loops but for this test I just
don't care :)
Any ideas? (and thanks!)
-paul
doing a fairly large number of generational jobs. The basic view of her
workflow is she spawns N number of workers that need to complete before
the next generation of N workers can start.
Her current set is 16 workers and I have no idea how many generations.
She can submit up to around generation 7 before things really go south.
We start to see the effects around generation 4 (submits slow down
slightly). The moment generation 7 begins submitting the speed drops
significantly. Slurmctld's cpu usage goes to 100% and I begin to get
warning messages about processing time in the slurmctld logs (slurmctld:
Warning: Note very large processing time from _slurm_rpc_submit_batch_job:
usec=2735283). Turning the verbosity up yielded no obvious issues.
Eventually sbatch fails with timeouts and that kills the rest of the
submits.
As a test we slowed her submit script down with a few sleep calls to see
if we were overwhelming slurmctld. The same slowdown occurred at
generation 7.
I have created a very simplified version of her submit scripts for
testing. It shows the same issues.
Important info:
slurm 2.3.1.
Controller is a KVM VM with 2 processors (AMD 2.8ghz) and 14G ram
No memory/disk limits appear to be the issue.
Generation G's jobs only have G-1's jobs listed as a dependencies.
My submit scripts for testing:
####BEGIN CONSOLE DUMP####
[***@tropos submitotron []]# cat submit-many-jobs.sh
#!/bin/bash
# Just a constant variable used throughout the script to name our jobs
# in a meaningful way.
BASEJOBNAME="dep"
# Another constant variable used to name the slurm submission file that
# this script is going to submit to slurm.
JOBFILE="slurm-payload.sh"
#Generations requested
NUMBEROFGENERATIONS=16
#Workers per generation
NUMBEROFWORKERS=16
#The first generation has no dependency so it has its own loop.
#
#We capture the job number slurm spits out and put it into an array with
the index being the generation.
#Future jobs can then reference $GENERATION - 1 to set dependency.
for GENERATION in $(seq 1 ${NUMBEROFGENERATIONS}) ; do
if [ ${GENERATION} -eq 1 ] ; then
for WORKER in $(seq 1 ${NUMBEROFWORKERS}) ; do
echo GENERATION/WORKER: ${GENERATION}/${WORKER}
WORKERLIST[${GENERATION}]=$(sbatch --qos=rc-normal -o /dev/null -J
${BASEJOBNAME}-${GENERATION}-${WORKER} ${JOBFILE} | awk ' { print $4
}'):${WORKERLIST[${GENERATION}]}
done
else
for WORKER in $(seq 1 ${NUMBEROFWORKERS}) ; do
echo GENERATION/WORKER: ${GENERATION}/${WORKER}
WORKERLIST[${GENERATION}]=$(sbatch --qos=rc-normal -o /dev/null
--dependency=afterok:${WORKERLIST[$(expr ${GENERATION} - 1)]%\:} -J
${BASEJOBNAME}-${GENERATION}-${WORKER} ${JOBFILE} | awk ' { print $4
}'):${WORKERLIST[${GENERATION}]}
done
fi
done
[***@tropos submitotron []]# cat slurm-payload.sh
#!/bin/bash -l
# NOTE the -l flag!
#
# Where to send mail...
#SBATCH --mail-user pfmeec-***@public.gmane.org
# notify on state change: BEGIN, END, FAIL or ALL
#SBATCH --mail-type=FAIL
# Request run time MAX H:M:S , anything over will be KILLED
#SBATCH -t 0:1:30
#vaild partions are "work" and "debug"
#SBATCH -p work -n 1
# Job memory requirements in MB
#SBATCH --mem=30
#Just a quick sleep.
sleep 60
[***@tropos submitotron []]#
####END CONSOLE DUMP####
Wow, that totally killed my indentation. Github version:
https://github.com/paulmezz/SlurmThings
I know there are ways I could clean up the loops but for this test I just
don't care :)
Any ideas? (and thanks!)
-paul
--
Paul.Mezzanini-***@public.gmane.org
Sr Systems Administrator/Engineer
Research Computing at RIT
585.475.3245
Paul.Mezzanini-***@public.gmane.org
Sr Systems Administrator/Engineer
Research Computing at RIT
585.475.3245