Gordon Wells
2014-07-30 06:53:35 UTC
Hi
I get all CPUs from a node allocated to a job, even when I request less.
This is on a relatively new slurm setup, but using basically the same
configuration as an older setup which worked correctly
My slurm.conf looks as follows:
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
GresTypes=gpu
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP
and in the batch file:
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)
I get all CPUs from a node allocated to a job, even when I request less.
This is on a relatively new slurm setup, but using basically the same
configuration as an older setup which worked correctly
My slurm.conf looks as follows:
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
GresTypes=gpu
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP
and in the batch file:
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)