Discussion:
All cores being allocated / -n ignored
Gordon Wells
2014-07-30 06:53:35 UTC
Permalink
Hi

I get all CPUs from a node allocated to a job, even when I request less.
This is on a relatively new slurm setup, but using basically the same
configuration as an older setup which worked correctly

My slurm.conf looks as follows:
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0

GresTypes=gpu

SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP

and in the batch file:
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley

Can anyone explain what I'm doing in this setup?




-- max(∫(εὐδαιμονία)dt)
Ryan Cox
2014-07-30 14:18:34 UTC
Permalink
Shared=No is the default for a partition (slurm.conf manpage). That
might have something to do with it.

Ryan
All cores being allocated / -n ignored
Hi
I get all CPUs from a node allocated to a job, even when I request
less. This is on a relatively new slurm setup, but using basically the
same configuration as an older setup which worked correctly
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
GresTypes=gpu
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)
--
Ryan Cox
Operations Director
Fulton Supercomputing Lab
Brigham Young University
j***@public.gmane.org
2014-07-30 14:56:32 UTC
Permalink
This configuration will always allocate all CPUs on a node to jobs:
SelectType=select/linear
Post by Gordon Wells
Hi
I get all CPUs from a node allocated to a job, even when I request less.
This is on a relatively new slurm setup, but using basically the same
configuration as an older setup which worked correctly
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
GresTypes=gpu
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)
--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
Gordon Wells
2014-07-31 12:04:37 UTC
Permalink
Thanks, I missed that setting. Although I'm using
SelectType=select/cons_res now, but it's fixed the problem


-- max(∫(εὐδαιμονία)dt)
Post by Gordon Wells
SelectType=select/linear
Hi
Post by Gordon Wells
I get all CPUs from a node allocated to a job, even when I request less.
This is on a relatively new slurm setup, but using basically the same
configuration as an older setup which worked correctly
ClusterName=eslab
ControlMachine=riddley
SlurmUser=slurm
SlurmctldPort=6817
SlurmdPort=6818
AuthType=auth/munge
StateSaveLocation=/tmp
SlurmdSpoolDir=/tmp/slurmd
SwitchType=switch/none
MpiDefault=none
SlurmctldPidFile=/var/run/slurmctld.pid
SlurmdPidFile=/var/run/slurmd.pid
ProctrackType=proctrack/pgid
CacheGroups=0
ReturnToService=0
GresTypes=gpu
SlurmctldTimeout=300
SlurmdTimeout=300
InactiveLimit=0
MinJobAge=300
KillWait=30
Waittime=0
SchedulerType=sched/backfill
SelectType=select/linear
FastSchedule=1
SlurmctldDebug=3
SlurmdDebug=3
JobCompType=jobcomp/none
NodeName=riddley CPUs=4 Sockets=1 CoresPerSocket=4 ThreadsPerCore=1
State=UNKNOWN Gres=gpu:1
PartitionName=debug Nodes=riddley Default=YES MaxTime=INFINITE State=UP
#SBATCH -J NAG_int_tip3p_rep2
#SBATCH -o NAG_int_tip3p_rep2.out
#SBATCH -e NAG_int_tip3p_rep2.err
#SBATCH -n 2
#SBATCH -p debug
#SBATCH -D /home/gordon/cpgh89/autodock/NAG_DNAP
#SBATCH -w riddley
Can anyone explain what I'm doing in this setup?
-- max(∫(εὐδαιμονία)dt)
--
Morris "Moe" Jette
CTO, SchedMD LLC
Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
Continue reading on narkive:
Loading...