Discussion:
job scheduling on a node with 40 procs
Satrajit Ghosh
2014-08-31 20:40:48 UTC
Permalink
hi,

our nodes have 20 cores with hyperthreading enabled resulting in 40 procs
per node. when scheduling 40 processes it immediately puts 20 of them in
suspended state and appears to engage time-slicing (which this partition is
setup for).

my primary questions are:

shouldn't the scheduler respect the fact that node has 40 processors and
run all of them? if not, is this the intended behavior? is the scheduler
then hyperthreading aware?

i can make each job submit two internal cpu_loads and then it will run all
20 and the process utilization will match the number of threads

cheers,

satra


Details:

submission:

for i in `seq 1 40`; do sbatch -N1 -c1 --nodelist node020 cpu_load.sh; done

or

sbatch --array=1-40 -N1 -c1 --nodelist node020 cpu_load.sh

cpu_load: cat /dev/zero > /dev/null


$ sinfo -lNe
Sun Aug 31 16:27:11 2014
NODELIST NODES PARTITION STATE CPUS
S:C:T MEMORY TMP_DISK WEIGHT FEATURES REASON
node020 1 om_all_nodes* allocated 40
2:10:2 258424 1923 1 (null) none

$ squeue
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
134934 om_all_no cpu_load satra R 0:10 1 node020
134935 om_all_no cpu_load satra R 0:10 1 node020
134936 om_all_no cpu_load satra R 0:10 1 node020
134937 om_all_no cpu_load satra R 0:10 1 node020
134938 om_all_no cpu_load satra R 0:10 1 node020
134939 om_all_no cpu_load satra R 0:10 1 node020
134920 om_all_no cpu_load satra R 0:10 1 node020
134921 om_all_no cpu_load satra R 0:10 1 node020
134922 om_all_no cpu_load satra R 0:10 1 node020
134923 om_all_no cpu_load satra R 0:10 1 node020
134924 om_all_no cpu_load satra R 0:10 1 node020
134925 om_all_no cpu_load satra R 0:10 1 node020
134926 om_all_no cpu_load satra R 0:10 1 node020
134927 om_all_no cpu_load satra R 0:10 1 node020
134928 om_all_no cpu_load satra R 0:10 1 node020
134929 om_all_no cpu_load satra R 0:10 1 node020
134930 om_all_no cpu_load satra R 0:10 1 node020
134931 om_all_no cpu_load satra R 0:10 1 node020
134932 om_all_no cpu_load satra R 0:10 1 node020
134933 om_all_no cpu_load satra R 0:10 1 node020
134900 om_all_no cpu_load satra S 0:56 1 node020
134901 om_all_no cpu_load satra S 0:56 1 node020
134902 om_all_no cpu_load satra S 0:56 1 node020
134903 om_all_no cpu_load satra S 0:56 1 node020
134904 om_all_no cpu_load satra S 0:56 1 node020
134905 om_all_no cpu_load satra S 0:56 1 node020
134906 om_all_no cpu_load satra S 0:56 1 node020
134907 om_all_no cpu_load satra S 0:56 1 node020
134908 om_all_no cpu_load satra S 0:56 1 node020
134909 om_all_no cpu_load satra S 0:56 1 node020
134910 om_all_no cpu_load satra S 0:56 1 node020
134911 om_all_no cpu_load satra S 0:56 1 node020
134912 om_all_no cpu_load satra S 0:56 1 node020
134913 om_all_no cpu_load satra S 0:56 1 node020
134914 om_all_no cpu_load satra S 0:56 1 node020
134915 om_all_no cpu_load satra S 0:56 1 node020
134916 om_all_no cpu_load satra S 0:56 1 node020
134917 om_all_no cpu_load satra S 0:56 1 node020
134918 om_all_no cpu_load satra S 0:56 1 node020
134919 om_all_no cpu_load satra S 0:56 1 node020

Loading...