core allocation clarity

j***@public.gmane.org

2014-07-11 18:09:34 UTC

The biggest problem is there are a multitude of job options to specify
the layout of tasks and they can conflict. For example
-N 4 --ntasks-per-node=16
will always give you 16 tasks per node (or an error if your nodes
don't have 16 CPUs per node), but
-N 4 --ntasks-per-node=16 -n 4
will give you 1 task per node (a total of 4 tasks) due to the
conflicting task count specification.

For what it's worth, there used to be both a minimum and maximum task
count per node, socket and core, but the resulting code was so complex
as to be virtually impossible to support.

I'll take a look at the documentation and see if it can be made more clear.

Moe

Post by Bill Wichser
This morning, one of our users questioned another's allocation,
mainly asking for a how-to in order to do the same thing. The
#SBATCH --ntasks=256
#SBATCH --ntasks-per-socket=16
Now we have 16 core nodes with dual socket, 8 core CPUs in each, so
this raised an eyebrow. The actual allocation is all over the place
and I offer a few lines from scontrol show job;
Socks/Node=* NtasksPerN:B:S:C=0:0:16:* CoreSpec=0
Nodes=tiger-r1c1n16 CPU_IDs=0-7 Mem=24000
Nodes=tiger-r1c2n11 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c3n1 CPU_IDs=15 Mem=3000
Nodes=tiger-r1c3n2 CPU_IDs=12 Mem=3000
Nodes=tiger-r1c3n[6,10] CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c4n2 CPU_IDs=4,15 Mem=6000
Nodes=tiger-r1c4n3 CPU_IDs=13-14 Mem=6000
Nodes=tiger-r2c1n2 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r2c1n3 CPU_IDs=3 Mem=3000
and on and on and on, using a total of 43 different nodes.
Off to the man pages. What I find is that --ntasks-per-socket
specifies the maximum number of cores per socket. Okay this is
interesting and now I understand why this worked.
But this isn't my question.
We tell users to allocate using
#SBATCH -N 4
#SBATCH --ntasks-per-node=16
and this gets exactly that -- 64 cores. Why? When I look at the
man page for --ntasks-per-node I also find this to be a maximum value.
So I'm not sure why this works correctly (thankfully) and the other
--ntasks-per-socket is using this as a maximum value. Off to the
source code and in there I find that when -N is set, then there is a
MAX() call which actually takes this value as absolute and allocates
the correct values.
I have no clue how to get this written correctly in the
documentation but the current description of --ntasks-per-node
doesn't spell this out very clearly at all to me.
Bill