Discussion:
core allocation clarity
Bill Wichser
2014-07-11 15:14:34 UTC
Permalink
This morning, one of our users questioned another's allocation, mainly
asking for a how-to in order to do the same thing. The request looks
like this:

#SBATCH --ntasks=256
#SBATCH --ntasks-per-socket=16


Now we have 16 core nodes with dual socket, 8 core CPUs in each, so this
raised an eyebrow. The actual allocation is all over the place and I
offer a few lines from scontrol show job;

Socks/Node=* NtasksPerN:B:S:C=0:0:16:* CoreSpec=0
Nodes=tiger-r1c1n16 CPU_IDs=0-7 Mem=24000
Nodes=tiger-r1c2n11 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c3n1 CPU_IDs=15 Mem=3000
Nodes=tiger-r1c3n2 CPU_IDs=12 Mem=3000
Nodes=tiger-r1c3n[6,10] CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c4n2 CPU_IDs=4,15 Mem=6000
Nodes=tiger-r1c4n3 CPU_IDs=13-14 Mem=6000
Nodes=tiger-r2c1n2 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r2c1n3 CPU_IDs=3 Mem=3000

and on and on and on, using a total of 43 different nodes.

Off to the man pages. What I find is that --ntasks-per-socket specifies
the maximum number of cores per socket. Okay this is interesting and
now I understand why this worked.

But this isn't my question.

We tell users to allocate using
#SBATCH -N 4
#SBATCH --ntasks-per-node=16

and this gets exactly that -- 64 cores. Why? When I look at the man
page for --ntasks-per-node I also find this to be a maximum value.

So I'm not sure why this works correctly (thankfully) and the other
--ntasks-per-socket is using this as a maximum value. Off to the source
code and in there I find that when -N is set, then there is a MAX() call
which actually takes this value as absolute and allocates the correct
values.

I have no clue how to get this written correctly in the documentation
but the current description of --ntasks-per-node doesn't spell this out
very clearly at all to me.

Bill
j***@public.gmane.org
2014-07-11 18:09:34 UTC
Permalink
The biggest problem is there are a multitude of job options to specify
the layout of tasks and they can conflict. For example
-N 4 --ntasks-per-node=16
will always give you 16 tasks per node (or an error if your nodes
don't have 16 CPUs per node), but
-N 4 --ntasks-per-node=16 -n 4
will give you 1 task per node (a total of 4 tasks) due to the
conflicting task count specification.

For what it's worth, there used to be both a minimum and maximum task
count per node, socket and core, but the resulting code was so complex
as to be virtually impossible to support.

I'll take a look at the documentation and see if it can be made more clear.

Moe
Post by Bill Wichser
This morning, one of our users questioned another's allocation,
mainly asking for a how-to in order to do the same thing. The
#SBATCH --ntasks=256
#SBATCH --ntasks-per-socket=16
Now we have 16 core nodes with dual socket, 8 core CPUs in each, so
this raised an eyebrow. The actual allocation is all over the place
and I offer a few lines from scontrol show job;
Socks/Node=* NtasksPerN:B:S:C=0:0:16:* CoreSpec=0
Nodes=tiger-r1c1n16 CPU_IDs=0-7 Mem=24000
Nodes=tiger-r1c2n11 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c3n1 CPU_IDs=15 Mem=3000
Nodes=tiger-r1c3n2 CPU_IDs=12 Mem=3000
Nodes=tiger-r1c3n[6,10] CPU_IDs=8-15 Mem=24000
Nodes=tiger-r1c4n2 CPU_IDs=4,15 Mem=6000
Nodes=tiger-r1c4n3 CPU_IDs=13-14 Mem=6000
Nodes=tiger-r2c1n2 CPU_IDs=8-15 Mem=24000
Nodes=tiger-r2c1n3 CPU_IDs=3 Mem=3000
and on and on and on, using a total of 43 different nodes.
Off to the man pages. What I find is that --ntasks-per-socket
specifies the maximum number of cores per socket. Okay this is
interesting and now I understand why this worked.
But this isn't my question.
We tell users to allocate using
#SBATCH -N 4
#SBATCH --ntasks-per-node=16
and this gets exactly that -- 64 cores. Why? When I look at the
man page for --ntasks-per-node I also find this to be a maximum value.
So I'm not sure why this works correctly (thankfully) and the other
--ntasks-per-socket is using this as a maximum value. Off to the
source code and in there I find that when -N is set, then there is a
MAX() call which actually takes this value as absolute and allocates
the correct values.
I have no clue how to get this written correctly in the
documentation but the current description of --ntasks-per-node
doesn't spell this out very clearly at all to me.
Bill
Continue reading on narkive:
Loading...