Sbatch nodes / cores allocation

Pf Busson

2014-06-25 08:35:38 UTC

Hello,

After having read a whole bunch of web pages about nodes and cores
allocation in Slurm, I still don't get the logic of how it is done, and/or
what is wrong with my script.

We have 5 nodes, with 64 cores on each. The test script I try to run is
very simple : print in a text file the identifier of the current job. Here
is the script :

#########
#!/bin/bash
#SBATCH --job-name=test_sbatch
#SBATCH --output=res_test.stdout
#SBATCH --ntasks=5
#SBATCH --ntasks-per-node=5
#SBATCH --array=1-5
#SBATCH -p shortq

sleep 10
echo $SLURM_ARRAY_TASK_ID > test$SLURM_ARRAY_TASK_ID.txt
#########

This version of the script runs perfectly well (5 cores used on one single
node), but here is what happens if I change the parameters values (--array is
always set to be equal to --ntasks) :
--ntasks --ntasks-per-node #_of_nodes_used #_of_simultaneous_tasks_on_1_node
5 5 1 5
10 10 2 6/4
20 20 5 3
64 64 5 1
70 64 2 1

Do you know why my tasks are spread on several nodes (even if I add the
--nodes=1 parameter) instead of using all the resources of one single node
before using the next one ?

One other thing I can't understand is the squeue output... In the --ntasks=20
--ntasks-per-node=20 case, the output looks like this :
NODES NODELIST
1 node001
1 node001
1 node002
...

Whereas in the --ntasks=20 --ntasks-per-node=20 case, it looks like :
NODES NODELIST
2 node[001-002]
2 node[003-004]

Could you explain this difference ?

Thanks in advance,

Pierre-François