sbatch option to constrain one task per core

Discussion:

Edrisse Chermak

2014-10-15 12:26:36 UTC

Dear Slurm Developers and Users,

I would like to constrain an 8 cpu job to run in one socket of 16 cpu,
with one task per core.
Unfortunately, when using the script :
---
sbatch -J $JOB -N 1 -B '1:8:1' --ntasks-per-socket=8 --ntasks-per-core=1
<< eof
...
mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out
...
eof
---
top command on compute node shows 2 tasks running on the same core :
---
$ top
11838 11846 51 edrisse 20 0 12.3g 9452 95m R 46.7 0.0 0:01.43
nwchem_64to32
11838 11845 59 edrisse 20 0 12.3g 9600 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11844 47 edrisse 20 0 12.3g 9592 95m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11843 43 edrisse 20 0 12.3g 9844 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11842 3 edrisse 20 0 12.3g 9.8m 96m R 46.4 0.0 0:01.43
nwchem_64to32
11838 11841 35 edrisse 20 0 12.3g 9.8m 92m R 45.7 0.0 0:01.41
nwchem_64to32
11838 11840 39 edrisse 20 0 12.3g 10m 96m R 46.1 0.0 0:01.42
nwchem_64to32
11838 11839 55 edrisse 20 0 12.3g 10m 109m R 46.4 0.0 0:01.42
nwchem_64to32
---
Unfortunately, cpu 55 and cpu 51 own to the same core in our node's
architecture: (see NUMA node7)
---
$ lscpu
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
...
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
...
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
---
I perhaps missed something, if you could guide me to the right option it
would be great.
I also attached my slurm.conf file.

Best Regards,
Edrisse

--
Edrisse Chermak
Post-Doctoral Fellow
Catalysis center - KAUST, Thuwal, Saudi Arabia
kcc.kaust.edu.sa

________________________________

This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

Edrisse Chermak

2014-10-15 12:31:52 UTC

Permalink

My mistake, I forgot some important lscpu NUMA output :

NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
NUMA node1 CPU(s): 32,36,40,44,48,52,56,60
NUMA node2 CPU(s): 1,5,9,13,17,21,25,29
NUMA node3 CPU(s): 33,37,41,45,49,53,57,61
NUMA node4 CPU(s): 2,6,10,14,18,22,26,30
NUMA node5 CPU(s): 34,38,42,46,50,54,58,62
NUMA node6 CPU(s): 35,39,43,47,51,55,59,63
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31

Thanks in advance,
Edrisse

Post by Edrisse Chermak
Dear Slurm Developers and Users,
I would like to constrain an 8 cpu job to run in one socket of 16 cpu,
with one task per core.
---
sbatch -J $JOB -N 1 -B '1:8:1' --ntasks-per-socket=8
--ntasks-per-core=1 << eof
...
mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out
...
eof
---
---
$ top
11838 11846 51 edrisse 20 0 12.3g 9452 95m R 46.7 0.0 0:01.43
nwchem_64to32
11838 11845 59 edrisse 20 0 12.3g 9600 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11844 47 edrisse 20 0 12.3g 9592 95m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11843 43 edrisse 20 0 12.3g 9844 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11842 3 edrisse 20 0 12.3g 9.8m 96m R 46.4 0.0 0:01.43
nwchem_64to32
11838 11841 35 edrisse 20 0 12.3g 9.8m 92m R 45.7 0.0 0:01.41
nwchem_64to32
11838 11840 39 edrisse 20 0 12.3g 10m 96m R 46.1 0.0 0:01.42
nwchem_64to32
11838 11839 55 edrisse 20 0 12.3g 10m 109m R 46.4 0.0 0:01.42
nwchem_64to32
---
Unfortunately, cpu 55 and cpu 51 own to the same core in our node's
architecture: (see NUMA node7)
---
$ lscpu
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
...
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
...
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
---
I perhaps missed something, if you could guide me to the right option
it would be great.
I also attached my slurm.conf file.
Best Regards,
Edrisse

________________________________

This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please consider the environment before printing this email.

Danny Auble

2014-10-15 12:46:52 UTC

Permalink

What happens if you use srun instead of mpirun?

Post by Edrisse Chermak
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
NUMA node1 CPU(s): 32,36,40,44,48,52,56,60
NUMA node2 CPU(s): 1,5,9,13,17,21,25,29
NUMA node3 CPU(s): 33,37,41,45,49,53,57,61
NUMA node4 CPU(s): 2,6,10,14,18,22,26,30
NUMA node5 CPU(s): 34,38,42,46,50,54,58,62
NUMA node6 CPU(s): 35,39,43,47,51,55,59,63
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
Thanks in advance,
Edrisse

Post by Edrisse Chermak
Dear Slurm Developers and Users,
I would like to constrain an 8 cpu job to run in one socket of 16

cpu,

Post by Edrisse Chermak
with one task per core.
---
sbatch -J $JOB -N 1 -B '1:8:1' --ntasks-per-socket=8
--ntasks-per-core=1 << eof
...
mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out
...
eof
---
---
$ top
11838 11846 51 edrisse 20 0 12.3g 9452 95m R 46.7 0.0 0:01.43
nwchem_64to32
11838 11845 59 edrisse 20 0 12.3g 9600 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11844 47 edrisse 20 0 12.3g 9592 95m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11843 43 edrisse 20 0 12.3g 9844 96m R 46.4 0.0 0:01.42
nwchem_64to32
11838 11842 3 edrisse 20 0 12.3g 9.8m 96m R 46.4 0.0 0:01.43
nwchem_64to32
11838 11841 35 edrisse 20 0 12.3g 9.8m 92m R 45.7 0.0 0:01.41
nwchem_64to32
11838 11840 39 edrisse 20 0 12.3g 10m 96m R 46.1 0.0 0:01.42
nwchem_64to32
11838 11839 55 edrisse 20 0 12.3g 10m 109m R 46.4 0.0 0:01.42
nwchem_64to32
---
Unfortunately, cpu 55 and cpu 51 own to the same core in our node's
architecture: (see NUMA node7)
---
$ lscpu
CPU(s): 64
On-line CPU(s) list: 0-63
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 4
NUMA node(s): 8
...
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
...
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
---
I perhaps missed something, if you could guide me to the right option
it would be great.
I also attached my slurm.conf file.
Best Regards,
Edrisse

________________________________
This message and its contents including attachments are intended solely
for the original recipient. If you are not the intended recipient or
have received this message in error, please notify me immediately and
delete this message from your computer system. Any unauthorized use or
distribution is prohibited. Please consider the environment before
printing this email.

Edrisse Chermak

2014-10-16 04:39:47 UTC

Permalink

Hi Danny,
Thanks for your kind advice,
I'll try to get this done using the srun command and test its options,
Best Regards,
Edrisse

Post by Danny Auble
What happens if you use srun instead of mpirun?
On October 15, 2014 5:31:42 AM PDT, Edrisse Chermak
NUMA node0 CPU(s): 0,4,8,12,16,20,24,28
NUMA node1 CPU(s): 32,36,40,44,48,52,56,60
NUMA node2 CPU(s): 1,5,9,13,17,21,25,29
NUMA node3 CPU(s): 33,37,41,45,49,53,57,61
NUMA node4 CPU(s): 2,6,10,14,18,22,26,30
NUMA node5 CPU(s): 34,38,42,46,50,54,58,62
NUMA node6 CPU(s): 35,39,43,47,51,55,59,63
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31
Thanks in advance,
Edrisse
Dear Slurm Developers and Users, I would like to constrain an
8 cpu job to run in one socket of 16 cpu, with one task per
core. Unfortunately, when using the script : --- sbatch -J
$JOB -N 1 -B '1:8:1' --ntasks-per-socket=8 --ntasks-per-core=1
<< eof ... mpirun -np 8 nwchem_64to32 $JOB.nwc >& $JOB.out ...
eof --- top command on compute node shows 2 tasks running on
the same core : --- $ top 11838 11846 51 edrisse 20 0 12.3g
9452 95m R 46.7 0.0 0:01.43 nwchem_64to32 11838 11845 59
edrisse 20 0 12.3g 9600 96m R 46.4 0.0 0:01.42 nwchem_64to32
11838 11844 47 edrisse 20 0 12.3g 9592 95m R 46.4 0.0 0:01.42
nwchem_64to32 11838 11843 43 edrisse 20 0 12.3g 9844 96m R
46.4 0.0 0:01.42 nwchem_64to32 11838 11842 3 edrisse 20 0
12.3g 9.8m 96m R 46.4 0.0 0:01.43 nwchem_64to32 11838 11841 35
edrisse 20 0 12.3g 9.8m 92m R 45.7 0.0 0:01.41 nwchem_64to32
11838 11840 39 edrisse 20 0 12.3g 10m 96m R 46.1 0.0 0:01.42
nwchem_64to32 11838 11839 55 edrisse 20 0 12.3g 10m 109m R
46.4 0.0 0:01.42 nwchem_64to32 --- Unfortunately, cpu 55 and
cpu 51 own to the same core in our node's architecture: (see
NUMA node7) --- $ lscpu CPU(s): 64 On-line CPU(s) list: 0-63
Thread(s) per core: 2 Core(s) per socket: 8 Socket(s): 4 NUMA
node(s): 8 ... NUMA node0 CPU(s): 0,4,8,12,16,20,24,28 ...
NUMA node7 CPU(s): 3,7,11,15,19,23,27,31 --- I perhaps missed
something, if you could guide me to the right option it would
be great. I also attached my slurm.conf file. Best Regards,
Edrisse
------------------------------------------------------------------------
This message and its contents including attachments are intended solely for the original recipient. If you are not the intended recipient or have received this message in error, please notify me immediately and delete this message from your computer system. Any unauthorized use or distribution is prohibited. Please
consider the environment before printing this email.