Riccardo Murri
2014-09-19 16:01:32 UTC
Hello,
we are having an issue with SLURM killing jobs because of virtual
memory limits::
slurmstepd[46530]: error: Job 784 exceeded virtual memory limit
(416329820 > 211812352), being killed
The problem is that the job above has actually negligible heap use,
*but* it allocates a SysV shared memory segment of about 100GB. It
seems that the size of this shared memory segment is counted towards
*all* 4 processes in the job, instead of being counted just once.
Is this expected, or did we misconfigure something?
We are running 14.03.2. Possibly relevant configuration items::
# slurm.conf
JobAcctGatherType=jobacct_gather/linux
JobCompType=jobcomp/none
MpiDefault=none
ProctrackType=proctrack/pgid
PropagateResourceLimitsExcept=CPU
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/cgroup
VSizeFactor=101
# cgroup.conf
ConstrainCores=yes
Thanks for any suggestion!
Kind regards,
Riccardo
--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/
S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888
we are having an issue with SLURM killing jobs because of virtual
memory limits::
slurmstepd[46530]: error: Job 784 exceeded virtual memory limit
(416329820 > 211812352), being killed
The problem is that the job above has actually negligible heap use,
*but* it allocates a SysV shared memory segment of about 100GB. It
seems that the size of this shared memory segment is counted towards
*all* 4 processes in the job, instead of being counted just once.
Is this expected, or did we misconfigure something?
We are running 14.03.2. Possibly relevant configuration items::
# slurm.conf
JobAcctGatherType=jobacct_gather/linux
JobCompType=jobcomp/none
MpiDefault=none
ProctrackType=proctrack/pgid
PropagateResourceLimitsExcept=CPU
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
TaskPlugin=task/cgroup
VSizeFactor=101
# cgroup.conf
ConstrainCores=yes
Thanks for any suggestion!
Kind regards,
Riccardo
--
Riccardo Murri
http://www.s3it.uzh.ch/about/team/
S3IT: Services and Support for Science IT
University of Zurich
Winterthurerstrasse 190, CH-8057 Zürich (Switzerland)
Tel: +41 44 635 4222
Fax: +41 44 635 6888