Bjørn-Helge Mevik
2012-09-04 10:42:05 UTC
We've switched to use the task/cgroup plugin to constrain the memory
usage on our cluster. (Slurm 2.4.1, Rocks 6.0 based on CentOS 6.2)
We have the following cgroup.conf:
-------------
###
### General settings
###
CgroupMountpoint=/dev/cgroup
CgroupAutomount=yes
#default: CgroupReleaseAgentDir=/etc/slurm/cgroup
###
### Task/cgroup plugin
###
#default: ConstrainCores=no
#default: TaskAffinity=no
ConstrainRAMSpace=yes
#deafult: ConstrainDevices=no
#default: AllowedDevicesFile=/etc/slurm/cgroup_allowed_devices_file.conf
-------------
I'm quite new to cgroups, so please forgive me if these are silly
questions:
******
1) Does the ConstrainRAMSpace kill a process when the job uses too much
RAM (resident), or too much RAM (resident) + swap?
I did a test on a node with 64530 MB RAM (according to "free -m). The
node is configured in slurm to have 64530 MB RAM, and I ran a job with
--ntasks=1 --mem-per-cpu=64530.
The job started a C program that allocated a 65536 MB vector, and then
started to fill it (i.e., actually use it). The program was killed by
the oom-killer on the node, and /var/log/messages contained the
following:
Sep 4 11:43:53 compute-1-1 kernel: Task in /slurm/uid_10231/job_344/step_4294967294 killed as a result of limit of /slurm/uid_10231/job_344/step_4294967294
Sep 4 11:43:53 compute-1-1 kernel: memory: usage 64447376kB, limit 66078720kB, failcnt 0
Sep 4 11:43:53 compute-1-1 kernel: memory+swap: usage 66078720kB, limit 66078720kB, failcnt 49
To me, this looks like the job used 62936.89 MB (resident) memory, which is
less than the limit (64530 MB), but it used 64530 MB memory + swap,
which equals the limit, so it was killed.
Am I correct in this interpretation?
(It is not a problem if this is correct; we just want to be sure what
actually happens. If cgroup only constrained resident memory, one would
think that this program would not be killed, because it would never be
able to get 64530 MB resident (experiments have shown that the limit is
about 62894 MB on these nodes).)
*******
2) Is it possible to get slurm to write a message to the job's stderr
(i.e., slurm-xxx.out) when a process is killed due to a task/cgroup
limit?
*******
3) The oom-killer is very talkative: Killing the process above resulted
in about 200 lines in /var/log/messages. Is there a way to reduce the
"chatter" a bit (but not turn of loggin alltogether)?
(Any other comments and suggestions about the task/cgroup use are
also welcome!)
usage on our cluster. (Slurm 2.4.1, Rocks 6.0 based on CentOS 6.2)
We have the following cgroup.conf:
-------------
###
### General settings
###
CgroupMountpoint=/dev/cgroup
CgroupAutomount=yes
#default: CgroupReleaseAgentDir=/etc/slurm/cgroup
###
### Task/cgroup plugin
###
#default: ConstrainCores=no
#default: TaskAffinity=no
ConstrainRAMSpace=yes
#deafult: ConstrainDevices=no
#default: AllowedDevicesFile=/etc/slurm/cgroup_allowed_devices_file.conf
-------------
I'm quite new to cgroups, so please forgive me if these are silly
questions:
******
1) Does the ConstrainRAMSpace kill a process when the job uses too much
RAM (resident), or too much RAM (resident) + swap?
I did a test on a node with 64530 MB RAM (according to "free -m). The
node is configured in slurm to have 64530 MB RAM, and I ran a job with
--ntasks=1 --mem-per-cpu=64530.
The job started a C program that allocated a 65536 MB vector, and then
started to fill it (i.e., actually use it). The program was killed by
the oom-killer on the node, and /var/log/messages contained the
following:
Sep 4 11:43:53 compute-1-1 kernel: Task in /slurm/uid_10231/job_344/step_4294967294 killed as a result of limit of /slurm/uid_10231/job_344/step_4294967294
Sep 4 11:43:53 compute-1-1 kernel: memory: usage 64447376kB, limit 66078720kB, failcnt 0
Sep 4 11:43:53 compute-1-1 kernel: memory+swap: usage 66078720kB, limit 66078720kB, failcnt 49
To me, this looks like the job used 62936.89 MB (resident) memory, which is
less than the limit (64530 MB), but it used 64530 MB memory + swap,
which equals the limit, so it was killed.
Am I correct in this interpretation?
(It is not a problem if this is correct; we just want to be sure what
actually happens. If cgroup only constrained resident memory, one would
think that this program would not be killed, because it would never be
able to get 64530 MB resident (experiments have shown that the limit is
about 62894 MB on these nodes).)
*******
2) Is it possible to get slurm to write a message to the job's stderr
(i.e., slurm-xxx.out) when a process is killed due to a task/cgroup
limit?
*******
3) The oom-killer is very talkative: Killing the process above resulted
in about 200 lines in /var/log/messages. Is there a way to reduce the
"chatter" a bit (but not turn of loggin alltogether)?
(Any other comments and suggestions about the task/cgroup use are
also welcome!)
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo
Regards,
Bjørn-Helge Mevik, dr. scient,
Research Computing Services, University of Oslo