Christopher Samuel
2013-01-18 06:05:02 UTC
Hi folks,
I'm playing with Slurm 2.5.1 on a RHEL6.3 box and trying to get it to
limit the amount of memory a job needs. To demonstrate the issue I've
got a program that just loops allocating RAM in 1GB chunks.
My /etc/slurm/cgroup.conf has:
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
Slurm has all the cgroup plugins enabled:
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup
I've also confirmed that cgroups are being created and destroyed for
these jobs too.
By default (with overcommit disabled) I get:
[***@qan02 Run1]# srun -n1 --mem=4G ./memtest
Malloc failed: Cannot allocate memory
Allocated 1 GB
Allocated 2 GB
Allocated 3 GB
Allocated 4 GB
Allocated 5 GB
Allocated 6 GB
Allocated 7 GB
Allocated 8 GB
Allocated 9 GB
Allocated 10 GB
Allocated 11 GB
Allocated 12 GB
Allocated 13 GB
Allocated 14 GB
Allocated 15 GB
Allocated 16 GB
Allocated 17 GB
Allocated 18 GB
Allocated 19 GB
Allocated 20 GB
Allocated 21 GB
Allocated 22 GB
Allocated 23 GB
Allocated 24 GB
Allocated 25 GB
Allocated 26 GB
Allocated 27 GB
Allocated 28 GB
Allocated 29 GB
Allocated 30 GB
Allocated 31 GB
Allocated 32 GB
Allocated 33 GB
Allocated 34 GB
Allocated 35 GB
srun: error: qan02: task 0: Exited with exit code 1
Which shows that it can request as much as the machine has, despite
Slurm being told it only wants 4GB of RAM.
Poking at the cgroup when I've made it sleep between allocation I see
that the memory cgroup doesn't appear to have any limits set on its usage:
memory.limit_in_bytes
9223372036854775807
memory.memsw.limit_in_bytes
9223372036854775807
memory.soft_limit_in_bytes
9223372036854775807
Which would explain a lot.
Any ideas?
cheers!
Chris
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
I'm playing with Slurm 2.5.1 on a RHEL6.3 box and trying to get it to
limit the amount of memory a job needs. To demonstrate the issue I've
got a program that just loops allocating RAM in 1GB chunks.
My /etc/slurm/cgroup.conf has:
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm/cgroup"
ConstrainCores=yes
ConstrainRAMSpace=yes
ConstrainSwapSpace=yes
Slurm has all the cgroup plugins enabled:
ProctrackType=proctrack/cgroup
TaskPlugin=task/cgroup
JobAcctGatherType=jobacct_gather/cgroup
I've also confirmed that cgroups are being created and destroyed for
these jobs too.
By default (with overcommit disabled) I get:
[***@qan02 Run1]# srun -n1 --mem=4G ./memtest
Malloc failed: Cannot allocate memory
Allocated 1 GB
Allocated 2 GB
Allocated 3 GB
Allocated 4 GB
Allocated 5 GB
Allocated 6 GB
Allocated 7 GB
Allocated 8 GB
Allocated 9 GB
Allocated 10 GB
Allocated 11 GB
Allocated 12 GB
Allocated 13 GB
Allocated 14 GB
Allocated 15 GB
Allocated 16 GB
Allocated 17 GB
Allocated 18 GB
Allocated 19 GB
Allocated 20 GB
Allocated 21 GB
Allocated 22 GB
Allocated 23 GB
Allocated 24 GB
Allocated 25 GB
Allocated 26 GB
Allocated 27 GB
Allocated 28 GB
Allocated 29 GB
Allocated 30 GB
Allocated 31 GB
Allocated 32 GB
Allocated 33 GB
Allocated 34 GB
Allocated 35 GB
srun: error: qan02: task 0: Exited with exit code 1
Which shows that it can request as much as the machine has, despite
Slurm being told it only wants 4GB of RAM.
Poking at the cgroup when I've made it sleep between allocation I see
that the memory cgroup doesn't appear to have any limits set on its usage:
memory.limit_in_bytes
9223372036854775807
memory.memsw.limit_in_bytes
9223372036854775807
memory.soft_limit_in_bytes
9223372036854775807
Which would explain a lot.
Any ideas?
cheers!
Chris
- --
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci