Using control groups to restrict resource usage on BG/Q launch node?
Christopher Samuel
2014-09-22 13:07:33 UTC
Hi folks,

We have had a number of occasions where users run front-end code in
their Slurm jobs on our launch nodes for our BlueGene/Q system and use
up a large amount of memory/CPU by error.

Now on a BG/Q system all the jobs on the launch node are meant to be
doing are to be starting back end applications via srun which then run
on the compute nodes. So if a user manages to run the system out of
memory and trigger an OOM and that (say) kills GPFS then we lose
everyones jobs that are running on that launch node.

I'd like to be able to use control groups to limit the amount of memory
any single job can use on the launch node and wondering if any one else
is doing this?

I can see you can set cores to be unconstrained (which is important as
our front end node doesn't have 65,535 cores to match our 4 rack system)
so it would appear possible, but I'd love to hear from anyone who is...

Of course there may be occasions where only part of the application runs
on the back end and other parts the users might need will need to run on
the front end (yes GROMACS, I'm looking at you) but if we permit users
to request memory then we are likely to be able to handle this gracefully.

Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Carlos Fenoy
2014-09-23 16:00:34 UTC
Hi Chris,
Yo can create a cgroup restricting only memory to the front end memory
amount - the amount needed by the system, and attaching the slurm processes
to that cgroup. This way the if the oom is invoked in the cgroup it will
only kill tasks belonging to the cgroup.

Carles Fenoy
Post by Christopher Samuel
Hi folks,
We have had a number of occasions where users run front-end code in
their Slurm jobs on our launch nodes for our BlueGene/Q system and use
up a large amount of memory/CPU by error.
Now on a BG/Q system all the jobs on the launch node are meant to be
doing are to be starting back end applications via srun which then run
on the compute nodes. So if a user manages to run the system out of
memory and trigger an OOM and that (say) kills GPFS then we lose
everyones jobs that are running on that launch node.
I'd like to be able to use control groups to limit the amount of memory
any single job can use on the launch node and wondering if any one else
is doing this?
I can see you can set cores to be unconstrained (which is important as
our front end node doesn't have 65,535 cores to match our 4 rack system)
so it would appear possible, but I'd love to hear from anyone who is...
Of course there may be occasions where only part of the application runs
on the back end and other parts the users might need will need to run on
the front end (yes GROMACS, I'm looking at you) but if we permit users
to request memory then we are likely to be able to handle this gracefully.
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci