Discussion:
Warning: RLIMIT_MEMLOCK is 32768 bytes suddenly appears
Всеволод Никоноров
2014-10-13 12:10:29 UTC
Permalink
Hi,

I am seeing mentioned warning now and then. As I discovered, there is a configuration parameter in slurm.conf, PropagateResourceLimitsExcept, which determines whether system rlimit_memlock value is propagated to submitted job or not. I have this parameter set to "NOFILE", meaning that it should propagate system value of rlimit_memlock (there are lines "* soft memlock unlimited" and "* hard memlock unlimited" in my /etc/security/limits.conf, which, as I believe, specifies rlimit_memlock value as unlimited). Nevertheless, sometimes my users complain about mentioned warning spotting in their jobs' output files. I use the followving way to check whether the node where user's job has failed is configured properly:

srun -w <node name> bash -c "ulimit -l"

Sometimes I get "32" as the result. Restarting slurmd on the node fixes the issue. Is this problem known for slurm-14.03.7?

Thanks in advance!
j***@public.gmane.org
2014-10-13 19:47:41 UTC
Permalink
perhaps this will help:
http://slurm.schedmd.com/faq.html#rlimit
Post by Всеволод Никоноров
Hi,
I am seeing mentioned warning now and then. As I discovered, there
is a configuration parameter in slurm.conf,
PropagateResourceLimitsExcept, which determines whether system
rlimit_memlock value is propagated to submitted job or not. I have
this parameter set to "NOFILE", meaning that it should propagate
system value of rlimit_memlock (there are lines "* soft memlock
unlimited" and "* hard memlock unlimited" in my
/etc/security/limits.conf, which, as I believe, specifies
rlimit_memlock value as unlimited). Nevertheless, sometimes my users
complain about mentioned warning spotting in their jobs' output
files. I use the followving way to check whether the node where
srun -w <node name> bash -c "ulimit -l"
Sometimes I get "32" as the result. Restarting slurmd on the node
fixes the issue. Is this problem known for slurm-14.03.7?
Thanks in advance!
--
Morris "Moe" Jette
CTO, SchedMD LLC
Всеволод Никоноров
2014-10-15 11:35:33 UTC
Permalink
"Can't propagate RLIMIT_...".

I don't see such messages in my slurm.log. Thanks anyway.

Maybe there is an error in that article:

---citation---
When the srun command executes, it captures the resource limits in effect at submit time. These limits are propagated to the allocated _nodes_ before initiating the user's job. The SLURM daemon running on _that node_ then tries to establish identical resource limits for the job being initiated.
---end of citation---

In second sentence there are _nodes_, more than one. In third sentence they are referred to as "that node".
Post by j***@public.gmane.org
http://slurm.schedmd.com/faq.html#rlimit
О©╫Hi,
О©╫I am seeing mentioned warning now and then. As I discovered, there
О©╫is a configuration parameter in slurm.conf,
О©╫PropagateResourceLimitsExcept, which determines whether system
О©╫rlimit_memlock value is propagated to submitted job or not. I have
О©╫this parameter set to "NOFILE", meaning that it should propagate
О©╫system value of rlimit_memlock (there are lines "* soft memlock
О©╫unlimited" and "* hard memlock unlimited" in my
О©╫/etc/security/limits.conf, which, as I believe, specifies
О©╫rlimit_memlock value as unlimited). Nevertheless, sometimes my users
О©╫complain about mentioned warning spotting in their jobs' output
О©╫files. I use the followving way to check whether the node where
О©╫srun -w <node name> bash -c "ulimit -l"
О©╫Sometimes I get "32" as the result. Restarting slurmd on the node
О©╫fixes the issue. Is this problem known for slurm-14.03.7?
О©╫Thanks in advance!
--
Morris "Moe" Jette
CTO, SchedMD LLC
--О©╫
Vsevolod Nikonorov, JSC NIKIET
О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫ О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫О©╫, О©╫О©╫О©╫ О©╫О©╫О©╫О©╫О©╫О©╫
Loading...