Всеволод Никоноров
2014-10-13 12:10:29 UTC
Hi,
I am seeing mentioned warning now and then. As I discovered, there is a configuration parameter in slurm.conf, PropagateResourceLimitsExcept, which determines whether system rlimit_memlock value is propagated to submitted job or not. I have this parameter set to "NOFILE", meaning that it should propagate system value of rlimit_memlock (there are lines "* soft memlock unlimited" and "* hard memlock unlimited" in my /etc/security/limits.conf, which, as I believe, specifies rlimit_memlock value as unlimited). Nevertheless, sometimes my users complain about mentioned warning spotting in their jobs' output files. I use the followving way to check whether the node where user's job has failed is configured properly:
srun -w <node name> bash -c "ulimit -l"
Sometimes I get "32" as the result. Restarting slurmd on the node fixes the issue. Is this problem known for slurm-14.03.7?
Thanks in advance!
I am seeing mentioned warning now and then. As I discovered, there is a configuration parameter in slurm.conf, PropagateResourceLimitsExcept, which determines whether system rlimit_memlock value is propagated to submitted job or not. I have this parameter set to "NOFILE", meaning that it should propagate system value of rlimit_memlock (there are lines "* soft memlock unlimited" and "* hard memlock unlimited" in my /etc/security/limits.conf, which, as I believe, specifies rlimit_memlock value as unlimited). Nevertheless, sometimes my users complain about mentioned warning spotting in their jobs' output files. I use the followving way to check whether the node where user's job has failed is configured properly:
srun -w <node name> bash -c "ulimit -l"
Sometimes I get "32" as the result. Restarting slurmd on the node fixes the issue. Is this problem known for slurm-14.03.7?
Thanks in advance!