Discussion:
cgroup freezer throwing "Device or resource busy" upon job cancel or kill - 14.03.6
Trey Dockendorf
2014-08-13 01:56:35 UTC
Permalink
The cgroup functionality is working great with the slight exception that when I cancel jobs or they are killed by SLURM a error is printed to the job's stderr and in the slurmd logs:

[2014-08-12T20:32:09.256] [6254] sending task exit msg for 1 tasks status 15
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20661 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20662 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20664 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] _slurm_cgroup_destroy: problem deleting step cgroup path /cgroup/freezer/slurm/uid_1380/job_6254/step_batch: Device or resource busy
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] Entering _handle_accept (new thread)
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9

This does not seem to effect the functionality, but am curious if there's something I can do to remedy this or if this is a bug.

This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org
Kilian Cavalotti
2014-08-13 06:05:30 UTC
Permalink
Post by Trey Dockendorf
This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64
Exact same behavior here, same Slurm version and same kernel.

Cheers,
--
Kilian
Trey Dockendorf
2014-08-13 15:22:32 UTC
Permalink
Kilian,

Thanks for confirming that others are seeing this.

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org

----- Original Message -----
Sent: Wednesday, August 13, 2014 1:06:10 AM
Subject: [slurm-dev] Re: cgroup freezer throwing "Device or resource busy" upon job cancel or kill - 14.03.6
Post by Trey Dockendorf
This is slurm-14.03.6 running CentOS 6.5 kernel
2.6.32-431.23.3.el6.x86_64
Exact same behavior here, same Slurm version and same kernel.
Cheers,
--
Kilian
David Bigagli
2014-08-13 17:00:35 UTC
Permalink
For some reason at the first attempt rmdir(2) returns EBUSY.
Post by Kilian Cavalotti
Post by Trey Dockendorf
This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64
Exact same behavior here, same Slurm version and same kernel.
Cheers,
--
Thanks,
/David/Bigagli

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
Kilian Cavalotti
2014-08-13 23:16:37 UTC
Permalink
Post by David Bigagli
For some reason at the first attempt rmdir(2) returns EBUSY.
Would writing to memory.force_empty before calling rmdir() help?
See http://lxr.free-electrons.com/source/Documentation/cgroups/memory.txt?v=2.6.32#L269

Cheers,
--
Kilian
David Bigagli
2014-08-13 23:21:32 UTC
Permalink
Interesting indeed. Let me have a look at it and experiment with it a bit.
Post by Kilian Cavalotti
Post by David Bigagli
For some reason at the first attempt rmdir(2) returns EBUSY.
Would writing to memory.force_empty before calling rmdir() help?
See http://lxr.free-electrons.com/source/Documentation/cgroups/memory.txt?v=2.6.32#L269
Cheers,
--
Thanks,
/David/Bigagli

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
David Bigagli
2014-08-18 18:18:49 UTC
Permalink
Unfortunately the article refers to the memory sub system which gets
removed without problem. The issue happens on the freezer, however it is
just an error message without consequences.
Post by Kilian Cavalotti
Post by David Bigagli
For some reason at the first attempt rmdir(2) returns EBUSY.
Would writing to memory.force_empty before calling rmdir() help?
See http://lxr.free-electrons.com/source/Documentation/cgroups/memory.txt?v=2.6.32#L269
Cheers,
--
Thanks,
/David/Bigagli

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
www.schedmd.com
Loading...