Trey Dockendorf
2014-08-13 01:56:35 UTC
The cgroup functionality is working great with the slight exception that when I cancel jobs or they are killed by SLURM a error is printed to the job's stderr and in the slurmd logs:
[2014-08-12T20:32:09.256] [6254] sending task exit msg for 1 tasks status 15
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20661 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20662 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20664 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] _slurm_cgroup_destroy: problem deleting step cgroup path /cgroup/freezer/slurm/uid_1380/job_6254/step_batch: Device or resource busy
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] Entering _handle_accept (new thread)
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9
This does not seem to effect the functionality, but am curious if there's something I can do to remedy this or if this is a bug.
This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64
Thanks,
- Trey
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org
[2014-08-12T20:32:09.256] [6254] sending task exit msg for 1 tasks status 15
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20661 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20662 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20664 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] _slurm_cgroup_destroy: problem deleting step cgroup path /cgroup/freezer/slurm/uid_1380/job_6254/step_batch: Device or resource busy
[2014-08-12T20:32:09.257] [6254] parameter 'freezer.state' set to 'THAWED' for '/cgroup/freezer/slurm/uid_1380/job_6254/step_batch'
[2014-08-12T20:32:09.257] [6254] killing process 20663 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] Entering _handle_accept (new thread)
[2014-08-12T20:32:09.257] [6254] killing process 20665 (inherited_task) with signal 9
[2014-08-12T20:32:09.257] [6254] killing process 20666 (inherited_task) with signal 9
This does not seem to effect the functionality, but am curious if there's something I can do to remedy this or if this is a bug.
This is slurm-14.03.6 running CentOS 6.5 kernel 2.6.32-431.23.3.el6.x86_64
Thanks,
- Trey
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org