Discussion:
Scheduled node reboot
Uwe Sauter
2014-08-07 15:51:34 UTC
Permalink
Hi all,

the "scontrol reboot_nodes" feature is a nice one. But I was wondering
if I could see somewhere which node was already rebooted and which is
still scheduled to do so. I know I could script something that gathers
the uptime of all nodes... but it would be handy if "sinfo" or
"scontrol" would just present the information if a node will be rebooted
after the current job.

Another useful thing would be if one could cancel a planned reboot.


Regards,

UWe
j***@public.gmane.org
2014-08-07 17:57:37 UTC
Permalink
Post by Uwe Sauter
Hi all,
the "scontrol reboot_nodes" feature is a nice one. But I was wondering
if I could see somewhere which node was already rebooted and which is
still scheduled to do so. I know I could script something that gathers
the uptime of all nodes... but it would be handy if "sinfo" or
"scontrol" would just present the information if a node will be rebooted
after the current job.
Another useful thing would be if one could cancel a planned reboot.
I've been working in this area recently. This patch should do what you
want. It applies to Slurm version 14.11 currently under development,
but will probably work with 14.03 also.

https://github.com/SchedMD/slurm/commit/c358dcd4d32088623ad7fe66d67e425193c0fa58.patch
--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
Uwe Sauter
2014-08-07 20:22:31 UTC
Permalink
Hi Moe,

good to here. I'll try if it will compile.

Regards,

Uwe
Post by j***@public.gmane.org
Post by Uwe Sauter
Hi all,
the "scontrol reboot_nodes" feature is a nice one. But I was wondering
if I could see somewhere which node was already rebooted and which is
still scheduled to do so. I know I could script something that gathers
the uptime of all nodes... but it would be handy if "sinfo" or
"scontrol" would just present the information if a node will be rebooted
after the current job.
Another useful thing would be if one could cancel a planned reboot.
I've been working in this area recently. This patch should do what you
want. It applies to Slurm version 14.11 currently under development, but
will probably work with 14.03 also.
https://github.com/SchedMD/slurm/commit/c358dcd4d32088623ad7fe66d67e425193c0fa58.patch
Continue reading on narkive:
Loading...