Franco Broi
2014-08-29 02:38:32 UTC
Hi
Seen this a few times now, we have jobs queued that should be able to
run but they wont start unless I restart the controller daemon. Other
jobs submitted more recently seem to working fine.
I can see from the slurmctld log file with debug=9 that they are not
being tested to see if they are runnable, does this mean that the daemon
has somehow forgotten about them?
I just restarted the daemon and they started immediately.
Any ideas how I can debug this if it happens again?
Cheers,
Seen this a few times now, we have jobs queued that should be able to
run but they wont start unless I restart the controller daemon. Other
jobs submitted more recently seem to working fine.
I can see from the slurmctld log file with debug=9 that they are not
being tested to see if they are runnable, does this mean that the daemon
has somehow forgotten about them?
I just restarted the daemon and they started immediately.
Any ideas how I can debug this if it happens again?
Cheers,