Alessandro Italiano
2012-11-19 14:44:03 UTC
Hi
we are going to evaluate slurm as batch system for our computing
farm[14k computing slots].
I've done some tests using the prolog script and I've noticed that
1. when the "Prolog" script fails the host, where it failed, is flagged
as DOWN
and the job will stack in PENDING status.
2. when the "PrologSlurmctld" script fails the job is CANCELLED.
first of all, can someone confirm that this is the expected behavior ?
Is there a way to configure slurm in order to automatically dispatch a
job on
a new host when the "Prolog " script fails ?
unfortunately I didn't find any answer to my questions in the "Prolog
and Epilog Scripts" section of the slurm.conf man page
thanks in advance
Alessandro
we are going to evaluate slurm as batch system for our computing
farm[14k computing slots].
I've done some tests using the prolog script and I've noticed that
1. when the "Prolog" script fails the host, where it failed, is flagged
as DOWN
and the job will stack in PENDING status.
2. when the "PrologSlurmctld" script fails the job is CANCELLED.
first of all, can someone confirm that this is the expected behavior ?
Is there a way to configure slurm in order to automatically dispatch a
job on
a new host when the "Prolog " script fails ?
unfortunately I didn't find any answer to my questions in the "Prolog
and Epilog Scripts" section of the slurm.conf man page
thanks in advance
Alessandro