Discussion:
Failure detection / transaction when scheduling job steps
Jeroen Meijer
2014-06-19 11:32:06 UTC
Permalink
When submitting a job with sbatch with ~10000 job steps errors often occur.
This happens although we looked at
https://computing.llnl.gov/linux/slurm/high_throughput.html. Is it possible
to cancel a job when scheduling a job step fails? Or maybe scheduling job
steps in a transaction such that either none or all job steps get executed?
Loading...