Mikael Johansson
2014-10-20 17:51:38 UTC
Hello All,
I've been scratching my head for a while now trying to figure this one
out, which I would think would be a rather common setup.
I would need to set up a partition (or whatever, maybe a partition is
actually not the way to go) with the following properties:
1. If there are any unused cores on the cluster, jobs submitted to this
one would use them, and immediately have access to them.
2. The jobs should only use these resources until _any_ other job in
another partition needs them. In this case, the jobs should be
preempted and requeued.
So this should be some sort of "shadow" queue/partition, that shouldn't
affect the scheduling of other jobs on the cluster, but just use up any
free resources that momentarily happen to be available. So SLURM should
just continue scheduling everything else normally, and treat the cores
used by this shadow queue as free resources, and then just immediately
cancel and requeue any jobs there, when a "real" job starts.
If anyone has something like this set up, example configs would be very
welcome, as of course all other suggestions and ideas.
Cheers,
Mikael J.
http://www.iki.fi/~mpjohans/
I've been scratching my head for a while now trying to figure this one
out, which I would think would be a rather common setup.
I would need to set up a partition (or whatever, maybe a partition is
actually not the way to go) with the following properties:
1. If there are any unused cores on the cluster, jobs submitted to this
one would use them, and immediately have access to them.
2. The jobs should only use these resources until _any_ other job in
another partition needs them. In this case, the jobs should be
preempted and requeued.
So this should be some sort of "shadow" queue/partition, that shouldn't
affect the scheduling of other jobs on the cluster, but just use up any
free resources that momentarily happen to be available. So SLURM should
just continue scheduling everything else normally, and treat the cores
used by this shadow queue as free resources, and then just immediately
cancel and requeue any jobs there, when a "real" job starts.
If anyone has something like this set up, example configs would be very
welcome, as of course all other suggestions and ideas.
Cheers,
Mikael J.
http://www.iki.fi/~mpjohans/