Uwe Sauter
2014-08-14 09:11:31 UTC
Hi all,
I got a question about a configuration detail: "dynamic partitions"
Situation:
I operate a Linux cluster of currently 54 nodes for a cooperation of two
different institutes at the university. To reflect the ratio of cash
those institutes invested I configured SLURM with two partition, one for
each institute. Those partitions have assigned different numbers of
nodes in a hard way, e.g.
PartitionName=InstA Nodes=n[01-20]
PartitionName=InstB Nodes=n[21-54]
To improve availability in case nodes break (and perhaps save some
power) I'd like to configure SLURM in a way that jobs can be assigned
nodes from the whole pool, respecting the number of nodes each institute
bought.
Research so far:
There is an option for partition configuration called "MaxNodes" but the
man pages state that this restricts the maximum number of nodes PER JOB.
It probably is possible to get something similar working using limit
enforcment through accounting, but I haven't understood that part of
SLURM 100% yet.
BlueGene systems seem to have the ability for something alike but then
this is for IBM systems only.
Question:
Is it possible to configure SLURM so that both partitions could utilize
all nodes but respect a maximum number of nodes that may be used the
same time? Something like:
PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34
So is there a way to achieve this using the confg file? Do I have to use
accounting to enfoce the limits? Or is there another way that I don't see?
Best regards,
Uwe Sauter
I got a question about a configuration detail: "dynamic partitions"
Situation:
I operate a Linux cluster of currently 54 nodes for a cooperation of two
different institutes at the university. To reflect the ratio of cash
those institutes invested I configured SLURM with two partition, one for
each institute. Those partitions have assigned different numbers of
nodes in a hard way, e.g.
PartitionName=InstA Nodes=n[01-20]
PartitionName=InstB Nodes=n[21-54]
To improve availability in case nodes break (and perhaps save some
power) I'd like to configure SLURM in a way that jobs can be assigned
nodes from the whole pool, respecting the number of nodes each institute
bought.
Research so far:
There is an option for partition configuration called "MaxNodes" but the
man pages state that this restricts the maximum number of nodes PER JOB.
It probably is possible to get something similar working using limit
enforcment through accounting, but I haven't understood that part of
SLURM 100% yet.
BlueGene systems seem to have the ability for something alike but then
this is for IBM systems only.
Question:
Is it possible to configure SLURM so that both partitions could utilize
all nodes but respect a maximum number of nodes that may be used the
same time? Something like:
PartitionName=InstA Nodes=n[01-54] MaxPartNodes=20
PartitionName=InstB Nodes=n[01-54] MaxPartNodes=34
So is there a way to achieve this using the confg file? Do I have to use
accounting to enfoce the limits? Or is there another way that I don't see?
Best regards,
Uwe Sauter