Discussion:
Resource allocation
Mike Johnson
2014-06-26 21:49:34 UTC
Permalink
Hi

I have a question regarding splitting resources on a single node
across two partitions.

We have a number of nodes that contain 24 cores and 4 GPUs. What I'd
like is either of the two options:

1. One partition. Jobs that need CPU cores only will only be
scheduled up to the point that 16 cores are used on a GPU node. This
leaves space for jobs that need a GPU.

2. We can have a CPU and a GPU partition. 16 cores per GPU node are
allocated to the CPU queue and the remaining 8 cores and 4 GPUs are
allocated to the GPU queue.

It gets more complex though. Some nodes are 32, 48 or 160 cores and
don't have GPUs. Obviously these would be in the CPU partition only.
Limiting the number of CPUs used on a partition would cause issues
with these nodes because of the wastage of CPU cores if everything is
limited to 16 cores. Is it possible to have this level of control?
We'd be defining limits on a per-node basis.

Anyone got any tips? I think it should be possible I just don't know
how I'd go about defining this.

Thanks!
Mike
Trey Dockendorf
2014-06-27 02:52:31 UTC
Permalink
I can't offer any advice regarding the GPU allocation as I have not had to configure such a setup, but the mixed CPU count per node is something I've dealt with.  In my situation we have set both Memory and CPUs (cores) as consumable resources, and our limits are set in QOS.  So if you do per-core scheduling, then mixing servers of varying CPUs counts is not an issue.

For us, we have a mix of 8 or 32 core systems, that are either 16, 32, 64 or 128GB of memory.  Half are old GigE and half IB.  We've split our partitions based on our desired preemption and then lock accounts (which correspond to Unix groups) and QOS to some partition.  The per core scheduling made this possible.

Not sure if that helps, or if I misunderstood the issue.

- Trey

Mike Johnson <m.d.johnson-***@public.gmane.org> wrote:


Hi

I have a question regarding splitting resources on a single node
across two partitions.

We have a number of nodes that contain 24 cores and 4 GPUs. What I'd
like is either of the two options:

1. One partition. Jobs that need CPU cores only will only be
scheduled up to the point that 16 cores are used on a GPU node. This
leaves space for jobs that need a GPU.

2. We can have a CPU and a GPU partition. 16 cores per GPU node are
allocated to the CPU queue and the remaining 8 cores and 4 GPUs are
allocated to the GPU queue.

It gets more complex though. Some nodes are 32, 48 or 160 cores and
don't have GPUs. Obviously these would be in the CPU partition only.
Limiting the number of CPUs used on a partition would cause issues
with these nodes because of the wastage of CPU cores if everything is
limited to 16 cores. Is it possible to have this level of control?
We'd be defining limits on a per-node basis.

Anyone got any tips? I think it should be possible I just don't know
how I'd go about defining this.

Thanks!
Mike

Loading...