Slurm GPU accounting

Discussion:

Jens Svalgaard Kohrt

2014-06-27 14:17:28 UTC

Hi,

We are trying to setup GPU accounting in a mixed environment with 12 CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU cores.

Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)

Everything seems to work, apart from that the GPU’s are "free to use” wrt. Slurms fair share accounting etc

Is it somehow possible to set this up such that accounting wise, getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU accounting as future work.

In an ideal world it would be nice to be able to have write a job submit/completion script that given information about the requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed

cost_of_job = time_used * (total_cpus + 10*total_gpus)

Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?

Thanks!

Jens=

Paul Edmon

2014-06-27 14:27:27 UTC

Permalink

Actually a broader question would be GPU charging back to fairshare. Do
they actually count? How much? This is an interesting question.

-Paul Edmon-

Post by Jens Svalgaard Kohrt
Hi,
We are trying to setup GPU accounting in a mixed environment with 12 CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU cores.
Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)
Everything seems to work, apart from that the GPU’s are "free to use” wrt. Slurms fair share accounting etc
Is it somehow possible to set this up such that accounting wise, getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU accounting as future work.
In an ideal world it would be nice to be able to have write a job submit/completion script that given information about the requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed
cost_of_job = time_used * (total_cpus + 10*total_gpus)
Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?
Thanks!
Jens=

Bill Wichser

2014-06-27 15:01:32 UTC

Permalink

We too would love to have a way of accounting for GPU usage. The hope
was that there might be a record of the requested GRES resource in the
accounting database such that externally we could determine GPU usage.
While we have nodes with GPUs in a separate partition, the best we can
obtain is that this was a GPU job and not the 1 to 4 GPUs that it
actually used.

Bill

Post by Paul Edmon
Actually a broader question would be GPU charging back to fairshare.
Do they actually count? How much? This is an interesting question.
-Paul Edmon-

Post by Jens Svalgaard Kohrt
Hi,
We are trying to setup GPU accounting in a mixed environment with 12
CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU
cores.
Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)
Everything seems to work, apart from that the GPU’s are "free to use”
wrt. Slurms fair share accounting etc
Is it somehow possible to set this up such that accounting wise,
getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU accounting as future work.
In an ideal world it would be nice to be able to have write a job
submit/completion script that given information about the
requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed
cost_of_job = time_used * (total_cpus + 10*total_gpus)
Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?
Thanks!
Jens=

Trey Dockendorf

2014-06-27 15:23:29 UTC

Permalink

Jens,

If I understand you correctly, your wishing to update the SlurmDBD after a job completes, with a modified "usage" based some some criteria?

My guess it that could be done with a Plugin, but unsure how.

If you wish to modify the job BEFORE it runs, which should then upload the correct accounting data upon completion, you could try using the "JobSubmitPlugins" parameter in slurm.conf.

For example:

JobSubmitPlugins=lua

Then in /etc/slurm/job_submit.lua you use some logic that says "if gpu needed, increase max_cpus by 10".

The SLURM source contains an example job_submit.lua [1]. I also found another good example [2] using Google.

Might not be the approach your looking for but hopefully sparks an idea :)

- Trey

[1] - https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
[2] - https://github.com/edf-hpc/slurm-llnl-misc-plugins/blob/master/job_submit.lua

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org

----- Original Message -----

Sent: Friday, June 27, 2014 9:18:06 AM
Subject: [slurm-dev] Slurm GPU accounting
Hi,
We are trying to setup GPU accounting in a mixed environment with 12
CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU
cores.
Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)
Everything seems to work, apart from that the GPU’s are "free to use”
wrt. Slurms fair share accounting etc
Is it somehow possible to set this up such that accounting wise,
getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU
accounting as future work.
In an ideal world it would be nice to be able to have write a job
submit/completion script that given information about the
requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed
cost_of_job = time_used * (total_cpus + 10*total_gpus)
Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?
Thanks!
Jens=

Jens Svalgaard Kohrt

2014-06-27 20:17:33 UTC

Permalink

Trey,

This sounds like a good starting point. I’ll have a look next week and see whether this gets me anywhere.

The reason for this complicated setup is that the cluster was bought by several research groups joining their grants.
The GPU nodes were approximately twice as expensive as CPU only nodes.
To be fair to all groups a job using 20 cpu cores (one node) should be about the same price as a job with one cpu cores + two GPUS. Currently the GPU job only counts 1/20 in the fairshare.

Cheers,

Jens

Post by Trey Dockendorf
Jens,
If I understand you correctly, your wishing to update the SlurmDBD after a job completes, with a modified "usage" based some some criteria?
My guess it that could be done with a Plugin, but unsure how.
If you wish to modify the job BEFORE it runs, which should then upload the correct accounting data upon completion, you could try using the "JobSubmitPlugins" parameter in slurm.conf.
JobSubmitPlugins=lua
Then in /etc/slurm/job_submit.lua you use some logic that says "if gpu needed, increase max_cpus by 10".
The SLURM source contains an example job_submit.lua [1]. I also found another good example [2] using Google.
Might not be the approach your looking for but hopefully sparks an idea :)
- Trey
[1] - https://github.com/SchedMD/slurm/blob/master/contribs/lua/job_submit.lua
[2] - https://github.com/edf-hpc/slurm-llnl-misc-plugins/blob/master/job_submit.lua
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
----- Original Message -----