Jens Svalgaard Kohrt
2014-06-27 14:17:28 UTC
Hi,
We are trying to setup GPU accounting in a mixed environment with 12 CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU cores.
Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)
Everything seems to work, apart from that the GPU’s are "free to use” wrt. Slurms fair share accounting etc
Is it somehow possible to set this up such that accounting wise, getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU accounting as future work.
In an ideal world it would be nice to be able to have write a job submit/completion script that given information about the requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed
cost_of_job = time_used * (total_cpus + 10*total_gpus)
Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?
Thanks!
Jens=
We are trying to setup GPU accounting in a mixed environment with 12 CPU only nodes, and 12 nodes each two 2 GPU’s. All nodes have 20 CPU cores.
Jobs are submitted to a partition containing all nodes, and are allocated as
* if a GPU is needed: on the GPU nodes
* if no GPU is needed: on any node (but only on GPU nodes if all CPU nodes are in use)
Everything seems to work, apart from that the GPU’s are "free to use” wrt. Slurms fair share accounting etc
Is it somehow possible to set this up such that accounting wise, getting a GPU corresponds to getting e.g., 10 cpu cores extra?
Using Google I’ve only been able to find something about GPU accounting as future work.
In an ideal world it would be nice to be able to have write a job submit/completion script that given information about the requested/allocated
* # CPU cores
* # GPUs
* # Memory
* # QOS
* # maximum/actual running time
calculates the cost of running the job and updates the SlurmDBD database.
In my particular context, only something like this is needed
cost_of_job = time_used * (total_cpus + 10*total_gpus)
Can somebody give a hint on how to do this (if possible)?
If not, maybe point me to where in the slurm source code I should start digging?
Thanks!
Jens=