Discussion:
Oversubscription of GPU resources
Ulf Markwardt
2013-11-05 20:42:29 UTC
Permalink
Dear list,

how can I oversubscribe a few of our GPU cards (general resource) so
that a certain number of users might share the node AND the card for
development purposes.

Thanks,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt

Dresden University of Technology
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Moe Jette
2013-11-05 21:08:27 UTC
Permalink
You would need to configure the GPU(s) multiple times in slurm.conf
and gres.conf, but duplicate the name in the gres.conf "File" option
like this:

# Configure GPU zero to be allocated twice
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0
Post by Ulf Markwardt
Dear list,
how can I oversubscribe a few of our GPU cards (general resource) so
that a certain number of users might share the node AND the card for
development purposes.
Thanks,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt
Dresden University of Technology
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany
Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Ulf Markwardt
2013-11-11 14:58:29 UTC
Permalink
Dear Moe,
You would need to configure the GPU(s) multiple times in slurm.conf and
# Configure GPU zero to be allocated twice
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0
I have changed gres.conf on one GPU node to

Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15

But after a restart of the slurmd (+ slurmctld on the admin) I still
cannot oversubscribe the GPUs, I can still not run more than 2 of these
jobs at the same time:
srun -n 1 --gres=gpu:1 -p test-gpu bash

Thank you,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Morris Jette
2013-11-11 15:02:26 UTC
Permalink
Did you also change the count in slurm.conf?
Post by Ulf Markwardt
Dear Moe,
Post by Moe Jette
You would need to configure the GPU(s) multiple times in slurm.conf
and
Post by Moe Jette
gres.conf, but duplicate the name in the gres.conf "File" option like
# Configure GPU zero to be allocated twice
Name=gpu File=/dev/nvidia0
Name=gpu File=/dev/nvidia0
I have changed gres.conf on one GPU node to
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia0 CPUs=0-7
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15
Name=gpu File=/dev/nvidia1 CPUs=8-15
But after a restart of the slurmd (+ slurmctld on the admin) I still
cannot oversubscribe the GPUs, I can still not run more than 2 of these
srun -n 1 --gres=gpu:1 -p test-gpu bash
Thank you,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany
Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.
Ulf Markwardt
2013-11-11 15:19:29 UTC
Permalink
Post by Morris Jette
Did you also change the count in slurm.conf?
No :-)

But when I do it, I can oversubscribe.
Thank you!

Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Loading...