Discussion:
Sharing GPU memory (gpu_mem)
Sergio Iserte Agut
2012-04-23 13:53:06 UTC
Permalink
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run multiple
jobs in the same GPU.

These are my configurations:

*slurm.conf*

SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*

*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
68 debug sleep root PD 0:00 1 (Resources)
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.

Thank you.

Regards!
Sergio Iserte.
Moe Jette
2012-04-23 15:15:05 UTC
Permalink
The current logic allows a GRES to be allocated to one job at a time,
however you could develop a new plugin to do what you want. You would
not use gres/gpu, but write a gres/gpu_mem plugin that duplicates a
lot of the code from gres/gpu to set GPU environment variables for
CUDA (the code is src/common/gres.c will already avoid
over-subscribing the gpu_mem count/size).
Post by Sergio Iserte Agut
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run multiple
jobs in the same GPU.
*slurm.conf*
SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
68 debug sleep root PD 0:00 1 (Resources)
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.
Thank you.
Regards!
Sergio Iserte.
Sergio Iserte Agut
2012-04-23 19:26:05 UTC
Permalink
Thank you for you quick answer, I will get on with it!

Regards!
Post by Moe Jette
The current logic allows a GRES to be allocated to one job at a time,
however you could develop a new plugin to do what you want. You would
not use gres/gpu, but write a gres/gpu_mem plugin that duplicates a
lot of the code from gres/gpu to set GPU environment variables for
CUDA (the code is src/common/gres.c will already avoid
over-subscribing the gpu_mem count/size).
Post by Sergio Iserte Agut
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run
multiple
Post by Sergio Iserte Agut
jobs in the same GPU.
*slurm.conf*
SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
68 debug sleep root PD 0:00 1 (Resources)
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.
Thank you.
Regards!
Sergio Iserte.
Sergio Iserte Agut
2012-04-25 08:24:03 UTC
Permalink
Hello,

I have already created the plugin gres/gpu_mem whose code is almost the
same of gres/gpu. But I have been doing tests and I have seen that the
program never enter into the functions job_set_env and step_set_env either.

And if I look at /var/log/slurm/slurmctld.log, I don't understand the clues
slurmctld: debug: Configuration for job 188 complete
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.0 gres_bit_alloc is NULL
slurmctld: debug3: step_layout cpus = 4 pos = 0
slurmctld: debug: laying out the 1 tasks on 1 hosts compute0 dist 1
slurmctld: debug: gres/gpu_mem: step_alloc gres_bit_alloc for 188.0 is
NULL
slurmctld: sched: _slurm_rpc_job_step_create: StepId=188.0 compute0
usec=1174
I hope somebody can help me.

Kind regards!

Sergio Iserte.
Thank you for you quick answer, I will get on with it!
Regards!
Post by Moe Jette
The current logic allows a GRES to be allocated to one job at a time,
however you could develop a new plugin to do what you want. You would
not use gres/gpu, but write a gres/gpu_mem plugin that duplicates a
lot of the code from gres/gpu to set GPU environment variables for
CUDA (the code is src/common/gres.c will already avoid
over-subscribing the gpu_mem count/size).
Post by Sergio Iserte Agut
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run
multiple
Post by Sergio Iserte Agut
jobs in the same GPU.
*slurm.conf*
SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
Post by Sergio Iserte Agut
68 debug sleep root PD 0:00 1 (Resources)
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.
Thank you.
Regards!
Sergio Iserte.
Moe Jette
2012-04-25 22:05:04 UTC
Permalink
Did you install the plugin file on the head and compute nodes are
restart the slurmctld and slurmd daemons?

What does your code look like?
Post by Sergio Iserte Agut
Hello,
I have already created the plugin gres/gpu_mem whose code is almost the
same of gres/gpu. But I have been doing tests and I have seen that the
program never enter into the functions job_set_env and step_set_env either.
And if I look at /var/log/slurm/slurmctld.log, I don't understand the clues
slurmctld: debug: Configuration for job 188 complete
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.0 gres_bit_alloc is NULL
slurmctld: debug3: step_layout cpus = 4 pos = 0
slurmctld: debug: laying out the 1 tasks on 1 hosts compute0 dist 1
slurmctld: debug: gres/gpu_mem: step_alloc gres_bit_alloc for 188.0 is
NULL
slurmctld: sched: _slurm_rpc_job_step_create: StepId=188.0 compute0
usec=1174
I hope somebody can help me.
Kind regards!
Sergio Iserte.
Thank you for you quick answer, I will get on with it!
Regards!
Post by Moe Jette
The current logic allows a GRES to be allocated to one job at a time,
however you could develop a new plugin to do what you want. You would
not use gres/gpu, but write a gres/gpu_mem plugin that duplicates a
lot of the code from gres/gpu to set GPU environment variables for
CUDA (the code is src/common/gres.c will already avoid
over-subscribing the gpu_mem count/size).
Post by Sergio Iserte Agut
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run
multiple
Post by Sergio Iserte Agut
jobs in the same GPU.
*slurm.conf*
SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN gres=gpu:1,gpu_mem:512
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
Post by Sergio Iserte Agut
68 debug sleep root PD 0:00 1 (Resources)
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.
Thank you.
Regards!
Sergio Iserte.
Sergio Iserte Agut
2012-04-26 10:27:04 UTC
Permalink
Yes, I've installed the plugin on every node, ant then I've restarted them
daemons.

The code of gres/gpu_mem is the same of the gres/gpu, though I've put a
debug line in each function to see the call flow.
That's why I know that when I submit a job requesting gpu_mem, it is called
the functions node_config_load and step_set_env.

And if I run:

$ srun --gres=gpu:1,gpu_mem:100 -w"compute0" sleep 10
SLURM_NODELIST=compute0
CUDA_VISIBLE_DEVICES=0

I get the result in the files attached.

Thank you for your interest.

Kind regards!
Post by Moe Jette
Did you install the plugin file on the head and compute nodes are
restart the slurmctld and slurmd daemons?
What does your code look like?
Post by Sergio Iserte Agut
Hello,
I have already created the plugin gres/gpu_mem whose code is almost the
same of gres/gpu. But I have been doing tests and I have seen that the
program never enter into the functions job_set_env and step_set_env
either.
Post by Sergio Iserte Agut
And if I look at /var/log/slurm/slurmctld.log, I don't understand the
clues
Post by Sergio Iserte Agut
slurmctld: debug: Configuration for job 188 complete
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.4294967294 gres_bit_alloc
is NULL
slurmctld: debug: gres/gpu_mem: step_test 188.0 gres_bit_alloc is NULL
slurmctld: debug3: step_layout cpus = 4 pos = 0
slurmctld: debug: laying out the 1 tasks on 1 hosts compute0 dist 1
slurmctld: debug: gres/gpu_mem: step_alloc gres_bit_alloc for 188.0 is
NULL
slurmctld: sched: _slurm_rpc_job_step_create: StepId=188.0 compute0
usec=1174
I hope somebody can help me.
Kind regards!
Sergio Iserte.
Thank you for you quick answer, I will get on with it!
Regards!
Post by Moe Jette
The current logic allows a GRES to be allocated to one job at a time,
however you could develop a new plugin to do what you want. You would
not use gres/gpu, but write a gres/gpu_mem plugin that duplicates a
lot of the code from gres/gpu to set GPU environment variables for
CUDA (the code is src/common/gres.c will already avoid
over-subscribing the gpu_mem count/size).
Post by Sergio Iserte Agut
Hello,
I'm trying to configure my Slurm-2.3.2 in order to allow me to run
multiple
Post by Sergio Iserte Agut
jobs in the same GPU.
*slurm.conf*
SchedulerType=sched/backfill
SelectType=select/linear
GresTypes=gpu,gpu_mem
NodeName=enersis CPUs=1 Sockets=1 CoresPerSocket=1 ThreadsPerCore=1
RealMemory=1006 State=UNKNOWN
NodeName=compute0 NodeAddr=10.0.0.2 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
gres=gpu:1,gpu_mem:512
Post by Sergio Iserte Agut
Post by Moe Jette
Post by Sergio Iserte Agut
NodeName=compute1 NodeAddr=10.0.0.3 CPUs=4 RealMemory=7982 Sockets=1
CoresPerSocket=4 ThreadsPerCore=1 State=UNKNOWN
gres=gpu:1,gpu_mem:512
Post by Sergio Iserte Agut
Post by Moe Jette
Post by Sergio Iserte Agut
PartitionName=debug Nodes=compute[0-1] Default=YES MaxTime=INFINITE
State=UP
* *
*gres.conf*
*
*
Name=gpu File=/dev/nvidia0
Name=gpu_mem Count=512
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ srun -w"compute0" --gres=gpu:1,gpu_mem:250 sleep 100 &*
*$ squeue*
*
*
JOBID PARTITION NAME USER ST TIME NODES
NODELIST(REASON)
Post by Sergio Iserte Agut
68 debug sleep root PD 0:00 1
(Resources)
Post by Sergio Iserte Agut
Post by Moe Jette
Post by Sergio Iserte Agut
67 debug sleep root R 0:04 1 compute0
I wonder if to run both jobs sharing the GPU memory is possible.
Thank you.
Regards!
Sergio Iserte.
Loading...