Krishna Teja
2014-08-08 14:28:34 UTC
I have been trying to configure SLURM so as to be able to use GPU's
available in some of the nodes in our cluster (compute-0-4,compute-0-5 and
compute-0-6 to be precise). I have followed the instructions given in the
SLURM website.
http://slurm.schedmd.com/gres.html
But that doesn't seem to work. I still get the same error as if the GPU's
weren't configured.
srun: error: Unable to allocate resources: Requested node configuration is
not available
Furthermore, i run a simple command to test if everything is fine with
SLURM, to print the hostnames of all the nodes using
srun -N7 -l /bin/hostname
and i get the following output.
srun: error: Duplicated NodeHostName compute-0-4 in the config file
srun: error: Duplicated NodeHostName compute-0-5 in the config file
srun: error: Duplicated NodeHostName compute-0-6 in the config file
4: compute-0-4.local
5: compute-0-5.local
6: compute-0-6.local
3: compute-0-3.local
1: compute-0-1.local
0: compute-0-0.local
2: compute-0-2.local
I have attached the slurm.conf file and gres.conf file. Can someone please
point to me what i am doing wrong. Any help appreciated!!
available in some of the nodes in our cluster (compute-0-4,compute-0-5 and
compute-0-6 to be precise). I have followed the instructions given in the
SLURM website.
http://slurm.schedmd.com/gres.html
But that doesn't seem to work. I still get the same error as if the GPU's
weren't configured.
srun: error: Unable to allocate resources: Requested node configuration is
not available
Furthermore, i run a simple command to test if everything is fine with
SLURM, to print the hostnames of all the nodes using
srun -N7 -l /bin/hostname
and i get the following output.
srun: error: Duplicated NodeHostName compute-0-4 in the config file
srun: error: Duplicated NodeHostName compute-0-5 in the config file
srun: error: Duplicated NodeHostName compute-0-6 in the config file
4: compute-0-4.local
5: compute-0-5.local
6: compute-0-6.local
3: compute-0-3.local
1: compute-0-1.local
0: compute-0-0.local
2: compute-0-2.local
I have attached the slurm.conf file and gres.conf file. Can someone please
point to me what i am doing wrong. Any help appreciated!!