Discussion:
HOSTNAME is the same for every node
Mads Boye
2014-09-16 11:13:37 UTC
Permalink
Hi.
I am making some getting started scripts for my slurm users, and is
facing this problem, I can not figure out if it is a bug or if I am
simply holding it wrong ;-)

What my script is doing is using sbcast to copy a file to all allocated
nodes.

Here after i am trying to rename each file to the hostname of the given
node, and then cp all the file back to my home folder.

It appears that the env variable HOSTNAME is the hostname of the
SLURM_BATCHNODE.

Here is my slurm script:

#!/bin/bash
#SBATCH -N 5 ## Number of nodes allocated
#SBATCH --job-name=sbcast

echo "create file"
touch /tmp/Hello
echo "copy Hello to every nodes /scratch"
sbcast -f /tmp/Hello /scratch/Hello
echo "see if Hello is on every node"
srun ls -l /scratch/Hello
echo "rename Hello to node hostname"
srun mv /scratch/Hello /scratch/$HOSTNAME
srun echo $HOSTNAME
srun ls -l /scratch/eagle*
~

and here is to slurm-%jobid.out

***@birdnest:~$ cat slurm-2139.out
create file
copy Hello to every nodes /scratch
see if Hello is on every node
-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
rename Hello to node hostname
eagle1
eagle1
eagle1
eagle1
eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1

Am I doing something wrong or using the function i a unintended way?


Best Regards
Bill Barth
2014-09-16 11:55:30 UTC
Permalink
The environment variable $HOSTNAME got expanded on the master compute node
(eagle1) by the shell before the srun command was executed. Also, if you
try to mv /scratch/Hello from a common filesystem, there is a race
condition as to which host will execute this first in parallel. You should
probably use cp. I haven't tested this, but you should probably escape the
$ and use cp:

srun cp /scratch/Hello /scratch/\$HOSTNAME

might work.

Bill.
--
Bill Barth, Ph.D., Director, HPC
bbarth-***@public.gmane.org | Phone: (512) 232-7069
Office: ROC 1.435 | Fax: (512) 475-9445
Post by Mads Boye
Hi.
I am making some getting started scripts for my slurm users, and is
facing this problem, I can not figure out if it is a bug or if I am
simply holding it wrong ;-)
What my script is doing is using sbcast to copy a file to all allocated
nodes.
Here after i am trying to rename each file to the hostname of the given
node, and then cp all the file back to my home folder.
It appears that the env variable HOSTNAME is the hostname of the
SLURM_BATCHNODE.
#!/bin/bash
#SBATCH -N 5 ## Number of nodes allocated
#SBATCH --job-name=sbcast
echo "create file"
touch /tmp/Hello
echo "copy Hello to every nodes /scratch"
sbcast -f /tmp/Hello /scratch/Hello
echo "see if Hello is on every node"
srun ls -l /scratch/Hello
echo "rename Hello to node hostname"
srun mv /scratch/Hello /scratch/$HOSTNAME
srun echo $HOSTNAME
srun ls -l /scratch/eagle*
~
and here is to slurm-%jobid.out
create file
copy Hello to every nodes /scratch
see if Hello is on every node
-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/Hello
rename Hello to node hostname
eagle1
eagle1
eagle1
eagle1
eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:52 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
-rw-rw-r-- 1 mb mb 13 Sep 16 12:53 /scratch/eagle1
Am I doing something wrong or using the function i a unintended way?
Best Regards,
Mads.
Loading...