Discussion:
Bug in sgather
Bjørn-Helge Mevik
2014-08-26 15:43:31 UTC
Permalink
(Since sgather is in contrib, and I found no contact address in it, I
post the report here.)

sgather in slurm 14.04.1 has a bug that is triggered when nodes are set
up with different Nodename and Nodehostname (and hostname(1) returns the
Nodehostname). Changing

nodelist=$($SRUN --ntasks=$SLURM_NNODES --ntasks-per-node=1 -l hostname) || exit $?
nodelist=$(echo "$nodelist" | cut -d ' ' -f 2 | sort)

into

nodelist=$($SCONTROL show hostnames $SLURM_NODELIST | sort)

should fix it (I am not sure if sort is even needed). It should also be
slightly more efficient.

It would also be nice if the node-global destinations could be
configurable, instead of being hard-coded in the script (or at least be
set at the top of the script). For instance, on our system, the
node-global file systems are /work and /cluster, not /scratch and /home.
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
j***@public.gmane.org
2014-08-26 18:04:34 UTC
Permalink
Responses inline below
Post by Bjørn-Helge Mevik
(Since sgather is in contrib, and I found no contact address in it, I
post the report here.)
sgather in slurm 14.04.1 has a bug that is triggered when nodes are set
up with different Nodename and Nodehostname (and hostname(1) returns the
Nodehostname). Changing
nodelist=$($SRUN --ntasks=$SLURM_NNODES --ntasks-per-node=1 -l hostname) || exit $?
nodelist=$(echo "$nodelist" | cut -d ' ' -f 2 | sort)
into
nodelist=$($SCONTROL show hostnames $SLURM_NODELIST | sort)
should fix it (I am not sure if sort is even needed). It should also be
slightly more efficient.
Thanks for the patch. This will be addressed in version 14.03.8 when
released, probably in October:
https://github.com/SchedMD/slurm/commit/9ac88dc62c4b0e48f3b405610d27d62192a38845
Post by Bjørn-Helge Mevik
It would also be nice if the node-global destinations could be
configurable, instead of being hard-coded in the script (or at least be
set at the top of the script). For instance, on our system, the
node-global file systems are /work and /cluster, not /scratch and /home.
This will be addressed in version 14.11 when released. I don't want to
change tins in version 14.03, which might break local patches.
https://github.com/SchedMD/slurm/commit/b4a735ffa59cbbde435af641190ff6fa34fb48a5
Post by Bjørn-Helge Mevik
--
Regards,
Bjørn-Helge Mevik, dr. scient,
Department for Research Computing, University of Oslo
--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html
Loading...