Discussion:
srun: error: Not a valid slurm_step_ctx_t!
Michal Mazurek
2014-07-29 10:01:52 UTC
Permalink
Yesterday I upgraded slurm to version 14.03.6. This caused an error that
I'm told didn't occur before:

$ srun -p phi -w phi --pty bash -i
$ mpirun -np 2 echo a
srun: error: Not a valid slurm_step_ctx_t!
srun: error: Application launch failed: Invalid argument
$ slurmd -V
slurm 14.03.6

What can I do to fix this?
--
Michal Mazurek

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Michal Mazurek
2014-07-30 08:38:29 UTC
Permalink
Post by Michal Mazurek
Yesterday I upgraded slurm to version 14.03.6. This caused an error that
$ srun -p phi -w phi --pty bash -i
$ mpirun -np 2 echo a
srun: error: Not a valid slurm_step_ctx_t!
srun: error: Application launch failed: Invalid argument
$ slurmd -V
slurm 14.03.6
What can I do to fix this?
Here is some additional info I can provide:

$ mpirun -np 2 echo a
srun: error: Not a valid slurm_step_ctx_t!
srun: error: Application launch failed: Invalid argument
^CCtrl-C caught... cleaning up processes
[mpiexec-***@public.gmane.org] HYD_pmcd_pmiserv_send_signal (./pm/pmiserv/pmiserv_cb.c:239): assert (!closed) failed
[mpiexec-***@public.gmane.org] ui_cmd_cb (./pm/pmiserv/pmiserv_pmci.c:127): unable to send SIGUSR1 downstream
[mpiexec-***@public.gmane.org] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec-***@public.gmane.org] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:435): error waiting for event
[mpiexec-***@public.gmane.org] main (./ui/mpich/mpiexec.c:901): process manager error waiting for completion
$ unset SLURM_JOBID
$ mpirun -np 2 echo a
a
a
$
--
Michal Mazurek

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Michal Mazurek
2014-07-30 08:42:31 UTC
Permalink
Post by Michal Mazurek
Post by Michal Mazurek
Yesterday I upgraded slurm to version 14.03.6. This caused an error that
$ srun -p phi -w phi --pty bash -i
$ mpirun -np 2 echo a
srun: error: Not a valid slurm_step_ctx_t!
srun: error: Application launch failed: Invalid argument
$ slurmd -V
slurm 14.03.6
What can I do to fix this?
$ mpirun -np 2 echo a
srun: error: Not a valid slurm_step_ctx_t!
srun: error: Application launch failed: Invalid argument
^CCtrl-C caught... cleaning up processes
$ unset SLURM_JOBID
$ mpirun -np 2 echo a
a
a
$
I now see this is likely a problem with mpirun, which is a script
containing reverences to SLURM_JOBID. Sorry for the noise.
--
Michal Mazurek

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Loading...