Discussion:
slurm_receive_msg: Zero Bytes were transmitted or received
SLIM H.A.
2014-10-23 17:37:46 UTC
Permalink
I wish to use slurm on our cluster and it has recently been installed and configured with SelectType=select/linear and this is working.
I want to modify the configuration to allow serial jobs sharing a node and modified the conf file like this:

# diff slurm.conf slurm.conf.20141022
65,66c65
< SelectType=select/cons_res
< SelectTypeParameters=CR_CPU
---
SelectType=select/linear
The modified file slurm.conf(.select) is attached.

On submitting a job I now get this error message:

# sbatch -p thin -N 2 slurm_gcc.csh
sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or received
sbatch: error: Batch job submission failed: Zero Bytes were transmitted or received

Am I missing something in the configuration here?

Thanks

Henk
Mike Johnson
2014-10-23 18:11:42 UTC
Permalink
Hi Henk.

Two things to check:

Date and time synced.

Correct number of resources set in slurm.conf. slurmd -C on the node
should tell you the correct info.

Mike
Post by SLIM H.A.
I wish to use slurm on our cluster and it has recently been installed and
configured with SelectType=select/linear and this is working.
I want to modify the configuration to allow serial jobs sharing a node and
# diff slurm.conf slurm.conf.20141022
65,66c65
< SelectType=select/cons_res
< SelectTypeParameters=CR_CPU
---
Post by SLIM H.A.
SelectType=select/linear
The modified file slurm.conf(.select) is attached.
# sbatch -p thin -N 2 slurm_gcc.csh
sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or received
sbatch: error: Batch job submission failed: Zero Bytes were transmitted or received
Am I missing something in the configuration here?
Thanks
Henk
Mike Johnson
2014-10-23 18:15:39 UTC
Permalink
... and it seems my phone cut off the top of your message so it may be
more complex than I thought. I've had the 'zero bytes' problem
recently and it's been caused by incorrect specification of consumable
resources, date/time out of sync with client/server and sometimes a
mismatched version between slurmd and slurmctld

Do email back if that doesn't fix it
Mike
Post by Mike Johnson
Hi Henk.
Date and time synced.
Correct number of resources set in slurm.conf. slurmd -C on the node should
tell you the correct info.
Mike
Post by SLIM H.A.
I wish to use slurm on our cluster and it has recently been installed and
configured with SelectType=select/linear and this is working.
I want to modify the configuration to allow serial jobs sharing a node and
# diff slurm.conf slurm.conf.20141022
65,66c65
< SelectType=select/cons_res
< SelectTypeParameters=CR_CPU
---
Post by SLIM H.A.
SelectType=select/linear
The modified file slurm.conf(.select) is attached.
# sbatch -p thin -N 2 slurm_gcc.csh
sbatch: error: slurm_receive_msg: Zero Bytes were transmitted or received
sbatch: error: Batch job submission failed: Zero Bytes were transmitted or received
Am I missing something in the configuration here?
Thanks
Henk
Loading...