Jaya Srivastava
2014-07-24 12:01:48 UTC
Hi,
[Its my first mail to the slurm-dev mailing list.]
I am getting the following error while submitting the job to the Slurm -
Error : This is observed on both - master as well as slave
--------------------------------------------------------
mybin: bind: resource busy (Address already in use)
srun: error: machine1: tasks 0-1,3: Exited with exit code 1
----------------------------------------------------
Parameters of job submitted to SLURM -
--------------------------------------
#SBATCH --time=00:02:00
#SBATCH --exclusive
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=128
srun -p partition1 --nodelist=machine1 --exclusive ./mybin slave <IP
address> <port no.> &
srun -p partition1 --nodelist=machine2 --exclusive ./mybin master <IP
address> <port no.>
-----------------------------------
Where, mybin = haskell binary.
Program - mybin - In Cloud haskell, communication between nodes achieved by
using master-slave model (Other models are also there).
Here, out of 4 allocated nodes, one will behave as master and remaining as
slave and do further computation.
Please let me know, if you need further details.
Thanks in advance,
Jaya
[Its my first mail to the slurm-dev mailing list.]
I am getting the following error while submitting the job to the Slurm -
Error : This is observed on both - master as well as slave
--------------------------------------------------------
mybin: bind: resource busy (Address already in use)
srun: error: machine1: tasks 0-1,3: Exited with exit code 1
----------------------------------------------------
Parameters of job submitted to SLURM -
--------------------------------------
#SBATCH --time=00:02:00
#SBATCH --exclusive
#SBATCH --ntasks=4
#SBATCH --cpus-per-task=1
#SBATCH --nodes=4
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=128
srun -p partition1 --nodelist=machine1 --exclusive ./mybin slave <IP
address> <port no.> &
srun -p partition1 --nodelist=machine2 --exclusive ./mybin master <IP
address> <port no.>
-----------------------------------
Where, mybin = haskell binary.
Program - mybin - In Cloud haskell, communication between nodes achieved by
using master-slave model (Other models are also there).
Here, out of 4 allocated nodes, one will behave as master and remaining as
slave and do further computation.
Please let me know, if you need further details.
Thanks in advance,
Jaya