Trey Dockendorf
2014-09-19 21:44:35 UTC
I've been documenting for my users how to move from Torque to SLURM and what that means for running MPI jobs. Based on the SLURM documentation I've come up with the following:
$ slurm.conf
MpiDefault=none
MpiParams=ports=30000-39999
Then users run...
OpenMPI:
srun --mpi=openmpi --resv-ports /path/to/executable
MVAPICH2:
srun --mpi=none /path/to/executable
To test this and ensure I'm not giving bad instructions I've been running small 2 node HPL tests (to also test IB functionality), and this is when things go bad:
$ salloc -N2 --ntasks-per-node=32 --cpus-per-task=1 --mem-per-cpu=1900 -p mpi-core32
$ module load gcc openmpi openblas
$ srun --mpi=openmpi --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl
<LOTS of errors>
$ srun --mpi=pmi2 --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl
< no errors >
Our install of OpenMPI was compiled like so:
../openmpi-1.8.2/configure --prefix=/apps/gcc-4.8.2/openmpi/1.8.2 \
--libdir=/apps/gcc-4.8.2/openmpi/1.8.2/lib64 \
--with-slurm --with-pmi --with-verbs \
--enable-shared --enable-static \
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64
The SLURM documentation [1] seems to indicate that the --mpi type should be OpenMPI. I'm finding though that if I set MpiDefault=pmi2 then I'm able to run both OpenMPI and MVAPICH2 without the "--mpi" argument or the "--resv-ports" argument.
MVAPICH2 was compiled using " --with-pm=no --with-pmi=slurm".
Is it the case that if OpenMPI is compiled with "--with-pmi" and "--with-slurm" then the pmi2 MPI plugin should be used?
Is "--resv-ports" necessary given how OpenMPI was compiled?
Thanks,
- Trey
[1] http://slurm.schedmd.com/mpi_guide.html#open_mpi
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org
$ slurm.conf
MpiDefault=none
MpiParams=ports=30000-39999
Then users run...
OpenMPI:
srun --mpi=openmpi --resv-ports /path/to/executable
MVAPICH2:
srun --mpi=none /path/to/executable
To test this and ensure I'm not giving bad instructions I've been running small 2 node HPL tests (to also test IB functionality), and this is when things go bad:
$ salloc -N2 --ntasks-per-node=32 --cpus-per-task=1 --mem-per-cpu=1900 -p mpi-core32
$ module load gcc openmpi openblas
$ srun --mpi=openmpi --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl
<LOTS of errors>
Need at least 64 processes for these tests <<<
Then...$ srun --mpi=pmi2 --resv-ports $HOME/hpl/bin/openblas_openmpi/xhpl
< no errors >
Our install of OpenMPI was compiled like so:
../openmpi-1.8.2/configure --prefix=/apps/gcc-4.8.2/openmpi/1.8.2 \
--libdir=/apps/gcc-4.8.2/openmpi/1.8.2/lib64 \
--with-slurm --with-pmi --with-verbs \
--enable-shared --enable-static \
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64
The SLURM documentation [1] seems to indicate that the --mpi type should be OpenMPI. I'm finding though that if I set MpiDefault=pmi2 then I'm able to run both OpenMPI and MVAPICH2 without the "--mpi" argument or the "--resv-ports" argument.
MVAPICH2 was compiled using " --with-pm=no --with-pmi=slurm".
Is it the case that if OpenMPI is compiled with "--with-pmi" and "--with-slurm" then the pmi2 MPI plugin should be used?
Is "--resv-ports" necessary given how OpenMPI was compiled?
Thanks,
- Trey
[1] http://slurm.schedmd.com/mpi_guide.html#open_mpi
=============================
Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org