Hi Eva!
As Sergio said, you have to specify the compute nodes with "NodeName=..." and then define partitions including those cnodes with "PartitionName=... Nodes=.." without including the head nodes or the login nodes. Also you could set in slurm.conf file the parameter "AllocNodes=..." where usually we give the login nodes only, in order to disable submission on any nodes other than the login nodes.
So my question now, is node hpcdev-005.sdsc.edu a login node or a master/admin node. I mean from that node or from a different one did you do the submission? Because if this one is the login node then there is no error at all, this is the default behaviour of salloc.
The default salloc (you can change it) returns you a shell on the node where the submission took place and then with srun commands you can execute programs on the compute nodes.
So in case you want an interactive shell on the compute nodes then you should execute:
"salloc -N1 -p active srun -N1 --pty sh"
or directly an srun command (without salloc involved):
"srun -N1 -p active --pty sh"
Best Regards,
Chrysovalantis Paschoulas
On 09/12/2014 09:19 AM, Sergio Iserte wrote:
Hello Eva,
you must remove the management nodes from the field "Nodes" of the "PartitionName" parameter.
With the slurm.conf file would be easier to write an example, anyway this should work!
Regards,
Sergio.
2014-09-12 9:06 GMT+02:00 Uwe Sauter <uwe.sauter.de-***@public.gmane.org<mailto:uwe.sauter.de-***@public.gmane.org>>:
Hi Eva,
if you don't want to use the controller node for jobs, the easiest way
is to not configure it as node at all. Meaning you don't need a line like
NodeName=hpc-0-5 RealMemory=....
for the controller.
A program/user can find out which nodes are allocated by looking into
the environment variables. Try running salloc and then
$ env | grep SLURM
Here is an example output:
SLURM_NODELIST=n523601
SLURM_NODE_ALIASES=(null)
SLURM_NNODES=1
SLURM_JOBID=6437
SLURM_TASKS_PER_NODE=40
SLURM_JOB_ID=6437
SLURM_SUBMIT_DIR=/nfs/admins/adm17
SLURM_JOB_NODELIST=n523601
SLURM_JOB_CPUS_PER_NODE=40
SLURM_SUBMIT_HOST=frontend
SLURM_JOB_PARTITION=foo
SLURM_JOB_NUM_NODES=1
Regards,
Uwe
Post by Eva HocksI am trying to configure the latest slurm 14.03 and am running into
problem to prevent slurm from running jobs on the control node.
active up 2:00:00 1 down* hpc-0-5
active up 2:00:00 1 mix hpc-0-4
active up 2:00:00 1 idle hpc-0-6
but when I use salloc I end up on the head node
$ salloc -N 1 -p active sh
salloc: Granted job allocation 16
sh-4.1$ hostname
hpcdev-005.sdsc.edu<http://hpcdev-005.sdsc.edu>
That node is not part of the "active" partition but slurm still uses it.
How? The allocation btw is for NodeList=hpc-0-4
and the user can login to that node without a problem but slurm doesn't
run the sh on that node for the user.
Also how can a user find out what nodes are allocated without having to
run the scontrol command? Is there an option in salloc to return the
host names?
Thanks
Eva
--
Sergio Iserte Agut, research assistant,
High Performance Computing & Architecture
Jaume I University (Castellón, Spain)
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------