Discussion:
how to alter priorities with interactive jobs
Satrajit Ghosh
2014-07-16 16:38:35 UTC
Permalink
hi folks,

we are trying to setup a cluster in a mixed usage scenario. thus far we
have had two slurm partitions (all_nodes, interactive). interactive at
present contains a single node that is also part of all_nodes.

---

PartitionName=all_nodes Default=YES MinNodes=1 AllowGroups=ALL Priority=1
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0
ReqResv=NO PreemptMode=GANG State=UP Nodes=node[001-030]

PartitionName=interactive Default=NO MinNodes=1 MaxNodes=1
DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG State=UP Nodes=node017

---
what we are trying to achieve is a balance between cluster utilization and
interactive jobs.

are there ways in which we can balance these two options effectively?

this would be our list of constraints:

1. compute resources are time sliced across jobs. (this is already the
case, but doesn't appear to be compatible with constraint #2)
2. an interactive job request should get priority and exclusive access
within at most the time slicing window (we are using the default 30s)
independent on the number of jobs running on the node.
3. we would like to control the max number of slots an interactive job
could ask for.
4. we would like these partitions to overlap. i.e. we don't want to carve
out compute resources for the interactive partition.

any guidance would be much appreciated. also, these nodes have 1:12 core to
memory ratio, so many jobs can be launched and suspended on any node.

cheers,

satra
j***@public.gmane.org
2014-07-16 18:46:52 UTC
Permalink
Post by Satrajit Ghosh
hi folks,
we are trying to setup a cluster in a mixed usage scenario. thus far we
have had two slurm partitions (all_nodes, interactive). interactive at
present contains a single node that is also part of all_nodes.
---
PartitionName=all_nodes Default=YES MinNodes=1 AllowGroups=ALL Priority=1
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0
ReqResv=NO PreemptMode=GANG State=UP Nodes=node[001-030]
PartitionName=interactive Default=NO MinNodes=1 MaxNodes=1
DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=NO GraceTime=0
MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG State=UP Nodes=node017
---
what we are trying to achieve is a balance between cluster utilization and
interactive jobs.
are there ways in which we can balance these two options effectively?
1. compute resources are time sliced across jobs. (this is already the
case, but doesn't appear to be compatible with constraint #2)
You'll want to set SelectTypeParameters to manage memory in order and
avoid overcommitting memory (e.g. CR_CPU_MEMORY, plus DefMemPerCPU,
MaxMemPerCPU, etc.).

I would remove the PreemptMode=GANG on each partition and instead set
PreemptMode=GANG,SUSPEND on a separate line to apply globally.

You'll also need to configure Shared=FORCE:1 in partition
"interactive" if you want it to preempt jobs running in the "all_node"
partition.

For more information, see:
http://slurm.schedmd.com/slurm.conf.html
http://slurm.schedmd.com/gang_scheduling.html
http://slurm.schedmd.com/preempt.html
Post by Satrajit Ghosh
2. an interactive job request should get priority and exclusive access
within at most the time slicing window (we are using the default 30s)
independent on the number of jobs running on the node.
There isn't an fundamental difference in prioritization or scheduling
for interactive jobs vs. batch jobs. You might use a job_submit plugin
to check for batch jobs (anything with a script) to set a nice value
on it and lower its scheduling priority. Also at some point, you can
exhaust memory and/or CPUs so jobs may need to get queued and wait for
resources.
http://slurm.schedmd.com/job_submit_plugins.html
Post by Satrajit Ghosh
3. we would like to control the max number of slots an interactive job
could ask for.
Slurm supports a bunch of per job, per user, and per account limits. See:
http://slurm.schedmd.com/resource_limits.html
Post by Satrajit Ghosh
4. we would like these partitions to overlap. i.e. we don't want to carve
out compute resources for the interactive partition.
No problem.
Post by Satrajit Ghosh
any guidance would be much appreciated. also, these nodes have 1:12 core to
memory ratio, so many jobs can be launched and suspended on any node.
cheers,
satra
Loading...