Discussion:
Using Constraints
Dennis Zheleznyak
2014-07-13 08:54:38 UTC
Permalink
Hi All,

Recently we've upgraded our storage system with more network cards so all
the nodes in the cluster can see it. Since then, we tried running the
command:
sbatch –n <no_of_cores> -C[rack1|rack2|rack3] –c<no_of_cores> <script>

However, when the job is queued, the other jobs that I'm trying to send (to
other nodes and features) using a normal sbatch command without a
constraint is being queued as well even though there are free resources.
When canceling the job with the -C options, jobs are queued and executed
properly, it only happens when I send the job with the -C first.

Why is this happening and how can I resolve it?

Part of slurm.conf:
Slurm Configuration:
# SCHEDULING
FastSchedule=0
SchedulerType=sched/backfill
SchedulerPort=7321
SelectType=select/cons_res
SelectTypeParameters=CR_Core_Memory
#
#
# JOB PRIORITY
PriorityType=priority/basic
#
# JOB PREEMPTION (optional)
PreemptMode=requeue
PreemptType=preempt/partition_prio

*Node Configuration:*
#rack1
NodeName=hnmp[106-164] NodeAddr=X.X.X.[106-164] Sockets=2 CoresPerSocket=4
ThreadsPerCore=1 RealMemory=1 State=UNKNOWN Feature="par,rack1"
NodeName=hnmp[101-105] NodeAddr=X.X.X.[101-105] Sockets=2 CoresPerSocket=4
ThreadsPerCore=1 RealMemory=1 State=UNKNOWN Feature="pls"
#
#rack2
NodeName=hnmp[27-80] NodeAddr=X.X.X.[27-80] Procs=12 RealMemory=1
State=UNKNOWN Feature="par,rack2"
#
#rack3
NodeName=hnmp[5001-5056] NodeAddr=X.X.X.[1-56] Procs=16 RealMemory=1
State=UNKNOWN Feature="par,rack3"
#rack4
NodeName=hnmp[5057-5100] NodeAddr=X.X.X.[57-100] Procs=16 RealMemory=1
State=UNKNOWN Feature="par,rack4"

*Partitions Properties:*
#Partitions
#
# priority partitions
#
PartitionName=low Nodes=hnmp[106-164] Default=NO MaxTime=INFINITE State=UP
Shared=NO Priority=10 PreemptMode=requeue
PartitionName=hi Nodes=hnmp[106-164] Default=NO MaxTime=INFINITE State=UP
Shared=NO Priority=30 PreemptMode=off
PartitionName=med Nodes=hnmp[106-164] Default=NO MaxTime=INFINITE State=UP
Shared=NO Priority=20 PreemptMode=off
# lsdyna partiton
PartitionName=lsall Nodes=hnmp[05-07,09-16] Default=NO MaxTime=INFINITE
State=UP Shared=NO Priority=10 PreemptMode=off
# Default partition
#
PartitionName=hnm
Nodes=hnmp[01-16,18-26,101-164,27-80,165-176,181-196,5001-5100] Default=YES
MaxTime=INFINITE State=UP Shared=NO Priority=20 PreemptMode=off
# Backfill partition
#
PartitionName=hpc Nodes=hnmp[101-164,27-80,5001-5100] MaxTime=7-0 State=UP
Shared=NO Priority=20 PreemptMode=off

Continue reading on narkive:
Loading...