Satrajit Ghosh
2014-08-05 21:46:53 UTC
hi
out cluster is setup with the configuration below. yet we have been having
a lot of jobs cancelled when preempted:
slurmd[node004]: *** JOB 79188 CANCELLED AT 2014-08-05T15:31:41 DUE TO
PREEMPTION ***
i thought the settings would simply suspend the job instead of canceling it.
cheers,
satra
Partial configuration
---------------------------
PreemptMode=GANG,SUSPEND
PreemptType=preempt/partition_prio
# default
SchedulerTimeSlice=30
DefMemPerCPU=2048
DefMemPerNode=2048
PartitionName=DEFAULT MaxTime=7-0 DefaultTime=24:00:00
# Partitions
PartitionName=defq Default=NO MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=YES Shared=NO GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP
PartitionName=om_all_nodes Default=YES MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP Nodes=node[001-030]
PartitionName=om_interactive Default=NO MinNodes=1 MaxNodes=1
DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:1 GraceTime=0
MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG,SUSPEND State=UP Nodes=node017
out cluster is setup with the configuration below. yet we have been having
a lot of jobs cancelled when preempted:
slurmd[node004]: *** JOB 79188 CANCELLED AT 2014-08-05T15:31:41 DUE TO
PREEMPTION ***
i thought the settings would simply suspend the job instead of canceling it.
cheers,
satra
Partial configuration
---------------------------
PreemptMode=GANG,SUSPEND
PreemptType=preempt/partition_prio
# default
SchedulerTimeSlice=30
DefMemPerCPU=2048
DefMemPerNode=2048
PartitionName=DEFAULT MaxTime=7-0 DefaultTime=24:00:00
# Partitions
PartitionName=defq Default=NO MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=YES Shared=NO GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP
PartitionName=om_all_nodes Default=YES MinNodes=1 DefaultTime=1-00:00:00
MaxTime=7-00:00:00 AllowGroups=ALL Priority=1 DisableRootJobs=NO
RootOnly=NO Hidden=NO Shared=FORCE:4 GraceTime=0 ReqResv=NO
PreemptMode=GANG,SUSPEND State=UP Nodes=node[001-030]
PartitionName=om_interactive Default=NO MinNodes=1 MaxNodes=1
DefaultTime=01:00:00 MaxTime=01:00:00 AllowGroups=ALL Priority=10
DisableRootJobs=NO RootOnly=NO Hidden=NO Shared=FORCE:1 GraceTime=0
MaxCPUsPerNode=32 ReqResv=NO PreemptMode=GANG,SUSPEND State=UP Nodes=node017