Marcin Sliwowski
2014-09-23 19:03:34 UTC
I'm running version 2.6.9 and wondering if the preemption algorithm
takes into account the topology, as defined in topology.conf, when it
selects which jobs to preempt to make room for a new higher priority MPI
job.
Based on what I have seen it appears that it doesn't.
The reason I ask is that we define our infiniband topology as 8
individual fabrics because we have 8 bladecenters that each have their
own fabric, they are not interconnected, one partition includes all 8
bladecenters, 32 nodes per bladecenter.
Eventually enough jobs are preempted and the MPI job is scheduled into a
bladecenter, but it comes at the cost of many jobs. The main problem is
that it preempts jobs on bladecenters where the MPI job does not
ultimately land.
If it took into consideration our defined topology and focused on
preempting jobs that reside in a single bladecenter, it could make room
for the MPI job with a much lower number of preempted jobs.
We have been scratching our heads on this one for a while.
SelectType=select/cons_res
PreemptType=preempt/partition_prio
TopologyPlugin=topology/tree
Thanks
takes into account the topology, as defined in topology.conf, when it
selects which jobs to preempt to make room for a new higher priority MPI
job.
Based on what I have seen it appears that it doesn't.
The reason I ask is that we define our infiniband topology as 8
individual fabrics because we have 8 bladecenters that each have their
own fabric, they are not interconnected, one partition includes all 8
bladecenters, 32 nodes per bladecenter.
Eventually enough jobs are preempted and the MPI job is scheduled into a
bladecenter, but it comes at the cost of many jobs. The main problem is
that it preempts jobs on bladecenters where the MPI job does not
ultimately land.
If it took into consideration our defined topology and focused on
preempting jobs that reside in a single bladecenter, it could make room
for the MPI job with a much lower number of preempted jobs.
We have been scratching our heads on this one for a while.
SelectType=select/cons_res
PreemptType=preempt/partition_prio
TopologyPlugin=topology/tree
Thanks
--
Marcin Sliwowski | ***@RENCI | 919-445-0479
Marcin Sliwowski | ***@RENCI | 919-445-0479