Checkpoint support using BLCR - Steps and needed packages

Marcin Stolarek

2014-08-05 22:49:37 UTC

Post by Trey Dockendorf
I have found that in order to support SUSPEND preemption we can not use
CR_Memory or Memory as a consumable resource. I've seen that if a
preemptable partition has requested 15900MB of RAM on a 16GB node then the
job will not be preempted and understandably so. Now I'm looking at how to
implement Preemption using Checkpoint. However I'm unable to find any
documentation on the exact behavior, configuration and necessary packages.

The job can be preempted only if it can fit in RAM. For example if 512GB
memory job would be preempted it will take a lot of time to swap whole
memory. It's better to check it on the queueing system level rather then
assume that you can use swap (i'm not sure how it would work for instance
on bluegene system).

Post by Trey Dockendorf
I have rebuilt the BLCR SRPM for my cluster, and am unsure which packages
are necessary for the various systems. I have the SLURM controller, SLURM
compute nodes and SLURM submit hosts (login nodes) that do not run the
slurm daemon but only submit jobs.
I'm also unsure what the expected behavior of when a job is preempted and
checkpointed. Will the job's state be saved? The documentation mentions
ImageDir but does not mention how it's set outside of interactive scontrol
commands. If I enable PreemptMode=CHECKPOINT, I'm just not clear on what
the expected behavior will be for a user's job.
Any guidance on how other sites have implemented BLCR checkpointing, and
your experiences would be useful.

It's quite difficult staff. And it's much more on MPI and BLCR side than on
Slurms.

cheers,
marcin