Discussion:
Checkpoint support using BLCR - Steps and needed packages
Trey Dockendorf
2014-08-05 17:10:32 UTC
Permalink
I have found that in order to support SUSPEND preemption we can not use CR_Memory or Memory as a consumable resource. I've seen that if a preemptable partition has requested 15900MB of RAM on a 16GB node then the job will not be preempted and understandably so. Now I'm looking at how to implement Preemption using Checkpoint. However I'm unable to find any documentation on the exact behavior, configuration and necessary packages.

I have rebuilt the BLCR SRPM for my cluster, and am unsure which packages are necessary for the various systems. I have the SLURM controller, SLURM compute nodes and SLURM submit hosts (login nodes) that do not run the slurm daemon but only submit jobs.

I'm also unsure what the expected behavior of when a job is preempted and checkpointed. Will the job's state be saved? The documentation mentions ImageDir but does not mention how it's set outside of interactive scontrol commands. If I enable PreemptMode=CHECKPOINT, I'm just not clear on what the expected behavior will be for a user's job.

Any guidance on how other sites have implemented BLCR checkpointing, and your experiences would be useful.

Thanks,
- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org
Marcin Stolarek
2014-08-05 22:49:37 UTC
Permalink
Post by Trey Dockendorf
I have found that in order to support SUSPEND preemption we can not use
CR_Memory or Memory as a consumable resource. I've seen that if a
preemptable partition has requested 15900MB of RAM on a 16GB node then the
job will not be preempted and understandably so. Now I'm looking at how to
implement Preemption using Checkpoint. However I'm unable to find any
documentation on the exact behavior, configuration and necessary packages.
The job can be preempted only if it can fit in RAM. For example if 512GB
memory job would be preempted it will take a lot of time to swap whole
memory. It's better to check it on the queueing system level rather then
assume that you can use swap (i'm not sure how it would work for instance
on bluegene system).
Post by Trey Dockendorf
I have rebuilt the BLCR SRPM for my cluster, and am unsure which packages
are necessary for the various systems. I have the SLURM controller, SLURM
compute nodes and SLURM submit hosts (login nodes) that do not run the
slurm daemon but only submit jobs.
I'm also unsure what the expected behavior of when a job is preempted and
checkpointed. Will the job's state be saved? The documentation mentions
ImageDir but does not mention how it's set outside of interactive scontrol
commands. If I enable PreemptMode=CHECKPOINT, I'm just not clear on what
the expected behavior will be for a user's job.
Any guidance on how other sites have implemented BLCR checkpointing, and
your experiences would be useful.
It's quite difficult staff. And it's much more on MPI and BLCR side than on
Slurms.

cheers,
marcin

Loading...