Discussion:
Upgrading and not losing jobs
Dennis Zheleznyak
2014-08-24 08:57:31 UTC
Permalink
Hi,

I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried to
simulate it in a virtual environment the running jobs were deleted every
single time.

Is it possible to upgrade the version without losing the running jobs?

Thank you,
Dennis.,
Uwe Sauter
2014-08-24 09:04:31 UTC
Permalink
Hi Dennis,

I started using SLURM only a few weeks ago but I suspect that an update
from 2.4.x to 14.03.x in a single step is not possible because of too
many changes in internal structures (both job state information and
database).

There is on entry in the FAQ

http://slurm.schedmd.com/faq.html#state_preserve

which indicates that you probably have to have a maintenance on your
cluster for a major version update.


Regards,

Uwe
Post by Dennis Zheleznyak
Hi,
I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried
to simulate it in a virtual environment the running jobs were deleted
every single time.
Is it possible to upgrade the version without losing the running jobs?
Thank you,
Dennis.,
Chris Samuel
2014-08-24 09:22:28 UTC
Permalink
Post by Dennis Zheleznyak
I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried to
simulate it in a virtual environment the running jobs were deleted every
single time.
As Uwe said I suspect that's too large a jump to be supported, you might want
to test 2.4.4 -> 2.6.9 first to see if that will work.

Also - do you mean keeping running jobs or just queued jobs?

Best of luck,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Dennis Zheleznyak
2014-08-25 05:27:35 UTC
Permalink
Both running and queued
Post by Chris Samuel
Post by Dennis Zheleznyak
I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried
to
Post by Dennis Zheleznyak
simulate it in a virtual environment the running jobs were deleted every
single time.
As Uwe said I suspect that's too large a jump to be supported, you might want
to test 2.4.4 -> 2.6.9 first to see if that will work.
Also - do you mean keeping running jobs or just queued jobs?
Best of luck,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
Marcin Stolarek
2014-08-25 06:39:36 UTC
Permalink
Post by Dennis Zheleznyak
Both running and queued
From theoretical point of view it's possible to upgrade from 2.4.4 to
14.11 version, for sure you have to do that step by step, since slurm
protocol is consistent only between three subsequent versions.

However the overall success is also dependent on your skills and
experience. For example, I'll suggest increasing SlurmdTimeout to a few
hours, you should also follow the procedure of upgrading slurmdbd first and
check if all plugins (spank, job_submit) used in your site (and present in
configuration) are going to be available. If you have written your own
plugin you should check if it will compile against new slurm version (for
example job_submit plugin api have changed).

cheers,
marcin
Post by Dennis Zheleznyak
Post by Chris Samuel
Post by Dennis Zheleznyak
I'm upgrading Slurm from 2.4.4 to the latest 14.X version, when I tried
to
Post by Dennis Zheleznyak
simulate it in a virtual environment the running jobs were deleted every
single time.
As Uwe said I suspect that's too large a jump to be supported, you might want
to test 2.4.4 -> 2.6.9 first to see if that will work.
Also - do you mean keeping running jobs or just queued jobs?
Best of luck,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
http://www.vlsci.org.au/ http://twitter.com/vlsci
Continue reading on narkive:
Loading...