Slurm versions 14.03.5 and 14.11.0-pre2 are now available

Discussion:

j***@public.gmane.org

2014-07-10 21:24:39 UTC

Slurm versions 14.03.5 and 14.11.0-pre2 are now available. Version
14.03.5 includes about 40 relatively minor bug fixes and enhancements
as described below. Version 14.11.0-pre2 is the second pre-release of
the next major release of Slurm scheduled for November 2014. This is
very much a work in progress and not intended for production use.

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of changes in Slurm version 14.03.5 include:
-- If a srun runs in an exclusive allocation and doesn't use the entire
allocation and CR_PACK_NODES is set layout tasks appropriately.
-- Correct Shared field in job state information seen by scontrol,
sview, etc.
-- Print Slurm error string in scontrol update job and reset the Slurm errno
before each call to the API.
-- Fix task/cgroup to handle -mblock:fcyclic correctly
-- Fix for core-based advanced reservations where the distribution of cores
across nodes is not even.
-- Fix issue where association maxnodes wouldn't be evaluated correctly if a
QOS had a GrpNodes set.
-- GRES fix with multiple files defined per line in gres.conf.
-- When a job is requeued make sure accounting marks it as such.
-- Print the state of requeued job as REQUEUED.
-- Fix if a job's partition was taken away from it don't allow a requeue.
-- Make sure we lock on the conf when sending slurmd's conf to the
slurmstepd.
-- Fix issue with sacctmgr 'load' not able to gracefully handle bad formatted
file.
-- sched/backfill: Correct job start time estimate with advanced
reservations.
-- Error message added when in proctrack/cgroup the step freezer path isn't
able to be destroyed for debug.
-- Added extra index's into the database for better performance when
deleting users.
-- Fix issue with wckeys when tracking wckeys, but not enforcing them,
you could get multiple '*' wckeys.
-- Fix bug which could report to squeue the wrong partition for a running job
that is submitted to multiple partitions.
-- Report correct CPU count allocated to job when allocated whole
node even if
not using all CPUs.
-- If job's constraints cannot be satisfied put it in pending state
with reason
BadConstraints and don't remove it.
-- sched/backfill - If job started with infinite time limit, set its end_time
one year in the future.
-- Clear record of a job's gres when requeued.
-- Clear QOS GrpUsedCPUs when resetting raw usage if QOS is not
using any cpus.
-- Remove log message left over from debugging.
-- When using CR_PACK_NODES fix make --ntasks-per-node work correctly.
-- Report correct partition associated with a step if the job is submitted to
multiple partitions.
-- Fix to allow removing of preemption from a QOS
-- If the proctrack plugins fail to destroy the job container print an error
message and avoid to loop forever, give up after 120 seconds.
-- Make srun obey POSIX convention and increase the exit code by 128 when the
process terminated by a signal.
-- Sanity check for acct_gather_energy/rapl
-- If the proctrack plugins fail to destroy the job container print an error
message and avoid to loop forever, give up after 120 seconds.
-- If the sbatch command specifies the option --signal=B:signum sent
the signal
to the batch script only.
-- If we cancel a task and we have no other exit code send the signal and
exit code.
-- Added note about InnoDB storage engine being used with MySQL.
-- Set the job exit code when the job is signaled and set the log level to
debug2() when processing an already completed job.
-- Reset diagnostics time stamp when "sdiag --reset" is called.
-- squeue and scontrol to report a job's "shared" value based upon partition
options rather than reporting "unknown" if job submission does not use
--exclusive or --shared option.
-- task/cgroup - Fix cpuset binding for batch script.
-- sched/backfill - Fix anomaly that could result in jobs being scheduled out
of order.
-- Expand pseudo-terminal size data structure field sizes from 8 to 16 bits.
-- Set the job exit code when the job is signaled and set the log level to
debug2() when processing an already completed job.
-- Distinguish between two identical error messages.
-- If using accounting_storage/mysql directly without a DBD fix issue with
start of requeued jobs.
-- If a job fails because of batch node failure and the job is
requeued and an
epilog complete message comes from that node do not process the batch step
information since the job has already been requeued because the epilog
script running isn't guaranteed in this situation.
-- Change message to note a NO_VAL for return code could of come from node
failure as well as interactive user.
-- Modify test4.5 to only look at one partition instead of all of them.
-- Fix sh5util -u to accept username different from the user that runs the
command.
-- Corrections to man pages:salloc.1 sbatch.1 srun.1 nonstop.conf.5
slurm.conf.5.
-- Restore srun --pty resize ability.
-- Have sacctmgr dump cluster handle situations where users or such have
special characters in their names like ':'

Highlights of changes in Slurm version 14.11.0pre2 (pre-release) include:
-- Added AllowSpecResourcesUsage configuration parameter in slurm.conf. This
allows jobs to use specialized resources on nodes allocated to them if the
job designates --core-spec=0.
-- Add new SchedulerParameters option of build_queue_timeout to throttle how
much time can be consumed building the job queue for scheduling.
-- Added HealthCheckNodeState option of "cycle" to cycle through the compute
nodes over the course of HealthCheckInterval rather than running all at
the same time.
-- Add job "reboot" option for Linux clusters. This invokes the configured
RebootProgram to reboot nodes allocated to a job before it begins
execution.
-- Added squeue -O/--Format option that makes all job and step
fields available
for printing.
-- Improve database slurmctld entry speed dramatically.
-- Add "CPUs" count to output of "scontrol show step".
-- Add support for lua5.2
-- scancel -b signals only the batch step neither any other step nor any
children of the shell script.
-- MySQL - enforce NO_ENGINE_SUBSTITUTION
-- Added CpuFreqDef configuration parameter in slurm.conf to specify the
default CPU frequency and governor to be set at job end.
-- Added support for job email triggers: TIME_LIMIT, TIME_LIMIT_90 (reached
90% of time limit), TIME_LIMIT_80 (reached 80% of time limit), and
TIME_LIMIT_50 (reached 50% of time limit). Applies to salloc, sbatch and
srun commands.
-- In slurm.conf add the parameter SrunPortRange=min-max. If this is
configured
then srun will use its dynamic ports only from the configured range.
-- Make debug_flags 64 bit to handle more flags.

Janne Blomqvist

2014-07-15 07:04:32 UTC

Permalink

Post by j***@public.gmane.org
Slurm versions 14.03.5 and 14.11.0-pre2 are now available. Version
14.03.5 includes about 40 relatively minor bug fixes and enhancements
as described below.
-- Added extra index's into the database for better performance when
deleting users.

It would have been nice if this change would have been mentioned in big
blinking colorful letters or something like that.

As it is, we routinely updated to 14.03.5 from 14.03.4 without any
maintenance break or such, after all just a bugfix release, what could
go wrong? Well, what did go wrong was that slurmdbd was offline for 40
minutes while it added those extra indexes.. :(

Other than that, 14.03.5 seems to be running fine here.

--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || janne.blomqvist-***@public.gmane.org

Uwe Sauter

2014-07-15 07:17:29 UTC

Permalink

Post by Janne Blomqvist

It would have been nice if this change would have been mentioned in
big blinking colorful letters or something like that.
As it is, we routinely updated to 14.03.5 from 14.03.4 without any
maintenance break or such, after all just a bugfix release, what could
go wrong? Well, what did go wrong was that slurmdbd was offline for 40
minutes while it added those extra indexes.. :(
Other than that, 14.03.5 seems to be running fine here.

Hi Janne,

did you experience any problems or was the database just not available
and slurmctld cached all new information until the DB was online again?

Regards,

Uwe

Janne Blomqvist

2014-07-15 08:26:29 UTC

Permalink

Post by Uwe Sauter

Post by Janne Blomqvist

It would have been nice if this change would have been mentioned in
big blinking colorful letters or something like that.
As it is, we routinely updated to 14.03.5 from 14.03.4 without any
maintenance break or such, after all just a bugfix release, what
could go wrong? Well, what did go wrong was that slurmdbd was offline
for 40 minutes while it added those extra indexes.. :(
Other than that, 14.03.5 seems to be running fine here.

Hi Janne,
did you experience any problems or was the database just not available
and slurmctld cached all new information until the DB was online again?

Well, things which would require the DB would either fail (e.g. sacct,
sshare, ...), or slurmctld would be able to queue them (e.g. job
completion records). AFAICT we didn't lose any jobs so in that sense no
disaster happened. Essentially the situation was equivalent to slurmdbd
being down.

It seems the slurmdbd startup procedure is to first do any DB updates,
and then start listening for new connections. I vaguelly recall that
MySQL locks tables for updates, so doing the update concurrently with
other work might not help.

--
Janne Blomqvist, D.Sc. (Tech.), Scientific Computing Specialist
Aalto University School of Science, PHYS & BECS
+358503841576 || janne.blomqvist-***@public.gmane.org