New Slurm releases and Slurm User Group Meeting

j***@public.gmane.org

2014-08-19 22:11:45 UTC

The 2014 Slurm User Group Meeting will be held on September 23 and 24
in Lugano,
Switzerland. The meeting will include an assortment of tutorials, technical
presentations, and site reports. Prof. Felix Schürmann with the European Human
Brain Project will be our keynote speaker. Early registration for ends this
week. For more information, see
http://slurm.schedmd.com/slurm_ug_agenda.html

Slurm versions 14.03.7 and 14.11.0-pre4 are now available.
Version 14.03.7 includes quite a few relatively minor bug fixes.
Version 14.11.0-pre4 includes a new job array data structure and APIs for
managing job arrays. These changes provide vastly improved scalability with
respect to job arrays. Version 14.11.0 is under active development and its
release is planned in November 2014.

Slurm downloads are available from
http://www.schedmd.com/#repos

Highlights of changes in Slurm version 14.03.7 include:
-- Correct typos in man pages.
-- Add note to MaxNodesPerUser and multiple jobs running on the same node
counting as multiple nodes.
-- PerlAPI - fix renamed call from slurm_api_set_conf_file to
slurm_conf_reinit.
-- Fix gres race condition that could result in job deallocation
error message.
-- Correct NumCPUs count for jobs with --exclusive option.
-- When creating reservation with CoreCnt, check that Slurm uses
SelectType=select/cons_res, otherwise don't send the request to slurmctld
and return an error.
-- Save the state of scheduled node reboots so they will not be lost
should the
slurmctld restart.
-- In select/cons_res plugin - Insure the node count does not exceed the task
count.
-- switch/nrt - Unload tables rather than windows at job end, to release CAU.
-- When HealthCheckNodeState is configured as IDLE don't run the
HealthCheckProgram for nodes in any other states than IDLE.
-- Minor sanity check to verify the string sent in isn't NULL when using
bit_unfmt.
-- CRAY NATIVE - Fix issue on heavy systems to only run the NHC once per
job/step completion.
-- Remove unneeded step cleanup for pending steps.
-- Fix issue where if a batch job was manually requeued the batch step
information wasn't stored in accounting.
-- When job is release from a requeue hold state clean up its previous
exit code.
-- Correct the srun man page about how the output from the user application
is sent to srun.
-- Increase the timeout of the main thread while waiting for the i/o thread.
Allow up to 180 seconds for the i/o thread to complete.
-- When using sacct -c to read the job completion data compute the correct
job elapsed time.
-- Perl package: Define some missing node states.
-- When using AccountingStorageType=accounting_storage/mysql zero out the
database index for the array elements avoiding duplicate database values.
-- Reword the explanation of cputime and cputimeraw in the sacct man page.
-- JobCompType allows "jobcomp/mysql" as valid name but the code used
"job_comp/mysql" setting an incorrect default database.
-- Try to load libslurm.so only when necessary.
-- When nodes scheduled for reboot, set state to DOWN rather than FUTURE so
they are still visible to sinfo. State set to IDLE after reboot completes.
-- Apply BatchStartTimeout configuration to task launch and avoid aborting
srun commands due to long running Prolog scripts.
-- Fix minor memory leaks when freeing node_info_t structure.
-- Fix various memory leaks in sview
-- If a batch script is requeued and running steps get correct exit
code/signal
previous it was always -2.
-- If step exitcode hasn't been set display with sacct the -2 instead
of acting like it is a signal and exitcode.
-- Send calculated step_rc for batch step instead of raw status as
done for normal steps.
-- If a job times out, set the exit code in accounting to 1 instead of the
signal 1.
-- Update the acct_gather.conf.5 man page removing the reference to
InfinibandOFEDFrequency.
-- Fix gang scheduling for jobs submitted to multiple partitions.
-- Enable srun to submit job to multiple partitions.
-- Update slurm.conf man page. When Epilog or Prolog fail the node state
is set ro DRAIN.
-- Start a job in the highest priority partition possible, even if
it requires
preempting other jobs and delaying initiation, rather than using a lower
priority partition. Previous logic would preempt lower priority jobs, but
then might start the job in a lower priority partition and not use the
resources released by the preempted jobs.
-- Fix SelectTypeParameters=CR_PACK_NODES for srun making both job and step
resource allocation.
-- BGQ - Make it possible to pack multiple tasks on a core when not using
the entire cnode.
-- MYSQL - if unable to connect to mysqld close connection that was inited.
-- DBD - when connecting make sure we wait MessageTimeout + 5 since the
timeout when talking to the Database is the same timeout so a race
condition could occur in the requesting client when receiving the response
if the database is unresponsive.

Highlights of changes in Slurm version 14.11.0pre4 include:
-- Added job array data structure and removed 64k array size restriction.
-- Added SchedulerParameters options of bf_max_job_array_resv to control how
many tasks of a job array should have resources reserved for them.
-- Added more validity checking of incoming job submit requests.
-- Added srun --export option to set/export specific environment variables.
-- Scontrol modified to print separate error messages for job arrays with
different exit codes on the different tasks of the job array. Applies to
job suspend and resume operations.
-- Fix race condition in CPU frequency set with job preemption.
-- Always call select plugin on step termination, even if the job is also
complete.
-- Srun executable names beginning with "." will be resolved based upon the
working directory and path on the compute node rather than the
submit node.
-- Add node state string suffix of "$" to identify nodes in maintenance
reservation or scheduled for reboot. This applies to scontrol, sinfo,
and sview commands.
-- Enable scontrol to clear a node's scheduled reboot by setting its state
to "RESUME".
-- As per sbatch and srun documentation when the --signal option is used
signal only the steps and unless, in the case, of a batch job B is
specified in which case signal only the batch script.
-- Modify AuthInfo configuration parameter to accept credential lifetime
option.
-- Modify crypto/munge plugin to use socket and timeout specified in
AuthInfo.
-- If we have a state for a step on completion put that in the database
instead of guessing off the exit_code.
-- Added squeue -P/--priority option that can be used to display pending jobs
in the same order as used by the Slurm scheduler even if jobs are
submitted
to multiple partitions (job is reported once per usable partition).
-- Improve the pending reason description for various QOS limits. For each
QOS limit that causes a job to be pending print its specific reason.
For example if job pends because of GrpCpus the squeue command will
print QOSGrpCpuLimit as pending reason.
-- sched/backfill - Set expected start time of job submitted to multiple
partitions to the earliest start time on any of the partitions.
-- Introduce a MAX_BATCH_REQUEUE define that indicates how many times a job
can be requeued. When the number is reached the job is put on hold
with reason JobHoldMaxRequeue.
-- Add sbatch job array option to limit the number of simultaneously running
tasks from a job array (e.g. "--array=0-15%4").
-- Implemented a new QOS limit MinCPUs. Users running under a QOS must
request a minimum number of CPUs which is at least MinCPUs otherwise
their job will pend.
-- Introduced a new pending reason WAIT_QOS_MIN_CPUS to reflect the new QOS
limit.
-- Job array dependency based upon state is now dependent upon the state of
the array as a whole (e.g. afterok requires ALL tasks to complete
successfully, afternotok is true if ANY tasks does not complete
successfully, and after requires all tasks to at least be started).
-- The srun -u/--unbuffered options set the stdout of the task launched
by srun to be line buffered.
-- The srun options -/--label and -u/--unbuffered can be specified together.
This limitation has been removed.
-- Provide sacct display of gres accounting information per job.

--
Morris "Moe" Jette
CTO, SchedMD LLC

Slurm User Group Meeting
September 23-24, Lugano, Switzerland
Find out more http://slurm.schedmd.com/slurm_ug_agenda.html