Discussion:
Slurm versions 14.03.8 and 14.11.0-pre5 are now available
Danny Auble
2014-09-17 20:58:39 UTC
Permalink
Slurm versions 14.03.8 and 14.11.0-pre5 are now available. Version
14.03.8 includes quite a few relatively minor bug fixes.

Version 14.11.0 is under active development and its release is planned
in November 2014. Much of its features and performance enhancements
will be discussed next week at SLUG 2014 in Lugano Switzerland.

Note to all developers, code freeze for new features in 14.11 will be at
the end of this month (September).

Slurm downloads are available from http://www.schedmd.com/#repos.

Highlights of the 2 versions are these

* Changes in Slurm 14.03.8
==========================
-- Fix minor memory leak when Job doesn't have nodes on it (Meaning
the job
has finished)
-- Fix sinfo/sview to be able to query against nodes in reserved and other
states.
-- Make sbatch/salloc read in (SLURM|(SBATCH|SALLOC))_HINT in order to
handle sruns in the script that will use it.
-- srun properly interprets a leading "." in the executable name based
upon
the working directory of the compute node rather than the submit host.
-- Fix Lustre misspellings in hdf5 guide
-- Fix wrong reference in slurm.conf man page to what --profile option
should
be used for AcctGatherFilesystemType.
-- Update HDF5 document to point out the SlurmdUser is who creates the
ProfileHDF5Dir directory as well as all it's sub-directories and files.
-- CRAY NATIVE - Remove error message for srun's ran inside an salloc that
had --network= specified.
-- Defer job step initiation of required GRES are in use by other
steps rather
than immediately returning an error.
-- Deprecate --cpu_bind from sbatch and salloc. These never worked
correctly
and only caused confusion since the cpu_bind options mostly refer to a
step we opted to only allow srun to set them in future versions.
-- Modify sgather to work if Nodename and NodeHostname differ.
-- Changed use of JobContainerPlugin where it should be JobContainerType.
-- Fix for possible error if job has GRES, but the step explicitly
requests a
GRES count of zero.
-- Make "srun --gres=none ..." work when executed without a job
allocation.
-- Change the global eio_shutdown_time to a field in eio handle.
-- Advanced reservation fixes for heterogeneous systems, especially when
reserving cores.
-- If --hint=nomultithread is used in a job allocation make sure any
srun's
ran inside the allocation can read the environment correctly.
-- If batchdir can't be made set errno correctly so the slurmctld is
notified
correctly.
-- Remove repeated batch complete if batch directory isn't able to be made
since the slurmd will send the same message.
-- sacctmgr fix default format for list transactions.
-- BLUEGENE - Fix backfill issue with backfilling jobs on blocks already
reserved for higher priority jobs.
-- When creating job arrays the job specification files for each elements
are hard links to the first element specification files. If the
controller
fails to make the links the files are copied instead.
-- Fix error handling for job array create failure due to inability to
copy
job files (script and environment).
-- Added patch in the contribs directory for integrating make version
4.0 with
Slurm and renamed the previous patch "make-3.81.slurm.patch".
-- Don't wait for an update message from the DBD to finish before
sending rc
message back. In slow systems with many associations this could speed
responsiveness in sacctmgr after adding associations.
-- Eliminate race condition in enforcement of MaxJobCount limit for
job arrays.
-- Fix anomaly allocating cores for GRES with specific device/CPU mapping.
-- cons_res - When requesting exclusive access make sure we set the number
of cpus in the job_resources_t structure so as nodes finish the correct
cpu count is displayed in the user tools.
-- If the job_submit plugin calls take longer than 1 second to run,
print a
warning.
-- Make sure transfer_s_p_options transfers all the portions of the
s_p_options_t struct.
-- Correct the srun man page, the SLURM_CPU_BIND_VERBOSE,
SLURM_CPU_BIND_TYPE
SLURM_CPU_BIND_LIST environment variable are set only when
task/affinity
plugin is configured.
-- sacct - Initialize variables correctly to avoid incorrect structure
reference.
-- Performance adjustment to avoid calling a function multiple times
when it
only needs to be called once.
-- Give more correct waiting reason if job is waiting on association/QOS
MaxNode limit.
-- DB - When sending lft updates to the slurmctld only send
non-deleted lfts.
-- BLUEGENE - Fix documentation on how to build a reservation less than
a midplane.
-- If Slurmctld fails to read the job environment consider it an error
and abort the job.
-- Add the name of the node a job is running on to the message printed by
slurmstepd when terminating a job.
-- Remove unsupported options from sacctmgr help and the dump function.
-- Update sacctmgr man page removing reference to obsolete parameter
MaxProcSecondsPerJob.
-- Added more validity checking of incoming job submit requests.

* Changes in Slurm 14.11.0pre5
==============================
-- Fix sbatch --export=ALL, it was treated by srun as a request to
explicitly
export only the environment variable named "ALL".
-- Improve scheduling of jobs in reservations that overlap other
reservations.
-- Modify sgather to make global file systems easier to configure.
-- Added sacctmgr reconfig to reread the slurmdbd.conf in the slurmdbd.
-- Modify scontrol job operations to accept comma delimited list of
job IDs.
Applies to job update, hold, release, suspend, resume, requeue, and
requeuehold operations.
-- Refactor job_submit/lua interface. LUA FUNCTIONS NEED TO CHANGE! The
lua script no longer needs to explicitly load meta-tables, but
information
is available directly using names slurm.reservations, slurm.jobs,
slurm.log_info, etc. Also, the job_submit.lua script is reloaded when
updated without restarting the slurmctld daemon.
-- Allow users to specify --resv_ports to have value 0.
-- Cray MPMD (Multiple-Program Multiple-Data) support completed.
-- Added ability for "scontrol update" to references jobs by JobName (and
filtered optionally by UserID).
-- Add support for an advanced reservation start time that remains
constant
relative to the current time. This can be used to prevent the
starting of
longer running jobs on select nodes for maintenance purpose. See the
reservation flag "TIME_FLOAT" for more information.
-- Enlarge the jobid field to 18 characters in squeue output.
-- Added "scontrol write config" option to save a copy of the current
configuration in a file containing a time stamp.
-- Eliminate native Cray specific port management. Native Cray systems
must
now use the MpiParams configuration parameter to specify ports to
be used
for commmunications. When upgrading Native Cray systems from
version 14.03,
all running jobs should be killed and the switch_cray_state file (in
SaveStateLocation of the nodes where the slurmctld daemon runs) must be
explicitly deleted.

Loading...