Discussion:
Enforcing qos limits without associations limits
Marcin Stolarek
2014-07-31 07:52:33 UTC
Permalink
Hi guys,

In our installation we have a separate job_submit plugin which checks the
account validity directly in LDAP. We would like to disable associations
enforcement, but in current configuration we are using qos limits which are
limiting number of jobs and cores per user (user can choose to use qos
normal - with longer jobs and lower number of jobs/cores running/allocated,
or qos short which is allowing more resources with lower walltime limit).

I haven't yet checked in code, do you think there is an ease way to remove
accounting enforcement dependencies?
cheers,
marcin
Trey Dockendorf
2014-07-31 22:21:35 UTC
Permalink
I don't have a solution regarding the removal of accounting enforcement, but what is it your storing in LDAP that is checked by your plugin?

We are still migrating our Torque/Maui cluster to SLURM and part of the migration includes moving from /etc/passwd based user management to LDAP. I've gone to considerable trouble to script the importing of our LDAP into slurmdbd. Right now my script only queries LDAP and converts various attributes into a sacctmgr import file. The code is a proof-of-concept and eventually will be changed to perform regular checks that slurmdbd matches LDAP.

Something similar to what I'm doing could possibly be easier than having the SLURM code changed.

- Trey

=============================

Trey Dockendorf
Systems Analyst I
Texas A&M University
Academy for Advanced Telecommunications and Learning Technologies
Phone: (979)458-2396
Email: treydock-mRW4Vj+***@public.gmane.org
Jabber: treydock-mRW4Vj+***@public.gmane.org

----- Original Message -----
Sent: Thursday, July 31, 2014 2:53:14 AM
Subject: [slurm-dev] Enforcing qos limits without associations limits
Hi guys,
In our installation we have a separate job_submit plugin which checks
the account validity directly in LDAP. We would like to disable
associations enforcement, but in current configuration we are using
qos limits which are limiting number of jobs and cores per user
(user can choose to use qos normal - with longer jobs and lower
number of jobs/cores running/allocated, or qos short which is
allowing more resources with lower walltime limit).
I haven't yet checked in code, do you think there is an ease way to
remove accounting enforcement dependencies?
cheers,
marcin
Marcin Stolarek
2014-08-01 11:04:32 UTC
Permalink
Post by Trey Dockendorf
I don't have a solution regarding the removal of accounting enforcement,
but what is it your storing in LDAP that is checked by your plugin?
We prefer using job_submit plugin which is connecting to LDAP database,
because this guarantees that closing account in web interface (which is not
managed by cluster administrators) affects the possibility of job
submission immediately.
Post by Trey Dockendorf
We are still migrating our Torque/Maui cluster to SLURM and part of the
migration includes moving from /etc/passwd based user management to LDAP.
I've gone to considerable trouble to script the importing of our LDAP into
slurmdbd. Right now my script only queries LDAP and converts various
attributes into a sacctmgr import file. The code is a proof-of-concept and
eventually will be changed to perform regular checks that slurmdbd matches
LDAP.
We have our own code which is working the same way (however we have our
own LDAP schema for accounts; honestly we are merging 2 different schemas
from different sources), our backup file has more than 4k7 lines and
currnetly we are unable to load it, the error message is very helpfull :)
"Unspecified error".
Post by Trey Dockendorf
Something similar to what I'm doing could possibly be easier than having
the SLURM code changed.
Easier solution often doesn't mean better one ;-)

We want to use only QoS values allowed for every user, but we are forced to
generate the dump file, load it from cron doesn't look ugly? :)


cheers,
marcin
Trey Dockendorf
2014-08-01 17:23:59 UTC
Permalink
Post by Marcin Stolarek
We prefer using job_submit plugin which is connecting to LDAP
database, because this guarantees that closing account in web
interface (which is not managed by cluster administrators) affects
the possibility of job submission immediately.
I can see why connecting to LDAP at submit time is necessary.

On a separate note, how are using connecting to LDAP? I'm trying to implement a job_submit Lua plugin and have found that executing shell commands via "io.popen" breaks job array submissions. The shell commands are "squeue". Separate thread started for that, but always interested how others implement similar functionality.
Post by Marcin Stolarek
We have our own code which is working the same way (however we have
our own LDAP schema for accounts; honestly we are merging 2
different schemas from different sources), our backup file has more
than 4k7 lines and currnetly we are unable to load it, the error
message is very helpfull :) "Unspecified error".
Ah, I have not run into that yet, but I only have ~500 users in LDAP and a separate file is generated which contains OSG grid accounts (~4000).
Post by Marcin Stolarek
Easier solution often doesn't mean better one ;-)
We want to use only QoS values allowed for every user, but we are
forced to generate the dump file, load it from cron doesn't look
ugly? :)
I couldn't agree more. My solution is simply what I've had time to implement.

I'm not sure if this helps, but right now our users are assigned their primary GID (gidNumber) based on their department. The projects and additional groups they belong to are assigned by assigning the user to the appropriate group's "uniqueMember". We use 389ds and the "memberOf" plugin to help see which groups a user belongs to.

Our slurm database has an account for each posixgroup in LDAP that has a "slurmAccount" objectClass (custom schema I made, does not really follow true schema conventions).

The QOS mapping is done to accounts (groups in LDAP) and if a group needs to have specific QOS that's different from the parent then it has a "slurmAssociationQOS" (multi-value) attribute.

I don't put QOS into LDAP as I wrote a Puppet module to handle that, https://github.com/treydock/puppet-slurm_providers.

I do not know to what level the association enforcement is handled but maybe you could change the "job_desc.account" value in the Lua script (referring to job_submit.lua plugin) to something generic that does exist in database. If I understand the problem it is that you don't want to have to store users/accounts in slurmdbd but are required to due to using QOS which enforces associations.

- Trey
Post by Marcin Stolarek
cheers,
marcin
Loading...