Discussion:
Clusters API
Nate Coraor
2014-10-08 14:26:29 UTC
Permalink
Hi all,

For the last few days I've been working on adding support for the
-M/--clusters option to slurm-drmaa. I have it working but it's taken
a few hacks:

1. I cannot see any public way to actually use the multi-cluster
functionality. It's possible to query slurmdbd for all the cluster
info you can get with sacctmgr via the public API, but I had to expose
working_cluster_rec in libslurmdb to be able to use those cluster
records for submit and status requests. Of course this will not work
with any standard installations (and may not be entirely safe).

2. I'm using the non-published slurmdb_get_info_clusters() function to
get cluster records. It's possible to get them with the public
slurmdb_clusters_get() function, but for some reason the
plugin_id_select returned by the latter (and indeed, as found in
slurmdbd's database) is incorrect for both clusters (both are 101 in
slurmdbd, should be 2).

Could anyone provide guidance on how to fix these? Am I going at this all wrong?

Thanks,
--nate
Nate Coraor
2014-10-08 15:46:52 UTC
Permalink
Hi all,

For the last few days I've been working on adding support for the
-M/--clusters option to slurm-drmaa. I have it working but it's taken
a few hacks:

1. I cannot see any public way to actually use the multi-cluster
functionality. It's possible to query slurmdbd for all the cluster
info you can get with sacctmgr via the public API, but I had to expose
working_cluster_rec in libslurmdb to be able to use those cluster
records for submit and status requests. Of course this will not work
with any standard installations (and may not be entirely safe).

2. I'm using the non-published slurmdb_get_info_clusters() function to
get cluster records. It's possible to get them with the public
slurmdb_clusters_get() function, but for some reason the
plugin_id_select returned by the latter (and indeed, as found in
slurmdbd's database) is incorrect for both clusters (both are 101 in
slurmdbd, should be 2).

Could anyone provide guidance on how to fix these? Am I going at this all wrong?

Thanks,
--nate
Nate Coraor
2014-10-09 17:22:39 UTC
Permalink
Here's the fairly complete form of it, in case anyone else finds this
useful. As the documentation mentions, all I have to do is compile a
standalone libslurmdb.so for the application, with working_cluster_rec
public and everything works great:

https://github.com/natefoo/slurm-drmaa

Thanks,
--nate
Post by Nate Coraor
Hi all,
For the last few days I've been working on adding support for the
-M/--clusters option to slurm-drmaa. I have it working but it's taken
1. I cannot see any public way to actually use the multi-cluster
functionality. It's possible to query slurmdbd for all the cluster
info you can get with sacctmgr via the public API, but I had to expose
working_cluster_rec in libslurmdb to be able to use those cluster
records for submit and status requests. Of course this will not work
with any standard installations (and may not be entirely safe).
2. I'm using the non-published slurmdb_get_info_clusters() function to
get cluster records. It's possible to get them with the public
slurmdb_clusters_get() function, but for some reason the
plugin_id_select returned by the latter (and indeed, as found in
slurmdbd's database) is incorrect for both clusters (both are 101 in
slurmdbd, should be 2).
Could anyone provide guidance on how to fix these? Am I going at this all wrong?
Thanks,
--nate
Danny Auble
2014-10-09 18:03:55 UTC
Permalink
Nate,

Does

diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
#include "src/common/slurm_auth.h"
#include "src/slurmdbd/read_config.h"

+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
slurmdb_cluster_rec_t *working_cluster_rec = NULL;

static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);

Fix your situation? You would probably have to change your references
to slurm_working_cluster_rec, but that is probably safer.

Danny
Post by Nate Coraor
Here's the fairly complete form of it, in case anyone else finds this
useful. As the documentation mentions, all I have to do is compile a
standalone libslurmdb.so for the application, with working_cluster_rec
https://github.com/natefoo/slurm-drmaa
Thanks,
--nate
Post by Nate Coraor
Hi all,
For the last few days I've been working on adding support for the
-M/--clusters option to slurm-drmaa. I have it working but it's taken
1. I cannot see any public way to actually use the multi-cluster
functionality. It's possible to query slurmdbd for all the cluster
info you can get with sacctmgr via the public API, but I had to expose
working_cluster_rec in libslurmdb to be able to use those cluster
records for submit and status requests. Of course this will not work
with any standard installations (and may not be entirely safe).
2. I'm using the non-published slurmdb_get_info_clusters() function to
get cluster records. It's possible to get them with the public
slurmdb_clusters_get() function, but for some reason the
plugin_id_select returned by the latter (and indeed, as found in
slurmdbd's database) is incorrect for both clusters (both are 101 in
slurmdbd, should be 2).
Could anyone provide guidance on how to fix these? Am I going at this all wrong?
Thanks,
--nate
Nate Coraor
2014-10-09 19:13:40 UTC
Permalink
Post by Danny Auble
Nate,
Does
diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
#include "src/common/slurm_auth.h"
#include "src/slurmdbd/read_config.h"
+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
slurmdb_cluster_rec_t *working_cluster_rec = NULL;
static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);
Fix your situation? You would probably have to change your references to
slurm_working_cluster_rec, but that is probably safer.
Danny
Hi Danny,

I had thought about proposing that change, it works just fine. Should
it be slurmdb_working_cluster_rec? Could you add it to slurmdb.h as
well? I have a few other missing symbols I'm using as well, if you're
going to add anything to the public includes:

https://github.com/natefoo/slurm-drmaa/blob/master/slurm_drmaa/slurm_missing.h

Thanks,
--nate
Danny Auble
2014-10-09 21:42:31 UTC
Permalink
Nate check out commit e49fcbeb48647460fda59895c22afc4b080efe89

It should give you what you want. Since working_cluster_rec was already
in slurmdb.h I ended up sticking with that instead of changing the
name. This is also only in 14.11, but the patch should apply cleanly to
14.03.

Danny
Post by Nate Coraor
Post by Danny Auble
Nate,
Does
diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
#include "src/common/slurm_auth.h"
#include "src/slurmdbd/read_config.h"
+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
slurmdb_cluster_rec_t *working_cluster_rec = NULL;
static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);
Fix your situation? You would probably have to change your references to
slurm_working_cluster_rec, but that is probably safer.
Danny
Hi Danny,
I had thought about proposing that change, it works just fine. Should
it be slurmdb_working_cluster_rec? Could you add it to slurmdb.h as
well? I have a few other missing symbols I'm using as well, if you're
https://github.com/natefoo/slurm-drmaa/blob/master/slurm_drmaa/slurm_missing.h
Thanks,
--nate
Nate Coraor
2014-10-10 17:06:45 UTC
Permalink
Great, thank you! And thanks to Moe also for the link in the html docs.

--nate
Post by Danny Auble
Nate check out commit e49fcbeb48647460fda59895c22afc4b080efe89
It should give you what you want. Since working_cluster_rec was already
in slurmdb.h I ended up sticking with that instead of changing the name.
This is also only in 14.11, but the patch should apply cleanly to 14.03.
Danny
Post by Nate Coraor
Post by Danny Auble
Nate,
Does
diff --git a/src/common/slurmdb_defs.c b/src/common/slurmdb_defs.c
index d094b91..01673fc 100644
--- a/src/common/slurmdb_defs.c
+++ b/src/common/slurmdb_defs.c
@@ -49,6 +49,8 @@
#include "src/common/slurm_auth.h"
#include "src/slurmdbd/read_config.h"
+strong_alias(working_cluster_rec, slurm_working_cluster_rec);
+
slurmdb_cluster_rec_t *working_cluster_rec = NULL;
static void _free_res_cond_members(slurmdb_res_cond_t *res_cond);
Fix your situation? You would probably have to change your references to
slurm_working_cluster_rec, but that is probably safer.
Danny
Hi Danny,
I had thought about proposing that change, it works just fine. Should
it be slurmdb_working_cluster_rec? Could you add it to slurmdb.h as
well? I have a few other missing symbols I'm using as well, if you're
https://github.com/natefoo/slurm-drmaa/blob/master/slurm_
drmaa/slurm_missing.h
Thanks,
--nate
Loading...