Discussion:
sinfo -R doesn't print one node once (version 14.03.4)
Didier GAZEN
2014-06-17 17:09:34 UTC
Permalink
Hi,

The fix that is supposed to print each down/drained node once rather
than once per partition (commit object b5ace9a) when running "sinfo -R"
is not sufficient.

When enabling the -R option (list_reasons), all spawned threads
(_build_part_info) that work on a separate partition receive as argument
an empty sinfo_list. In this "list_reasons" case, threads may update the
same partition data during the _insert_node_ptr function call. The
problem is that the _insert_node_ptr function and all the functions it
calls are not thread safe WHEN different threads are updating the SAME
partition member of sinfo_list.

The simplest workaround I found was to disable thread spawning when
list_reasons is on.

static int _build_sinfo_data(List sinfo_list,
partition_info_msg_t *partition_msg,
node_info_msg_t *node_msg)
Didier GAZEN
2014-06-18 07:16:29 UTC
Permalink
Sorry, my previous post was truncated.

diff --git a/src/sinfo/sinfo.c b/src/sinfo/sinfo.c
index 15a17dc..0643f62 100644
--- a/src/sinfo/sinfo.c
+++ b/src/sinfo/sinfo.c
@@ -573,6 +573,9 @@ static int _build_sinfo_data(List sinfo_list,
sinfo_cnt++;
slurm_mutex_unlock(&sinfo_cnt_mutex);

+ if (params.list_reasons) {
+ _build_part_info(build_struct_ptr);
+ } else {
slurm_attr_init(&attr_sinfo);
if (pthread_attr_setdetachstate
(&attr_sinfo, PTHREAD_CREATE_DETACHED))
@@ -585,6 +588,7 @@ static int _build_sinfo_data(List sinfo_list,
}
slurm_attr_destroy(&attr_sinfo);
}
+ }


Regards,

Didier Gazen
Laboratoire d'Aérologie
Post by Didier GAZEN
Hi,
The fix that is supposed to print each down/drained node once rather
than once per partition (commit object b5ace9a) when running "sinfo
-R" is not sufficient.
When enabling the -R option (list_reasons), all spawned threads
(_build_part_info) that work on a separate partition receive as
argument an empty sinfo_list. In this "list_reasons" case, threads may
update the same partition data during the _insert_node_ptr function
call. The problem is that the _insert_node_ptr function and all the
functions it calls are not thread safe WHEN different threads are
updating the SAME partition member of sinfo_list.
The simplest workaround I found was to disable thread spawning when
list_reasons is on.
static int _build_sinfo_data(List sinfo_list,
partition_info_msg_t *partition_msg,
node_info_msg_t *node_msg)
{
j***@public.gmane.org
2014-06-18 15:39:32 UTC
Permalink
Thank you for the patch. A somewhat more general solution to this
problem will be in version 14.03.5 and can be found here:

https://github.com/SchedMD/slurm/commit/5351d393a67809371f14a40ca8bb4e85f0c64018
Post by Didier GAZEN
Sorry, my previous post was truncated.
diff --git a/src/sinfo/sinfo.c b/src/sinfo/sinfo.c
index 15a17dc..0643f62 100644
--- a/src/sinfo/sinfo.c
+++ b/src/sinfo/sinfo.c
@@ -573,6 +573,9 @@ static int _build_sinfo_data(List sinfo_list,
sinfo_cnt++;
slurm_mutex_unlock(&sinfo_cnt_mutex);
+ if (params.list_reasons) {
+ _build_part_info(build_struct_ptr);
+ } else {
slurm_attr_init(&attr_sinfo);
if (pthread_attr_setdetachstate
(&attr_sinfo, PTHREAD_CREATE_DETACHED))
@@ -585,6 +588,7 @@ static int _build_sinfo_data(List sinfo_list,
}
slurm_attr_destroy(&attr_sinfo);
}
+ }
Regards,
Didier Gazen
Laboratoire d'Aérologie
Post by Didier GAZEN
Hi,
The fix that is supposed to print each down/drained node once
rather than once per partition (commit object b5ace9a) when running
"sinfo -R" is not sufficient.
When enabling the -R option (list_reasons), all spawned threads
(_build_part_info) that work on a separate partition receive as
argument an empty sinfo_list. In this "list_reasons" case, threads
may update the same partition data during the _insert_node_ptr
function call. The problem is that the _insert_node_ptr function
and all the functions it calls are not thread safe WHEN different
threads are updating the SAME partition member of sinfo_list.
The simplest workaround I found was to disable thread spawning when
list_reasons is on.
static int _build_sinfo_data(List sinfo_list,
partition_info_msg_t *partition_msg,
node_info_msg_t *node_msg)
{
Didier GAZEN
2014-06-19 09:41:51 UTC
Permalink
Thanks, that's definitely a more general solution.
Post by j***@public.gmane.org
Thank you for the patch. A somewhat more general solution to this
https://github.com/SchedMD/slurm/commit/5351d393a67809371f14a40ca8bb4e85f0c64018
Post by Didier GAZEN
Sorry, my previous post was truncated.
diff --git a/src/sinfo/sinfo.c b/src/sinfo/sinfo.c
index 15a17dc..0643f62 100644
--- a/src/sinfo/sinfo.c
+++ b/src/sinfo/sinfo.c
@@ -573,6 +573,9 @@ static int _build_sinfo_data(List sinfo_list,
sinfo_cnt++;
slurm_mutex_unlock(&sinfo_cnt_mutex);
+ if (params.list_reasons) {
+ _build_part_info(build_struct_ptr);
+ } else {
slurm_attr_init(&attr_sinfo);
if (pthread_attr_setdetachstate
(&attr_sinfo, PTHREAD_CREATE_DETACHED))
@@ -585,6 +588,7 @@ static int _build_sinfo_data(List sinfo_list,
}
slurm_attr_destroy(&attr_sinfo);
}
+ }
Regards,
Didier Gazen
Laboratoire d'Aérologie
Post by Didier GAZEN
Hi,
The fix that is supposed to print each down/drained node once rather
than once per partition (commit object b5ace9a) when running "sinfo
-R" is not sufficient.
When enabling the -R option (list_reasons), all spawned threads
(_build_part_info) that work on a separate partition receive as
argument an empty sinfo_list. In this "list_reasons" case, threads
may update the same partition data during the _insert_node_ptr
function call. The problem is that the _insert_node_ptr function and
all the functions it calls are not thread safe WHEN different
threads are updating the SAME partition member of sinfo_list.
The simplest workaround I found was to disable thread spawning when
list_reasons is on.
static int _build_sinfo_data(List sinfo_list,
partition_info_msg_t *partition_msg,
node_info_msg_t *node_msg)
{
Loading...