Discussion:
srun: spank_fini() called twice
Dorian Krause
2014-10-09 08:28:34 UTC
Permalink
Dear all,

while experimenting with the SPANK API I noticed that the
slurm_spank_exit() function
is called twice by srun in local context since the list of functions
registered with
atexit() is inherited by the shepard process.
Is this intended behavior? I pasted a patch below that ensures that
spank_fini() is not
called in the shepard process. Since the documentation says nothing
about slurm_spank_exit()
being called twice IMHO this is the way to go but I can see how some
people may consider
this as introducing a regression.

Thanks,
Dorian Krause

---

Date: Thu, 9 Oct 2014 08:39:49 +0200
Subject: [PATCH] srun: Prevent shepard proc from calling spank_fini()

The spank_fini() function is registered with atexit() to be called
after termination of the srun main() function. The registered
functions are inherited by the forked shepard process and thus
spank_fini() is called twice.
This commit fixes this problem by introducing a wrapper function
_call_spank_fini() that is a no-op in the context of the shepard
process.
---
src/srun/libsrun/srun_job.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/src/srun/libsrun/srun_job.c b/src/srun/libsrun/srun_job.c
index 39f1ec6..84b2b6e 100644
--- a/src/srun/libsrun/srun_job.c
+++ b/src/srun/libsrun/srun_job.c
@@ -130,6 +130,7 @@ static int _shepard_spawn(srun_job_t *job, bool
got_alloc);
static void *_srun_signal_mgr(void *no_data);
static void _step_opt_exclusive(void);
static int _validate_relative(resource_allocation_response_msg_t *resp);
+static void _call_spank_fini(void);


/*
@@ -431,7 +432,7 @@ extern void init_srun(int ac, char **av,

/* Be sure to call spank_fini when srun exits.
*/
- if (atexit((void (*) (void)) spank_fini) < 0)
+ if (atexit(_call_spank_fini) < 0)
error("Failed to register atexit handler for plugins: %m");

/* set default options, process commandline arguments, and
@@ -1447,3 +1448,9 @@ static int
_validate_relative(resource_allocation_response_msg_t *resp)
return 0;
}

+static void _call_spank_fini(void)
+{
+ if (-1 != shepard_fd)
+ spank_fini(NULL);
+}
+
--
1.9.3





------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
j***@public.gmane.org
2014-10-09 17:31:59 UTC
Permalink
I have applied your patch to the version 14.11 code base. The commit is here:
https://github.com/SchedMD/slurm/commit/d522f721de66f7fad72f6831675329ef6bd3c0c8

Thanks!
Post by Dorian Krause
Dear all,
while experimenting with the SPANK API I noticed that the
slurm_spank_exit() function
is called twice by srun in local context since the list of functions
registered with
atexit() is inherited by the shepard process.
Is this intended behavior? I pasted a patch below that ensures that
spank_fini() is not
called in the shepard process. Since the documentation says nothing
about slurm_spank_exit()
being called twice IMHO this is the way to go but I can see how some
people may consider
this as introducing a regression.
Thanks,
Dorian Krause
---
Date: Thu, 9 Oct 2014 08:39:49 +0200
Subject: [PATCH] srun: Prevent shepard proc from calling spank_fini()
The spank_fini() function is registered with atexit() to be called
after termination of the srun main() function. The registered
functions are inherited by the forked shepard process and thus
spank_fini() is called twice.
This commit fixes this problem by introducing a wrapper function
_call_spank_fini() that is a no-op in the context of the shepard
process.
---
src/srun/libsrun/srun_job.c | 9 ++++++++-
1 file changed, 8 insertions(+), 1 deletion(-)
diff --git a/src/srun/libsrun/srun_job.c b/src/srun/libsrun/srun_job.c
index 39f1ec6..84b2b6e 100644
--- a/src/srun/libsrun/srun_job.c
+++ b/src/srun/libsrun/srun_job.c
@@ -130,6 +130,7 @@ static int _shepard_spawn(srun_job_t *job, bool
got_alloc);
static void *_srun_signal_mgr(void *no_data);
static void _step_opt_exclusive(void);
static int _validate_relative(resource_allocation_response_msg_t *resp);
+static void _call_spank_fini(void);
/*
@@ -431,7 +432,7 @@ extern void init_srun(int ac, char **av,
/* Be sure to call spank_fini when srun exits.
*/
- if (atexit((void (*) (void)) spank_fini) < 0)
+ if (atexit(_call_spank_fini) < 0)
error("Failed to register atexit handler for plugins: %m");
/* set default options, process commandline arguments, and
@@ -1447,3 +1448,9 @@ static int
_validate_relative(resource_allocation_response_msg_t *resp)
return 0;
}
+static void _call_spank_fini(void)
+{
+ if (-1 != shepard_fd)
+ spank_fini(NULL);
+}
+
--
1.9.3
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr.-Ing. Wolfgang Marquardt (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
--
Morris "Moe" Jette
CTO, SchedMD LLC
Loading...