Discussion:
Power save config option: BatchStartTimeout
Uwe Sauter
2014-09-02 14:29:32 UTC
Permalink
Hi all,

I'm a bit confused by the explanation of the "BatchStartTimeout" option.
It states:

"Specifies how long to wait after a batch job start request is issued
before we expect the batch job to be running on the compute node.
Depending upon how nodes are returned to service, this value may need to
be increased above its default value of 10 seconds."

It is unclear from which point in time this timeout gets counted. Some
possibilities:

- when a batch job was submitted
- when SLURM executes the ResumeProgram command
- when the node's slurm daemon contacts the controller daemon

Can someone reword the explanation or give details about this option?

Are there recommendations, e.g. linked to ResumeTimeout?


Thanks,

Uwe
Uwe Sauter
2014-09-05 13:49:30 UTC
Permalink
Hi all,

*bump*

I can't believe no one has an explanation for this parameter...


Regards,

Uwe
Post by Uwe Sauter
Hi all,
I'm a bit confused by the explanation of the "BatchStartTimeout" option.
"Specifies how long to wait after a batch job start request is issued
before we expect the batch job to be running on the compute node.
Depending upon how nodes are returned to service, this value may need to
be increased above its default value of 10 seconds."
It is unclear from which point in time this timeout gets counted. Some
- when a batch job was submitted
- when SLURM executes the ResumeProgram command
- when the node's slurm daemon contacts the controller daemon
Can someone reword the explanation or give details about this option?
Are there recommendations, e.g. linked to ResumeTimeout?
Thanks,
Uwe
Loading...