John Desantis
2014-09-26 17:19:35 UTC
Hello all,
First and foremost since this is my first post to the list, I'd like
to thank the Slurm developers for a great and gratis product!
Anyways, to the point.
We have users submitting array jobs via sbatch and using
"-a/--array=n-n" without an issue. When these jobs are running, we
can use 'squeue' to see tasks under the form of "jobnumber_task".
When we try to query these jobs via the accounting database (checking
on job_table, step_table, and jobcomp_table) and via sacct -j
"jobnumber", we're not getting the complete set of information
associated with the job(batch and exec hosts, etc.). If the job is
currently running, we can use scontrol to see the job and its steps,
and the full set of information we're looking for.
When I used scontrol to view an array job, I saw that "JobId" for each
of the array tasks incremented based upon the step, e.g.:
JobId=23383 ArrayJobId=23383 ArrayTaskId=1
JobId=23384 ArrayJobId=23383 ArrayTaskId=2
JobId=23385 ArrayJobId=23383 ArrayTaskId=3
When I tried to query any of the successive JobId's via sacct or the
DB itself, I didn't get any information. Only the real JobId "23383"
returned a result within sacct and the DB. I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.
I did find a previous post
https://www.mail-archive.com/slurm-dev-***@public.gmane.org/msg03344.html which
seems to be my question as well.
Thanks for any insight which can be provided,
John DeSantis
First and foremost since this is my first post to the list, I'd like
to thank the Slurm developers for a great and gratis product!
Anyways, to the point.
We have users submitting array jobs via sbatch and using
"-a/--array=n-n" without an issue. When these jobs are running, we
can use 'squeue' to see tasks under the form of "jobnumber_task".
When we try to query these jobs via the accounting database (checking
on job_table, step_table, and jobcomp_table) and via sacct -j
"jobnumber", we're not getting the complete set of information
associated with the job(batch and exec hosts, etc.). If the job is
currently running, we can use scontrol to see the job and its steps,
and the full set of information we're looking for.
When I used scontrol to view an array job, I saw that "JobId" for each
of the array tasks incremented based upon the step, e.g.:
JobId=23383 ArrayJobId=23383 ArrayTaskId=1
JobId=23384 ArrayJobId=23383 ArrayTaskId=2
JobId=23385 ArrayJobId=23383 ArrayTaskId=3
When I tried to query any of the successive JobId's via sacct or the
DB itself, I didn't get any information. Only the real JobId "23383"
returned a result within sacct and the DB. I was able to glean node
information from the scheduler and control daemon logs by looking for
the JobId's listed above.
I did find a previous post
https://www.mail-archive.com/slurm-dev-***@public.gmane.org/msg03344.html which
seems to be my question as well.
Thanks for any insight which can be provided,
John DeSantis