Discussion:
slurmdbd errors
Brian B
2014-09-15 20:11:39 UTC
Permalink
Greetings,

I recently changed from simple txt accounting to mysql. The setup appeared to have worked and I wasn’t seeing any errors in the logs but sshare, sacct, sprio, and others aren’t given any output aside from their column headers.The slurmctld log shows jobs completing without any issues but now I am seeing the following in slurmdbd log:

[2014-09-15T16:09:46.660] error: Problem getting jobs for cluster blackbox

This seems to occur after each jobs completion. I couldn’t find any documentation on this error nor comments via Google, anyone have an idea?

Regards,
Brian
Jeff Tan
2014-09-16 00:22:31 UTC
Permalink
Hi Brian

I get these logs when I do a query on a nonexistent cluster, e.g.,

gold:~# sacct --cluster=abcd

(I get an empty result table)

and

[2014-09-16T10:16:41.349] error: Problem getting jobs for cluster abcd

So is there indeed a cluster "blackbox" in your system? Perhaps it is
misspelt in a script or (as I did above) just on the command line? Also,
there are two places where cluster names are defined: slurm.conf and via
sacctmgr. Perhaps one or the other has the spelling wrong?

Regards
Jeff

Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia




From: Brian B <***@gmail.com>
To: "slurm-dev" <slurm-***@schedmd.com>
Date: 16/09/2014 06:13
Subject: [slurm-dev] slurmdbd errors



Greetings,

I recently changed from simple txt accounting to mysql. The setup appeared
to have worked and I wasn’t seeing any errors in the logs but sshare,
sacct, sprio, and others aren’t given any output aside from their column
headers.The slurmctld log shows jobs completing without any issues but now
I am seeing the following in slurmdbd log:

[2014-09-15T16:09:46.660] error: Problem getting jobs for cluster blackbox

This seems to occur after each jobs completion. I couldn’t find any
documentation on this error nor comments via Google, anyone have an idea?

Regards,
Bria
Brian B
2014-09-16 14:10:37 UTC
Permalink
Hello Jeff,

We do have a cluster named blackbox and it now is defined in both slurm.conf and via sacctmgr. I was missing sacctmgr. There is still nothing being presented for sshare, etc.. Any ideas on why that might be the case? At least now I don’t have errors in my logs.

Regards,
Brian
Post by Jeff Tan
Hi Brian
I get these logs when I do a query on a nonexistent cluster, e.g.,
gold:~# sacct --cluster=abcd
(I get an empty result table)
and
[2014-09-16T10:16:41.349] error: Problem getting jobs for cluster abcd
So is there indeed a cluster "blackbox" in your system? Perhaps it is
misspelt in a script or (as I did above) just on the command line? Also,
there are two places where cluster names are defined: slurm.conf and via
sacctmgr. Perhaps one or the other has the spelling wrong?
Regards
Jeff
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia
Date: 16/09/2014 06:13
Subject: [slurm-dev] slurmdbd errors
Greetings,
I recently changed from simple txt accounting to mysql. The setup appeared
to have worked and I wasn’t seeing any errors in the logs but sshare,
sacct, sprio, and others aren’t given any output aside from their column
headers.The slurmctld log shows jobs completing without any issues but now
[2014-09-15T16:09:46.660] error: Problem getting jobs for cluster blackbox
This seems to occur after each jobs completion. I couldn’t find any
documentation on this error nor comments via Google, anyone have an idea?
Regards,
Brian
Jeff Tan
2014-09-17 07:26:35 UTC
Permalink
Hi Brian

Glad to have helped with the errors, but I'm not sure what you mean
regarding sshare. What does the output look like when you run the command?

Regards
Jeff

Jeff Tan

High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia




From: Brian B <***@gmail.com>
To: "slurm-dev" <slurm-***@schedmd.com>
Date: 17/09/2014 00:12
Subject: [slurm-dev] Re: slurmdbd errors



Hello Jeff,

We do have a cluster named blackbox and it now is defined in both
slurm.conf and via sacctmgr. I was missing sacctmgr. There is still nothing
being presented for sshare, etc.. Any ideas on why that might be the case?
At least now I don’t have errors in my logs.

Regards,
Brian
Post by Jeff Tan
Hi Brian
I get these logs when I do a query on a nonexistent cluster, e.g.,
gold:~# sacct --cluster=abcd
(I get an empty result table)
and
[2014-09-16T10:16:41.349] error: Problem getting jobs for cluster abcd
So is there indeed a cluster "blackbox" in your system? Perhaps it is
misspelt in a script or (as I did above) just on the command line? Also,
there are two places where cluster names are defined: slurm.conf and via
sacctmgr. Perhaps one or the other has the spelling wrong?
Regards
Jeff
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia
Date: 16/09/2014 06:13
Subject: [slurm-dev] slurmdbd errors
Greetings,
I recently changed from simple txt accounting to mysql. The setup appeared
to have worked and I wasn’t seeing any errors in the logs but sshare,
sacct, sprio, and others aren’t given any output aside from their column
headers.The slurmctld log shows jobs completing without any issues but now
[2014-09-15T16:09:46.660] error: Problem getting jobs for cluster blackbox
This seems to occur after each jobs completion. I couldn’t find any
documentation on this error nor comments via Google, anyone have an idea?
Brian B
2014-09-17 15:09:38 UTC
Permalink
Hello Jeff,

After restarting slurm and slurmdbd sshare started reporting data. My next question: is there a way to have an energy per user in sshare? Do I need to add all linux users via saccntmgr? Is there documentation on this?

Regards,
Brian
Post by Jeff Tan
Hi Brian
Glad to have helped with the errors, but I'm not sure what you mean
regarding sshare. What does the output look like when you run the command?
Regards
Jeff
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia
Date: 17/09/2014 00:12
Subject: [slurm-dev] Re: slurmdbd errors
Hello Jeff,
We do have a cluster named blackbox and it now is defined in both
slurm.conf and via sacctmgr. I was missing sacctmgr. There is still nothing
being presented for sshare, etc.. Any ideas on why that might be the case?
At least now I don’t have errors in my logs.
Regards,
Brian
Post by Jeff Tan
Hi Brian
I get these logs when I do a query on a nonexistent cluster, e.g.,
gold:~# sacct --cluster=abcd
(I get an empty result table)
and
[2014-09-16T10:16:41.349] error: Problem getting jobs for cluster abcd
So is there indeed a cluster "blackbox" in your system? Perhaps it is
misspelt in a script or (as I did above) just on the command line? Also,
there are two places where cluster names are defined: slurm.conf and via
sacctmgr. Perhaps one or the other has the spelling wrong?
Regards
Jeff
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia
Date: 16/09/2014 06:13
Subject: [slurm-dev] slurmdbd errors
Greetings,
I recently changed from simple txt accounting to mysql. The setup
appeared
Post by Jeff Tan
to have worked and I wasn’t seeing any errors in the logs but sshare,
sacct, sprio, and others aren’t given any output aside from their column
headers.The slurmctld log shows jobs completing without any issues but
now
Post by Jeff Tan
[2014-09-15T16:09:46.660] error: Problem getting jobs for cluster
blackbox
Post by Jeff Tan
This seems to occur after each jobs completion. I couldn’t find any
documentation on this error nor comments via Google, anyone have an idea?
Regards,
Brian
Loading...