slurmdbd: more time than is possible

Jeff Tan

2014-09-16 07:49:33 UTC

Hi folks

I might have found the problem. Turning on slurmdbd DebugLevel=7, I found
three SQL queries within the hourly rollup. One of them turned up these
from the events table:

mysql> select node_name,cpu_count,time_end,state,reason from
avoca_event_table where !(state & 32768) && time_end=0;;

| | 65536 | 0 | 0 | Cluster
processor count |
| bgq0010 | 4096 | 0 | 4 | R00-M0_R01-M0_board_swap_RT9458 |

The first one with state of 0 is probably not of concern, but the second
one should not have time_end=0 since that event (via reservation) was
definitely completed. The d_cpu value I saw in slurmdbd.log was definitely
4096 CPUs (as above) x 3600 seconds (for an hour of the rollup).

So I've updated the time_end for that event using the timestamp in the
_slurm_rcp_delete_reservation complete log in slurmctld.log. We had a
couple of highly allocated hours today with 98% allocated, but no more
complaints about having more time than possible. Hopefully this is indeed
solved.

It's only now that I realize I could have worked this out sooner (maybe) if
I'd used `sreport cluster utilization` for the hours when the rollup
complained, which would have shown the reservation that wasn't actually
there. The four hours earlier today where the rollup did complain now show
allocated percentages at 93.75% or more (leaving no room for the 6.25% of
the non-existent reservation in the above event). Checking them for the
last few days, I seem have received the complaint in the rollup only when
the allocation was at 93.75% and higher, and never for hours when
allocation was lower.

Regards
Jeff

Jeff Tan

High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia

From: Jeff Tan/Australia/IBM
Date: 09/09/2014 17:16
Subject: slurmdbd: more time than is possible
Hello, folks!
Although this topic was considered resolved from last year, and
while I tried what was suggested from those posts, we still get this
in slurmdbd.log on Slurm 2.6.5.
Following Don's suggestion to resolve errors in job records in
<cluster>_job_table, i.e., time_start was 0 although the job
actually did start running and ended one way or another. This worked
for two of our x86 clusters, or so it seems. One of them was only
resolved a few weeks back, but the other one hasn't had such
complaints about having "more time than is possible" since July.
Our Blue Gene/Q is another matter, where the rollup still makes this
complaint sporadically, up to yesterday, in fact. Are there other
Slurm users out there with a Blue Gene who see these slurmdbd
complaints? I was wondering if it had to do with reservations and/or
node failures. The problem with overlapping reservations is
mentioned in a 2012 post here as well as in the source code. Looking
at the source code in as_mysql_rollup.c, it occurred to me that
perhaps outages will mess up the number of CPUs which affects
c_usage->d_cpu? I have logs where the reported d_cpu matches the
total number of CPU-seconds for an hour during the hourly rollup,
but sometimes the number is higher and sometimes lower.
Has anyone ever noticed these complaints? I've already resolved (1)
jobs that ran and ended but time_start was 0, and (2) jobs that were
marked running despite having long been terminated. I'm not sure
what else we are missing. Any suggestions would be appreciated.
Regards
Jeff
--
Jeff Tan
High Performance Computing Specialist
IBM Research Collaboratory for Life Sciences, Melbourne, Australia