Yair Yarom
2014-07-09 07:51:29 UTC
Hi,
Using slurm 14.03.1-2, with mysql and slurmdbd storage, we've
encountered the error "We have more allocated time than is
possible". Following the suggestions on this thread:
https://groups.google.com/forum/#!topic/slurm-devel/3f1SOGHXwSY
we modified the database and marked the phantom jobs and steps' state as
FAILED with appropriate time_end. However, sreport still shows wrong
usage for the times it was active. It appears to be the cluster usage
tables.
Is there any way to reset or force slurmdbd to recalculate them (or any
other fix for the sreport false usage data)? will deleting the data in
the usage tables work in this case (will it be rebuilt)?
Is there any utility to check the integrity of the database? e.g. that
there are no "running" steps for non-running jobs, that there are no
dead jobs still running, or that the usage tables are accurate?
Thanks,
Yair.
Using slurm 14.03.1-2, with mysql and slurmdbd storage, we've
encountered the error "We have more allocated time than is
possible". Following the suggestions on this thread:
https://groups.google.com/forum/#!topic/slurm-devel/3f1SOGHXwSY
we modified the database and marked the phantom jobs and steps' state as
FAILED with appropriate time_end. However, sreport still shows wrong
usage for the times it was active. It appears to be the cluster usage
tables.
Is there any way to reset or force slurmdbd to recalculate them (or any
other fix for the sreport false usage data)? will deleting the data in
the usage tables work in this case (will it be rebuilt)?
Is there any utility to check the integrity of the database? e.g. that
there are no "running" steps for non-running jobs, that there are no
dead jobs still running, or that the usage tables are accurate?
Thanks,
Yair.