Discussion:
Prune database before migration to 14.11 ?
Ulf Markwardt
2014-10-10 09:04:07 UTC
Permalink
Dear all,

the update procedure 2.6->14.11 of the slurmdbd adds a couple of indexes to the database. For >> 10 Mio job entries this takes too long.

Do you know a "best practice" how to quickly prune the database to keep only the jobs from within the last 2 months?

I can do this at the mysql-level, like
delete from `taurus_job_table` where time_end>unix_timestamp('2013-08-01')
but am not sure if this is the correct/best way to do it.

Thank you,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Ulf Markwardt
2014-10-10 10:23:36 UTC
Permalink
Post by Ulf Markwardt
delete from `taurus_job_table` where time_end>unix_timestamp('2013-08-01')
I know... better like "where time_end<unix_timestamp('2014-08-01')" :-)
but the original question remains.

U
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Ulf Markwardt
2014-10-10 13:05:35 UTC
Permalink
Update: A sandbox test leads to these problems:

mysql> delete from taurus_job_table where time_end<unix_timestamp('2013-07-01');

ERROR 1451 (23000): Cannot delete or update a parent row: a foreign key constraint fails (`slurm_aux_acct_db`.`job_node_cpu_allocations`, CONSTRAINT `fk_job_node_cpu_allocations_job_db_inx` FOREIGN KEY (`job_db_inx`) REFERENCES `slurm_acct_db`.`taurus_job_table` (`job_db_inx`) ON D)

U
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
j***@public.gmane.org
2014-10-10 15:27:44 UTC
Permalink
Please see the slurmdbd.conf man page:
http://slurm.schedmd.com/slurmdbd.conf.html

There are five separate configurable purge values (for jobs, steps,
reservations, events, and suspend records). None are purged by
default. Setting appropriate purge values is strongly recommended and
may reduce your database size a great deal.
Post by Ulf Markwardt
Dear all,
the update procedure 2.6->14.11 of the slurmdbd adds a couple of
indexes to the database. For >> 10 Mio job entries this takes too
long.
Do you know a "best practice" how to quickly prune the database to
keep only the jobs from within the last 2 months?
I can do this at the mysql-level, like
delete from `taurus_job_table` where time_end>unix_timestamp('2013-08-01')
but am not sure if this is the correct/best way to do it.
Thank you,
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt
Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany
Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
--
Morris "Moe" Jette
CTO, SchedMD LLC
Christopher Samuel
2014-10-14 00:03:38 UTC
Permalink
Hiya Ulf,
Post by Ulf Markwardt
the update procedure 2.6->14.11 of the slurmdbd adds a couple of
indexes to the database. For >> 10 Mio job entries this takes too
long.
Could you define too long, and what problems it caused please?

We're looking at upgrading Slurm to from 2.6 to 14.03 shortly and I
believe that's when those extra indexes appear. We've got about 14
million rows across the various *_job_table and *_step_table.

However, on our Percona Cluster (i.e. MySQL Galera) DB servers adding a
multi-column index to an 8+ million row table in another database took
just a few seconds.

All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Ulf Markwardt
2014-10-15 13:39:36 UTC
Permalink
Hi Chris,
Post by Christopher Samuel
Could you define too long, and what problems it caused please?
we have tested this migration in a VM sandbox. After >30 min we decided
to interrupt the procedure and look for a faster solution.

The "foreign constraints" came from another database a colleague of mine
has installed, we could fix this :-)

What we did was
* mysqldump the whole databas
* delete from ..._job_table where time_end>unix_timestamp('2013-08-01')
* follow the SchedMD update protocol.

And now we look into automatic pruning...

Thank you all
Ulf
--
___________________________________________________________________
Dr. Ulf Markwardt

Technische Universität Dresden
Center for Information Services and High Performance Computing (ZIH)
01062 Dresden, Germany

Phone: (+49) 351/463-33640 WWW: http://www.tu-dresden.de/zih
Christopher Samuel
2014-10-16 05:01:31 UTC
Permalink
Post by Ulf Markwardt
we have tested this migration in a VM sandbox. After >30 min we decided
to interrupt the procedure and look for a faster solution.
No worries, we're going to test out ours in a sandbox as well, so we'll
be able to compare it to our (pretty beefy) DB servers.
Post by Ulf Markwardt
The "foreign constraints" came from another database a colleague of mine
has installed, we could fix this :-)
:-)

cheers,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Christopher Samuel
2014-10-22 03:44:33 UTC
Permalink
Post by Christopher Samuel
No worries, we're going to test out ours in a sandbox as well, so we'll
be able to compare it to our (pretty beefy) DB servers.
It took around 2 minutes to add all the indexes in our sandbox, thats
with a total of about 6 million jobs across 5 systems (of which 2 are
now decommissioned).

We're about to do it for real.

All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel-***@public.gmane.org Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
Loading...