New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Give Jobsets an ID; add jobset_id to builds and jobs. #710
Conversation
A postgresql column which is non-null and unique is treated with the same optimisations as a primary key, so we have no need to try and recreate the `id` as the primary key. No read paths are impacted by this change, and the database will automatically create an ID for each insert. Thus, no code needs to change.
Also, adds an explicitly named "jobs" accessor to the Jobsets Schema object, which uses the project/jobset name.
Also, adds an explicitly named "builds" accessor to the Jobsets Schema object, which uses the project/jobset name.
Vacuum every 10 iterations, update 10k at a time.
a484b03
to
c61a1d6
Compare
-- very important to figure out where the nullable columns came from. | ||
|
||
ALTER TABLE Builds | ||
ALTER COLUMN jobset_id SET NOT NULL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Drop the project
and jobset
columns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should eventually, but I'm a bit afraid to do that in this PR since it makes it basically impossible to undo if there is a mistake. Also, it means we need to be 100% certain every read path is updated. I think that is a good improvement to make for a follow-up PR, maybe within a few weeks of this being deployed. Sound okay?
What is the motivation for this change? |
The query executed by the Every slow query reported by our database server involves reading or writing to the Builds table. Here are some table sizes from hydra.nixos.org:
Here is the EXPLAIN ANALYZE for the
In prior tests, changing this to use a |
This pull request has been mentioned on NixOS Discourse. There might be relevant details there: https://discourse.nixos.org/t/upcoming-hydra-schema-change/5806/1 |
I've opened #714 for part two. Both part 1 and part 2 work quite nicely on my replica from production. |
Deployment of Part 1Part 1 is deployed and the backfiller is running as of Mon Feb 10 18:12:38 CET 2020. The first vacuum finished at 18:43, and now it is updating the Jobs table in batches of 10,000 at about 3-4 batches per second. Looks like there are 3.5 million rows. |
Pass 2 of the Jobs table finished at Mon Feb 10 18:46:08 CET 2020. Now on to the Builds table, this is the big one. Counting the number of rows itself takes an appreciable amount of time (84 seconds.) The builds table is able to do a 10k batch in about 2-3 seconds.108,403,861 rows to do, ~108,303,900 rows remaining. |
It has started the first VACUUM at Mon Feb 10 19:04:06 CET 2020, averaging about 2.5s per 10k batch. VACUUM finished at Mon Feb 10 19:17:43 CET 2020. |
The third VACUUM just started (Mon Feb 10 19:52:18 CET 2020.) We have 93,403,861 rows remaining. This server is handling the migration much faster than my test box. We will very plausibly finish this in less than 8 hours. |
The sixth vacuum started:
Currently updated from IDs 1 to 33,581,838, and 78,403,861 IDs to go. |
We're half way:
so another 5-6 h I guess.
|
|
Deploy of part 1 finished
|
After a minor oopsie: 8347934 we're deploying part 2:
|
We're done, part 1 and 2 merged and deployed. Initially, the |
Some background on this issue is available:
note: each commit here is pretty small and should be very easy to review.
Here is the proposed plan for this migration:
id
to theJobsets
table: serial, non-null, unique.jobset_id
to the Jobs table: nullable, foreign key toJobsets
Jobs
should begin writing thejobset_id
hydra/src/script/hydra-eval-jobset
Line 420 in 9da60e3
jobset_id
to theBuilds
table: nullable, foreign key toJobsets
(3f074388
)Builds
should begin writing thejobset_id
hydra/src/script/hydra-eval-jobset
Line 465 in 9da60e3
52283da9
)Jobs
withjobset_id
values.running this in a loop:
Every N iterations, run
VACUUM
VACUUM
will prevent 2x table bloat from happening all at onceBuilds
withjobset_id
values.running this in a loop:
Every N iterations, run
VACUUM
VACUUM
will prevent 2x table bloat from happening all at oncejobset_id
fields, this would indicate a bug which needs to be fixed.Builds
table, doing two things in one transaction:jobset_id
, any rows with a nulljobset_id
is likely from a bug we should fix.Builds
table to makejobset_id
not-null. Hopefully not very slow, I think postgresql will only validate no rows have a nulljobset_id
.Jobs
table, doing two things in one transaction:jobset_id
, any rows with a nulljobset_id
is likely from a bug we should fix.Jobs
table to makejobset_id
not-null. Hopefully not very slow, I think postgresql will only validate no rows have a nulljobset_id
.jobset_id
field for Builds and Jobs.Under this plan we would:
Jobs.jobset_id: make not-null
)hydra-backfill-ids
hydra-backfill-ids
to see if any new records have been created with NULL jobset_id columnsTHEN: merge the second half (migration making them not-null) and deploy to production.
This becomes a bit more complicated for users other than nixos.org. I have tried to make a clear warning in hydra-init. We might want to create a branch off of master after merging the first half, in case there are bugs found in the code after we have merged the second half.
Currently marking this as a draft, here is my personal to-do list before merging:
nix-build ./release.nix -A build -A manual -A tests.install -A tests.api -A tests.notifications
works a final time.