Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MBS-8736: Provide a partial data dump for testing purposes #571

Merged
merged 38 commits into from Feb 2, 2018

Conversation

mwiencek
Copy link
Member

@mwiencek mwiencek commented Nov 27, 2017

No description provided.

@mwiencek mwiencek force-pushed the mbs-8736 branch 3 times, most recently from 5b7420e to 67d1e26 Compare November 30, 2017 19:44
@mwiencek
Copy link
Member Author

mwiencek commented Dec 1, 2017

I consider this "done" now, other than any possible bugs...I decided not to put it into cron, because we should rarely need to generate a new one except after schema changes, or when someone requests it.

@mwiencek
Copy link
Member Author

mwiencek commented Dec 1, 2017

The dump includes tags, ratings, and subscriptions for editors who haven't configured those to be private. (Obviously just that information related to entities included in the dump.) This information, although publicly visible on the website, has never been included in dumps. It was decided in August 2014 that MBS-7560 is okay, though I don't know if the referenced blog post ever happened.

@yvanzo yvanzo self-requested a review December 28, 2017 15:29
@mwiencek mwiencek force-pushed the mbs-8736 branch 3 times, most recently from 487a4a8 to dab2bc7 Compare January 6, 2018 18:27
@mwiencek
Copy link
Member Author

mwiencek commented Jan 6, 2018

This is now hooked into cron (running monthly) and able to be synced to ftp. I tested the RunSampleDataDump script on hip and it worked fine.

To be used for MBS-8736.
While this package is currently only used by script/dump-entities-sql.pl
to print SQL to stdout, it will eventually be used as part of MBS-8736
to create actual data dumps, which don't involving printing.
This builds up the related entities encountered into %path_ids and fetches
relationships just once at the very end. This is better because we don't end
up fetching the same relationships twice, and we can have our entity methods
skip IDs that were already fetched (which was previously not possible,
because the result of those functions sometimes depended on @link_path).
This is useful for when we are dumping the entirety of those tables
separately.
This dump will be useful for people wanting to set up a development
server with some data to use, but without having to perform a full
import of our entire DB (which takes a lot of time and space).
This doesn't even work where it's used, and can be functionally replaced
by just generating a JSON dump from the new sample data dumps.
Instead of copying custom configuration to config.user.sh, scripts can
set the MBS_ADMIN_CONFIG env var to point to the location of the custom
config (relative to admin/). This is safer in case multiple scripts
requiring custom configuration are executing concurrently.
@mwiencek
Copy link
Member Author

@brainzbot retest this please

@mwiencek mwiencek merged commit 26bfe50 into metabrainz:master Feb 2, 2018
@mwiencek mwiencek deleted the mbs-8736 branch February 2, 2018 20:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants