New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CB-231: Use direct database access for the Release Group entity #122
Conversation
2c182cc
to
df07ede
Compare
54b2426
to
f231a37
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. I think we should have a talk now about number of db queries per method call before we go much further, but otherwise this seems to be going well.
from brainzutils import cache | ||
|
||
|
||
DEFAULT_CACHE_EXPIRATION = 12 * 60 * 60 # seconds (12 hours) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this shared between all musicbrainz_db
module files? It might be better to set it only once, unless you think there's a reason that we should refresh some entities more often
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All entities are updated after the same time. Updated as suggested.
|
||
if 'artists' in includes: | ||
for release_group in release_groups: | ||
artist_credit_names = release_group.artist_credit.artists |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you looked into how many individual SQL queries a single fetch_
method makes? I'm looking at this method and my guess is at least 7 per mbid (main query, artist - releases - rg rel - url rel - work rel - tags includes, one more if the mbid is a redirect).
There's no rule of thumb for the number of queries that we should make, but each query is going to require a round trip to the sql server (and in the case of production CB, it will be a network trip to a different physical server).
Looking at this method I can see a few basic optimisations:
- If there are many mbids, get them all at once. This may require a modified version of
get_something_by_gid
- You can join into the
artist_credit
item of the releasegroup at the initialget
query
For the other entity relations I'm not sure if there is any further optimisation that we can do, as each query will be against a different table. It's possible we could do some kind of UNION
query but I'm not sure that will give us any benefit of speed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggested, I've added a get_entities_by_gid
function for fetching multiple entities. Also, added joinedload
for artist_credit_names
, artists
and release_group_meta
for saving db queries. A single fetch_
method now would take atmost 7 queries to the db (for any number of mbids). (Previously, it was making 5 queries per mbid (seperate ones for artist_credit_name
, artist_credit
, meta
, artist_credit_name
and release_group
(excluding redirects queries)) + 6 more for relationships, releases and tags). Thanks for suggesting that :)
if 'work-rels' in includes: | ||
entity_relation_helper(db, 'work', 'release_group', release_group_ids, includes_data) | ||
|
||
if 'tags' in includes: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will you have to do this tag query often? It might pay to have a helper function to do it. Do you need tags for Places? I see you have it here but not for Place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As suggeted, added a get_tags
function for fetching entity tags (as it may be used for other entities like artists
)
data['join_phrase'] = artist_credit_name.join_phrase | ||
return data | ||
|
||
|
||
def to_dict_release_groups(release_group, includes={}): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we should have tests for these conversion methods to make sure that we correctly create a dict from a particular releasegroup id.
I think that this will be a bit difficult - we can't really do the query against a real musicbrainz database because the data might change in the database. I'm not sure that it's worth testing only this function, I'd like to see it tested in conjunction with fetch_multiple_release_groups
Maybe @gentlecat has a suggestion here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd definitely like to see some tests for all of this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. Regarding the test database, we can make use ot mbdata.sample_data.create_sample_data()
. I've started some minimal example work here > ferbncode@866538a (works, but still WIP). I can make a list of testable entities (entities in sample_data.py). Is this a desirable way?
Also, these test (may) fail in case of schema_changes (in musicbrainz database). But I think that would be good for us?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, read the backlogs later (on irc) about the script in musicbrainz-server
(that way might be better for testing different type of entities)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like testing needs to be set up with an MB database to make them work now.
@ferbncode, I assume you are working on tests with MB database now. When you finish that part, put it into a separate PR please so that it's more manageable. ;) |
9cc3c8c
to
d1de138
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
critiquebrainz/frontend/views/test/test_release_group.py is removed but not replaced with anything else. Can you clarify this? It's good to have at least a test that checks that page renders correctly.
[mbid], | ||
includes=['artists', 'releases', 'release-group-rels', 'url-rels', 'work-rels', 'tags'] | ||
)[mbid] | ||
cache.set(key=key, val=release_group, time=DEFAULT_CACHE_EXPIRATION) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be inside the if not release_group:
block, otherwise it's being unnecessarily reset each time.
release_group = fetch_multiple_release_groups( | ||
[mbid], | ||
includes=['artists', 'releases', 'release-group-rels', 'url-rels', 'work-rels', 'tags'] | ||
)[mbid] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to create a wrapper function for fetch_multiple_release_groups
that retrieves only one release group at a time.
includes_data = defaultdict(dict) | ||
check_includes('release_group', includes) | ||
with mb_session() as db: | ||
query = db.query(models.ReleaseGroup).options(joinedload("meta")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does .options(joinedload("meta"))
part do? It might be worth making a comment clarifying parts like these that might be more difficult to understand for someone unfamiliar with how mbdata works.
options(joinedload("artist_credit.artists")).\ | ||
options(joinedload("artist_credit.artists.artist")) | ||
|
||
release_groups = get_entities_by_gids(query, models.ReleaseGroup, models.ReleaseGroupGIDRedirect, mbids) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a case like this it would be better to specify names of the arguments explicitly. When I look at get_entities_by_gids(query, models.ReleaseGroup, models.ReleaseGroupGIDRedirect, mbids)
I have no idea what these models are unless I look up the function definition and its docstring/implementation.
@@ -96,7 +96,7 @@ | |||
{% macro show_tags(tags) %} | |||
{% if tags %} | |||
{% for tag in tags %} | |||
<a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag['name']) }}">{{ tag['name'] }}</a> | |||
<a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag) }}">{{ tag }}</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to use format
instead of just outputting the value directly into the template?
d1de138
to
35bd76b
Compare
35bd76b
to
7841e51
Compare
@@ -18,7 +18,8 @@ | |||
TAG_INCLUDES = ["tags", "user-tags"] | |||
RATING_INCLUDES = ["ratings", "user-ratings"] | |||
VALID_INCLUDES = { | |||
'place': ["aliases", "annotation"] + RELATION_INCLUDES + TAG_INCLUDES, | |||
'place' : ["aliases", "annotation"] + RELATION_INCLUDES + TAG_INCLUDES, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change looks unnecessary.
[mbid], | ||
includes=['artist-rels', 'place-rels', 'release-group-rels', 'url-rels'], | ||
)[mbid] | ||
return place |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can simply return fetch_multiple_places(...
here.
self.assert200(response) | ||
self.assertIn("Days Are Gone", str(response.data)) | ||
self.assertIn("No reviews found", str(response.data)) | ||
# TODO(roman): Try to add review and check it's displayed there! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this fixed now?
02d70bf
to
418e5be
Compare
418e5be
to
379ee1c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are changes in critiquebrainz/frontend/external/musicbrainz_db/place.py related to release groups?
@@ -96,7 +96,7 @@ | |||
{% macro show_tags(tags) %} | |||
{% if tags %} | |||
{% for tag in tags %} | |||
<a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag['name']) }}">{{ tag['name'] }}</a> | |||
<a href="{{ 'https://musicbrainz.org/tag/%s' % tag }}">{{ tag }}</a> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you fix indentation here please?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, you are still using formatting. I meant this:
<a href="https://musicbrainz.org/tag/{{ tag }}">{{ tag }}</a>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed 😅
album = spotify_api.get_album(spotify_id) | ||
except ExternalServiceException: | ||
flash.error(gettext("You need to specify existing album from Spotify!")) | ||
return redirect(url_for('.spotify_list', release_group_id=release_group_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this change. Can you explain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was getting https://gist.github.com/ferbncode/307bc3f3e13f5d65428aad6180ace1ed error when running pylint. I have undone this change now as it may be done separately.
379ee1c
to
e3ec18c
Compare
No, changes in |
That's fine. |
Added
fetch_multiple_release_group
andget_release_group_by_id
for fetching release group info.