CB-231: Use direct database access for the Release Group entity #122

ferbncode · 2017-07-05T07:36:30Z

Added fetch_multiple_release_group and get_release_group_by_id for fetching release group info.

alastair

Looks good. I think we should have a talk now about number of db queries per method call before we go much further, but otherwise this seems to be going well.

alastair · 2017-07-12T13:12:55Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+from brainzutils import cache
+
+
+DEFAULT_CACHE_EXPIRATION = 12 * 60 * 60 # seconds (12 hours)


Is this shared between all musicbrainz_db module files? It might be better to set it only once, unless you think there's a reason that we should refresh some entities more often

All entities are updated after the same time. Updated as suggested.

alastair · 2017-07-12T18:40:00Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+
+        if 'artists' in includes:
+            for release_group in release_groups:
+                artist_credit_names = release_group.artist_credit.artists


Have you looked into how many individual SQL queries a single fetch_ method makes? I'm looking at this method and my guess is at least 7 per mbid (main query, artist - releases - rg rel - url rel - work rel - tags includes, one more if the mbid is a redirect).
There's no rule of thumb for the number of queries that we should make, but each query is going to require a round trip to the sql server (and in the case of production CB, it will be a network trip to a different physical server).

Looking at this method I can see a few basic optimisations:

If there are many mbids, get them all at once. This may require a modified version of get_something_by_gid

You can join into the artist_credit item of the releasegroup at the initial get query

For the other entity relations I'm not sure if there is any further optimisation that we can do, as each query will be against a different table. It's possible we could do some kind of UNION query but I'm not sure that will give us any benefit of speed.

As suggested, I've added a get_entities_by_gid function for fetching multiple entities. Also, added joinedload for artist_credit_names, artists and release_group_meta for saving db queries. A single fetch_ method now would take atmost 7 queries to the db (for any number of mbids). (Previously, it was making 5 queries per mbid (seperate ones for artist_credit_name, artist_credit, meta, artist_credit_name and release_group(excluding redirects queries)) + 6 more for relationships, releases and tags). Thanks for suggesting that :)

alastair · 2017-07-12T18:57:21Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+        if 'work-rels' in includes:
+            entity_relation_helper(db, 'work', 'release_group', release_group_ids, includes_data)
+
+        if 'tags' in includes:


will you have to do this tag query often? It might pay to have a helper function to do it. Do you need tags for Places? I see you have it here but not for Place.

As suggeted, added a get_tags function for fetching entity tags (as it may be used for other entities like artists)

alastair · 2017-07-12T20:08:22Z

critiquebrainz/frontend/external/musicbrainz_db/serialize.py

+        data['join_phrase'] = artist_credit_name.join_phrase
+    return data
+
+
 def to_dict_release_groups(release_group, includes={}):


I think that we should have tests for these conversion methods to make sure that we correctly create a dict from a particular releasegroup id.
I think that this will be a bit difficult - we can't really do the query against a real musicbrainz database because the data might change in the database. I'm not sure that it's worth testing only this function, I'd like to see it tested in conjunction with fetch_multiple_release_groups
Maybe @gentlecat has a suggestion here?

I'd definitely like to see some tests for all of this.

Thanks. Regarding the test database, we can make use ot mbdata.sample_data.create_sample_data(). I've started some minimal example work here > ferbncode@866538a (works, but still WIP). I can make a list of testable entities (entities in sample_data.py). Is this a desirable way?

Also, these test (may) fail in case of schema_changes (in musicbrainz database). But I think that would be good for us?

Sorry, read the backlogs later (on irc) about the script in musicbrainz-server (that way might be better for testing different type of entities)

gentlecat

Looks like testing needs to be set up with an MB database to make them work now.

gentlecat · 2017-07-20T02:29:02Z

@ferbncode, I assume you are working on tests with MB database now. When you finish that part, put it into a separate PR please so that it's more manageable. ;)

gentlecat

critiquebrainz/frontend/views/test/test_release_group.py is removed but not replaced with anything else. Can you clarify this? It's good to have at least a test that checks that page renders correctly.

gentlecat · 2017-08-02T17:03:51Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+            [mbid],
+            includes=['artists', 'releases', 'release-group-rels', 'url-rels', 'work-rels', 'tags']
+        )[mbid]
+    cache.set(key=key, val=release_group, time=DEFAULT_CACHE_EXPIRATION)


This should be inside the if not release_group: block, otherwise it's being unnecessarily reset each time.

gentlecat · 2017-08-02T17:06:28Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+        release_group = fetch_multiple_release_groups(
+            [mbid],
+            includes=['artists', 'releases', 'release-group-rels', 'url-rels', 'work-rels', 'tags']
+        )[mbid]


It might be better to create a wrapper function for fetch_multiple_release_groups that retrieves only one release group at a time.

gentlecat · 2017-08-02T17:08:26Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+    includes_data = defaultdict(dict)
+    check_includes('release_group', includes)
+    with mb_session() as db:
+        query = db.query(models.ReleaseGroup).options(joinedload("meta"))


What does .options(joinedload("meta")) part do? It might be worth making a comment clarifying parts like these that might be more difficult to understand for someone unfamiliar with how mbdata works.

gentlecat · 2017-08-02T17:44:04Z

critiquebrainz/frontend/external/musicbrainz_db/release_group.py

+                    options(joinedload("artist_credit.artists")).\
+                    options(joinedload("artist_credit.artists.artist"))
+
+        release_groups = get_entities_by_gids(query, models.ReleaseGroup, models.ReleaseGroupGIDRedirect, mbids)


In a case like this it would be better to specify names of the arguments explicitly. When I look at get_entities_by_gids(query, models.ReleaseGroup, models.ReleaseGroupGIDRedirect, mbids) I have no idea what these models are unless I look up the function definition and its docstring/implementation.

gentlecat · 2017-08-02T17:48:10Z

critiquebrainz/frontend/templates/macros.html

@@ -96,7 +96,7 @@
 {% macro show_tags(tags) %}
  {% if tags %}
    {% for tag in tags %}
-      <a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag['name']) }}">{{ tag['name'] }}</a>
+      <a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag) }}">{{ tag }}</a>


Is there any reason to use format instead of just outputting the value directly into the template?

gentlecat · 2017-08-06T15:01:01Z

critiquebrainz/frontend/external/musicbrainz_db/includes.py

@@ -18,7 +18,8 @@
 TAG_INCLUDES = ["tags", "user-tags"]
 RATING_INCLUDES = ["ratings", "user-ratings"]
 VALID_INCLUDES = {
-    'place': ["aliases", "annotation"] + RELATION_INCLUDES + TAG_INCLUDES,
+    'place' : ["aliases", "annotation"] + RELATION_INCLUDES + TAG_INCLUDES,


This change looks unnecessary.

gentlecat · 2017-08-06T15:02:00Z

critiquebrainz/frontend/external/musicbrainz_db/place.py

+        [mbid],
+        includes=['artist-rels', 'place-rels', 'release-group-rels', 'url-rels'],
+    )[mbid]
+    return place


Can simply return fetch_multiple_places(... here.

gentlecat · 2017-08-06T15:03:14Z

critiquebrainz/frontend/views/test/test_release_group.py

        self.assert200(response)
-        self.assertIn("Days Are Gone", str(response.data))
-        self.assertIn("No reviews found", str(response.data))
-        # TODO(roman): Try to add review and check it's displayed there!


Is this fixed now?

gentlecat

Are changes in critiquebrainz/frontend/external/musicbrainz_db/place.py related to release groups?

gentlecat · 2017-08-06T17:55:58Z

critiquebrainz/frontend/templates/macros.html

@@ -96,7 +96,7 @@
 {% macro show_tags(tags) %}
  {% if tags %}
    {% for tag in tags %}
-      <a href="{{ 'https://musicbrainz.org/tag/%s'|format(tag['name']) }}">{{ tag['name'] }}</a>
+    <a href="{{ 'https://musicbrainz.org/tag/%s' % tag }}">{{ tag }}</a>


Can you fix indentation here please?

Also, you are still using formatting. I meant this:

<a href="https://musicbrainz.org/tag/{{ tag }}">{{ tag }}</a>

gentlecat · 2017-08-06T17:58:41Z

critiquebrainz/frontend/views/mapping.py

+        album = spotify_api.get_album(spotify_id)
+    except ExternalServiceException:
+        flash.error(gettext("You need to specify existing album from Spotify!"))
+        return redirect(url_for('.spotify_list', release_group_id=release_group_id))


I don't understand this change. Can you explain?

I was getting https://gist.github.com/ferbncode/307bc3f3e13f5d65428aad6180ace1ed error when running pylint. I have undone this change now as it may be done separately.

ferbncode · 2017-08-06T18:36:49Z

No, changes in place.py adding the wrapper _get_place_by_id for fetch_multiple_places is not a part for release group stuff. I removed that part and will make a separate PR for the same. Other changes in place.py fix the block for the cache.set statement and imports the DEFAULT_CACHE_EXPIRATION variable (as per #122 (comment)). Should they be done in a separate PR too?

gentlecat · 2017-08-07T13:42:10Z

Should they be done in a separate PR too?

That's fine.

ferbncode changed the title ~~[WIP]: Use direct database access for entity: release_group~~ [WIP]: CB-231: Use direct database access for entity: release_group Jul 5, 2017

ferbncode force-pushed the release_group_work branch 5 times, most recently from 2c182cc to df07ede Compare July 7, 2017 13:33

ferbncode changed the title ~~[WIP]: CB-231: Use direct database access for entity: release_group~~ CB-231: Use direct database access for entity: release_group Jul 7, 2017

ferbncode force-pushed the release_group_work branch 5 times, most recently from 54b2426 to f231a37 Compare July 12, 2017 19:02

alastair reviewed Jul 12, 2017

View reviewed changes

gentlecat suggested changes Jul 12, 2017

View reviewed changes

ferbncode force-pushed the release_group_work branch 6 times, most recently from 9cc3c8c to d1de138 Compare July 26, 2017 03:49

gentlecat suggested changes Aug 2, 2017

View reviewed changes

ferbncode added 2 commits August 4, 2017 17:22

CB-231: Use direct database access for entity: release_group

d851a53

Reduce number of queries, add tag_helper, fix urls

75a71d3

ferbncode force-pushed the release_group_work branch from d1de138 to 35bd76b Compare August 4, 2017 21:13

Add tests for release_group_page & musicbrainz db functions, other fixes

7841e51

ferbncode force-pushed the release_group_work branch from 35bd76b to 7841e51 Compare August 4, 2017 21:17

gentlecat reviewed Aug 6, 2017

View reviewed changes

ferbncode force-pushed the release_group_work branch from 02d70bf to 418e5be Compare August 6, 2017 16:13

Check that reviews appear on release_group page, other changes

55c2dbc

ferbncode force-pushed the release_group_work branch from 418e5be to 379ee1c Compare August 6, 2017 16:22

gentlecat suggested changes Aug 6, 2017

View reviewed changes

Fix macro url, undo some changes

e3ec18c

ferbncode force-pushed the release_group_work branch from 379ee1c to e3ec18c Compare August 6, 2017 18:33

gentlecat approved these changes Aug 7, 2017

View reviewed changes

gentlecat changed the title ~~CB-231: Use direct database access for entity: release_group~~ CB-231: Use direct database access for the Release Group entity Aug 7, 2017

gentlecat merged commit 154782d into metabrainz:master Aug 7, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CB-231: Use direct database access for the Release Group entity #122

CB-231: Use direct database access for the Release Group entity #122

ferbncode commented Jul 5, 2017

alastair left a comment

alastair Jul 12, 2017

ferbncode Jul 21, 2017

alastair Jul 12, 2017

ferbncode Jul 21, 2017 •

edited

alastair Jul 12, 2017

ferbncode Jul 21, 2017

alastair Jul 12, 2017

gentlecat Jul 12, 2017

ferbncode Jul 13, 2017 •

edited

ferbncode Jul 13, 2017

gentlecat left a comment

gentlecat commented Jul 20, 2017

gentlecat left a comment

gentlecat Aug 2, 2017

gentlecat Aug 2, 2017

gentlecat Aug 2, 2017

gentlecat Aug 2, 2017

gentlecat Aug 2, 2017

gentlecat Aug 6, 2017

gentlecat Aug 6, 2017

gentlecat Aug 6, 2017

gentlecat left a comment

gentlecat Aug 6, 2017

gentlecat Aug 6, 2017

ferbncode Aug 6, 2017

gentlecat Aug 6, 2017

ferbncode Aug 6, 2017

ferbncode commented Aug 6, 2017 •

edited

gentlecat commented Aug 7, 2017

		from brainzutils import cache


		DEFAULT_CACHE_EXPIRATION = 12 * 60 * 60 # seconds (12 hours)

CB-231: Use direct database access for the Release Group entity #122

CB-231: Use direct database access for the Release Group entity #122

Conversation

ferbncode commented Jul 5, 2017

alastair left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferbncode Jul 21, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferbncode Jul 13, 2017 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gentlecat left a comment

Choose a reason for hiding this comment

gentlecat commented Jul 20, 2017

gentlecat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gentlecat left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferbncode commented Aug 6, 2017 • edited

gentlecat commented Aug 7, 2017

ferbncode Jul 21, 2017 •

edited

ferbncode Jul 13, 2017 •

edited

ferbncode commented Aug 6, 2017 •

edited