Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update plugins/wikidata/wikidata.py #181

Merged
merged 4 commits into from Dec 10, 2018

Conversation

pbrackin
Copy link
Contributor

@pbrackin pbrackin commented Nov 12, 2018

Refer to

https://tickets.metabrainz.org/browse/PICARD-1411
https://community.metabrainz.org/t/picard-keeps-locking-up/358547/4

I was not liking some of the esoteric genres that came from last.fm, so I decided to try wikidata & while it worked for some cases, I found it was not always stable. It seems like this is pretty well documented in the JIRA DB + community forums.

The most glaring issue was this recursion that happened when you right-click the "other versions" option to switch releases. It would eventually call to finalize the album which would call run_track_metadata_processors which would call to process the track then to finalize again, and then...

So, I spent some time carefully stepping through & trying to understand the plugin flow & data structures that are in place. I was able to deduce the one problematic call to finalize there was not needed & thus eliminate the recursion. There are some locks they put in which seems to acknowledge a re-entrant issue, but perhaps they didn't see through to the recursion. But as best I can tell, no other threads ever come in so I am not sure the locks are needed once the recursion problem is lifted. I did some basic testing with both the recursion & the locks out & everything seemed to work ok for me. But I left the locks in for now, in case there are some edge cases I am not seeing thus far. Locks do represent a certain amount of overhead tho, so some consideration might be given to their removal.

Besides the recursion, I identified extra album insertions that may have been causing some albums to end up un-finalized. Removing the extra insertions seems to have eliminated these dangling albums.

Also, I moved the server & port to instance variables to avoid hundred of unneeded config lookups.

The above items in combination seemed to also speed the overall process up a bit. Nothing super-major, but noticeable.

Lastly, I sorted the genre list because if left unsorted, the sets can come out in different order for every lookup, causing "false positives" for the genre field (looks like a change when, in fact, just the order is changing).

I ran the unittests under the test folder & it said 68 tests OK. Not sure if you had other tests or what, since this is my first PR on this project.

Remove recursion, Remove redundant album insertions, Sort genre list, Make mb host+post instance variables.
Copy link
Member

@phw phw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your changes make sense, but I think this does not go far enough. So the feedback below is not really about your code, but the existing code.

As I read the code there is a race condition between process_wikidata and parse_wikidata_response. In process_wikidata the requests for all albums are increased:

for album in self.albums[item_id]:
    album._requests += 1

In parse_wikidata_response the counts get removed with:

for album in self.albums[item_id]:
    album._requests -= 1

Problem here is the list of albums when process_wikidata gets called is not necessarily the same as in parse_wikidata_response. I also don't get why the counts for all albums need to get increased. Wouldn't it be sufficient to do this for the currently processed album only?

And if there is a reason to do it for all in the list a copy of this list should probably be passed to parse_wikidata_response so that it decreases the calls only for those albums that had been increased previously.

plugins/wikidata/wikidata.py Outdated Show resolved Hide resolved
plugins/wikidata/wikidata.py Outdated Show resolved Hide resolved
@pbrackin
Copy link
Contributor Author

pbrackin commented Nov 12, 2018

OK. Well, in general, I agree that the design is somewhat flawed. I came across many of the same questions as you - only without the benefit of your in-depth understanding of Picard's guts :P At the very least, the current design can benefit from some more changes - some of which I discussed in the PR. At the most, possibly a complete re-do with only a recycling of the parse code could be in order.

However, my efforts here were mainly targeted towards "just get it working". Not sure where you're at on your release cycle, but it may be prudent to get the current changes into the next release and then allow some time for discussion on the design & subsequent associated implementation. It may be helpful to hear from the original authors, if they are still around, on their rationale on the existing design and as well, they may want to make some changes too.

Of course... holding back the changes in this PR on all that may not be prudent as the plugin is pretty broken in the current state & with my current PR changes, it does work much better.

Having said that.. if you feel strongly to hold off til a more robust solution is in place and we don't hear from the original authors - I'd be happy to dig in and revamp it. Of course, I'd need some time.

I will go ahead and provide some feedback on your feedback in the mean time :O)

@phw
Copy link
Member

phw commented Nov 13, 2018

Since plugin changes can be published independently of Picard we are not bound to any release schedule here. I'm happy to merge this as a step forward if the config issue gets resolved.

But I still think this needs some more cleanup before we can consider it fixed.

self.cache[item_id] = genre_list
log.debug('WIKIDATA: setting genre : %s ' % genre_list)

else:
log.info('WIKIDATA: Genre not found in wikidata')

log.info('WIKIDATA: Seeing if we can finalize tags %s ' %
len(self.albums[item_id]))
log.info('WIKIDATA: Seeing if we can finalize tags %s ' % len(self.albums[item_id]))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

%d format string place holder makes more sense with len()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't actually change this line other than making it one-line, but I can throw this is the next commit for you :)

@pbrackin
Copy link
Contributor Author

Good call. I will try to help you out further & rework it some more if we don't hear from the authors on any of this.

One more thing... you have a design flaw with all the calls to the protected album member _finalize_loading. This can be dangerous. You should consider a wrapper for this. You could use the wrapper for additional checks and logic like checking for recursions to avert plugins sinking the whole ship, as we saw here when using the 'Other releases" context item.

@phw
Copy link
Member

phw commented Nov 13, 2018

you have a design flaw with all the calls to the protected album member _finalize_loading. This can be dangerous. You should consider a wrapper for this.

See the existing ticket at https://tickets.metabrainz.org/browse/PICARD-1403 . Suggestions and code welcome. Especially how to best handle that we ensure only one request decrement for each increment.

allow mb host & port config update checks once per process_track call.  clean some empty spaces, change a log call formatting
@pbrackin
Copy link
Contributor Author

OK good!

plugins/wikidata/wikidata.py Show resolved Hide resolved
@phw
Copy link
Member

phw commented Nov 17, 2018

This plugin is somewhat tricky. I noticed another issue, which actually might be the root problem of some of the issues users have. To check if an album can be removed from the plugins list we have something like this in the code:

if not album._requests:
                self.albums[item_id].remove(album)
                album._finalize_loading(None)

But this will only ever work reliable if wikidata is the only thing dong requests. But this is seldom the case. So if wikidata finishes the requests, but other plugins or covert loaders are still busy, the album will stay in wikidata's list. Any attempt of later reloading this album will get stuck.

Having _finalize_loading already getting called in process_request seems to have been an attempt to fix this. But the proper way is probably for the wikidata plugin to track only it's own requests for an album (in addition to manipulating the album request counter).

I did some debugging here, I will see if I can submit a fix for this directly to this pull request tomorrow.

@pbrackin
Copy link
Contributor Author

pbrackin commented Nov 18, 2018

Not sure I fully understand the dependency of one plugin on other plugins that you are describing, or even the design intent of album._requests. Seems like you would want to have it all managed in the album object or higher & just let the plugin mark an album if a change is needed, so the album would open and finalize as needed - managing requests internally & also eliminate any plugin inter-dependencies. But it seems like the plugins have too much power with the ability to block the main thread, leave albums open, etc... Not being critical, just saying this approach could perhaps be improved over time. In the mean time, some more exhaustive testing of new plugins or plugin mods should be considered.

I have continued to test the current revision of wikidata (the one in this PR) against subsets of a large media library & its been working flawlessly for me. Its truly a massive improvement over what's currently being made available to users. I could test the reloading problem you are describing if you can explain the test procedure more clearly. I am running a couple other plugins - the fanart.tv one, mainly.

I was going to start on the additional improvements we discussed this next week. I was going to create a new JIRA ticket to track that. But if you are going to make changes here maybe its better to wait on those...

@phw
Copy link
Member

phw commented Nov 18, 2018

My attempt of improving this: pbrackin#1

@phw phw force-pushed the PICARD-1411-Multiple_Wikidata_Issues branch 2 times, most recently from c02fd8e to 21a5906 Compare November 24, 2018 08:53
Made the albums dict a straight dict of albums (now called itemAlbums), ensured itemAlbums and requests dicts are empty after each run, removed locks, cleaned up logging.
@pbrackin
Copy link
Contributor Author

pbrackin commented Dec 5, 2018

I am still testing, but so far its been working good. One thing I found though... if there is a network problem, a track may get an incomplete genre set from what it would otherwise get. To the user this shows up like a change and the change is a reduced genre set.

The problem seems to be that the webservice does not send an error indication via the callback so wikidata is kind of in the dark that there was a problem.

Below is a snippet from the log. Here again, I am not sure about the design intent, so maybe you can guide me in on if this is considered "normal" behavior for the plugin to not get an error indication.

I noticed that if I simulate a complete network outage by driving the musicbrainz.org requests to null then I get the red album discs in the UI. This seems goods so as to indicate to the user they need to retry later. But for the case I am describing here, its not turning red.

Probably can address this in a separate ticket, but wanted to bring it up since I see it in my testing on occasion.

E: 07:45:44,103 C:\Users\paul\Documents\GitHub\picard\picard\webservice\__init__._handle_reply:383: Network request error for https://musicbrainz.org:443/ws/2/release-group/467a5697-30e7-34b8-9813-d07e6af9c653?inc=url-rels: Error transferring https://musicbrainz.org:443/ws/2/release-group/467a5697-30e7-34b8-9813-d07e6af9c653?inc=url-rels - server replied: Service Temporarily Unavailable (QT code 403, HTTP code 200)
D: 07:45:44,105 C:\Users\paul\Documents\GitHub\picard\picard\webservice\__init__._handle_reply:395: Retrying https://musicbrainz.org:443/ws/2/release-group/467a5697-30e7-34b8-9813-d07e6af9c653?inc=url-rels (#2)

Copy link
Member

@phw phw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is looking good now. I really appreciate you did the cleanup here. I think it is better to have this clean with the fixes, otherwise the next one working on this just wonders what this is all about.

album = self.itemAlbums[item_id]
album._requests -= 1
if not album._requests:
self.itemAlbums = {k: v for k, v in self.itemAlbums.items() if v != album}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still have an issue here, since this line only ever gets executed when album._requests becomes 0, but actually this will not always be true if other plugins are used in parallel (only one plugin will be the one dealing with the last request, and this might not be wikidata). So we have the risk of not updating itemAlbums correctly.

Anyway I think we should merge this first and deal with this issue separately.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only way to solve this probably is for this plugin to keep track of its own request count.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK yeah I can do that too in a separate PR. If you change the album request counter thing to an API / wrapper call, this change could be made at the same time, or if that is not coming soon we could go ahead and do this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We shouldn't wait for any Picard API changes, because there are no concrete plans yet and the earliest release that could have some changes here could be Picard 2.2.

@phw
Copy link
Member

phw commented Dec 9, 2018

The problem seems to be that the webservice does not send an error indication via the callback so wikidata is kind of in the dark that there was a problem.

Below is a snippet from the log. Here again, I am not sure about the design intent, so maybe you can guide me in on if this is considered "normal" behavior for the plugin to not get an error indication.

The problem here is that this plugin never passes any error to album._finalize_loading(). To actually show an error the plugin is supposed to call album._finalize_loading(error). Note that currently the call to _finalize_loading is guarded by a check against album._requests being 0 (see also my comment above), which strictly speaking is not necessary since this is checked in _finalize_loading itself. That is especially true if passing an error, which must be done regardless the status of album._requests.

Regarding the design intent you are asking the wrong person, until a few weeks ago I had never looked at the wikidata code in detail. We are kind of in the same boat here :)

LGTM!

@pbrackin
Copy link
Contributor Author

pbrackin commented Dec 9, 2018

But how can the plugin pass an error if it never gets an error back from the webservice to pass? From what I could tell, the callback just never gets called for these cases. I suppose you could add some sort of timer and make an error on timeout, but seems like the right solution is for the webservice to notify the plugin?

@phw
Copy link
Member

phw commented Dec 9, 2018

But how can the plugin pass an error if it never gets an error back from the webservice to pass?

On a request error the callback will be called with the error object set, see the error handling with if error.

@pbrackin
Copy link
Contributor Author

pbrackin commented Dec 9, 2018

Yeah that is what I thought. That is not happening in this case.

@phw
Copy link
Member

phw commented Dec 10, 2018

Yeah that is what I thought. That is not happening in this case.

It definitely is supposed to happen, and it does happen here on my tests. If there is any network error (I tested by changing the wikidata.org host to something invalid, but could also be an error response like 500 from the server) the error object will be set and the wikidata logging happens.

If that really does not work for you, can you give some details about the exact error condition?

Copy link
Collaborator

@zas zas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have enough positive changes in this PR to merge, it looks good, further improvements should be made in new PRs.

@phw phw merged commit 7e2dd80 into metabrainz:2.0 Dec 10, 2018
@pbrackin
Copy link
Contributor Author

pbrackin commented Dec 11, 2018

OK when I get a chance I will open a new ticket with more data on the "lack of callback error" for the one case I was describing. Thanks for y'alls help getting this PR in to make this plugin operational :O)

@pbrackin
Copy link
Contributor Author

pbrackin commented Dec 12, 2018

One more question... I saw the merge happened. When & how does this new plugin version become available? I saw the generate.py script in the plugin repo has to be run. Does that get run by your build system? Or do I need to do something with my local Picard to see it?

@phw
Copy link
Member

phw commented Dec 12, 2018

Unfortunately not, the website needs to be redeployed manually and I was told this is a rather painful process. We talked about this today and would like to streamline the process, but it'll require some work.

@zas
Copy link
Collaborator

zas commented Dec 13, 2018

When & how does this new plugin version become available?

New plugin versions were deployed just now. As explained by @phw , it's a tedious process, we have plans to improve this, but for now, it just sucks ;)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants