Spotify listen importer #407

alastair · 2018-05-14T07:38:51Z

Summary

This is a…
- Feature addition
Describe this change in 1-2 sentences: Add the ability to connect your spotify account to Listenbrainz and automatically import listens

Solution

Spotify has an API that can read the 50 most recent songs played by a user. We can do an oauth connection and read the user's recent plays and import them as listens.

The user profile page has a connect to spotify button which does an oauth workflow.
A separate script loops through all users who have performed this login flow and gets listens for each of them, converting them to listens and importing into LB.
Listens are imported starting from the user who last had their listens imported least recently.

Action

I tried a few new things with code layout, specifically a domain package that contains methods relevant to spotify, but disconnected from our database methods. This allows us to test database methods in db directly with the database and not worry about interaction, and methods that "do stuff", while letting us mock database access to run tests faster.
I'm open to feedback on if this is a direction that we want to take with this code, but if it's to confusing we can rearrange it into a format closer to the existing code.

There are some tests missing, especially in the domain package. There are a number of comments in the spotify_read_listens.pyscript especially regarding error catching from Spotify about what we should do in particular failure cases. This isn't implemented yet.

I don't know the preferred way of starting scripts in the LB production environment, especially regarding config file reading, etc. This will need to be updated.

We have to keep an eye on how long this operation will take to run. At only 50 songs per history for a user, we may find that with too many users that we start running behind. Perhaps we'll have to run multiple scripts. This is a future concern.

alastair · 2018-05-14T08:33:11Z

Last.fm's spotify connect documentation (https://getsatisfaction.com/lastfm/topics/spotify-scrobbling-gvfb1uv1ff1cs) says

Private sessions are not supported
We don't currently support Private Sessions so all listens will appear on your Last.fm profile. If you're using Private Sessions to hide your listening history, then please don't enable the new scrobbler yet.

We should confirm if this is a concern that we might have... perhaps spotify returns these listens in their play history but with a flag in the json? If so we should skip it. If they report it with no flag we should ping Hugh

alastair · 2018-05-14T09:45:56Z

@mayhem requested a breakdown of what I think should be done before and after a merge:

Before:

Config/running of script

Handle failing error codes from spotify:

listenbrainz-server/listenbrainz/spotify_updater/spotify_read_listens.py

Lines 61 to 70 in fa9e1da

    
           # TODO: if error 
        
           # If it's 401, try and refresh one more time and get it 
        
           # If that time fails, report the failure 
        
           # If refresh fails, report the failure 
        
           # What other types of error can we make? (Try bad/invalid keys), 404? 
        
           # If you get status code 429, it means that you have sent too many requests. If this happens, have a look in the Retry-After header 
        
           # spotify.update_error(user.user_id, False, error_msg) 
        
           # https://developer.spotify.com/web-api/user-guide/#response-status-codes 
        
           # 400, can't retry. 401, refresh. 403/404, can't retry. 
        
           # 5xx: retry some times?

Handle failures from adding listens:

listenbrainz-server/listenbrainz/spotify_updater/spotify_read_listens.py

Lines 87 to 88 in fa9e1da

    
           # TODO: This could fail with some listens, but is something that we create 
        
           # TODO: so we should report to developers and not tell the user?

After:

Finalise tests
Improve the way that we tell a user if their account is connected, when it was last run, and if there was any error while performing a query
Confirm that we don't record private listens
decide if the domain package is nice
admin function to list status of users, and deactivate or reactive an account
Logger/monitor to make sure that we keep up with updates given the time that it takes to run over all users given the limit of 50 songs per user on the spotify api

alastair · 2018-05-14T09:49:20Z

One final thing - we should decide if we're happy with the data that we add to listenbrainz:

listenbrainz-server/listenbrainz/spotify_updater/spotify_read_listens.py

Lines 21 to 33 in fa9e1da

    
           additional = {'tracknumber': track['track_number'], 
        
                         'spotify_artist_ids': [a['external_urls']['spotify'] for a in artists], 
        
                         'artist_names': [a['name'] for a in artists], 
        
                         'listening_from': 'spotify', 
        
                         'disc_number': track['disc_number'], 
        
                         'duration_ms': track['duration_ms'], 
        
                         'spotify_album_id': track['album']['external_urls']['spotify'], 
        
                         # Named 'release_*' because 'release_name' is an official name in the docs 
        
                         'release_artist_name': album_artist_name, 
        
                         'release_artist_names': [a['name'] for a in album_artists], 
        
                         # Named 'album_*' because Spotify calls it album and this is spotify-specific 
        
                         'spotify_album_artist_ids': [a['external_urls']['spotify'] for a in album_artists], 
        
                         }

mayhem

Thank you for this! We clearly have a lot of things loose ends to tighten up that also depend on our production setup. Seems like the perfect time to pass this off!

mayhem · 2018-05-14T09:43:30Z

listenbrainz/domain/spotify.py

+    def get_spotipy_client(self):
+        return spotipy.Spotify(auth=self.user_token)
+
+    def nice_last_updated(self):


Is this a function to make it nice and human readable? If so, maybe should also also use an "ago" function to make the timestamp more approachable.

I think this was for a potential status message on the user profile page ("You have connected your Spotify account to ListenBrainz and it was last updated x", 'never' is for the case where a user connects but it hasn't run yet) - you're right that "ago" would be a good update here.

mayhem · 2018-05-14T09:47:33Z

listenbrainz/domain/spotify.py

+
+    @property
+    def token_expired(self):
+        print(self.token_expires)


mayhem · 2018-05-14T09:48:04Z

listenbrainz/domain/spotify.py

+    def __str__(self):
+        return "<Spotify(user:%s)>" % self.user_id
+
+# TODO: All documentation for these


@paramsingh : please add tickets for all the TODOs in this PR.

mayhem · 2018-05-14T09:49:19Z

listenbrainz/domain/spotify.py

+    row = db_spotify.get_user(user_id)
+    if row:
+        return Spotify.from_dbrow(row)
+


Hmm. Not sure I'm a fan of implicit returns..

explicit else return None? Fine by me

mayhem · 2018-05-14T09:50:48Z

listenbrainz/domain/tests/test_spotify.py

+class SpotifyDomainTestCase(ServerTestCase):
+
+    def test(self):
+        pass


pass what??

pass the tests, please

mayhem · 2018-05-14T09:54:32Z

listenbrainz/spotify_updater/spotify_read_listens.py

+    # TODO: if error
+    # If it's 401, try and refresh one more time and get it
+    # If that time fails, report the failure
+    # If refresh fails, report the failure


Param: Please report failures to sentry.

In addition to developer reports, I had the idea of a field spotify.update_error which could be used to report to the user if an error happened while doing an update (e.g. 'we no longer have permission to read your listens, please re-auth with spotify'). I've not used this yet, but my plan with this comment was to report this user to the user (this specific error is that if we try and refresh the user's oauth access key and it fails we should tell this to the user)

mayhem · 2018-05-14T09:55:32Z

listenbrainz/spotify_updater/spotify_read_listens.py

+        spotify.update_last_updated(user.user_id, False, 'Unable to validate listens')
+        return
+        # TODO: This is our problem, report it to LB developers with a logger (sentry?)
+        # TODO: Skip invalid listens instead of skipping all listens?


Yes, get as many valid listens in as possible.

mayhem · 2018-05-14T09:59:08Z

listenbrainz/spotify_updater/spotify_read_listens.py

+def main():
+    app = listenbrainz.webserver.create_app()
+
+    with app.app_context():


Should there be some throttling here? While we don't have a lot of users, this tight loop would hammer the spotify API unnecessarily. It would be ideal to find some delay here that we can sleep for and then wake up again to start over. I think this point will need some discussion...

Yes good, but we need to keep track of this in the medium term (with a logging system) - we need to know how long it takes for a run to happen and how many users we have - if it takes more than 150 minutes to run (50 songs * 3 minutes) then we could potentially miss listens

Agreed -- we need to build a system that detects when we're getting close to falling behind so we can take corrective measures.

mayhem · 2018-05-14T10:00:14Z

listenbrainz/webserver/views/profile.py

+
+    try:
+        token = sp_oauth.get_access_token(code)
+        print("code: {}".format(code))


also cruft?

paramsingh · 2018-05-15T20:33:17Z

Created

mayhem · 2018-05-16T11:51:20Z

I also added https://tickets.metabrainz.org/browse/LB-361

mayhem · 2018-05-16T12:47:11Z

And also: https://tickets.metabrainz.org/browse/LB-362 , https://tickets.metabrainz.org/browse/LB-363 , https://tickets.metabrainz.org/browse/LB-364 .

We have quite a lot of stuff to do here -- I wonder if we should merge this to another branch and not quite commit to merging to master yet.

paramsingh · 2018-06-21T22:07:16Z

I wonder if we should merge this to another branch and not quite commit to merging to master yet.

+1, there's a lot of stuff to do here.

mayhem · 2018-06-22T09:59:11Z

Well, the branch is already in this repo, so we can close this PR for now and get back to it in a bit.

paramsingh · 2018-06-22T18:05:50Z

listenbrainz/spotify_updater/spotify_read_listens.py

+    # If that time fails, report the failure
+    # If refresh fails, report the failure
+    # What other types of error can we make? (Try bad/invalid keys), 404?
+    # If you get status code 429, it means that you have sent too many requests. If this happens, have a look in the Retry-After header


Spotipy is nice enough to do this for us, it seems. :)

https://github.com/plamere/spotipy/blob/master/spotipy/client.py#L152

Spotify importer

fa9e1da

mayhem reviewed May 14, 2018

View reviewed changes

mayhem closed this Jun 22, 2018

paramsingh reviewed Jun 22, 2018

View reviewed changes

paramsingh mentioned this pull request Jun 25, 2018

Spotify Listen Importer #422

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spotify listen importer #407

Spotify listen importer #407

alastair commented May 14, 2018

alastair commented May 14, 2018

alastair commented May 14, 2018 •

edited

alastair commented May 14, 2018

mayhem left a comment

mayhem May 14, 2018

alastair May 14, 2018

mayhem May 14, 2018

mayhem May 14, 2018

mayhem May 14, 2018

alastair May 14, 2018

mayhem May 14, 2018

alastair May 14, 2018

mayhem May 14, 2018

alastair May 14, 2018

mayhem May 14, 2018

mayhem May 14, 2018

alastair May 14, 2018

mayhem May 14, 2018

mayhem May 14, 2018

paramsingh commented May 15, 2018

mayhem commented May 16, 2018

mayhem commented May 16, 2018

paramsingh commented Jun 21, 2018

mayhem commented Jun 22, 2018

paramsingh Jun 22, 2018

Spotify listen importer #407

Spotify listen importer #407

Conversation

alastair commented May 14, 2018

Summary

Solution

Action

alastair commented May 14, 2018

alastair commented May 14, 2018 • edited

alastair commented May 14, 2018

mayhem left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

paramsingh commented May 15, 2018

mayhem commented May 16, 2018

mayhem commented May 16, 2018

paramsingh commented Jun 21, 2018

mayhem commented Jun 22, 2018

Choose a reason for hiding this comment

alastair commented May 14, 2018 •

edited