Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Spotify listen importer #407

Closed
wants to merge 1 commit into from
Closed

Spotify listen importer #407

wants to merge 1 commit into from

Conversation

alastair
Copy link
Collaborator

Summary

  • This is a…
    • Feature addition
  • Describe this change in 1-2 sentences: Add the ability to connect your spotify account to Listenbrainz and automatically import listens

Solution

Spotify has an API that can read the 50 most recent songs played by a user. We can do an oauth connection and read the user's recent plays and import them as listens.

The user profile page has a connect to spotify button which does an oauth workflow.
A separate script loops through all users who have performed this login flow and gets listens for each of them, converting them to listens and importing into LB.
Listens are imported starting from the user who last had their listens imported least recently.

Action

I tried a few new things with code layout, specifically a domain package that contains methods relevant to spotify, but disconnected from our database methods. This allows us to test database methods in db directly with the database and not worry about interaction, and methods that "do stuff", while letting us mock database access to run tests faster.
I'm open to feedback on if this is a direction that we want to take with this code, but if it's to confusing we can rearrange it into a format closer to the existing code.

There are some tests missing, especially in the domain package. There are a number of comments in the spotify_read_listens.pyscript especially regarding error catching from Spotify about what we should do in particular failure cases. This isn't implemented yet.

I don't know the preferred way of starting scripts in the LB production environment, especially regarding config file reading, etc. This will need to be updated.

We have to keep an eye on how long this operation will take to run. At only 50 songs per history for a user, we may find that with too many users that we start running behind. Perhaps we'll have to run multiple scripts. This is a future concern.

@alastair
Copy link
Collaborator Author

Last.fm's spotify connect documentation (https://getsatisfaction.com/lastfm/topics/spotify-scrobbling-gvfb1uv1ff1cs) says

Private sessions are not supported
We don't currently support Private Sessions so all listens will appear on your Last.fm profile. If you're using Private Sessions to hide your listening history, then please don't enable the new scrobbler yet.

We should confirm if this is a concern that we might have... perhaps spotify returns these listens in their play history but with a flag in the json? If so we should skip it. If they report it with no flag we should ping Hugh

@alastair
Copy link
Collaborator Author

alastair commented May 14, 2018

@mayhem requested a breakdown of what I think should be done before and after a merge:

Before:

  • Config/running of script
  • Handle failing error codes from spotify:
    # TODO: if error
    # If it's 401, try and refresh one more time and get it
    # If that time fails, report the failure
    # If refresh fails, report the failure
    # What other types of error can we make? (Try bad/invalid keys), 404?
    # If you get status code 429, it means that you have sent too many requests. If this happens, have a look in the Retry-After header
    # spotify.update_error(user.user_id, False, error_msg)
    # https://developer.spotify.com/web-api/user-guide/#response-status-codes
    # 400, can't retry. 401, refresh. 403/404, can't retry.
    # 5xx: retry some times?
  • Handle failures from adding listens:
    # TODO: This could fail with some listens, but is something that we create
    # TODO: so we should report to developers and not tell the user?

After:

  • Finalise tests
  • Improve the way that we tell a user if their account is connected, when it was last run, and if there was any error while performing a query
  • Confirm that we don't record private listens
  • decide if the domain package is nice
  • admin function to list status of users, and deactivate or reactive an account
  • Logger/monitor to make sure that we keep up with updates given the time that it takes to run over all users given the limit of 50 songs per user on the spotify api

@alastair
Copy link
Collaborator Author

One final thing - we should decide if we're happy with the data that we add to listenbrainz:

additional = {'tracknumber': track['track_number'],
'spotify_artist_ids': [a['external_urls']['spotify'] for a in artists],
'artist_names': [a['name'] for a in artists],
'listening_from': 'spotify',
'disc_number': track['disc_number'],
'duration_ms': track['duration_ms'],
'spotify_album_id': track['album']['external_urls']['spotify'],
# Named 'release_*' because 'release_name' is an official name in the docs
'release_artist_name': album_artist_name,
'release_artist_names': [a['name'] for a in album_artists],
# Named 'album_*' because Spotify calls it album and this is spotify-specific
'spotify_album_artist_ids': [a['external_urls']['spotify'] for a in album_artists],
}

Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this! We clearly have a lot of things loose ends to tighten up that also depend on our production setup. Seems like the perfect time to pass this off!

def get_spotipy_client(self):
return spotipy.Spotify(auth=self.user_token)

def nice_last_updated(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a function to make it nice and human readable? If so, maybe should also also use an "ago" function to make the timestamp more approachable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was for a potential status message on the user profile page ("You have connected your Spotify account to ListenBrainz and it was last updated x", 'never' is for the case where a user connects but it hasn't run yet) - you're right that "ago" would be a good update here.


@property
def token_expired(self):
print(self.token_expires)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cruft?

def __str__(self):
return "<Spotify(user:%s)>" % self.user_id

# TODO: All documentation for these
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paramsingh : please add tickets for all the TODOs in this PR.

row = db_spotify.get_user(user_id)
if row:
return Spotify.from_dbrow(row)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Not sure I'm a fan of implicit returns..

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

explicit else return None? Fine by me

class SpotifyDomainTestCase(ServerTestCase):

def test(self):
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass what??

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pass the tests, please

# TODO: if error
# If it's 401, try and refresh one more time and get it
# If that time fails, report the failure
# If refresh fails, report the failure
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Param: Please report failures to sentry.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to developer reports, I had the idea of a field spotify.update_error which could be used to report to the user if an error happened while doing an update (e.g. 'we no longer have permission to read your listens, please re-auth with spotify'). I've not used this yet, but my plan with this comment was to report this user to the user (this specific error is that if we try and refresh the user's oauth access key and it fails we should tell this to the user)

spotify.update_last_updated(user.user_id, False, 'Unable to validate listens')
return
# TODO: This is our problem, report it to LB developers with a logger (sentry?)
# TODO: Skip invalid listens instead of skipping all listens?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, get as many valid listens in as possible.

def main():
app = listenbrainz.webserver.create_app()

with app.app_context():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be some throttling here? While we don't have a lot of users, this tight loop would hammer the spotify API unnecessarily. It would be ideal to find some delay here that we can sleep for and then wake up again to start over. I think this point will need some discussion...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good, but we need to keep track of this in the medium term (with a logging system) - we need to know how long it takes for a run to happen and how many users we have - if it takes more than 150 minutes to run (50 songs * 3 minutes) then we could potentially miss listens

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed -- we need to build a system that detects when we're getting close to falling behind so we can take corrective measures.


try:
token = sp_oauth.get_access_token(code)
print("code: {}".format(code))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also cruft?

@mayhem
Copy link
Member

mayhem commented May 16, 2018

@mayhem
Copy link
Member

mayhem commented May 16, 2018

And also: https://tickets.metabrainz.org/browse/LB-362 , https://tickets.metabrainz.org/browse/LB-363 , https://tickets.metabrainz.org/browse/LB-364 .

We have quite a lot of stuff to do here -- I wonder if we should merge this to another branch and not quite commit to merging to master yet.

@paramsingh
Copy link
Collaborator

I wonder if we should merge this to another branch and not quite commit to merging to master yet.

+1, there's a lot of stuff to do here.

@mayhem
Copy link
Member

mayhem commented Jun 22, 2018

Well, the branch is already in this repo, so we can close this PR for now and get back to it in a bit.

@mayhem mayhem closed this Jun 22, 2018
# If that time fails, report the failure
# If refresh fails, report the failure
# What other types of error can we make? (Try bad/invalid keys), 404?
# If you get status code 429, it means that you have sent too many requests. If this happens, have a look in the Retry-After header
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotipy is nice enough to do this for us, it seems. :)

https://github.com/plamere/spotipy/blob/master/spotipy/client.py#L152

@paramsingh paramsingh mentioned this pull request Jun 25, 2018
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
3 participants