New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spotify listen importer #407
Conversation
Last.fm's spotify connect documentation (https://getsatisfaction.com/lastfm/topics/spotify-scrobbling-gvfb1uv1ff1cs) says
We should confirm if this is a concern that we might have... perhaps spotify returns these listens in their play history but with a flag in the json? If so we should skip it. If they report it with no flag we should ping Hugh |
@mayhem requested a breakdown of what I think should be done before and after a merge: Before:
After:
|
One final thing - we should decide if we're happy with the data that we add to listenbrainz:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for this! We clearly have a lot of things loose ends to tighten up that also depend on our production setup. Seems like the perfect time to pass this off!
def get_spotipy_client(self): | ||
return spotipy.Spotify(auth=self.user_token) | ||
|
||
def nice_last_updated(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a function to make it nice and human readable? If so, maybe should also also use an "ago" function to make the timestamp more approachable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this was for a potential status message on the user profile page ("You have connected your Spotify account to ListenBrainz and it was last updated x", 'never' is for the case where a user connects but it hasn't run yet) - you're right that "ago" would be a good update here.
|
||
@property | ||
def token_expired(self): | ||
print(self.token_expires) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cruft?
def __str__(self): | ||
return "<Spotify(user:%s)>" % self.user_id | ||
|
||
# TODO: All documentation for these |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@paramsingh : please add tickets for all the TODOs in this PR.
row = db_spotify.get_user(user_id) | ||
if row: | ||
return Spotify.from_dbrow(row) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm. Not sure I'm a fan of implicit returns..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explicit else return None
? Fine by me
class SpotifyDomainTestCase(ServerTestCase): | ||
|
||
def test(self): | ||
pass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass what??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass the tests, please
# TODO: if error | ||
# If it's 401, try and refresh one more time and get it | ||
# If that time fails, report the failure | ||
# If refresh fails, report the failure |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Param: Please report failures to sentry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In addition to developer reports, I had the idea of a field spotify.update_error
which could be used to report to the user if an error happened while doing an update (e.g. 'we no longer have permission to read your listens, please re-auth with spotify'). I've not used this yet, but my plan with this comment was to report this user to the user (this specific error is that if we try and refresh the user's oauth access key and it fails we should tell this to the user)
spotify.update_last_updated(user.user_id, False, 'Unable to validate listens') | ||
return | ||
# TODO: This is our problem, report it to LB developers with a logger (sentry?) | ||
# TODO: Skip invalid listens instead of skipping all listens? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, get as many valid listens in as possible.
def main(): | ||
app = listenbrainz.webserver.create_app() | ||
|
||
with app.app_context(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be some throttling here? While we don't have a lot of users, this tight loop would hammer the spotify API unnecessarily. It would be ideal to find some delay here that we can sleep for and then wake up again to start over. I think this point will need some discussion...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good, but we need to keep track of this in the medium term (with a logging system) - we need to know how long it takes for a run to happen and how many users we have - if it takes more than 150 minutes to run (50 songs * 3 minutes) then we could potentially miss listens
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed -- we need to build a system that detects when we're getting close to falling behind so we can take corrective measures.
|
||
try: | ||
token = sp_oauth.get_access_token(code) | ||
print("code: {}".format(code)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also cruft?
I also added https://tickets.metabrainz.org/browse/LB-361 |
And also: https://tickets.metabrainz.org/browse/LB-362 , https://tickets.metabrainz.org/browse/LB-363 , https://tickets.metabrainz.org/browse/LB-364 . We have quite a lot of stuff to do here -- I wonder if we should merge this to another branch and not quite commit to merging to master yet. |
+1, there's a lot of stuff to do here. |
Well, the branch is already in this repo, so we can close this PR for now and get back to it in a bit. |
# If that time fails, report the failure | ||
# If refresh fails, report the failure | ||
# What other types of error can we make? (Try bad/invalid keys), 404? | ||
# If you get status code 429, it means that you have sent too many requests. If this happens, have a look in the Retry-After header |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spotipy is nice enough to do this for us, it seems. :)
https://github.com/plamere/spotipy/blob/master/spotipy/client.py#L152
Summary
Solution
Spotify has an API that can read the 50 most recent songs played by a user. We can do an oauth connection and read the user's recent plays and import them as listens.
The user profile page has a connect to spotify button which does an oauth workflow.
A separate script loops through all users who have performed this login flow and gets listens for each of them, converting them to listens and importing into LB.
Listens are imported starting from the user who last had their listens imported least recently.
Action
I tried a few new things with code layout, specifically a
domain
package that contains methods relevant to spotify, but disconnected from our database methods. This allows us to test database methods in db directly with the database and not worry about interaction, and methods that "do stuff", while letting us mock database access to run tests faster.I'm open to feedback on if this is a direction that we want to take with this code, but if it's to confusing we can rearrange it into a format closer to the existing code.
There are some tests missing, especially in the domain package. There are a number of comments in the
spotify_read_listens.py
script especially regarding error catching from Spotify about what we should do in particular failure cases. This isn't implemented yet.I don't know the preferred way of starting scripts in the LB production environment, especially regarding config file reading, etc. This will need to be updated.
We have to keep an eye on how long this operation will take to run. At only 50 songs per history for a user, we may find that with too many users that we start running behind. Perhaps we'll have to run multiple scripts. This is a future concern.