Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add artist count to user profile page #173

Closed
wants to merge 1 commit into from

Conversation

phinjensen
Copy link
Contributor

This adds a total artist count by getting a count of distinct artist ids from the listenstore. To avoid duplicating logic, I created a helper function which references a dictionary of constants because fetching listen count and artist count is very similar.

@mayhem
Copy link
Member

mayhem commented May 19, 2017

Hi and thanks for this PR out of the blue, I really appreciate that!

However, during my recent testing, I've found that counting actual listens doesn't scale very well. If a user has 1M listens (and we have users with such last.fm profiles) counting the listens has a detrimental impact on the influx server.

This is why I left that feature blank and started working on a new branch: listen-counts-using-influx

The basic idea is that:

  1. Each time the listenstore saves a batch of points, the total of the points entered is also written to another measurement with a 7 day retention policy (meaning that these figures get tossed out after 7 days). ( https://github.com/metabrainz/listenbrainz-server/blob/listen-counts-using-influx/listenstore/influx_listenstore.py#L195 )

  2. Periodically (every 5000 listens currently) the counts are summed up with the last total value into a new updated total value. ( https://github.com/metabrainz/listenbrainz-server/blob/listen-counts-using-influx/listenstore/influx_listenstore.py#L219 ) This approach has the potential drawback that the sums and total values may get lost if this process doesn't run for 7 days. I'm still debating if this is a good idea or not.

Currently the totals are written into the same 7 day retention policy, but I am considering writing the totals into measurement with a permanent retention policy that gets downsampled to day or week accuracy after a while so that it can serve as a timeline of database growth as well.

The upshot of this approach is that the total values are always available via a simple O(1) call, rather than an O(n) call. This is a typical BigData trade-off -- store more stuff in exchange for querying it less often.

I would love it if you could have a look at my branch and give me your thoughts. I'll soon finish the branch, fix the one failing test and put up my own PR.

@mayhem
Copy link
Member

mayhem commented Jun 6, 2017

My PR is up now:

#191

That is the right way to count anything -- listens or artists. If you feel like re-doing this PR with that approach, that would be great! In the meantime, I'm going to close this.

Thanks!

@mayhem mayhem closed this Jun 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants