Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LB-113: Update robots.txt for Scrobble API endpoint #106

Merged
merged 1 commit into from Jan 23, 2017
Merged

LB-113: Update robots.txt for Scrobble API endpoint #106

merged 1 commit into from Jan 23, 2017

Conversation

dpmittal
Copy link
Contributor

Changed the robots.txt file to block path to new Lastfm API and Auth pages
Issue Link: https://tickets.metabrainz.org/browse/LB-113

@@ -2,3 +2,5 @@ User-agent: *
Disallow: /current-status
Disallow: /user/
Disallow: /1/
Disallow: /api
Disallow: /*lastfm
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this line needed?

@pinkeshbadjatiya
Copy link
Contributor

You also missed the URL for 2.0 endpoint as well. Please update it.

@dpmittal
Copy link
Contributor Author

@pinkeshbadjatiya Made the changes

@@ -2,3 +2,5 @@ User-agent: *
Disallow: /current-status
Disallow: /user/
Disallow: /1/
Disallow: /api
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason for using /api here and not /api/? (I'm a bit rusty on my robots.txt syntax.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a file /api.py which needs to be blocked for bots . /api means every file will be blocked with '/api' where '/api/' will block /api/ folder (which is not present in LB) not a file :)

Copy link
Contributor

@gentlecat gentlecat Dec 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are blocking URLs, not files or folders. It is better to have a trailing slash everywhere by default unless you are intentionally blocking prefixes like /apistuff.

Copy link
Contributor Author

@dpmittal dpmittal Dec 31, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I didn't know about robots.txt when I googled about it it said /filename /foldername/ so i thought robots.txt worked like that but now i came to know that it blocks urls... I have changed the file now. Thanks :D

@pinkeshbadjatiya
Copy link
Contributor

This looks good to me.

Copy link
Member

@mayhem mayhem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Gentlecat's last comment -- use trailing backslashes for directories. Then block specific scripts with explicit lines. Make this fix and this should be good to go.

@dpmittal
Copy link
Contributor Author

@mayhem @gentlecat I have made the changes :)

@alastair alastair merged commit 574d858 into metabrainz:master Jan 23, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
6 participants