Skip to content

Commit

Permalink
Combat link decay with Nikola, check redirects
Browse files Browse the repository at this point in the history
Recommend updating permanent redirects to their new location. Warn
about temporary redirects. Follow the redirects and check on their
targets too.
  • Loading branch information
da2x committed Aug 18, 2015
1 parent e36d230 commit beeacb0
Show file tree
Hide file tree
Showing 2 changed files with 17 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGES.txt
Expand Up @@ -4,6 +4,7 @@ New in master
Features
--------

* Checking remote links also checks redirects (nikola check -lr)
* Update suggested license to its latest version (Issue #1950)
* Add Punjabi language, by Jasdeep Singh (Issue #1940)
* New option to use custom, and several ``TEASER_END`` values
Expand Down
17 changes: 16 additions & 1 deletion nikola/plugins/command/check.py
Expand Up @@ -230,12 +230,27 @@ def analyze(self, fname, find_sources=False, check_remote=False):
# Check the remote link works
req_headers = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0 (Nikola)'} # I’m a real boy!
resp = requests.head(target, headers=req_headers)

# Retry client errors (4xx) as GET requests
if resp.status_code >= 400 and resp.status_code <= 499:
time.sleep(0.5)
resp = requests.get(target, headers=req_headers)

self.checked_remote_targets[target] = resp.status_code
# Follow redirects and see where they lead, redirects to errors will be reported twice
if resp.status_code in [301, 302, 307, 308]:
redir_status_code = resp.status_code
time.sleep(0.5)
resp = requests.head(target, headers=req_headers, allow_redirects=True)
# Permanent redirects should be updated
if redir_status_code in [301, 308]:
self.logger.error("Remote link moved PERMANENTLY to \"{0}\" and should be updated in {1}: {2} [HTTP: {3}]".format(resp.url, filename, target, redir_status_code))
if redir_status_code in [302, 307]:
self.logger.warn("Remote link temporarily redirected to \"{0}\" in {1}: {2} [HTTP: {3}]".format(resp.url, filename, target, redir_status_code))
self.checked_remote_targets[resp.url] = resp.status_code
self.checked_remote_targets[target] = redir_status_code
else:
self.checked_remote_targets[target] = resp.status_code

if resp.status_code > 399: # Error
self.logger.warn("Broken link in {0}: {1} [Error {2}]".format(filename, target, resp.status_code))
continue
Expand Down

0 comments on commit beeacb0

Please sign in to comment.