Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sphinx-style permalinks filter #2743

Merged
merged 6 commits into from May 14, 2017
Merged

Sphinx-style permalinks filter #2743

merged 6 commits into from May 14, 2017

Conversation

ralsina
Copy link
Member

@ralsina ralsina commented May 4, 2017

Pull Request Checklist

  • I’ve read the guidelines for contributing.
  • I updated AUTHORS.txt and CHANGES.txt (if the change is non-trivial) and documentation (if applicable).
  • I tested my changes.

Description

A filter implementing Sphinx-style header links. Bonus: it works for any compiler, since it's done as HTML post-processing.

@felixfontein
Copy link
Contributor

Permalinks are not supposed to change every time a HTML page is regenerated, right? But isn't that what's happening here because uuid.uuid4() always generates a new value which was never generated before?

for h in ['h1', 'h2', 'h3', 'h4']:
nodes = doc.findall('*//%s' % h)
for node in nodes:
new_node = lxml.html.fragment_fromstring('<a id="{0}" href="#{0}" class="headerlink" title="Permalink to this headline">&nbsp;&pi;</a>'.format(uuid.uuid4()))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re doing UUIDs too much. I would:

  • check if the header has an id already, and use that
  • otherwise, create a slugified form of the header and use that, watching out for conflicts within the same page and appending numbers (example for Django)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what @felixfontein said about permalinks being, um, permanent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, I don't care about this enough to do the extra work.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can take over if you want.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kwpolska that would be great, thanks!

@ralsina
Copy link
Member Author

ralsina commented May 5, 2017

@felixfontein that is a very good point. So, the id generation here is useless.

@ralsina
Copy link
Member Author

ralsina commented May 8, 2017

LGTM with @Kwpolska's latest fix

@Kwpolska
Copy link
Member

Kwpolska commented May 8, 2017

I’m not done yet, I want to at least implement the blacklist (or perhaps whitelist)

@felixfontein
Copy link
Contributor

Since the filter cannot/doesn't have persistent state, the link generation algorithm cannot be changed anymore after merging this (or at least from the next release on). Otherwise this filter doesn't generate permalinks, but code-dependent links which will change as soon as this code is touched in a non-trivial way! So we really shouldn't merge this one too early!

@Kwpolska
Copy link
Member

Kwpolska commented May 9, 2017

The link generation algorithm I have in mind will be pretty stable, so no worries. (The links being permanent also depend on user actions)

@Kwpolska
Copy link
Member

@ralsina, @felixfontein: this is ready for review. Header/id deduplication will be implemented in another PR, because it needs to be done for all IDs, not only the few headers we touch here.

Copy link
Contributor

@felixfontein felixfontein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but kind of incomplete without that extra deduplication PR. Also, how permanent these permalinks will be depends a lot on the deduplication algorithm and what kind of page this is. (If it is an index where newer posts will appear before older ones, this will be a big problem.)

xpath_list = ['*//div[@class="e-content entry-content"]//{hx}']
for xpath_expr in xpath_list:
for hx in ['h1', 'h2', 'h3', 'h4', 'h5', 'h6']:
nodes = doc.findall(xpath_expr.format(hx=hx))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case xpath_expr does not contain hx, wouldn't this add a permalink six times for every matching node?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Maybe first put the formatted expressions into a set, then convert the set to a sorted list, and iterate over it. That should fix this problem.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will. This sounds like an edge case, but are you aware of any good solutions for this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made it a set. There’s still the case of the same header being returned by multiple expressions, should I also work around that?

Roberto Alsina and others added 6 commits May 14, 2017 18:18
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
This prevents permalinks being created where they shouldn’t be, eg.
post or site titles.

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

LGTM, but kind of incomplete without that extra deduplication PR.

The code is waiting on the deduplicate-headers branch until this lands.

Also, how permanent these permalinks will be depends a lot on the deduplication algorithm and what kind of page this is. (If it is an index where newer posts will appear before older ones, this will be a big problem.)

The deduplication algorithm will run primarily on indexes. We can discuss whether or not it should be changing links from the bottom-up in that PR.

@felixfontein
Copy link
Contributor

Bottom-up would be a good idea, I think. This will at least make the links permanent when INDEXES_STATIC is True and you're not on the front page. If INDEXES_STATIC is False, all hope is lost anyway...

@ralsina
Copy link
Member Author

ralsina commented May 14, 2017

+1 to bottom-up

@Kwpolska
Copy link
Member

@ralsina: review welcome.

@felixfontein: alright, done on header-deduplication branch.

Copy link
Member Author

@ralsina ralsina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM other than the typo in the doc. I can't officially +1 because it's my PR :-)

.headerlink { opacity: 0.1; margin-left: 0.2em; }
.headerlink:hover { opacity: 1; text-decoration: none; }

Additionally, you can provide a custom list of XPath expressions which should be used for finding headers (``{hx}}`` is replaced by headers h1 through h6).
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unbalanced braces

@Kwpolska
Copy link
Member

Merging, thanks for the reviews.

@Kwpolska Kwpolska merged commit 3270735 into master May 14, 2017
@Kwpolska Kwpolska deleted the add-title-permalinks branch May 14, 2017 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants