New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Header/id deduplication #2763
Header/id deduplication #2763
Conversation
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
nikola/filters.py
Outdated
# Results are ordered the same way they are ordered in document | ||
offending_elements = doc.xpath('//*[@id="{}"]'.format(i)) | ||
counter = 2 | ||
for e in offending_elements[1::-1]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be offending_elements[::-1]
without the first 1
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice counter = 2
? There will be foo
, foo-2
, foo-3
…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this only enumerates elements 1 and 0 of the list, not any other element. ['a', 'b', 'c', 'd', 'e'][1::-1] == ['b', 'a']
.
You probably want offending_elements[-2::-1]
.
nikola/filters.py
Outdated
offending_elements = doc.xpath('//*[@id="{}"]'.format(i)) | ||
counter = 2 | ||
for e in offending_elements[1::-1]: | ||
new_id = '{0}-{1}'.format(i, counter) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if this is ID is in use as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will fix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But trying until you find a free one will be a problem if you want to make permalinks (as for the Sphinx permalinks filter). Assume the two oldest entries have headers with IDs a
and a
. If there's nothing else around, one will get a
and the other a-2
. But now assume a new post is added which uses a-2
(for whatever reason). Then the old a-2
will end up as a-3
, because a-2
is already taken, and links pointing to a
in the second oldest post are suddenly pointing to the wrong place.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is one of those cases which we can’t reliably fix, at least as a filter. To make it work in every scenario, we’d need to add post names to the IDs, or otherwise do unusual stuff. The thing is, most people are unlikely to use those permalinks on indexes, and this plugin’s aims are (a) to fix HTML validation issues on indexes, (b) to fix IDs for Sphinx permalinks and other uses clashing, on a single page. You can’t protect against changing permalinks if those are maintained by code, not humans, whilst allowing said humans to edit the page contents (an edit to a post/page could trigger a deduplication)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. You'd need state to track the permalinks. We should mention that somewhere in the documenation, though. Otherwise we'll sooner or later get bug reports for that :)
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
h/t @felixfontein Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
I updated the documentation and used a different approach: going bottom→top in indexes, and top→bottom in posts/pages. This way, we can get a sensible ordering for both use cases. |
@Kwpolska I resolved the conflicts, hope I did not mess it, feel free to push merge |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM except... :)
nikola/filters.py
Outdated
off = offending_elements[-2::-1] | ||
for e in off: | ||
new_id = i | ||
while doc.xpath('//*[@id="{}"]'.format(new_id)): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not use (and later update) seen_ids
instead of hoping that the query runs as fast as possible?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense, fixed.
nikola/filters.py
Outdated
# Find headerlinks that we can fix. | ||
headerlinks = e.find_class('headerlink') | ||
for hl in headerlinks: | ||
# We might get headerlinks of child elements |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If one of the header links belongs to a child element with the same ID, you change the link to something wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you suggest to fix that? A break
?
h/t @felixfontein Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@felixfontein, please explain, and perhaps suggest a solution to:
|
Maybe something like:
(And there's another
because the To avoid this, you probably have to check if there's another |
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
It sounds as if doing a |
In the case I sketched above it won't work, because the |
Please propose a better solution then? I’m inclined to say “edge case not worth fixing”. |
Is there an efficient way to find the first parent object of |
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
I’m afraid this might not be efficient enough, and that would be needed only for an edge case. This happening in real input is very unlikely. (reST won’t ever generate it, for example) |
Just manually walking up the DOM tree shouldn't be very slow. Usually DOM trees aren't excessively deep that you really get a performance hit. And since this search has only to be done for very few elements, it shouldn't really matter. And yes, while this is an edge case, it is a bug and really shitty for a user if this happens. And if we ever fix this, it might break some existing links where someone might have worked around this bug. In my opinion, we should get this filter into a form that should never require any changes, because every change might break existing links, which are supposed to be permalinks. But then, I probably won't be using it (in particular the headerlink feature), so I won't really mind if you decide to ignore this edge case :) Edit: I just noticed that this only screws up the header links, but not the IDs themselves. So this won't destroy permalinks later on, it will just generate incorrect header links. So I'm really fine with leaving this as-is. On the other hand, walking up the DOM tree isn't that hard and inefficient either. |
Ok, just thought of one reason where this can happen more realistically in real life: if someone decides to name a subsection the same as a section. Some people tend to do that. And if you don't use a compiler which prevents colliding IDs, you have a problem. Edit: ah, I forgot about the |
Thanks, merged. Will rebuild getnikola.com to use this in a second. |
Fixes #2570. Also contains a doc fix for #2743, because I didn’t notice which branch I was committing to.
Should I bother with
<a name="">
as well?cc @wichtounet