New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use encodelink() everywhere #2037
Conversation
OK, after 3 hours I’ve concluded that py27 support for Would it be okay ti throw an error and die if a user tries using |
Everything is awesome. Should work*. Does it work for you, @masayuko? * OSX may still explode because of it’s unique NFD preference, seems to work fine, but really it shouldn’t have. |
We wouldn’t mind not supporting Python 2 here. |
This is ready to merge with py27 and py3 support. Someone want to sanity check it? @Kwpolska? @masayuko? |
Sorry for late reply. I am checking it now. I'll report back later. |
OK. I have just tested in several ways. It works for me.
in nikola/plugins/task/sitemap/init.py. |
NFKC is destructive where as NFC is not. NFKC offers no advantages here as far as I know. Emailing discussions on W3C’s list recommended NFC over NFKC, so I’m sticking with that. (I couldn’t be bothered to dig up the link again.) NFD and NFKD is out of the question because they can’t be used reliably to produce the same output. Besides, everyone but Apple is using NFC everywhere for everything. Regarding the issue with periods in the end of paths, could you make a separate issue about that? |
The netloc should follow nameprep RFC3491, RFC3454. so prefer NFKC. ハ (U+FF8A) half-width パ (U+FF8A U+FF9F) half-width and two code points On the other hand, the path get used as the location name on a filesystem. so prefer NFC. |
I have just checked myself.
NFC is no problem. Please forget the above said. |
Replaces the pull request suggested in #2005.
utils.encodelink(str)
takes a string and NFC normalizes it, encodes the netloc to IDN punycode, and encodes the path to percent-encoding (RFC-3986 / RFC-3987). Schema, query, and fragment components of the address are also NFC normalized but are otherwise untouched. The function can be called multiple times non-destructively.@masayuko, would you please test this? Set
USE_SLUGIFY = False
to support tags and categories. No other option required. Please also testnikola check -l
(Nikola’s link checker.)