-
Notifications
You must be signed in to change notification settings - Fork 460
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure link path to be URL encoded #2005
Conversation
@@ -41,7 +41,7 @@ | |||
|
|||
<%def name="open_graph_metadata(post)"> | |||
%if use_open_graph: | |||
<meta property="og:site_name" content="${blog_title|striphtml}"> | |||
<meta property="og:site_name" content="${blog_title|striphtml,h}"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is ,h
really necessary alongside striphtml
and is it valid syntax?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. It is valid syntax. http://docs.makotemplates.org/en/latest/filtering.html
striphtml call markupsafe.Markup.striptags().
https://github.com/mitsuhiko/markupsafe/blob/master/markupsafe/__init__.py#L148
This method only remove |<[^>]*> blocks and unescape.
Is URL encoding really needed? In 2015, where UTF-8 is the only encoding that exists, and we explicitly disallow |
At least as far as I know, In Japanese environment with UTF-8, URL encoding surely has been used. |
It should just work without encoding (or it should be handled by someone else), are you sure this is necessary? Please test using a Japanese link in Chrome and Firefox, with slugify disabled and without your patch. |
ea1b36f
to
e5da7ae
Compare
016ca61
to
c63bdef
Compare
c63bdef
to
019896d
Compare
I made it configurable. Please check this again. |
I really don’t know if we should merge this, especially with the switch… @Aeyoun: what’s your opinion? Should this go in the core, or should this be something users would just do themselves (in custom templates)? |
Adding All URLs in Nikola are already compiling with “RFC-3986 standard for URIs, the RFC-3987 standard for IRIs, and the XML standard.” (sitemap FAQ). Nikola uses the same URL fromat in the sitemap as well as for link canonicalization so there is no confusion in the terms of search engines and duplicate content. So the SEO argument is something I wouldn’t worry about at all in this case. (And I’m usually the SEO worrying type.) We’re already following best-practices. I’m not sure if any particular servers would do better with one format over the other. This is the only possible argument for changing this. This would have ether be a problem everyone or no one, so there should be no need for an option to optionally break URLs. From the attached screenshot, everything part of the browser (and Nikola) seem to be doing the correct thing. In summary, besides |
TL;DR; unless we need to change URLs because of either RFC-3986 or RFC-3987, we shouldn’t change a thing with the URLs. |
Please refer to https://support.google.com/webmasters/answer/35653?hl=en |
That article covers the same RFCs as I referred to. |
Maybe I’m misunderstand the issues here. I believe we already do use RFC-3986 / RFC-3987 (plus XML reserved-characters escaping) everywhere. If this is not the case, than things are broken and should be fixed and made non-optional. Updating these links should be done internally and not require any changes to any templates. @masayuko, is Nikola not meeting RFC-3986 / RFC-3987 everywhere? nowhere? somewhere? what about sitemaps? atom/rss? If not, the link utilities that produced the links should be considered broken and need fixing. The link issue should not require template changes at all as Nikola should already be producing standard links. We could add a new method |
The attached screenshot means the follow process is occured: I hope the below part:
|
@Aeyoun At least, I hope sitemap, RSS/Atom, link tag for canonical is encoded with URL encoding. They have something to do with search engines. |
The following method should produce correctly encoded links for anything that Nikola produces. (Limited to netloc and paths, because that’s all that Nikola controls.) I’ll make a new branch, stick this in Opinions? @Kwpolska? @masayuko?
Some examples, all producing the expected result. Why a method fulfilling this purpose isn’t already provided by
|
It's great and thank you for understanding. I agree. |
This PR allow to use characters not permitted in URL. For example, in the case of tag, category and author not sluged appear in a generated path.