Allowing to write HTML fragments instead of only whole documents. #2546

felixfontein · 2016-10-23T18:12:44Z

Test implementation for the dicussion at the bottom of #2544.

Kwpolska · 2016-10-23T18:16:22Z

CHANGES.txt

@@ -16,6 +16,8 @@ Features
  translated (Issue #2116)
 * Pass ``post`` object and ``lang`` to post compilers (Issue #2531)
 * Pass ``url_type`` into template's context.
+* ``render_template`` and ``generic_renderer`` can now create HTML


PS. new changes go at the top.

Should I also move the url_type part, then?

Kwpolska · 2016-10-23T18:22:06Z

nikola/nikola.py

+            doc = lxml.html.document_fromstring(data, parser)
+        self.rewrite_links(doc, src, context['lang'], url_type, is_fragment=is_fragment)
+        if is_fragment:
+            data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])


TIL lxml has awful trees.

How would the solution that adds the extra div tag look? I don’t think we would really mind one little <div> there. Or worst case scenario, if we know the <div> is guaranteed to always be there, we could slice it out.

The extra <div></div> is produced by lxml.html.tostring(doc, encoding='utf-8', method='html').

I don't really like cutting stuff out via text replacements.

Kwpolska · 2016-10-23T18:23:12Z

nikola/nikola.py

        """Replace links in document to point to the right places."""
        # First let lxml replace most of them
        doc.rewrite_links(lambda dst: self.url_replacer(src, dst, lang, url_type), resolve_base_href=False)

        # lxml ignores srcset in img and source elements, so do that by hand
-        objs = list(doc.xpath('(*//img|*//source)'))
+        objs = list(doc.xpath('({0}//img|{0}//source)'.format('' if is_fragment else '*')))


Please rewrite this as a 4-line if/else tree instead of abusing str.format. We should also find out what that asterisk changes (can non-fragment code work without it? Does it reach all <img> and <source> elements if the asterisk is not present?

It doesn't reach top-level <img> tags with the asterisk there. I think (I don't know the xpath syntax well enough to be sure) that the asterisk eats up a top level element, and then it looks for children (or children-of-children etc.) of that top element.

What about no asterisk and full document? ~~(Please rewrite in 4 lines anyway)~~ done

I think that should work as well, since there are no top-level <img> or <source> elements which have to be handled differently. I've changed it accordingly, which removes the whole if/else.

(The original syntax was introduced in 24a45e1.)

ralsina · 2016-10-24T13:45:34Z

nikola/nikola.py

        self.rewrite_links(doc, src, context['lang'], url_type)
-        data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc, encoding='utf8', method='html', pretty_print=True)
+        if is_fragment:
+            data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])


I don't understand what this does. Doesn't it end up producing 2 copies of the doc's text?

doc.text is all the text in doc that appears before a HTML element. Text that appears after HTML elements is taken care of by tostring(). (lxml is weird.)

I added some comments with a similar explanation, hope it makes it clearer.

ralsina · 2016-10-25T18:41:01Z

Ok then!

El lun., 24 oct. 2016 16:17, Chris Warrick notifications@github.com
escribió:

@Kwpolska commented on this pull request.

In nikola/nikola.py #2546:
     self.rewrite_links(doc, src, context['lang'], url_type)
   data = b'<!DOCTYPE html>\n' + lxml.html.tostring(doc, encoding='utf8', method='html', pretty_print=True)
   if is_fragment:
       data = (doc.text or '').encode('utf-8') + ''.encode('utf-8').join([lxml.html.tostring(child, encoding='utf-8', method='html') for child in doc.iterchildren()])
doc.text is all the text in doc that appears before a HTML element. Text
that appears after HTML elements is taken care of by tostring(). (lxml is
weird.)

—
You are receiving this because you commented.

Reply to this email directly, view it on GitHub
#2546, or mute the thread
https://github.com/notifications/unsubscribe-auth/AAAGKxTwjY0Lt6_yVDjuoJ4sIwRpFP0Tks5q3L3egaJpZM4KeM0s
.

felixfontein · 2016-10-25T19:05:25Z

One more thing I had to try out after seeing that lxml.html.tostring has a doctype argument. :)

felixfontein · 2016-10-25T19:17:08Z

The one test fails because there were some connection problems to pypi while installing doit. All other tests had no problems.

Kwpolska · 2016-10-26T14:21:35Z

Those tests are pretty worthless anyways.

felixfontein · 2016-10-26T16:06:47Z

Ok. And thanks for the approval. Anyone mind if I merge?

Kwpolska · 2016-10-26T16:08:06Z

Go ahead.

Allowing to write HTML fragments instead of only whole documents.

a1eb111

felixfontein added the question label Oct 23, 2016

Stupid mistake.

9fbb5bf

Kwpolska reviewed Oct 23, 2016

View reviewed changes

felixfontein added 2 commits October 23, 2016 20:27

Improvements.

0280de3

Simplifying.

8785681

ralsina reviewed Oct 24, 2016

View reviewed changes

Improving documentation.

9a2551d

Using lxml.html.tostring's doctype argument.

3a3cd7c

Kwpolska approved these changes Oct 26, 2016

View reviewed changes

felixfontein merged commit 9a09d5d into master Oct 26, 2016

felixfontein deleted the allow-html-fragment-output branch October 26, 2016 17:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allowing to write HTML fragments instead of only whole documents. #2546

Allowing to write HTML fragments instead of only whole documents. #2546

felixfontein commented Oct 23, 2016

Kwpolska Oct 23, 2016

felixfontein Oct 23, 2016

Kwpolska Oct 23, 2016 •

edited by felixfontein

felixfontein Oct 23, 2016

Kwpolska Oct 23, 2016

felixfontein Oct 23, 2016

Kwpolska Oct 23, 2016 •

edited

felixfontein Oct 23, 2016

felixfontein Oct 23, 2016

ralsina Oct 24, 2016

Kwpolska Oct 24, 2016

felixfontein Oct 25, 2016

ralsina commented Oct 25, 2016

@Kwpolska commented on this pull request.

felixfontein commented Oct 25, 2016

felixfontein commented Oct 25, 2016

Kwpolska commented Oct 26, 2016

felixfontein commented Oct 26, 2016

Kwpolska commented Oct 26, 2016

Allowing to write HTML fragments instead of only whole documents. #2546

Allowing to write HTML fragments instead of only whole documents. #2546

Conversation

felixfontein commented Oct 23, 2016

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kwpolska Oct 23, 2016 • edited by felixfontein

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Kwpolska Oct 23, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ralsina commented Oct 25, 2016

@Kwpolska commented on this pull request.

felixfontein commented Oct 25, 2016

felixfontein commented Oct 25, 2016

Kwpolska commented Oct 26, 2016

felixfontein commented Oct 26, 2016

Kwpolska commented Oct 26, 2016

Kwpolska Oct 23, 2016 •

edited by felixfontein

Kwpolska Oct 23, 2016 •

edited