Completely rewrote shortcode parser. #2200

felixfontein · 2015-12-28T01:09:06Z

This PR fixes some shortcomings in the current shortcode parsers:

{{% a "%}}" %}} is parsed as expected;
better error messages are produced for various malformed situations;
no more infinite loops in cases such as ==> {{% <==;
strange constructs such as {{% a "b"c"d" %}} aren't allowed anymore.

…ppy.

Kwpolska · 2015-12-28T09:06:17Z

nikola/shortcodes.py

+
+
+def _format_position(data, pos):
+    """Return position formatted as line/column."""


Could use a note that this is for pretty error messages.

Kwpolska · 2015-12-28T09:12:17Z

nikola/shortcodes.py

-        args.append(cname)
-
-    return name, (args, kwargs)
+    empty_string = data[:0]  # same string type as data; to make Python 2 happy


I think '' will be enough in our case. By the way, this file didn’t have a from __future__ import unicode_literals import — I just fixed that.

This makes some test fail, since it always returns a unicode string. My final fix makes it return the same string type as the input was. But we could of course also fix the tests :)

Signed-off-by: Chris Warrick <kwpolska@gmail.com>

Kwpolska · 2015-12-28T09:27:59Z

What does non-trivial mean?
How fast is it for large inputs?

felixfontein · 2015-12-28T09:32:41Z

You can also replace non-trival with non-empty if you want.

About large inputs: since it uses str.find('{{%', pos) to find the start of the next shortcode, and only then starts going through the string character by character until the end of this shortcode, it should be pretty efficient. If you use no shortcodes, it should not be noticeably slower than the old code.

Signed-off-by: Chris Warrick <kwpolska@gmail.com>

Kwpolska · 2015-12-28T10:09:37Z

Would you mind adding tests (real tests, not doctests!) for all the things this parser fixes?

felixfontein · 2015-12-28T17:17:22Z

I found some more bugs, improved some error messages, and added proper tests.

Kwpolska · 2015-12-28T19:05:24Z

LGTM, anybody else?

ralsina · 2016-01-04T13:20:09Z

LGTM too. Merging.

Completely rewrote shortcode parser.

ralsina · 2016-01-04T13:21:53Z

@felixfontein how could we expand on this to make it handle both hugo shortcode cases?

{{% %}} contents of the shortcode are passed as-is
{{< >}} contents of the shortcode are passed through the markup compiler before replacing

/me is totally dumb about parsers and stuff. No computer-thingies college courses, see ;-)

Kwpolska · 2016-01-04T15:46:06Z

Let’s not do < >. It’s tricky, because some things in the foodchain might replace it with < and >. We don’t need full feature parity with Hugo.

ralsina · 2016-01-04T19:23:30Z

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.

—
Reply to this email directly or view it on GitHub
#2200 (comment).

Kwpolska · 2016-01-04T19:26:48Z

Is % really not enough?

On 4 January 2016 at 20:23, Roberto Alsina notifications@github.com wrote:

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.

—
Reply to this email directly or view it on GitHub
#2200 (comment).

—
Reply to this email directly or view it on GitHub
#2200 (comment).

Chris Warrick https://chriswarrick.com/
PGP: 5EAAEA16

felixfontein · 2016-01-04T20:29:00Z

We might already have some problems, if you write something along {{% a "b<c" %}} it might be that the argument will be b<c instead of b<c since shortcodes are applied to the transformed result.

It would be better to first replace the shortcodes with some unique IDs, then use markdown, and finally replace the markers by their transformed content. This would also make it trivial to implement support for {{< >}}: just apply these shortcodes before applying markdown.

ralsina · 2016-01-04T20:41:06Z

The reason for "shortcodes with markup inside them" is pretty good. Imagine you want to make a paragraph be a bootstrap warning panel, or whatever. You could do this:

{{< warning-panel >}}
Some markdown or rest here
{{< /warning-panel >}}

Requiring the content of the shortcode not be processed by the compiler makes this usecase suck.

Completely rewrote shortcode parser.

43f9ace

felixfontein mentioned this pull request Dec 28, 2015

Shortcodes #2193

Merged

felixfontein added 2 commits December 28, 2015 02:20

Making sure the resulting string type is unicode, to make Python 2 ha…

e19120a

…ppy.

Try 2.

d109043

Kwpolska reviewed Dec 28, 2015
View reviewed changes

Unicode literals in shortcodes

cb86619

Kwpolska reviewed Dec 28, 2015
View reviewed changes

Kwpolska added 2 commits December 28, 2015 10:17

Add file name to shortcode errors

e2b09e9

Signed-off-by: Chris Warrick <kwpolska@gmail.com>

Python 2.7 has to die

5ecacd5

Signed-off-by: Chris Warrick <kwpolska@gmail.com>

Filename is optional

f8cccda

Signed-off-by: Chris Warrick <kwpolska@gmail.com>

felixfontein added 6 commits December 28, 2015 17:49

Fixed off-by-one bug.

a936a0a

Improved docstring.

484498d

Improving and fixing error messages.

e984e7c

Allowing to raise exceptions instead of terminating with error message.

6253cbb

Improving error message.

e63412a

Adding more test cases.

52c64ec

felixfontein added 2 commits December 28, 2015 21:31

Improved wording.

93f464f

Updated test.

855f333

ralsina added a commit that referenced this pull request Jan 4, 2016

Merge pull request #2200 from getnikola/improving_shortcodes

7803f7a

Completely rewrote shortcode parser.

ralsina merged commit 7803f7a into master Jan 4, 2016

ralsina deleted the improving_shortcodes branch January 4, 2016 13:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Completely rewrote shortcode parser. #2200

Completely rewrote shortcode parser. #2200

felixfontein commented Dec 28, 2015

Kwpolska Dec 28, 2015

Kwpolska Dec 28, 2015

felixfontein Dec 28, 2015

Kwpolska commented Dec 28, 2015

felixfontein commented Dec 28, 2015

Kwpolska commented Dec 28, 2015

felixfontein commented Dec 28, 2015

Kwpolska commented Dec 28, 2015

ralsina commented Jan 4, 2016

ralsina commented Jan 4, 2016

Kwpolska commented Jan 4, 2016

ralsina commented Jan 4, 2016

Kwpolska commented Jan 4, 2016

felixfontein commented Jan 4, 2016

ralsina commented Jan 4, 2016



		def _format_position(data, pos):
		"""Return position formatted as line/column."""

Completely rewrote shortcode parser. #2200

Completely rewrote shortcode parser. #2200

Conversation

felixfontein commented Dec 28, 2015

Kwpolska Dec 28, 2015

Choose a reason for hiding this comment

Kwpolska Dec 28, 2015

Choose a reason for hiding this comment

felixfontein Dec 28, 2015

Choose a reason for hiding this comment

Kwpolska commented Dec 28, 2015

felixfontein commented Dec 28, 2015

Kwpolska commented Dec 28, 2015

felixfontein commented Dec 28, 2015

Kwpolska commented Dec 28, 2015

ralsina commented Jan 4, 2016

ralsina commented Jan 4, 2016

Kwpolska commented Jan 4, 2016

ralsina commented Jan 4, 2016

Kwpolska commented Jan 4, 2016

felixfontein commented Jan 4, 2016

ralsina commented Jan 4, 2016