Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely rewrote shortcode parser. #2200

Merged
merged 15 commits into from Jan 4, 2016
Merged

Completely rewrote shortcode parser. #2200

merged 15 commits into from Jan 4, 2016

Conversation

felixfontein
Copy link
Contributor

This PR fixes some shortcomings in the current shortcode parsers:

  • {{% a "%}}" %}} is parsed as expected;
  • better error messages are produced for various malformed situations;
  • no more infinite loops in cases such as ==> {{% <==;
  • strange constructs such as {{% a "b"c"d" %}} aren't allowed anymore.

@felixfontein felixfontein mentioned this pull request Dec 28, 2015


def _format_position(data, pos):
"""Return position formatted as line/column."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could use a note that this is for pretty error messages.

args.append(cname)

return name, (args, kwargs)
empty_string = data[:0] # same string type as data; to make Python 2 happy
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think '' will be enough in our case. By the way, this file didn’t have a from __future__ import unicode_literals import — I just fixed that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes some test fail, since it always returns a unicode string. My final fix makes it return the same string type as the input was. But we could of course also fix the tests :)

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

  • What does non-trivial mean?
  • How fast is it for large inputs?

@felixfontein
Copy link
Contributor Author

You can also replace non-trival with non-empty if you want.

About large inputs: since it uses str.find('{{%', pos) to find the start of the next shortcode, and only then starts going through the string character by character until the end of this shortcode, it should be pretty efficient. If you use no shortcodes, it should not be noticeably slower than the old code.

Signed-off-by: Chris Warrick <kwpolska@gmail.com>
@Kwpolska
Copy link
Member

Would you mind adding tests (real tests, not doctests!) for all the things this parser fixes?

@felixfontein
Copy link
Contributor Author

I found some more bugs, improved some error messages, and added proper tests.

@Kwpolska
Copy link
Member

LGTM, anybody else?

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

LGTM too. Merging.

ralsina added a commit that referenced this pull request Jan 4, 2016
Completely rewrote shortcode parser.
@ralsina ralsina merged commit 7803f7a into master Jan 4, 2016
@ralsina ralsina deleted the improving_shortcodes branch January 4, 2016 13:20
@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

@felixfontein how could we expand on this to make it handle both hugo shortcode cases?

  1. {{% %}} contents of the shortcode are passed as-is
  2. {{< >}} contents of the shortcode are passed through the markup compiler before replacing

/me is totally dumb about parsers and stuff. No computer-thingies college courses, see ;-)

@Kwpolska
Copy link
Member

Kwpolska commented Jan 4, 2016

Let’s not do < >. It’s tricky, because some things in the foodchain might replace it with &lt; and &gt;. We don’t need full feature parity with Hugo.

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.


Reply to this email directly or view it on GitHub
#2200 (comment).

@Kwpolska
Copy link
Member

Kwpolska commented Jan 4, 2016

Is % really not enough?

On 4 January 2016 at 20:23, Roberto Alsina notifications@github.com wrote:

We could use a different marker.

On Mon, Jan 4, 2016 at 12:46 PM Chris Warrick notifications@github.com
wrote:

Let’s not do < >. It’s tricky, because some things in the foodchain might
replace it with < and >. We don’t need full feature parity with
Hugo.


Reply to this email directly or view it on GitHub
#2200 (comment).


Reply to this email directly or view it on GitHub
#2200 (comment).

Chris Warrick https://chriswarrick.com/
PGP: 5EAAEA16

@felixfontein
Copy link
Contributor Author

We might already have some problems, if you write something along {{% a "b<c" %}} it might be that the argument will be b&lt;c instead of b<c since shortcodes are applied to the transformed result.

It would be better to first replace the shortcodes with some unique IDs, then use markdown, and finally replace the markers by their transformed content. This would also make it trivial to implement support for {{< >}}: just apply these shortcodes before applying markdown.

@ralsina
Copy link
Member

ralsina commented Jan 4, 2016

The reason for "shortcodes with markup inside them" is pretty good. Imagine you want to make a paragraph be a bootstrap warning panel, or whatever. You could do this:

{{< warning-panel >}}
Some markdown or rest here
{{< /warning-panel >}}

Requiring the content of the shortcode not be processed by the compiler makes this usecase suck.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants