Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Whitespace control inconsistent with Handlebars.js #113

Closed
imsedim opened this issue May 10, 2016 · 15 comments
Closed

Whitespace control inconsistent with Handlebars.js #113

imsedim opened this issue May 10, 2016 · 15 comments
Labels

Comments

@imsedim
Copy link
Contributor

imsedim commented May 10, 2016

  1. Using "~" after opening braces strips the whitespace both before and after the mustache statement, while it should leave whitespace after mustache intact.

  2. HandlebarsDotNet does not remove the whitespace around standalone helpers as it does handlebars.js:

default behavior [is] stripping lines that are "standalone" helpers (only a block helper, comment, or partial and whitespace)
(from here)

I made two tests to indicate these two issues:
https://dotnetfiddle.net/wtzQtB

And for comparison, the same examples made with handlebars.js:
https://jsfiddle.net/gk145cvh/

Sure these are pretty minor issues in this otherwise outstanding port, stuck upon them when one of my unit tests failed.

@rexm
Copy link
Member

rexm commented May 12, 2016

Found the bug in the lexer. Planning to include a fix on the next release.

@rexm rexm added the bug label May 12, 2016
@rexm
Copy link
Member

rexm commented May 18, 2016

Pushed 1.6.8 which addresses the ~ whitespace control bug.

The other whitespace control deviations from handlebarsjs (block helpers, comments, partials) can be added in the future. The high-level solution for these is easy to understand, but the implementation may be a little tricky. PR always welcome.

Details:
The core lexer only understands a few key control characters: {, }, ~, ( and ). It delegates other characters to specific types of parsers. When the lexer parses a }}, it checks to see if the previous character it saw was a ~ to know whether the EndExpressionToken should trim trailing whitespace. In the case where it would see {{~word}} a sub-parser would handle word, so the last core character it saw was indeed ~, and marked the EndExpressionToken as needing to trim trailing whitespace. Interestingly, this bug would produce the effect where {{~word}} would not behave as expected, but {{~word }} would behave as expected, since the extra space before the }} would give the lexer an extra step to find its bearings, so to speak.

We needed to improve the lexer so that the last-seen pre-word character would not overlap with the last-seen post-word character. That was a minor change to the lexer to make the last-seen pre-word character to be the first character of the word itself.

@imsedim
Copy link
Contributor Author

imsedim commented May 19, 2016

Great news, thanks for the update.

As for PR for the rest - could you please give some hint / starting point?

@rexm
Copy link
Member

rexm commented May 19, 2016

Sure... bit of background: the parser works in 3 phases:
The Lexer scans each character in the template and turns it into a sequence of tokens. For example, Hello {{#each person}}{{this}}{{/each}}! would produce the following token sequence:

  • Static token ("Hello ")
  • StartExpression token
  • Word token ("#each")
  • Word token ("person")
  • EndExpression token
  • StartExpression token
  • Word token ("this")
  • EndExpression token
  • StartExpression token
  • Word token ("/each")
  • EndExpression token
  • Static token ("!")

The next phase is conversion, where we convert the token sequence into an abstract syntax tree. Each converter in the conversion process is responsible for looping once over the token sequence and searching for a specific thing and replacing it in the sequence, until all tokens have been replaced. For example, the block helper converter would search through the token list above, and change it to this (roughly):

  • Static token ("Hello ") (no change)
  • Block helper expression
    • Name ("each")
    • Arguments
      • "person"
    • Sub-template
      • "this"
  • Static token ("!") (no change)

The final phase is compilation, converting the abstract syntax tree to instructions that mean something to .NET. You're probably interested in the second phase, conversion.

So what you'd want to do is probably add a new set of converters, each to handle the whitespace for a specific scenario. So in the example of removing whitespace before and after block helpers, you'd want to run it after the block helper converter, so you can search for a Static token, hold it in a variable, if the next element is a BlockHelper expression, you can yield back a modified version of the Static token with the trailing whitespace removed. Then see if the next element after that is another Static token, remove the leading whitespace, and yield it.

Each converter in the converters folder is an example of this process.

@imsedim
Copy link
Contributor Author

imsedim commented May 19, 2016

Could introducing the separate newline token be beneficial? On one hand, it means making changes in the two subsystems (lexer and converter) instead of one, which means more possible bugs. But on the other hand it might reduce the pain (well, or pleasure) of implementing this new suggested converter.

@rexm
Copy link
Member

rexm commented May 19, 2016

Could be. That might make the existing whitespace control handlers more complicated. Keen to find out what you learn!

@imsedim
Copy link
Contributor Author

imsedim commented May 27, 2016

Just as you said, it got a little bit tricky, so I'd need your feedback about one change I'm about to make about comments.

Comments are stripped out too early during conversion. I need my new whitespace converter to know about comments and I would also like to take advantage of ExpressionScopeConverter to simplify this converter. My suggestion: introduce CommentStatement and strip it out after whitespace converter, near the very end of conversion pipeline.

There are also layout tokens out there, but since they are not part of handlebars / mustache spec, I can either consider convert them to CommentStatement (since they resemble comments), or just leave them be without whitespace conversions.

@imsedim
Copy link
Contributor Author

imsedim commented May 27, 2016

Actually, maybe I should not even delete comment expressions, if there is an intention to make AST public.

@rexm
Copy link
Member

rexm commented May 27, 2016

Using a comment statement and moving the handling sounds reasonable, though I'm curious if you tried it yet - the further down the chain you move a processor, you have to consider not only searching the sequence of tokens but also searching already-converted expressions which may have trees of descendant expressions; and inversely you have to ensure each of those preceding converters won't choke when they now see a new kind of unconverted token they never saw before.

@imsedim
Copy link
Contributor Author

imsedim commented May 27, 2016

I made a prototype (although in this prototype I was using Expression.Empty instead of CommentStatement) and it seemed to work fine. I was thinking to stick the new whitespace converter between the ExpressionScopeConverter and BlockAccumulator - at that moment there are already no tokens but there are no deep expression trees yet.

@imsedim
Copy link
Contributor Author

imsedim commented May 27, 2016

Oh, I see what you are saying.

I think I messed up with names in my last few comments. I meant CommentExpression inside StatementExpression (just like PartialExpression resides inside StatementExpression).

@imsedim
Copy link
Contributor Author

imsedim commented May 27, 2016

Anyway, since there are no general objections, I'll wrap it all up in a PR and then I can redo it in case it's necessary.

@imsedim
Copy link
Contributor Author

imsedim commented May 31, 2016

Is it possible to release the version with these changes included?

@rexm
Copy link
Member

rexm commented May 31, 2016

Yep, I've got a lot of chores to take care of on this project this week, that's definitely on the list.

@rexm
Copy link
Member

rexm commented Jun 4, 2016

Pushed to nuget 1.7.1

@rexm rexm closed this as completed Jun 4, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants