Whitespace control inconsistent with Handlebars.js #113

imsedim · 2016-05-10T15:30:49Z

Using "~" after opening braces strips the whitespace both before and after the mustache statement, while it should leave whitespace after mustache intact.
HandlebarsDotNet does not remove the whitespace around standalone helpers as it does handlebars.js:

default behavior [is] stripping lines that are "standalone" helpers (only a block helper, comment, or partial and whitespace)
(from here)

I made two tests to indicate these two issues:
https://dotnetfiddle.net/wtzQtB

And for comparison, the same examples made with handlebars.js:
https://jsfiddle.net/gk145cvh/

Sure these are pretty minor issues in this otherwise outstanding port, stuck upon them when one of my unit tests failed.

rexm · 2016-05-12T11:34:27Z

Found the bug in the lexer. Planning to include a fix on the next release.

rexm · 2016-05-18T00:57:35Z

Pushed 1.6.8 which addresses the ~ whitespace control bug.

The other whitespace control deviations from handlebarsjs (block helpers, comments, partials) can be added in the future. The high-level solution for these is easy to understand, but the implementation may be a little tricky. PR always welcome.

Details:
The core lexer only understands a few key control characters: {, }, ~, ( and ). It delegates other characters to specific types of parsers. When the lexer parses a }}, it checks to see if the previous character it saw was a ~ to know whether the EndExpressionToken should trim trailing whitespace. In the case where it would see {{~word}} a sub-parser would handle word, so the last core character it saw was indeed ~, and marked the EndExpressionToken as needing to trim trailing whitespace. Interestingly, this bug would produce the effect where {{~word}} would not behave as expected, but {{~word }} would behave as expected, since the extra space before the }} would give the lexer an extra step to find its bearings, so to speak.

We needed to improve the lexer so that the last-seen pre-word character would not overlap with the last-seen post-word character. That was a minor change to the lexer to make the last-seen pre-word character to be the first character of the word itself.

imsedim · 2016-05-19T07:25:42Z

Great news, thanks for the update.

As for PR for the rest - could you please give some hint / starting point?

rexm · 2016-05-19T13:29:24Z

Sure... bit of background: the parser works in 3 phases:
The Lexer scans each character in the template and turns it into a sequence of tokens. For example, Hello {{#each person}}{{this}}{{/each}}! would produce the following token sequence:

Static token ("Hello ")
StartExpression token
Word token ("#each")
Word token ("person")
EndExpression token
StartExpression token
Word token ("this")
EndExpression token
StartExpression token
Word token ("/each")
EndExpression token
Static token ("!")

The next phase is conversion, where we convert the token sequence into an abstract syntax tree. Each converter in the conversion process is responsible for looping once over the token sequence and searching for a specific thing and replacing it in the sequence, until all tokens have been replaced. For example, the block helper converter would search through the token list above, and change it to this (roughly):

Static token ("Hello ") (no change)
Block helper expression
- Name ("each")
- Arguments
  - "person"
- Sub-template
  - "this"
Static token ("!") (no change)

The final phase is compilation, converting the abstract syntax tree to instructions that mean something to .NET. You're probably interested in the second phase, conversion.

So what you'd want to do is probably add a new set of converters, each to handle the whitespace for a specific scenario. So in the example of removing whitespace before and after block helpers, you'd want to run it after the block helper converter, so you can search for a Static token, hold it in a variable, if the next element is a BlockHelper expression, you can yield back a modified version of the Static token with the trailing whitespace removed. Then see if the next element after that is another Static token, remove the leading whitespace, and yield it.

Each converter in the converters folder is an example of this process.

imsedim · 2016-05-19T15:44:55Z

Could introducing the separate newline token be beneficial? On one hand, it means making changes in the two subsystems (lexer and converter) instead of one, which means more possible bugs. But on the other hand it might reduce the pain (well, or pleasure) of implementing this new suggested converter.

rexm · 2016-05-19T16:13:42Z

Could be. That might make the existing whitespace control handlers more complicated. Keen to find out what you learn!

imsedim · 2016-05-27T10:57:37Z

Just as you said, it got a little bit tricky, so I'd need your feedback about one change I'm about to make about comments.

Comments are stripped out too early during conversion. I need my new whitespace converter to know about comments and I would also like to take advantage of ExpressionScopeConverter to simplify this converter. My suggestion: introduce CommentStatement and strip it out after whitespace converter, near the very end of conversion pipeline.

There are also layout tokens out there, but since they are not part of handlebars / mustache spec, I can either consider convert them to CommentStatement (since they resemble comments), or just leave them be without whitespace conversions.

imsedim · 2016-05-27T11:33:45Z

Actually, maybe I should not even delete comment expressions, if there is an intention to make AST public.

rexm · 2016-05-27T12:14:02Z

Using a comment statement and moving the handling sounds reasonable, though I'm curious if you tried it yet - the further down the chain you move a processor, you have to consider not only searching the sequence of tokens but also searching already-converted expressions which may have trees of descendant expressions; and inversely you have to ensure each of those preceding converters won't choke when they now see a new kind of unconverted token they never saw before.

imsedim · 2016-05-27T12:22:16Z

I made a prototype (although in this prototype I was using Expression.Empty instead of CommentStatement) and it seemed to work fine. I was thinking to stick the new whitespace converter between the ExpressionScopeConverter and BlockAccumulator - at that moment there are already no tokens but there are no deep expression trees yet.

imsedim · 2016-05-27T12:30:11Z

Oh, I see what you are saying.

I think I messed up with names in my last few comments. I meant CommentExpression inside StatementExpression (just like PartialExpression resides inside StatementExpression).

imsedim · 2016-05-27T12:32:17Z

Anyway, since there are no general objections, I'll wrap it all up in a PR and then I can redo it in case it's necessary.

imsedim · 2016-05-31T14:08:23Z

Is it possible to release the version with these changes included?

rexm · 2016-05-31T22:34:08Z

Yep, I've got a lot of chores to take care of on this project this week, that's definitely on the list.

rexm · 2016-06-04T16:44:26Z

Pushed to nuget 1.7.1

rexm added the bug label May 12, 2016

imsedim mentioned this issue May 28, 2016

Standalone whitespace #129

Merged

rexm closed this as completed Jun 4, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whitespace control inconsistent with Handlebars.js #113

Whitespace control inconsistent with Handlebars.js #113

imsedim commented May 10, 2016 •

edited

rexm commented May 12, 2016

rexm commented May 18, 2016 •

edited

imsedim commented May 19, 2016

rexm commented May 19, 2016 •

edited

imsedim commented May 19, 2016

rexm commented May 19, 2016 •

edited

imsedim commented May 27, 2016 •

edited

imsedim commented May 27, 2016 •

edited

rexm commented May 27, 2016

imsedim commented May 27, 2016 •

edited

imsedim commented May 27, 2016

imsedim commented May 27, 2016

imsedim commented May 31, 2016

rexm commented May 31, 2016

rexm commented Jun 4, 2016

Whitespace control inconsistent with Handlebars.js #113

Whitespace control inconsistent with Handlebars.js #113

Comments

imsedim commented May 10, 2016 • edited

rexm commented May 12, 2016

rexm commented May 18, 2016 • edited

imsedim commented May 19, 2016

rexm commented May 19, 2016 • edited

imsedim commented May 19, 2016

rexm commented May 19, 2016 • edited

imsedim commented May 27, 2016 • edited

imsedim commented May 27, 2016 • edited

rexm commented May 27, 2016

imsedim commented May 27, 2016 • edited

imsedim commented May 27, 2016

imsedim commented May 27, 2016

imsedim commented May 31, 2016

rexm commented May 31, 2016

rexm commented Jun 4, 2016

imsedim commented May 10, 2016 •

edited

rexm commented May 18, 2016 •

edited

rexm commented May 19, 2016 •

edited

rexm commented May 19, 2016 •

edited

imsedim commented May 27, 2016 •

edited

imsedim commented May 27, 2016 •

edited

imsedim commented May 27, 2016 •

edited