Parser: add missing escape sequence for Char #5075

makenowjust · 2017-10-04T01:29:28Z

The compiler accepts this code:

p "\x64" # => "d"

But the compiler does not accept this:

p '\x64' #  invalid char escape sequence

And \100 style escape sequence has same issue.

This PR adds \xFF (hex) and \100 (octal) style escape sequence for Char.

Add `\xFF` and `\100` style escape sequence for `Char`

straight-shoota · 2017-10-04T08:41:36Z

src/compiler/crystal/syntax/lexer.cr

          when 'u'
            value = consume_char_unicode_escape
            @token.value = value.chr
-          when '0'
-            @token.value = '\0'
+          when '0', '1', '2', '3', '4', '5', '6', '7'


Maybe use '0'..'7'?

asterite · 2017-10-04T12:14:20Z

This was removed on purpose, the x escape is incorrect for chars. For example \xff is not a valid unicode codepoint.

asterite · 2017-10-04T12:15:26Z

Same goes with octal. This was removed explicitly by me some months ago. Let's not add it back.

straight-shoota · 2017-10-04T12:26:26Z

related to #2886

asterite · 2017-10-04T12:32:03Z

Why?

straight-shoota · 2017-10-04T12:36:31Z

This is also about character escape codes and you suggested to maybe remove \x.. escapes from strings at some point.
When (and if) this is removed, there would be no difference between "\x64" and '\x64' because both were invalid.
I thought it might be worth to mention this here.

makenowjust · 2017-10-04T12:52:01Z

@straight-shoota Thank you.

@asterite I can't understand "for example \xff is not a valid unicode codepoint" because U+00FF points 'LATIN SMALL LETTER Y WITH DIAERESIS' and I consider \xff means it. But the purpose makes me sense and I think we should remove octal and hex style escape sequence from string literal after #2886 resolved.

asterite · 2017-10-04T12:59:17Z

In a String, "\xff" will generate a string with one byte whose value is 255. That's not a valid UTF-8 string but it's valid as just a sequence of bytes (there's a big discussion on whether this should be allowed or not, or maybe just allowed in Slice(UInt8) literals, but that doesn't exist yet).

But a Char is an Int32 that holds an UTF-8 codepoint. A byte value and a codepoint are different things. \xff means "a byte with value 255", but in the comment above it means "a char with codepoint 255", which is 'ÿ'. But a String written as "\xff" is not the same as "ÿ". And "ÿ" in bytes is [195, 191].

It's a bit confusing, and in the original implementation '\xff' did generate 'ÿ', but that's wrong.

asterite · 2017-10-04T13:03:33Z

Another way to show it. Using this PR, do this:

"\xFF"[0] == '\xFF'

You will see that is false, which is quite unexpected.

makenowjust · 2017-10-04T13:05:39Z

@asterite Good example!

Add missing escape sequence for Char

ab1c575

Add `\xFF` and `\100` style escape sequence for `Char`

straight-shoota reviewed Oct 4, 2017

View reviewed changes

asterite closed this Oct 4, 2017

makenowjust deleted the feature/add-missing-escape-sequence-for-char branch October 4, 2017 12:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parser: add missing escape sequence for Char #5075

Parser: add missing escape sequence for Char #5075

makenowjust commented Oct 4, 2017

straight-shoota Oct 4, 2017

asterite commented Oct 4, 2017

asterite commented Oct 4, 2017

straight-shoota commented Oct 4, 2017

asterite commented Oct 4, 2017

straight-shoota commented Oct 4, 2017 •

edited

Loading

makenowjust commented Oct 4, 2017

asterite commented Oct 4, 2017

asterite commented Oct 4, 2017

makenowjust commented Oct 4, 2017

Parser: add missing escape sequence for Char #5075

Parser: add missing escape sequence for Char #5075

Conversation

makenowjust commented Oct 4, 2017

straight-shoota Oct 4, 2017

Choose a reason for hiding this comment

asterite commented Oct 4, 2017

asterite commented Oct 4, 2017

straight-shoota commented Oct 4, 2017

asterite commented Oct 4, 2017

straight-shoota commented Oct 4, 2017 • edited Loading

makenowjust commented Oct 4, 2017

asterite commented Oct 4, 2017

asterite commented Oct 4, 2017

makenowjust commented Oct 4, 2017

straight-shoota commented Oct 4, 2017 •

edited

Loading