Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser: add missing escape sequence for Char #5075

Conversation

makenowjust
Copy link
Contributor

The compiler accepts this code:

p "\x64" # => "d"

But the compiler does not accept this:

p '\x64' #  invalid char escape sequence

And \100 style escape sequence has same issue.

This PR adds \xFF (hex) and \100 (octal) style escape sequence for Char.

Add `\xFF` and `\100` style escape sequence for `Char`
when 'u'
value = consume_char_unicode_escape
@token.value = value.chr
when '0'
@token.value = '\0'
when '0', '1', '2', '3', '4', '5', '6', '7'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use '0'..'7'?

@asterite
Copy link
Member

asterite commented Oct 4, 2017

This was removed on purpose, the x escape is incorrect for chars. For example \xff is not a valid unicode codepoint.

@asterite asterite closed this Oct 4, 2017
@asterite
Copy link
Member

asterite commented Oct 4, 2017

Same goes with octal. This was removed explicitly by me some months ago. Let's not add it back.

@straight-shoota
Copy link
Member

related to #2886

@asterite
Copy link
Member

asterite commented Oct 4, 2017

Why?

@straight-shoota
Copy link
Member

straight-shoota commented Oct 4, 2017

This is also about character escape codes and you suggested to maybe remove \x.. escapes from strings at some point.
When (and if) this is removed, there would be no difference between "\x64" and '\x64' because both were invalid.
I thought it might be worth to mention this here.

@makenowjust
Copy link
Contributor Author

@straight-shoota Thank you.

@asterite I can't understand "for example \xff is not a valid unicode codepoint" because U+00FF points 'LATIN SMALL LETTER Y WITH DIAERESIS' and I consider \xff means it. But the purpose makes me sense and I think we should remove octal and hex style escape sequence from string literal after #2886 resolved.

@makenowjust makenowjust deleted the feature/add-missing-escape-sequence-for-char branch October 4, 2017 12:52
@asterite
Copy link
Member

asterite commented Oct 4, 2017

In a String, "\xff" will generate a string with one byte whose value is 255. That's not a valid UTF-8 string but it's valid as just a sequence of bytes (there's a big discussion on whether this should be allowed or not, or maybe just allowed in Slice(UInt8) literals, but that doesn't exist yet).

But a Char is an Int32 that holds an UTF-8 codepoint. A byte value and a codepoint are different things. \xff means "a byte with value 255", but in the comment above it means "a char with codepoint 255", which is 'ÿ'. But a String written as "\xff" is not the same as "ÿ". And "ÿ" in bytes is [195, 191].

It's a bit confusing, and in the original implementation '\xff' did generate 'ÿ', but that's wrong.

@asterite
Copy link
Member

asterite commented Oct 4, 2017

Another way to show it. Using this PR, do this:

"\xFF"[0] == '\xFF'

You will see that is false, which is quite unexpected.

@makenowjust
Copy link
Contributor Author

@asterite Good example!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants