Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid codepoints don't raise RangeError on #chr #1921

Closed
yous opened this issue Aug 26, 2014 · 3 comments
Closed

Invalid codepoints don't raise RangeError on #chr #1921

yous opened this issue Aug 26, 2014 · 3 comments
Milestone

Comments

@yous
Copy link

yous commented Aug 26, 2014

I'm using JRuby-1.7.13.

http://en.wikipedia.org/wiki/UTF-8#Invalid_byte_sequences

Not all sequences of bytes are valid UTF-8. A UTF-8 decoder should be prepared for:

  • A 4-byte sequence (starting with 0xF4) that decodes to a value greater than U+10FFFF

In Ruby 1.9.3,

>> 0x110000.chr(Encoding::UTF_8)
RangeError: invalid codepoint 0x110000 in UTF-8
    from (irb):3:in `chr'
    from (irb):3
    from /Users/yous/.rvm/rubies/ruby-1.9.3-p547/bin/irb:12:in `<main>'

In JRuby,

>> 0x110000.chr(Encoding::UTF_8)
=> "\xF4\x90\x80\x80"

http://en.wikipedia.org/wiki/UTF-8#Invalid_code_points

According to the UTF-8 definition (RFC 3629) the high and low surrogate halves used by UTF-16 (U+D800 through U+DFFF) are not legal Unicode values, and their UTF-8 encoding should be treated as an invalid byte sequence.

In Ruby 1.9.3,

>> 0xD800.chr(Encoding::UTF_8)
RangeError: invalid codepoint 0xD800 in UTF-8
    from (irb):1:in `chr'
    from (irb):1
    from /Users/yous/.rvm/rubies/ruby-1.9.3-p547/bin/irb:12:in `<main>'
>> 0xDFFF.chr(Encoding::UTF_8)
RangeError: invalid codepoint 0xDFFF in UTF-8
    from (irb):2:in `chr'
    from (irb):2
    from /Users/yous/.rvm/rubies/ruby-1.9.3-p547/bin/irb:12:in `<main>'

In JRuby,

>> 0xD800.chr(Encoding::UTF_8)
=> "\xED\xA0\x80"
>> 0xDFFF.chr(Encoding::UTF_8)
=> "\xED\xBF\xBF"
@yous yous changed the title Invalid codepoints are not raise RangeError on #chr Invalid codepoints does not raise RangeError on #chr Feb 26, 2015
@yous yous changed the title Invalid codepoints does not raise RangeError on #chr Invalid codepoints don't raise RangeError on #chr Feb 26, 2015
@yous
Copy link
Author

yous commented Feb 26, 2015

Still reproducable with JRuby-1.7.19 and JRuby-9.0.0.0.pre1.

yous added a commit to yous/raheui that referenced this issue Feb 26, 2015
@k77ch7
Copy link
Contributor

k77ch7 commented May 23, 2017

@yous this works in 9.1.9.0 and 9.1.10.0-SNAPSHOT (2.3.3) 2017-05-20 7d0ecb9.

@yous
Copy link
Author

yous commented Jun 8, 2017

I can confirm that this was fixed in JRuby-9.0.0.0.pre2, maybe by f90bcc9. Thank you, @k77ch7.

@yous yous closed this as completed Jun 8, 2017
@kares kares added this to the JRuby 9.0.1.0 milestone Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants