Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support grapheme detection via \X #4568

Closed
janlelis opened this issue Apr 20, 2017 · 4 comments
Closed

Support grapheme detection via \X #4568

janlelis opened this issue Apr 20, 2017 · 4 comments

Comments

@janlelis
Copy link

JRuby should support matching "grapheme clusters" (glyphs), which are constructed using mutliple Unicode codepoints.

Expected Behavior (MRI)

glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]

Actual Behavior (JRuby)

glyphs = "\u{61 308 62}".scan(/\X/) # =>  ["a", "b"]`
glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61"], ["62"]]

Related Links

@headius
Copy link
Member

headius commented Nov 28, 2017

Given that this does not appear to be fully baked in MRI 2.3 I think this is safe to defer to our 2.3-compatible release in 9.2.0.0.

@lopex
Copy link
Contributor

lopex commented Dec 30, 2017

This should work ootb with new joni

@headius
Copy link
Member

headius commented Jan 25, 2018

@lopex I forget our status here. We can move 9.1 to the new joni, yes?

@headius
Copy link
Member

headius commented Feb 13, 2018

Works in 9.1.16.0.

irb(main):001:0> glyphs = "\u{61 308 62}".scan(/\X/) # => ["ä", "b"]`
(irb):1: warning: character class has duplicated range
=> ["ä", "b"]
irb(main):002:0> glyphs.map{ |e| e.codepoints.map{ |f| f.to_s(16) } } #=> [["61", "308"], ["62"]]
=> [["61", "308"], ["62"]]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants