-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UTF-16LE Regexp not working as expected #2863
Comments
Works ok on master, but that's not too surprising (reimplemented encodings since 1.7). I would suspect the problem is in our Regexp encoding negotiation (prepareRegexp and such); it's not allowing this combination when it should. |
Here's the stack trace:
As you can see, the error is raised when IRB tries to printf the result of the evaluation, i.e. The actual difference between JRuby and MRI seems to be in the result of the |
JRuby 1.7.20:
MRI 2.2.2p95:
MRI forces US-ASCII if the Regexp encoding is not compatible with ASCII: https://github.com/ruby/ruby/blob/ruby_2_2/re.c#L434 But JRuby uses the original encoding: |
@azolotko you did such a good job find this is there any chance you can submit a pull request too? |
@enebo |
Sorry guys, I didn't manage to tackle the issue and can't spend more time on this. Maybe someone else will have a better luck. |
Ok, I am not going to fix this before 1.7.25 goes out but I learned some stuff so I will document something at least: abc8 = "abc"
p abc8.encoding # <Encoding:UTF-8>
regex8 = Regexp.new(abc8) # succeeds
p regex8.inspect
p regex8.inspect.encoding
abc16 = "abc".encode("UTF-16LE")
p abc16.encoding # <Encoding:UTF-16LE>
regex16 = Regexp.new(abc16)
p regex16.inspect
p regex16.inspect.encoding Both 1.7 and 9k are both wrong. 9k does not throw the original reported exception but it is still not properly converting the encoding to the same one as MRI. I did just move over the 9k logic from regexpDescription to 1.7 as a test and the exception goes away but the result is still wrong; so I do not think that is appropriate to unbreak it less. Results: (mri 1.9)
mri2.3
Only difference in MRI is that default encoding changed so we see UTF-8 in first print out. JRuby 1.7.24 has original reported problem Encoding::ComatibilityError. 9.0.5.0
So difference is as @azolotko said inspect will convert the inspect of a regexp to US-ASCII and then print out mbc's as /x codes. |
This works on 1.7 HEAD, so I'll call it fixed in 1.7.26. |
Not able to create a UTF-16LE encoded Regexp object. I tried the following in Jruby 1.7.19 in IRB:
The text was updated successfully, but these errors were encountered: