You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This test in MRI's test_m17n.rb fails because we kick the method name out. It appears that our logic for checking identifiers is not correct, and RubyLexer.identifier (and down stream) need updating.
TestM17N#test_nonascii_method_name:
Java::OrgJcodingsException::EncodingException: invalid code point value
org.jcodings.specific.BaseEUCJPEncoding.codeToMbcLength(BaseEUCJPEncoding.java:55)
org.jcodings.specific.EUCJPEncoding.codeToMbcLength(EUCJPEncoding.java:24)
org.jruby.lexer.yacc.RubyLexer.isMultiByteChar(RubyLexer.java:638)
org.jruby.lexer.yacc.RubyLexer.getIdentifier(RubyLexer.java:1365)
org.jruby.lexer.yacc.RubyLexer.identifier(RubyLexer.java:1779)
I started a patch to do the re-porting, but I'm not familiar enough with the lexer to complete it.
diff --git a/core/src/main/java/org/jruby/lexer/yacc/RubyLexer.java b/core/src/main/java/org/jruby/lexer/yacc/RubyLexer.java
index fa26107..4db7e14 100644
--- a/core/src/main/java/org/jruby/lexer/yacc/RubyLexer.java+++ b/core/src/main/java/org/jruby/lexer/yacc/RubyLexer.java@@ -621,7 +621,7 @@ public class RubyLexer {
* mri: is_identchar
*/
public boolean isIdentifierChar(int c) {
- return Character.isLetterOrDigit(c) || c == '_' || isMultiByteChar(c);+ return encoding.isAlnum(c) || c == '_' || !Encoding.isAscii(c);
}
public boolean isASCII(int c) {
@@ -1768,6 +1768,20 @@ public class RubyLexer {
}
private int identifier(int c, boolean commandState) throws IOException {
+ ByteList aggregate = new ByteList();+ do {+ if (!Encoding.isAscii(c)) mb = ENC_CODERANGE_UNKNOWN;+ if (tokenAddMBC(c, aggregate) == -1) return 0;+ c = nextToken();+ } while (isIdentifierChar(c));++ int result = 0;++ if (c == '!' || c == '?') {+ result = Tokens.tFID;+ } else {+ this+ }
if (!isIdentifierChar(c)) {
String badChar = "\\" + Integer.toOctalString(c & 0xff);
throw new SyntaxException(PID.CHARACTER_BAD, getPosition(), getCurrentLine(),
@@ -1791,8 +1805,7 @@ public class RubyLexer {
} else {
src.unread(c);
}
-- int result = 0;+
last_state = lex_state;
if (lastBangOrPredicate) {
The text was updated successfully, but these errors were encountered:
There may be other tests in test_m17n.rb (or elsewhere in our suites) that fail due to this. Probably should be fixed since it breaks MBC identifiers for many cases.
I am declaring this fixed since we no long die in the lexer. In fact we pass some of these tests now. The newer issue is that we convert chars to a Java string for methods but then when we call .name on a ruby method object we try and convert that Java string back to a Ruby string and the encodings info is lost. I am nearly 100% certain we already have an issue on this .name problem.
This test in MRI's test_m17n.rb fails because we kick the method name out. It appears that our logic for checking identifiers is not correct, and RubyLexer.identifier (and down stream) need updating.
I started a patch to do the re-porting, but I'm not familiar enough with the lexer to complete it.
The text was updated successfully, but these errors were encountered: