Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parser seems to choke on Japanese-encoded text #3679

Closed
headius opened this issue Feb 18, 2016 · 3 comments
Closed

Parser seems to choke on Japanese-encoded text #3679

headius opened this issue Feb 18, 2016 · 3 comments

Comments

@headius
Copy link
Member

headius commented Feb 18, 2016

Environment

JRuby on ruby-2.3 branch

Expected Behavior

The following script should parse and execute:

eval "C\u{30a8 30e9 30fc} = 1".encode("EUC-JP")

Actual Behavior

The parser gets an error from the jcoding library. This could mean the incoming text is improperly encoded, but that seems less likely than some issue in the parser not walking characters correctly.

$ jruby -e 'eval "C\u{30a8 30e9 30fc} = 1".encode("EUC-JP")'
Unhandled Java exception: org.jcodings.exception.EncodingException: invalid code point value
org.jcodings.exception.EncodingException: invalid code point value
      codeToMbcLength at org/jcodings/specific/BaseEUCJPEncoding.java:57
      codeToMbcLength at org/jcodings/specific/EUCJPEncoding.java:24
      isMultiByteChar at org/jruby/lexer/LexingCommon.java:253
     isIdentifierChar at org/jruby/lexer/LexingCommon.java:243
           identifier at org/jruby/lexer/yacc/RubyLexer.java:1446
                yylex at org/jruby/lexer/yacc/RubyLexer.java:1048
            nextToken at org/jruby/lexer/yacc/RubyLexer.java:336
              yyparse at org/jruby/parser/RubyParser.java:1618
              yyparse at org/jruby/parser/RubyParser.java:1569
                parse at org/jruby/parser/RubyParser.java:5359
                parse at org/jruby/parser/Parser.java:121
                parse at org/jruby/parser/Parser.java:77
            parseEval at org/jruby/Ruby.java:2793
            prepareIC at org/jruby/ir/interpreter/Interpreter.java:213
           evalCommon at org/jruby/ir/interpreter/Interpreter.java:168
      evalWithBinding at org/jruby/ir/interpreter/Interpreter.java:202
           evalCommon at org/jruby/RubyKernel.java:1006
               eval19 at org/jruby/RubyKernel.java:973

A different but possibly related issue occurs with ISO-2022-JP:

$ jruby -e 'eval "class C\u{30a8 30e9 30fc} < RuntimeError; self; end".encode("ISO-2022-JP")'
SyntaxError: (eval):1: Invalid char `\33' (') in expression
class CB%(%i!< < RuntimeError; self; end
   eval at org/jruby/RubyKernel.java:973
  <top> at -e:1

This affects the Ruby 2.3 test TestException#test_errinfo_encoding_in_debug.

@enebo
Copy link
Member

enebo commented Mar 1, 2016

This was fixed as part of ruby-2.3 branch and I am too lazy to figure out which commit (parser had many many changes recently).

@enebo enebo closed this as completed Mar 1, 2016
@headius
Copy link
Member Author

headius commented Mar 3, 2016

I attempted to untag the specs I tagged, but there's other issues that prevent them from passing (exception messages only supporting unicode, for one).

@enebo
Copy link
Member

enebo commented Mar 3, 2016

@headius ok. So things parse but we likely snag when we need proper Java string later

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants