Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jruby 9.1.13.0 can't parse UTF16 code with globals (both LE and BE) #4782

Open
slawo-ch opened this issue Sep 7, 2017 · 1 comment
Open

Comments

@slawo-ch
Copy link

slawo-ch commented Sep 7, 2017

Environment

jruby-complete-9.1.13.0.jar
macOS sierra

Expected Behavior

RubyLexer should be able to provide all tokens in all supported charsets

Actual Behavior

RubyLexer throws a compile error when it sees global variables

Test:

import org.jcodings.Encoding;
import org.jruby.Ruby;
import org.jruby.common.NullWarnings;
import org.jruby.lexer.ByteListLexerSource;
import org.jruby.lexer.LexerSource;
import org.jruby.lexer.LexingCommon;
import org.jruby.lexer.yacc.RubyLexer;
import org.jruby.parser.ParserConfiguration;
import org.jruby.parser.ParserSupport;
import org.jruby.parser.RubyParserResult;
import org.jruby.util.ByteList;
import org.junit.Test;

import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;

public class LexerTest {

  private RubyLexer makeLexer(LexerSource lexerSource){
    ParserSupport parserSupport = new ParserSupport();
    RubyLexer lexer = new RubyLexer(parserSupport, lexerSource, new NullWarnings(Ruby.getGlobalRuntime()));
    parserSupport.setLexer(lexer);
    parserSupport.setConfiguration(new ParserConfiguration(Ruby.getGlobalRuntime(), 0, false, true, false));
    parserSupport.setResult(new RubyParserResult());
    parserSupport.setWarnings(new NullWarnings(Ruby.getGlobalRuntime()));
    parserSupport.initTopLocalVariables();

    lexer.setState(LexingCommon.EXPR_BEG);
    return lexer;
  }

  public void lexIt(String code, Encoding encoding, Charset charset) throws Exception {
    LexerSource lexerSource = new ByteListLexerSource("test", 0, new ByteList(code.getBytes(charset)), null);
    lexerSource.setEncoding(encoding);

    RubyLexer lexer = makeLexer(lexerSource);
    while (!lexer.eofp){
      lexer.nextToken();
    }

  }

  @Test
  public void test_UTF8_no_globals() throws Exception {
    lexIt("puts 'Hello World'", Encoding.load("UTF8"), StandardCharsets.UTF_8);
  }
  @Test
  public void test_UTF8_with_globals() throws Exception {
    lexIt("puts $PROGRAM_NAME", Encoding.load("UTF8"), StandardCharsets.UTF_8);
  }

  @Test
  public void test_UTF16LE_no_globals() throws Exception {
    lexIt("puts 'Hello World'", Encoding.load("UTF16LE"), StandardCharsets.UTF_16LE);
  }

  @Test
  public void test_UTF16LE_with_globals() throws Exception {
    lexIt("puts $PROGRAM_NAME", Encoding.load("UTF16LE"), StandardCharsets.UTF_16LE);
  }

  @Test
  public void test_UTF16BE_no_globals() throws Exception {
    lexIt("puts 'Hello World'", Encoding.load("UTF16BE"), StandardCharsets.UTF_16BE);
  }

  @Test
  public void test_UTF16BE_with_globals() throws Exception {
    lexIt("puts $PROGRAM_NAME", Encoding.load("UTF16BE"), StandardCharsets.UTF_16BE);
  }
  
}

Gist of test: https://gist.github.com/slawo-ch/8a412cf76015af7963213f2de183692c

Test output is:
org.jruby.lexer.yacc.SyntaxException: `$�' is not allowed as a global variable name
puts $PROGRAM_NAME

	at org.jruby.lexer.yacc.RubyLexer.compile_error(RubyLexer.java:331)
	at org.jruby.lexer.yacc.RubyLexer.dollar(RubyLexer.java:1379)
	at org.jruby.lexer.yacc.RubyLexer.yylex(RubyLexer.java:1047)
	at org.jruby.lexer.yacc.RubyLexer.nextToken(RubyLexer.java:347)
	at LexerTest.lexIt(LexerTest.java:38)
	at LexerTest.test_UTF16LE_with_globals(LexerTest.java:59)
...


org.jruby.lexer.yacc.SyntaxException: `$�' is not allowed as a global variable name
puts $PROGRAM_NAME

	at org.jruby.lexer.yacc.RubyLexer.compile_error(RubyLexer.java:331)
	at org.jruby.lexer.yacc.RubyLexer.dollar(RubyLexer.java:1379)
	at org.jruby.lexer.yacc.RubyLexer.yylex(RubyLexer.java:1047)
	at org.jruby.lexer.yacc.RubyLexer.nextToken(RubyLexer.java:347)
	at LexerTest.lexIt(LexerTest.java:38)
	at LexerTest.test_UTF16BE_with_globals(LexerTest.java:69)
...
@headius
Copy link
Member

headius commented Sep 7, 2017

MRI does appear to be able to run files in UTF-16. This is likely a problem with us handling the encodings of symbols, for which there's numerous other bugs and an in-progress branch. Thoughts, @enebo?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants