Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A DStrNode is generated when parsing squigly heredocs #4172

Closed
jsyeo opened this issue Sep 23, 2016 · 11 comments
Closed

A DStrNode is generated when parsing squigly heredocs #4172

jsyeo opened this issue Sep 23, 2016 · 11 comments

Comments

@jsyeo
Copy link
Contributor

jsyeo commented Sep 23, 2016

When parsing squigly heredocs, a DStrNode is produced if there are spaces between the first heredoc line and the content. Like for example: "<<~TEST\n hello\nTEST". If I try to parse it without any spaces, only the StrNode is produced.

[23] pry(main)> parse("<<~TEST\nhello\nTEST").map { |a| a }
=> [#<Java::OrgJrubyparserAst::RootNode:0x33617539>, #<Java::OrgJrubyparserAst::StrNode:0x5db4c359>]
[24] pry(main)> parse("<<~TEST\n hello\nTEST").map { |a| a }
=> [#<Java::OrgJrubyparserAst::RootNode:0x7997b197>,
 #<Java::OrgJrubyparserAst::DStrNode:0x460f76a6>,
 #<Java::OrgJrubyparserAst::StrNode:0x11acdc30>,
 #<Java::OrgJrubyparserAst::StrNode:0x4a8ab068>]

Environment

» jruby --version
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 Java HotSpot(TM) 64-Bit Server VM 25.102-b14 on 1.8.0_102-b14 +jit [darwin-x86_64]
@jsyeo
Copy link
Contributor Author

jsyeo commented Sep 23, 2016

Here's the output from ast

$ cat heredoc.rb
<<~TEST
 hello
TEST
$ ast heredoc.rb
AST:
RootNode 0
  DStrNode 0
    StrNode 0
    StrNode 1

$ cat nospace_heredoc.rb
<<~TEST
hello
TEST
$ ast nospace_heredoc.rb
AST:
RootNode 0
  StrNode 0

@enebo
Copy link
Member

enebo commented Sep 23, 2016

@jsyeo What is the bug? Or what problem are you seeing with us processing tilde heredocs this way? MRI I believe also does this same thing internally.

@headius
Copy link
Member

headius commented Sep 23, 2016

@enebo I'm not sure if -y tells us what we want to know, but MRI appears to parse both of these identically. I think the concern here is that JRuby's creating a DStr for a non-dynamic string only when there's spaces ahead of the first heredoc line.

@enebo
Copy link
Member

enebo commented Sep 24, 2016

@headius I don't think -y sheds any light on this. It shows the lex tokens and which productions are hit. It does not report on the actual tree nodes generated. Both JRuby and MRI have a heredoc_dedent method in the lexer which keeps reading each line and concat'ing it as elements on a DStr. (update: I think MRI changed this from dstr + str concats to using memmove at some point?).

So my original question might be answered by @headius which is that this may not build in IR as efficient as it could, but I am not positive this is why @jsyeo is asking. He might just feel the AST is not what he wants for some tooling possibly?

I guess regardless how it is asked I can probably collapse it into a single string literal but I still sort of need to know whether AST is important or not? If not then I think I would fix it in IRBuilder as it will be a simpler fix.

@jsyeo
Copy link
Contributor Author

jsyeo commented Sep 24, 2016

He might just feel the AST is not what he wants for some tooling possibly?

Yep. I've forked the jruby-parser and updated the code with the main line parser and I was writing specs for it. While I was writing specs for #4173, I wrote something like this:

describe Parser do
  it 'parses squiggly heredoc with single quotes' do
    code = "obj = <<~'TEST'
                  hello
            TEST"
    str_node = parse(code).find_node.tap do |s|
      expect(str_node).to_not be_nil
      expect(str_node.value).to eq "hello\n"
    end
  end
end

This spec fails because the first StrNode that was found contains an empty string instead of "hello". It turns out that the parser was generating a DStrNode that wraps around two StrNodes and the string value is in the second StrNode and I don't think that's what one would expect if someone is traversing the AST.

@jsyeo
Copy link
Contributor Author

jsyeo commented Sep 24, 2016

I guess regardless how it is asked I can probably collapse it into a single string literal but I still sort of need to know whether AST is important or not?

I guess from a higher level point of view, as long as the semantics are right, we don't really need the generated AST to be correct. However, I think Ripper might not be behaving in the same way as MRI but I couldn't test this hypothesis. >_< I am having problems using Ripper to lex squigly heredocs. I have reported the issue in #4176.

@enebo
Copy link
Member

enebo commented Sep 24, 2016

@jsyeo ok thanks for the update. We can fix this in the AST so that it lines up and I guess we shall see how ripper should behave once it is working.

@headius
Copy link
Member

headius commented Sep 26, 2016

I've forked the jruby-parser and updated the code with the main line parser

Nice! Hopefully you can turn this into a PR to update jruby-parser?

@headius
Copy link
Member

headius commented Sep 26, 2016

Perhaps this issue should be moved to jruby/jruby-parser, since for standard JRuby uses, nobody will ever see that this is a DStr?

@jsyeo
Copy link
Contributor Author

jsyeo commented Sep 27, 2016

We can fix this in the AST so that it lines up and I guess we shall see how ripper should behave once it is working.

Now that ripper is working, I can confirm that this does not affect ripper. I will close this issue and move it to jruby/jruby-parser.

» bin/jruby -e "require 'ripper'; p Ripper.lex(File.read('heredoc.rb'))"
[[[1, 0], :on_heredoc_beg, "<<~TEST"], [[1, 7], :on_nl, "\n"], [[2, 1], :on_tstring_content, "hello\n"], [[3, 0], :on_heredoc_end, "TEST\n"]]
» bin/jruby -e "require 'ripper'; p Ripper.lex(File.read('heredoc_nospace.rb'))"
[[[1, 0], :on_heredoc_beg, "<<~TEST"], [[1, 7], :on_nl, "\n"], [[2, 0], :on_tstring_content, "hello\n"], [[3, 0], :on_heredoc_end, "TEST\n"]]

@enebo
Copy link
Member

enebo commented Sep 27, 2016

@jsyeo thanks for submitting a PR to jruby-parser. I also think if you have n lines in the heredoc each line gets emitted in ripper. So that might be worth checking to make sure they are the same.

@enebo enebo added this to the Invalid or Duplicate milestone Nov 9, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants