Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identical Regexp are not equal to each other #4259

Closed
andy-twosticks opened this issue Nov 1, 2016 · 8 comments
Closed

Identical Regexp are not equal to each other #4259

andy-twosticks opened this issue Nov 1, 2016 · 8 comments

Comments

@andy-twosticks
Copy link

andy-twosticks commented Nov 1, 2016

Environment

%: uname -a
Linux centos7andy.jhallpr.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

%: jruby -v
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 OpenJDK 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 [linux-x86_64]

Expected Behavior

"Things equal to the same thing are equal to each other" -- Euclid

Regardless of how a Regexp is created or what it contains, identical Regexp should always be equal to each other.

The behaviour below turns up in MRI 1.9.3, too, but appears to be gone by 2.0.0:

%: rvm use ruby-2.0.0 
Using /home/jonea/.rvm/gems/ruby-2.0.0-p648 

%: irb               
2.0.0-p648 :001 > %r|foo/bar| == /foo\/bar/             
 => true                                                

%: rvm use ruby-1.9.3   
Using /home/jonea/.rvm/gems/ruby-1.9.3-p551             

%: irb               
1.9.3-p551 :001 > %r|foo/bar| == /foo\/bar/             
 => false                                               

Actual Behavior

%: jirb
:001 > /foo\/bar/ == Regexp.new("foo/bar")
 => false

:002 > /foo\/bar/ == Regexp.new("foo\/bar")
=> false

:003 > /foo\/bar/ == %r|foo/bar|
=> false

:004 > %r|foo/bar| == Regexp.new("foo\/bar")
=> true

 :005 >  /foobar/ == %r|foobar|
 => true

As you can see, the problem seems to be with Regexp created using the basic slash syntax when the regexp itself also contains a slash.

@headius
Copy link
Member

headius commented Nov 1, 2016

@andy-twosticks Nice find! This would point to a difference in how the parser constructs Regex versus how the constructor forms construct Regex.

@headius headius added the parser label Nov 1, 2016
@headius
Copy link
Member

headius commented Nov 1, 2016

@enebo Any idea why this might be? Something odd with the escaping?

@enebo
Copy link
Member

enebo commented Nov 1, 2016

@headius My only guess would be that bytelist after parsing does not remove the \ and it does it in the constructor of the regexp. That would make the bytelists different perhaps?

@headius
Copy link
Member

headius commented Nov 1, 2016

It looks like the sources are difference in these two cases.

[] ~/projects/jruby $ jruby -e 'p /foo\/bar/.source'
"foo\\/bar"

[] ~/projects/jruby $ jruby -e 'p %r[foo/bar].source'
"foo/bar"

@headius
Copy link
Member

headius commented Nov 1, 2016

Ok, so stepping through the parser showed me that it's constructing the // form using foo\/bar in a ByteList and the %r[] form using foo/bar. Both proceed along the same path from there, and the compiled Regexps that result are the same. Only the sources differ.

Seems like either the one needs to be escaped or the other needs to be unescaped before constructing the Regexp.

@headius
Copy link
Member

headius commented Nov 1, 2016

Ruby 2.3 for comparison:

[] ~/projects/jruby $ ruby23 -e 'p %r[foo/bar].source'
"foo/bar"

[] ~/projects/jruby $ ruby23 -e 'p /foo\/bar/.source'
"foo/bar"

@headius
Copy link
Member

headius commented Nov 1, 2016

A bit more comparison with MRI:

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("/foo\/bar/")'
["/", "foo", "/bar", "/"]

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("%r[foo/bar]")'
["%r[", "foo/bar", "]"]

We produce the same tokens, but when combining those string pieces we seem to be escaping the / again. Ripper isn't exactly the same as the normal parser, but it should be close.

@enebo enebo added this to the JRuby 9.1.6.0 milestone Nov 2, 2016
@enebo enebo closed this as completed in 49bd27d Nov 2, 2016
@enebo
Copy link
Member

enebo commented Nov 2, 2016

So there was an extra append '' in our StringTerm code which MRI did not have nor did our ripper version of StringTerm. So pretty easy to delete that line. Spec added in commit b0abc53.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants