Identical Regexp are not equal to each other #4259

andy-twosticks · 2016-11-01T10:21:35Z

Environment

%: uname -a
Linux centos7andy.jhallpr.com 3.10.0-327.18.2.el7.x86_64 #1 SMP Thu May 12 11:03:55 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

%: jruby -v
jruby 9.1.5.0 (2.3.1) 2016-09-07 036ce39 OpenJDK 64-Bit Server VM 25.91-b14 on 1.8.0_91-b14 [linux-x86_64]

Expected Behavior

"Things equal to the same thing are equal to each other" -- Euclid

Regardless of how a Regexp is created or what it contains, identical Regexp should always be equal to each other.

The behaviour below turns up in MRI 1.9.3, too, but appears to be gone by 2.0.0:

%: rvm use ruby-2.0.0 
Using /home/jonea/.rvm/gems/ruby-2.0.0-p648 

%: irb               
2.0.0-p648 :001 > %r|foo/bar| == /foo\/bar/             
 => true                                                

%: rvm use ruby-1.9.3   
Using /home/jonea/.rvm/gems/ruby-1.9.3-p551             

%: irb               
1.9.3-p551 :001 > %r|foo/bar| == /foo\/bar/             
 => false

Actual Behavior

%: jirb
:001 > /foo\/bar/ == Regexp.new("foo/bar")
 => false

:002 > /foo\/bar/ == Regexp.new("foo\/bar")
=> false

:003 > /foo\/bar/ == %r|foo/bar|
=> false

:004 > %r|foo/bar| == Regexp.new("foo\/bar")
=> true

 :005 >  /foobar/ == %r|foobar|
 => true

As you can see, the problem seems to be with Regexp created using the basic slash syntax when the regexp itself also contains a slash.

The text was updated successfully, but these errors were encountered:

headius · 2016-11-01T17:44:21Z

@andy-twosticks Nice find! This would point to a difference in how the parser constructs Regex versus how the constructor forms construct Regex.

headius · 2016-11-01T17:44:46Z

@enebo Any idea why this might be? Something odd with the escaping?

enebo · 2016-11-01T18:54:46Z

@headius My only guess would be that bytelist after parsing does not remove the \ and it does it in the constructor of the regexp. That would make the bytelists different perhaps?

headius · 2016-11-01T21:13:21Z

It looks like the sources are difference in these two cases.

[] ~/projects/jruby $ jruby -e 'p /foo\/bar/.source'
"foo\\/bar"

[] ~/projects/jruby $ jruby -e 'p %r[foo/bar].source'
"foo/bar"

headius · 2016-11-01T21:19:31Z

Ok, so stepping through the parser showed me that it's constructing the // form using foo\/bar in a ByteList and the %r[] form using foo/bar. Both proceed along the same path from there, and the compiled Regexps that result are the same. Only the sources differ.

Seems like either the one needs to be escaped or the other needs to be unescaped before constructing the Regexp.

headius · 2016-11-01T21:21:09Z

Ruby 2.3 for comparison:

[] ~/projects/jruby $ ruby23 -e 'p %r[foo/bar].source'
"foo/bar"

[] ~/projects/jruby $ ruby23 -e 'p /foo\/bar/.source'
"foo/bar"

headius · 2016-11-01T21:29:27Z

A bit more comparison with MRI:

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("/foo\/bar/")'
["/", "foo", "/bar", "/"]

[] ~/projects/jruby $ ruby23 -rpp -rripper -e 'pp Ripper.tokenize("%r[foo/bar]")'
["%r[", "foo/bar", "]"]

We produce the same tokens, but when combining those string pieces we seem to be escaping the / again. Ripper isn't exactly the same as the normal parser, but it should be close.

enebo · 2016-11-02T15:07:23Z

So there was an extra append '' in our StringTerm code which MRI did not have nor did our ripper version of StringTerm. So pretty easy to delete that line. Spec added in commit b0abc53.

kares added core beginner JRuby 9000 labels Nov 1, 2016

headius added the parser label Nov 1, 2016

enebo added this to the JRuby 9.1.6.0 milestone Nov 2, 2016

enebo closed this as completed in 49bd27d Nov 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Identical Regexp are not equal to each other #4259

Identical Regexp are not equal to each other #4259

andy-twosticks commented Nov 1, 2016 •

edited

Loading

headius commented Nov 1, 2016

headius commented Nov 1, 2016

enebo commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

enebo commented Nov 2, 2016

Identical Regexp are not equal to each other #4259

Identical Regexp are not equal to each other #4259

Comments

andy-twosticks commented Nov 1, 2016 • edited Loading

Environment

Expected Behavior

Actual Behavior

headius commented Nov 1, 2016

headius commented Nov 1, 2016

enebo commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

headius commented Nov 1, 2016

enebo commented Nov 2, 2016

andy-twosticks commented Nov 1, 2016 •

edited

Loading