Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when matching regex in multiple threads #3670

Closed
marshalium opened this issue Feb 15, 2016 · 17 comments
Closed

Error when matching regex in multiple threads #3670

marshalium opened this issue Feb 15, 2016 · 17 comments

Comments

@marshalium
Copy link
Contributor

1.7.24 and 9.0.5.0 both have a pretty serious regression when matching against utf8 strings in multiple threads at the same time. This bug does not appear to be present in 1.7.23 or 9.0.4.0.

If multiple threads are running code like this, with completely unshared string and regex objects:

str = "foobar"
str.force_encoding("UTF-8")
str.gsub(/foo/i, '')

eventually one of the threads will throw an error like this:

Exception in thread "Ruby-0-Thread-6: ./recreate_utf8_bug.rb:3" java.lang.ArrayIndexOutOfBoundsException: 6
  at org.jcodings.specific.BaseUTF8Encoding.mbcCaseFold(BaseUTF8Encoding.java:167)
  at org.jcodings.specific.UTF8Encoding.mbcCaseFold(UTF8Encoding.java:24)
  at org.joni.SearchAlgorithm$SLOW_IC.lowerCaseMatch(SearchAlgorithm.java:238)
  at org.joni.SearchAlgorithm$SLOW_IC.search(SearchAlgorithm.java:206)
  at org.joni.Matcher.forwardSearchRange(Matcher.java:140)
  at org.joni.Matcher.searchInterruptible(Matcher.java:451)
  at org.jruby.RubyRegexp$SearchMatchTask.run(RubyRegexp.java:273)
  at org.jruby.RubyThread.executeBlockingTask(RubyThread.java:1066)
  at org.jruby.RubyRegexp.matcherSearch(RubyRegexp.java:235)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3123)
  at org.jruby.RubyString.gsubCommon19(RubyString.java:3106)
  at org.jruby.RubyString.gsub19(RubyString.java:3101)
  at org.jruby.RubyString.gsub19(RubyString.java:3069)
  at org.jruby.RubyString$INVOKER$i$gsub19.call(RubyString$INVOKER$i$gsub19.gen)
  at org.jruby.internal.runtime.methods.JavaMethod$JavaMethodOneOrTwoOrNBlock.call(JavaMethod.java:367)
  at org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
  at $_dot_.recreate_utf8_bug.block_2$RUBY$__file__(./recreate_utf8_bug.rb:13)
  at $_dot_$recreate_utf8_bug$block_2$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_2$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yieldSpecificInternal(CompiledBlock19.java:117)
  at org.jruby.runtime.CompiledBlock19.yieldSpecific(CompiledBlock19.java:92)
  at org.jruby.runtime.Block.yieldSpecific(Block.java:111)
  at org.jruby.RubyFixnum.times(RubyFixnum.java:275)
  at org.jruby.RubyFixnum$INVOKER$i$0$0$times.call(RubyFixnum$INVOKER$i$0$0$times.gen)
  at org.jruby.runtime.callsite.CachingCallSite.callBlock(CachingCallSite.java:143)
  at org.jruby.runtime.callsite.CachingCallSite.callIter(CachingCallSite.java:154)
  at $_dot_.recreate_utf8_bug.chained_0_rescue_1$RUBY$SYNTHETIC__file__(./recreate_utf8_bug.rb:10)
  at $_dot_.recreate_utf8_bug.block_1$RUBY$__file__(./recreate_utf8_bug.rb:9)
  at $_dot_$recreate_utf8_bug$block_1$RUBY$__file__.call($_dot_$recreate_utf8_bug$block_1$RUBY$__file__)
  at org.jruby.runtime.CompiledBlock19.yield(CompiledBlock19.java:159)
  at org.jruby.runtime.CompiledBlock19.call(CompiledBlock19.java:87)
  at org.jruby.runtime.Block.call(Block.java:101)
  at org.jruby.RubyProc.call(RubyProc.java:300)
  at org.jruby.RubyProc.call(RubyProc.java:230)
  at org.jruby.internal.runtime.RubyRunnable.run(RubyRunnable.java:99)
  at java.lang.Thread.run(Thread.java:745)

The error doesn't occur if I remove the call to #force_encoding or if I remove the "i" flag from the regex.

See this gist for a full reproducible test case and examples of running it on different JRuby versions:
https://gist.github.com/marshalium/3e62c2affbd2ce95757f

@enebo enebo added this to the JRuby 1.7.25 milestone Feb 15, 2016
@enebo enebo added the core label Feb 15, 2016
@rsim
Copy link

rsim commented Feb 22, 2016

We are experiencing this issue as well and it resulted in a large number of corrupted data in YAML serialized ActiveRecord columns.

@rsim
Copy link

rsim commented Feb 22, 2016

Based on the original example, I created a simple test case which always fails with ArrayIndexOutOfBoundsException is some threads:

10.times.map{Thread.new{100000.times{"foobar".force_encoding("UTF-8") =~ /foo/i}}}.map(&:value)

I would recommend that http://jruby.org/2016/01/20/jruby-1-7-24 is updated with this critical bug information and with recommendation not to use this version in multi-threaded production applications. This bug most probably will not be caught by application test suites and might cause critical problems in production.

lopex added a commit to jruby/joni that referenced this issue Feb 22, 2016
@lopex
Copy link
Contributor

lopex commented Feb 22, 2016

Temporary fix is in jruby/joni@f23bffa by making SLOW_IC stateless and moving the cached buffer into matcher instance. In the future we need to get rid of those stateful caches and IntHolder by making them local.
@headius @enebo No idea why it might behave as a regression, this was always broken.

@lopex
Copy link
Contributor

lopex commented Feb 23, 2016

Commit that revealed that bug in EXACT_IC: jruby/joni@eda1681

@headius
Copy link
Member

headius commented Feb 23, 2016

@lopex Perhaps the caching of Regex was not as good and something got fixed around these releases? That would trigger more sharing of this field, wouldn't it?

@lopex
Copy link
Contributor

lopex commented Feb 23, 2016

@headius jruby/joni@eda1681 changed the way heuristics applied fast skipping / fast failing by favouring EXACT_IC for this very case. EXACT_IC had a buffer as an instance field and since it itself is scoped by Regexp instance, the buffer essentially turned out to be Regexp global (in case of a literal regexp). The fix just moved that buffer into Matcher instance for potential reuse when jumping in/out of bytecode interpreter. So caching of regexps never had any effect on this.

@headius
Copy link
Member

headius commented Feb 24, 2016

I updated ruby-2.3 to use new joni, but we'll need a release for 9.1.

@headius
Copy link
Member

headius commented Feb 24, 2016

Oh, and we'll need to apply it to 1.7 of course.

@ncuesta
Copy link

ncuesta commented Mar 2, 2016

Hi all, we are experiencing this same issue. Which is the recommended workaround until new versions including a fix for this are released? Should we downgrade to 9.0.4.0 (or 1.7.23 for the 1.7 branch)?

Thanks

@enebo
Copy link
Member

enebo commented Mar 2, 2016

@ncuesta downgrading I think is the safest bet until we put out new releases. We will hopefully have a new release out in the next week to 2 weeks at most. At this point we are tidying up after the last dev cycle...

@ncuesta
Copy link

ncuesta commented Mar 2, 2016

Thanks @enebo!

@colinsurprenant
Copy link
Contributor

we also started to notice this in the new logstash 2.3 release which includes 1.7.24, in useragent filter logstash-plugins/logstash-filter-useragent#25 and we are tracking this regex issue in elastic/logstash#4977

@enebo
Copy link
Member

enebo commented Apr 11, 2016

joni has been updated and released for a few weeks with this addressed. Closing.

@enebo enebo closed this as completed Apr 11, 2016
@MartinKoerner
Copy link

Will there be a new 9.0.x release for this fix or do we have to wait until 9.1?

The JDBC MSSQL adapter code is full of //i Regex expressions, which produce 'nice' exceptions like 'no columns for table: ' or 'undefined method `[]' for nil:NilClass'

@enebo
Copy link
Member

enebo commented Apr 18, 2016

@MartinKoerner There will only be a 9.1 release which should be this week...

@enebo
Copy link
Member

enebo commented May 11, 2016

@DzmitryNikitsin I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com

@DzmitryNikitsin
Copy link

Yes, issue was fixed on 9.1.0.0.

Thanks,
Dzmitry

From: Thomas E Enebo notifications@github.com
Reply-To: jruby/jruby reply@reply.github.com
Date: Wednesday, May 11, 2016 at 2:30 PM
To: jruby/jruby jruby@noreply.github.com
Cc: Dzmitry Nikitsin dzmitry.nikitsin@gmail.com, Comment comment@noreply.github.com
Subject: Re: [jruby/jruby] Error when matching regex in multiple threads (#3670)

I see this comment sent to me as email but not in github issues. Was this
comment withdrawn?

On Fri, May 6, 2016 at 9:15 PM, DzmitryNikitsin notifications@github.com
wrote:

@enebo https://github.com/enebo i've just updated to 9.1.0.0, using
regexp match with multithreading, still seeing the same issue :(


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#3670 (comment)

blog: http://blog.enebo.com twitter: tom_enebo
mail: tom.enebo@gmail.com


You are receiving this because you commented.
Reply to this email directly or view it on GitHub

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants