Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory leak in JRuby 1.7.25 #4232

Closed
JasonLunn opened this issue Oct 17, 2016 · 7 comments
Closed

Memory leak in JRuby 1.7.25 #4232

JasonLunn opened this issue Oct 17, 2016 · 7 comments

Comments

@JasonLunn
Copy link
Contributor

JasonLunn commented Oct 17, 2016

Environment

Provide at least:

  • JRuby 1.7.25
  • Tomcat 7
  • AWS Linux Linux 4.1.17-22.30.amzn1.x86_64

Other relevant info you may wish to add:

  • httparty 0.13.7
  • Rails 4.2.7

Expected Behavior

  • JRuby 1.7.25 would be a suitable upgrade for an application that was already running JRuby 1.7.19. The application is a high throughput (~60K/request per minute) HTTPS Proxy

Actual Behavior

  • The application began exhibiting OoME's ~10 minutes after startup following an upgrade from JRuby 1.1.19 to 1.7.25 - in a Tomcat process allocated 10 GB of RAM. Reverting only the JRuby upgrade eliminated the problem. This stack trace was common in our log:
Java::JavaLang::OutOfMemoryError (GC overhead limit exceeded):
   sun.security.x509.RDN.<init>(RDN.java:254)
   sun.security.x509.X500Name.parseDER(X500Name.java:803)
   sun.security.x509.X500Name.<init>(X500Name.java:317)
   javax.security.auth.x500.X500Principal.<init>(X500Principal.java:212)
   org.bouncycastle.jcajce.provider.asymmetric.x509.X509CertificateObject.getSubjectX500Principal(Unknown Source)
   org.jruby.ext.openssl.x509store.X509AuxCertificate.getSubjectX500Principal(X509AuxCertificate.java:206)
   org.jruby.ext.openssl.x509store.Certificate.matches(Certificate.java:56)
   org.jruby.ext.openssl.x509store.Store.matchedObject(Store.java:315)
   org.jruby.ext.openssl.x509store.Store.addCertificate(Store.java:288)
   org.jruby.ext.openssl.x509store.Lookup.loadCertificateOrCRLFile(Lookup.java:331)
   org.jruby.ext.openssl.x509store.Lookup$ByFile.call(Lookup.java:525)
   org.jruby.ext.openssl.x509store.Lookup$ByFile.call(Lookup.java:508)
   org.jruby.ext.openssl.x509store.Lookup.control(Lookup.java:154)
   org.jruby.ext.openssl.x509store.Lookup.loadFile(Lookup.java:123)
   org.jruby.ext.openssl.x509store.Store.setDefaultPaths(Store.java:371)
   org.jruby.ext.openssl.X509Store.set_default_paths(X509Store.java:181)
   org.jruby.ext.openssl.X509Store$INVOKER$i$0$0$set_default_paths.call(X509Store$INVOKER$i$0$0$set_default_paths.gen)
   org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
   org.jruby.ast.CallNoArgNode.interpret(CallNoArgNode.java:60)
   org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
   org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
   org.jruby.ast.IfNode.interpret(IfNode.java:118)
   org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
   org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
   org.jruby.ast.IfNode.interpret(IfNode.java:116)
   org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
   org.jruby.ast.BlockNode.interpret(BlockNode.java:71)
   org.jruby.ast.IfNode.interpret(IfNode.java:116)
   org.jruby.ast.NewlineNode.interpret(NewlineNode.java:105)
   org.jruby.evaluator.ASTInterpreter.INTERPRET_METHOD(ASTInterpreter.java:74)
   org.jruby.internal.runtime.methods.InterpretedMethod.call(InterpretedMethod.java:225)
   org.jruby.internal.runtime.methods.DefaultMethod.call(DefaultMethod.java:222)

as was this one:

Java::JavaLang::OutOfMemoryError (GC overhead limit exceeded):
   java.util.Arrays.copyOf(Arrays.java:2367)
   java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
   java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
   java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:415)
   java.lang.StringBuilder.append(StringBuilder.java:132)
   sun.security.x509.AVA.toRFC2253CanonicalString(AVA.java:1003)
   sun.security.x509.RDN.toRFC2253StringInternal(RDN.java:452)
   sun.security.x509.RDN.toRFC2253String(RDN.java:431)
   sun.security.x509.X500Name.getRFC2253CanonicalName(X500Name.java:730)
   sun.security.x509.X500Name.equals(X500Name.java:417)
   javax.security.auth.x500.X500Principal.equals(X500Principal.java:461)
   org.jruby.ext.openssl.x509store.Certificate.matches(Certificate.java:56)
   org.jruby.ext.openssl.x509store.Store.matchedObject(Store.java:315)
   org.jruby.ext.openssl.x509store.Store.addCertificate(Store.java:288)
   org.jruby.ext.openssl.x509store.Lookup.loadCertificateOrCRLFile(Lookup.java:331)
   org.jruby.ext.openssl.x509store.Lookup$ByFile.call(Lookup.java:525)
   org.jruby.ext.openssl.x509store.Lookup$ByFile.call(Lookup.java:508)
   org.jruby.ext.openssl.x509store.Lookup.control(Lookup.java:154)
   org.jruby.ext.openssl.x509store.Lookup.loadFile(Lookup.java:123)
   org.jruby.ext.openssl.x509store.Store.setDefaultPaths(Store.java:371)
   org.jruby.ext.openssl.X509Store.set_default_paths(X509Store.java:181)
   org.jruby.ext.openssl.X509Store$INVOKER$i$0$0$set_default_paths.call(X509Store$INVOKER$i$0$0$set_default_paths.gen)
   org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
   rubyjit.HTTParty::ConnectionAdapter$$attach_ssl_certificates_56479577896ae365ba9c44e161ae36f76c232352950878365.__file__(httparty (0.13.7) lib/httparty/connection_adapter.rb:150)
   rubyjit.HTTParty::ConnectionAdapter$$attach_ssl_certificates_56479577896ae365ba9c44e161ae36f76c232352950878365.__file__(httparty (0.13.7) lib/httparty/connection_adapter.rb)
   org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:221)
   org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:202)
   rubyjit.HTTParty::ConnectionAdapter$$connection_2f8c751e55673d8db359002d68867514cd0f2b91950878365.__file__(httparty (0.13.7) lib/httparty/connection_adapter.rb:82)
   rubyjit.HTTParty::ConnectionAdapter$$connection_2f8c751e55673d8db359002d68867514cd0f2b91950878365.__file__(httparty (0.13.7) lib/httparty/connection_adapter.rb)
   org.jruby.internal.runtime.methods.JittedMethod.call(JittedMethod.java:141)   org.jruby.runtime.callsite.CachingCallSite.call(CachingCallSite.java:134)
   rubyjit.HTTParty::ConnectionAdapter$$call_55dd73ca703374cd0280323f30d805d8f1365811950878365.__file__(httparty (0.13.7) lib/httparty/connection_adapter.rb:58)

I understand that the site where the OoME exception may not highly correlate to where the leak is happening, but this Rails up does very little else beside take incoming HTTPS requests, make outbound HTTPS requests, and relay the results, so it seems likely that the issue is somewhere near the SSL integration.

Trivia

I attempted to follow the suggestions for profiling JRuby with a trivial example:

jruby -J-Djruby.reify.classes=true -rhttparty -e 'puts "Sleeping while you get the pid"; sleep 30; puts "Game on"; threads = []; 1000.times { threads << Thread.new { HTTParty.get "https://example.com"; nil } }; puts "Joining"; threads.each { |t| t.join }; puts "Sleeping before niling threads"; sleep 30; threads = nil; puts "Sleeping again"; sleep 60'

I looked for objects that appeared in multiples of the 1000 threads that I used on the command line above. On 1.7.25, there appear to be a bunch that persist as long as the threads that ran the request exist, and vanish when the the threads array are nil'ed, suggesting that they are thread local:

jmap -histo:live `jps -l | grep jruby | cut -f1 -d\ ` | grep 000
  64:          1000         112000  sun.nio.ch.SocketChannelImpl
 123:          2000          48000  java.net.InetSocketAddress$InetSocketAddressHolder
 132:          1000          40000  org.jruby.RubyContinuation$Continuation
 133:          1000          40000  sun.nio.ch.SocketAdaptor
 155:          2000          32000  java.net.InetSocketAddress
 156:          1000          32000  [Ljava.nio.channels.SelectionKey;
 157:          1000          32000  sun.security.ssl.SSLSessionContextImpl
 179:          1000          24000  com.sun.crypto.provider.TlsMasterSecretGenerator$TlsMasterSecretKey

I can't reproduce similar instance counts on 1.7.19.

@headius
Copy link
Member

headius commented Oct 18, 2016

These instance counts do indeed look strange. What we really need to see is a heap dump, so we can find these leaking objects and figure out why they are not being collected.

You can get a heap dump using jmap. Ideally, it would not be a 10GB dump...hopefully something smaller (like in the 1-2GB range) would still clearly show the leaking objects.

I'll have a look at the socket subsystem in 1.7 to see if anything obvious shows up.

@headius
Copy link
Member

headius commented Oct 18, 2016

Another thing that would obviously help is if you could try to bisect this problem to at least a specific JRuby version, if not a specific revision.

I do not see anything obvious looking at socket logic yet.

@JasonLunn
Copy link
Contributor Author

Attached is a heap.bin created via jhat against jruby-1.7.25 running the command line script above after joining is complete but before nil'ing the thread array.
heap.bin.gz

I have a 1.1G snapshot taken, using YourKit, against the running Tomcat instance before we pulled it out of service. It potentially has our customer data in it which presents some challenges. Would you be comfortable with signing an NDA? If that's not feasible, maybe we could arrange a time for a remote screen share, or if you know what you're looking for you could tell us what you need and we can go get it for you.

@JasonLunn
Copy link
Contributor Author

(FWIW, this heap.bin was created on OS X El Capitan, not on AWS Linux)

@headius
Copy link
Member

headius commented Mar 15, 2017

Hmm...got sidetracked and lost track of this one. Can we pick it up again? I assume it is still a problem for you with 1.7.26, but that would be good to confirm if you have not already.

@JasonLunn
Copy link
Contributor Author

Thanks to 9.1.7.0, last week we were able to deploy 9K in production without having to revert back to the 1.7.x series as we had had to do with all the previous 9K releases we had tried. At this point I am no longer concerned about 1.7.x's behavior, so if you want to close this I'm okay with that.

@headius
Copy link
Member

headius commented Mar 15, 2017

Ok, I'll close this. We've had no other reports of this kind of leak in 1.7.25, and not much to go on for leads, so let's hope you were just lucky :-)

@headius headius closed this as completed Mar 15, 2017
@headius headius modified the milestones: Won't Fix, JRuby 1.7.27 Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants