Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UNIX Sockets raising Errno::ECONNRESET or EOFError ( 9.0.0.0 && 1.7.19 ) #2750

Closed
digitalextremist opened this issue Mar 24, 2015 · 27 comments
Closed

Comments

@digitalextremist
Copy link
Contributor

Breakage proven by the break-unix-sockets branch of Reel which were being held back until all rubies could support UNIX Socket connections properly.

The test which passes under rubinius and MRI, under jRuby fails with:

Failures:

1) Reel::Server::UNIX allows connections over UNIX sockets
Failure/Error: response = Net::HTTPResponse.read_new(sock)
Errno::ECONNRESET:
  Connection reset by peer - Connection reset by peer
# org/jruby/RubyIO.java:2858:in `read_nonblock'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:141:in `rbuf_fill'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:122:in `readuntil'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:132:in `readline'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2571:in `read_status_line'
# RVM/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2560:in `read_new'
# ./spec/reel/unix_server_spec.rb:28:in `(root)'
# RVM/rubies/jruby-1.7.19/lib/ruby/shared/tmpdir.rb:0:in `create'
# ./spec/reel/unix_server_spec.rb:21:in `(root)'
# org/jruby/RubyBasicObject.java:1562:in `instance_exec'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:177:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:474:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:612:in `run_around_example_hooks_for'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/hooks.rb:474:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:343:in `with_around_example_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:385:in `with_around_and_singleton_context_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example.rb:174:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:548:in `run_examples'
# org/jruby/RubyArray.java:2412:in `map'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:544:in `run_examples'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/example_group.rb:512:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:110:in `run_specs'
# org/jruby/RubyArray.java:2412:in `map'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:110:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/configuration.rb:1526:in `with_suite_hooks'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:109:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/reporter.rb:62:in `report'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:108:in `run_specs'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:86:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:70:in `run'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/lib/rspec/core/runner.rb:38:in `invoke'
# RVM/gems/jruby-1.7.19/gems/rspec-core-3.2.2/exe/rspec:4:in `(root)'

This issue is the only issue remaining in 0.6.0 of our release, and we'd be very excited to include UNIX Socket servers after a year or so of holding that functionality back.

@enebo
Copy link
Member

enebo commented Mar 24, 2015

marked against 1.7.20 so we do not forget to evaluate what is wrong here before next release...

@digitalextremist
Copy link
Contributor Author

Thanks @enebo. Short of brushing up on my JAVA is there any way I could perhaps troubleshoot this further and try to help you guys surround it?

@enebo
Copy link
Member

enebo commented Mar 24, 2015

@digitalextremist if you could try JRuby 9.0.0.0pre1 and see if it works there it would help. Our IO subsystem was re-written and it would be good to know if we potentially have one or two problems.

@digitalextremist
Copy link
Contributor Author

@enebo it's definitely not working with 9.0.0.0 either.

I've tested with -SNAPSHOT ... you can see both strains failing here:

https://travis-ci.org/celluloid/reel/builds/55636058

@digitalextremist
Copy link
Contributor Author

For the record: you'll notice jruby-openssl is also failing under 9.0.0.0 but that's tangential.

Important tidbit:

  • 1.7.19 fails with Errno::ECONNRESET
  • 9.0.0.0 fails with EOFError

Both fail at the same call though, at different locations per version:

  • org/jruby/RubyIO.java:2858:in read_nonblock 1.7.19
  • org/jruby/RubyIO.java:2768:in read_nonblock 9.0.0.0

Both of those are failing at this line in the test: response = Net::HTTPResponse.read_new(sock)

Here is the complete test:

@digitalextremist
Copy link
Contributor Author

@enebo I think it's raising an exception near here for 9.0.0.0:

It's finding ret to be nil after getPartial

@digitalextremist
Copy link
Contributor Author

@enebo and for 1.7.19 it seems like it's here, not sure why ECONNRESET though:

I'm sure it's in/after/during read_nonblock but not sure why there's different behavior for each.

@digitalextremist
Copy link
Contributor Author

The only place where I see ECONNRESET happen is related to UDP sockets.

@digitalextremist
Copy link
Contributor Author

The extremely confusing thing is that we're wrapping the calls in a rescue covering both those.

@digitalextremist digitalextremist changed the title UNIX Sockets raising Errno::ECONNRESET under 1.7.19 UNIX Sockets raising Errno::ECONNRESET or EOFError ( 9.0.0.0 && 1.7.19 ) Mar 24, 2015
@enebo
Copy link
Member

enebo commented Mar 24, 2015

@digitalextremist This could be as simple as something we a missing in unix domain socket support causing getPartial to return nil. I am pretty sure we bypass Java and use our native callouts for uds.

@digitalextremist
Copy link
Contributor Author

@enebo, right on. Is there a crash-test-dummy level of exposure I could get in actually attempting to modify the Java and test that on-the-fly without needing to rebuild jruby every time I modify a file?

@digitalextremist
Copy link
Contributor Author

@enebo, @headius I've picked up a further down issue that's "cascading" into the ones I've shown, because I could see those failures. This one I had to dig to find:

Entire chain for 1.7.19 ...

ArgumentError: mode not supported for this object: r
    org/nio4r/Nio4r.java:172:in `register'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:43:in `wait'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:22:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io.rb:53:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/unix_server.rb:19:in `accept'
    /home/de/FOSS/reel/lib/reel/server.rb:49:in `run'
    org/jruby/RubyKernel.java:1507:in `loop'
    /home/de/FOSS/reel/lib/reel/server.rb:47:in `run'
    org/jruby/RubyKernel.java:1958:in `public_send'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:26:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:137:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:60:in `invoke'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:71:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/actor.rb:357:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/tasks.rb:57:in `initialize'
    /home/de/FOSS/celluloid/lib/celluloid/tasks/task_fiber.rb:14:in `create'
Errno::ECONNRESET: Connection reset by peer - Connection reset by peer
     read_nonblock at org/jruby/RubyIO.java:2858
     read_nonblock at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/forwardable.rb:201
         rbuf_fill at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:141
         readuntil at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:122
          readline at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/protocol.rb:132
  read_status_line at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2571
          read_new at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/1.9/net/http.rb:2560
        __ensure__ at 180.rb:51
            (root) at 180.rb:45
            create at /home/de/.rvm/rubies/jruby-1.7.19/lib/ruby/shared/tmpdir.rb:0
            (root) at 180.rb:44

Entire chain for 9.0.0.0-pre ...

ArgumentError: mode not supported for this object: r
    org/nio4r/Nio4r.java:172:in `register'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:43:in `wait'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/reactor.rb:22:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io.rb:53:in `wait_readable'
    /home/de/FOSS/celluloid-io/lib/celluloid/io/unix_server.rb:19:in `accept'
    /home/de/FOSS/reel/lib/reel/server.rb:49:in `run'
    org/jruby/RubyKernel.java:1300:in `loop'
    /home/de/FOSS/reel/lib/reel/server.rb:47:in `run'
    org/jruby/RubyKernel.java:1832:in `public_send'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:26:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/calls.rb:137:in `dispatch'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:60:in `invoke'
    /home/de/FOSS/celluloid/lib/celluloid/cell.rb:71:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/actor.rb:357:in `task'
    /home/de/FOSS/celluloid/lib/celluloid/tasks.rb:57:in `initialize'
    /home/de/FOSS/celluloid/lib/celluloid/tasks/task_fiber.rb:14:in `create'
EOFError: No message available
               read_nonblock at org/jruby/RubyIO.java:2751
               read_nonblock at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/forwardable.rb:183
                   rbuf_fill at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:153
                   readuntil at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:134
                    readline at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/protocol.rb:144
            read_status_line at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/http/response.rb:39
                    read_new at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/net/http/response.rb:28
  180.rb_CLOSURE_2__180.rb_1 at 180.rb:51
                      create at /home/de/.rvm/rubies/jruby-9.0.0.0.pre1-pre1/lib/ruby/stdlib/tmpdir.rb:146
                  __script__ at 180.rb:44

headius added a commit to jnr/jnr-enxio that referenced this issue Mar 24, 2015
headius added a commit to jnr/jnr-unixsocket that referenced this issue Mar 24, 2015
@headius
Copy link
Member

headius commented Mar 25, 2015

I've pushed revisions to jnr-enxio and jnr-unixsocket that modifies both to allow READ among the select operations for server sockets. I've also pushed a change to jruby-1_7 to update to these snapshot versions.

Let me know how it goes!

@digitalextremist
Copy link
Contributor Author

Alright! Well I built a custom jruby, 1.7.20-SNAPSHOT and mounted it in rvm. Thank you very much for the patched release @headius. I'm getting the same error though.

ArgumentError: mode not supported for this object: r

You said perhaps nio4r or even Celluloid::IO might be misconfiguring the socket, but it was unlikely. How can I test that? From what you did, it ought to be readable, correct? Are there any setsockopt configurations we need to do?

Thank you for giving so much of your time today. We really, really appreciate it.

It was really cool to build, mount, and run my own jruby binary.

@digitalextremist
Copy link
Contributor Author

Awesome! Thanks @headius. I'll check this in a bit and pin Reel 0.6.0.pre1 to the version that fixes this. What point release will that be?

On March 26, 2015 4:45:38 AM PDT, Charles Oliver Nutter notifications@github.com wrote:

Closed #2750 via 052e0d0.


Reply to this email directly or view it on GitHub:
#2750 (comment)

@digitalextremist
Copy link
Contributor Author

@headius I think this closed automatically but we've tested it and it didn't work, right? Can this be reopened until it does pass?

@headius headius reopened this Mar 26, 2015
@digitalextremist
Copy link
Contributor Author

@headius, thank you sir.

@headius
Copy link
Member

headius commented Apr 2, 2015

Back on this one today...

@digitalextremist
Copy link
Contributor Author

@headius so happy to hear that. Thank you.

@headius
Copy link
Member

headius commented Apr 2, 2015

Ah-ha!

I believe the remaining issue is a bug in nio4r; it uses a selector from the wrong provider, and that error gets misinterpreted as a bad selection operation.

When I add printStackTrace to Nio4r.java:172, I get this:

java.nio.channels.IllegalSelectorException
    at sun.nio.ch.SelectorImpl.register(SelectorImpl.java:128)
    at java.nio.channels.spi.AbstractSelectableChannel.register(AbstractSelectableChannel.java:212)
    at java.nio.channels.SelectableChannel.register(SelectableChannel.java:280)
    at org.nio4r.Nio4r$Selector.register(Nio4r.java:170)

UNIX sockets in JRuby come from jnr-enxio, which has its own selector provider. Selectors and selectable channels must come from the same provider.

I'll see if I can come up with a patch for nio4r.

@digitalextremist
Copy link
Contributor Author

Awesome! Excited to see what you turn up next. Will be ready to check-in a nio4r patch. /cc: @tarcieri

@headius
Copy link
Member

headius commented Apr 2, 2015

This will take a bit more work than I'd hoped; nio4r needs to duplicate logic we have in JRuby for dealing with selectors from different providers.

@headius
Copy link
Member

headius commented Apr 2, 2015

In the interim I will test 1.7 with the updated jnr-unixsocket stuff and see if it has reduced to the same problem.

@tarcieri
Copy link

tarcieri commented Apr 2, 2015

@headius is there any kind of API we can standardize on to avoid the duplication?

@headius
Copy link
Member

headius commented Apr 2, 2015

I've confirmed the EOFError in 1.7 is now also caused by this illegal selector error. The difference in exception is probably due to 9k having recent ports of MRI's IO logic.

I'm going to resolve this as fixed, since jnr-unixsocket and jnr-enxio and jruby itself appear to be doing the right thing. We'll deal with the nio4r issue separately.

@headius
Copy link
Member

headius commented May 4, 2015

@tarcieri Should I file an issue about this, or shall we discuss realtime a bit more? In any case I'm closing this because JRuby should be doing the right thing if you use Ruby APIs, and the work to be done is in nio4r.

@headius headius closed this as completed May 4, 2015
@tarcieri
Copy link

tarcieri commented May 4, 2015

Maybe open an nio4r issue about this and we can discuss there. FWIW I feel like nio4r is somewhat coupled to JRuby internals, and maybe needs some APIs surfaced (even just in Java-land) to bind to for this sort of thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants