Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

undefined "close_on_exec=" for TCPSocket #4858

Closed
HoneyryderChuck opened this issue Nov 18, 2017 · 20 comments
Closed

undefined "close_on_exec=" for TCPSocket #4858

HoneyryderChuck opened this issue Nov 18, 2017 · 20 comments
Labels

Comments

@HoneyryderChuck
Copy link

Environment

  • JRuby versions: 9.1.13.0 and 9.1.14.0
  • Operating system and platform: Darwin Macintosh.lan 16.7.0 Darwin Kernel Version 16.7.0: Wed Oct 4 00:17:00 PDT 2017; root:xnu-3789.71.6~1/RELEASE_X86_64 x86_64

I'm getting the following error when using the "close_on_exec" APIs in TCP Server/Sockets:

require 'socket'
s = TCPServer.new("localhost", 9999)
s.close_on_exec = true
s.close_on_exec = true
# NotImplementedError: close_on_exec=
# 	from org/jruby/RubyIO.java:2172:in `close_on_exec='

At first I just thought this was one of the MRI/JRuby differences, but then I saw there are tests implemented for this in jruby, and the "did you mean" functionality throws out some weird errors when I'm pretty close to the method name:

s.close_on_exec
irb(main):006:0> s.close_on_exec
# NoMethodError: undefined method `close_on_exec' for #<TCPServer:fd 20>
# Did you mean?  close_on_exec?
#              close_on_exec=
#	from (irb):6:in `<eval>'
s.close_on_exec? # this should actually be defined, this is also a bug
# NotImplementedError: close_on_exec=
#	from org/jruby/RubyIO.java:2178:in `close_on_exec?'
#	from (irb):7:in `<eval>'

From what I understood, this should be implemented, and it's important for the "exec and keep IOs open in child process" functionality, which jruby should support.

@headius
Copy link
Member

headius commented Nov 18, 2017

The default behavior for all IO streams on JRuby is to not close on exec. MRI defaults the other way.

The main complexity here is that we don't have a way to implement close_on_exec= for IO streams provided by the JDK. We can set it for streams we create with native file descriptors on our own (including process streams, stdio streams, pipes, and some files) but we decided having half the streams CLOEXEC and half not would be worse than not doing it for any of them.

We can certainly open up the close_on_exec= method, but should we error or warn when a stream can't be set to CLOEXEC?

The other methods are simple enough to add.

@headius
Copy link
Member

headius commented Nov 18, 2017

FWIW, MRI does not implement these methods on all platforms; only on those which support CLOEXEC. I think that would exclude Windows at least, but not sure about others.

@headius headius added this to the JRuby 9.1.15.0 milestone Nov 19, 2017
@headius headius added the core label Nov 19, 2017
@HoneyryderChuck
Copy link
Author

The default behavior for all IO streams on JRuby is to not close on exec.

That's actually good to know, that's my intended case.

@HoneyryderChuck
Copy link
Author

and thx for the fix :)

@HoneyryderChuck
Copy link
Author

The default behavior for all IO streams on JRuby is to not close on exec.

@headius I added a script below which doesn't really show a tcp socket being preserved after exec. This is a small reproduction of the larger problem that I have: passing listener fds in the env, exec and keep the reference.

 require "socket"
 server = TCPServer.new("localhost", 9999)
 
 if ENV.key?("SOCKID")
   fd = ENV["SOCKID"].to_i
   puts "we have a winner: #{fd}"
   s = TCPServer.for_fd(fd)
   puts "server: #{s.inspect}"
 else
   ENV["SOCKID"] = server.fileno.to_s
   exec($0)
 end
#######
# we have a winner: 20
# IOError: closed stream

@headius
Copy link
Member

headius commented Nov 21, 2017

Hmm...perhaps I was wrong about the JDK not setting CLOEXEC. I don't believe it used to, but they may have added it.

Grepping the jdk sources, it looks like the JDK will set all of the following cloexec:

  • Streams set up for processes; the JDK treats these as files and does not expose them to the user directly.
  • zlib files; this will not usually affect us because we use a pure-Ruby zlib port (jzlib).
  • Perhaps files; I did not see CLOEXEC in the places I see files opened, but I saw other comments that lead me to believe files are also CLOEXEC.
  • Perhaps all streams. The JDK may have a define somewhere for standard file modes to apply.

So the issue of wrapping the JDK socket classes becomes more of a problem. If they can't be unwrapped to a file descriptor, and we can't change their cloexec flag, they'll always go away.

In JRuby, the following streams will usually be "truly native" and allow you to unset CLOEXEC:

(This applies to UNIX environments only)

  • Files created through the normal means. Notable, Tempfile is not pure-native.
  • Process streams
  • Pipes
  • UNIX sockets
  • stdio

Because of the depth of work required to support the entire socket library, we delegate most of that to the JDK, which leads to our current predicament.

@headius headius reopened this Nov 21, 2017
@headius
Copy link
Member

headius commented Nov 21, 2017

Reopen; the issue is fixed for streams that are backed by a native file descriptor, but TCPSocket and most of the other socket classes are not native and will not respond to close_on_exec=.

@HoneyryderChuck
Copy link
Author

I've repeated the script using the UNIXServer class, and it throws the Errno::EBADF error. This might mean that CLOEXEC isn't set by default as the assumption was. I forgot also to mention my environment: Mac OS X 10.12.6, java 1.8.0_25.

@headius
Copy link
Member

headius commented Nov 29, 2017

CLOEXEC can be implemented on a piecemeal basis for the streams which are native, as listed above. Since we can't fix it for all cases all the time, perhaps you can illustrate which cases are important to you for this issue?

headius added a commit that referenced this issue Nov 29, 2017
@headius
Copy link
Member

headius commented Nov 29, 2017

With 12bb2fa, we should now support CLOEXEC for all streams which we back with a "real" file descriptor, which includes most items I list in #4858 (comment).

At this time I do not believe IO.pipe produces native streams, but it could. It may have to move to FFI on Java 9 anyway due to the impossibility of digging out the real file descriptors. Currently, if we can dig it out reflectively, it will work with CLOEXEC. It will not otherwise.

Tempfile is the other notable exception, along with all IO on Windows or platforms where we don't have FFI support. If we can't dig the file descriptor out reflectively, CLOEXEC will fail to work.

headius added a commit that referenced this issue Dec 1, 2017
* Port 2015 redirect logic for popen.
* Improve cloexec for subprocess IO.
* Make cloexec work for any streams we can get realFileno for.
** This allows it to work for JVM-provided pipes if they can be
   unwrapped.

See #4858
headius added a commit that referenced this issue Dec 1, 2017

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.
See #4858
headius added a commit that referenced this issue Dec 5, 2017
headius added a commit that referenced this issue Dec 5, 2017
* Port 2015 redirect logic for popen.
* Improve cloexec for subprocess IO.
* Make cloexec work for any streams we can get realFileno for.
** This allows it to work for JVM-provided pipes if they can be
   unwrapped.

See #4858
@headius
Copy link
Member

headius commented Dec 5, 2017

I'm going to call the original issue here fixed, since close_on_exec= and friends now exist and will usually do the right thing. I also have it set up to warn if a stream can't be set to cloexec, which will effect systems where reflection is locked down (stock Java 9, secured environments, etc) or where native IO doesn't work properly yet (secured environments, unsupported platforms, Windows).

I'll file a separate issue about native Pipe and cloexec.

@headius headius closed this as completed Dec 5, 2017
@headius
Copy link
Member

headius commented Dec 5, 2017

FYI, at least one MRI-provided cloexec test passes, after I spent a lot of time fiddling with cloexec logic. So the testing side of this is largely covered.

@HoneyryderChuck
Copy link
Author

I finally had time to look on the issue, and I saw that 9.1.15.0 was released, thx!

Testing my script, I see that the unix sockets now live beyond exec, and that's cool! I tried it again with tcp sockets, as I wasn't sure whether that was addressed by your fixes, and still doesn't work, but I think that you said that was a limitation of the JDK sockets and access to the real fd.

What confused me was your statement:

I also have it set up to warn if a stream can't be set to cloexec,...

This is still a TODO, right? As this should have warned me that this operation is impossible with tcp sockets, and it didn't.

@ketan
Copy link
Member

ketan commented Dec 19, 2017

This seems to have broken selenium-webdriver (3.7.0) and childprocess (0.8.0).

https://github.com/SeleniumHQ/selenium/blob/selenium-3.7.0/rb/lib/selenium/webdriver/common/socket_lock.rb#L61-L62

https://github.com/enkessler/childprocess/blob/0bce27d3f84ffac06ee49aa4dcb9ce8608cc49d3/lib/childprocess.rb#L153-L156

This combination raises an exception (on windows) —

NotImplementedError: fcntl unsupported or native support failed to load; see http://wiki.jruby.org/Native-Libraries
org/jruby/RubyIO.java:2206:in `close_on_exec='
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/childprocess-0.8.0/lib/childprocess.rb:155:in `close_on_exec'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/socket_lock.rb:62:in `can_lock?'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/socket_lock.rb:50:in `lock'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/socket_lock.rb:36:in `locked'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/service.rb:69:in `start'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/ie/driver.rb:51:in `initialize'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/driver.rb:42:in `for'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver.rb:84:in `for'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/spec/javascripts/support/jasmine_with_selenium_runner.rb:25:in `initialize'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/spec/javascripts/support/jasmine_helper.rb:41:in `block in (root)'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/jasmine-2.8.0/lib/jasmine/ci_runner.rb:20:in `run'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/jasmine-2.8.0/lib/jasmine/tasks/jasmine.rake:55:in `block in (root)'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/task.rb:251:in `block in execute'
org/jruby/RubyArray.java:1735:in `each'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/task.rb:251:in `execute'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/task.rb:195:in `block in invoke_with_call_chain'
C:/go-agent-17.11.0/pipelines/build-windows-PR/tools/jruby/lib/ruby/stdlib/monitor.rb:214:in `mon_synchronize'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/task.rb:188:in `invoke_with_call_chain'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/task.rb:181:in `invoke'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:160:in `invoke_task'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:116:in `block in top_level'
org/jruby/RubyArray.java:1735:in `each'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:116:in `block in top_level'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:125:in `run_with_threads'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:110:in `top_level'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:83:in `block in run'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:186:in `standard_exception_handling'
C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/rake-12.2.1/lib/rake/application.rb:80:in `run'
./bin/rake:4:in `<main>'
Tasks: TOP => jasmine:ci
NoMethodError: undefined method `poll_for_exit' for nil:NilClass
                stop at C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/service.rb:78
      block in start at C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/service.rb:67
  block in exit_hook at C:/go-agent-17.11.0/pipelines/build-windows-PR/server/webapp/WEB-INF/rails.new/vendor/bundle/jruby/2.3.0/gems/selenium-webdriver-3.7.0/lib/selenium/webdriver/common/platform.rb:136

@headius
Copy link
Member

headius commented Dec 20, 2017

@HoneyryderChuck Can you give me a simple example that shows TCPSocket is not staying alive into the child?

@ketan Ugh, ok. Windows has almost no native IO, so it should be just warning or no-op'ing the cloexec call. Can you open a separate issue for that please, and we'll fix it for 9.1.16.0?

@headius
Copy link
Member

headius commented Dec 20, 2017

Reopen while we investigate lingering issues with TCPSocket.

@HoneyryderChuck
Copy link
Author

@headius just running the script I pasted previously.

@headius
Copy link
Member

headius commented Jan 24, 2018

@HoneyryderChuck Ok, what I can see is the following:

  • TCPServer/Socket.for_fd is not fully functional in JRuby. There's no way to reproduce a JDK Server/SocketChannel given a file descriptor. You should be able to use the descriptor directly. This lack of support may make the rest of the discussion moot.
  • I can see the close_on_exec= logic does actually call fcntl on the JDK socket's fd, and I can see it do appropriate logic for both true and false. However, it doesn't seem like it has any effect. The fd is available and open in the subprocess either way.

So the real issues that remain are that setting CLOEXEC on a JDK socket fd does not appear to change its lifecycle after exec, and that JDK-originated streams can't be "for_fd" in a subprocess, because we have no way to reconstitute the JDK-provided object that wraps them.

@headius
Copy link
Member

headius commented Jan 24, 2018

The issue @ketan mentions is fixed for 9.1.16.

@headius
Copy link
Member

headius commented Jan 24, 2018

The original issue is fixed to my satisfaction, since close_on_exec and friends are implemented for sockets now. I've filed separate issues for the remaining problems: #5008 and #5009.

@headius headius closed this as completed Jan 24, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants