-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
File.stat sometimes fails with SystemCallError in JRuby 1.7.21 #3145
Comments
In talking to @rsim a little on irc I can see we have not updated jnr-posix since 1.7.20 and also this is dying trying to stat within jnr-posix. My only thought is we must be passing in filename differently? I know we changed how we canonicalize requred files and there were also changes mentioned in the previous comments. I think running with JRUBY_OPTS='-d -Xbacktrace.style=raw' might pick up some more info but since the only single extra data point is this might be happening accessing a file on a netapp appliance mounted to a linux machine. If I had to make a wild wild guess I would say we end up passing in the wrong pathname to stat which happens to cause the error. |
the file to test comes from a |
@mkristian I suspect that it is just the first gemspec and the first "LICENSE" file there and that If I won't be able to repeat this issue then I will try to ask this one customer if they could allow me to connect to their server and experiment with JRuby 1.7.21 and see if I can reproduce the issue with simple |
@enebo Digging more into if (stat == null) throw runtime.newErrnoFromInt(file.errno(), filename); So it seems that So I am wondering how to debug why Also I got more details from a different customer that had this issue and they do not have any network mounted filesystem - everything is on local filesystem. So my initial guess that this might be caused by a network mounted filesystem is wrong. I will try to add some debug messages to find out the current directory for the "LICENSE" file to see if it will help to find something strange. |
the currentDirectory should give some hints since https://github.com/jruby/jruby/blob/1.7.21/core/src/main/java/org/jruby/RubyFileStat.java#L122 where things did change between the two releases. |
Debugged more details and found out that the issue was not caused by changes in JRuby 1.7.21 but by changes in bundler between versions 1.9.x and 1.10.x and due to jnr-possix file stat error code for missing files. Here are the details. Found out that this "LICENSE" file was referenced from one private gem but during the build this file was not included in the application package. When we built the application with JRuby 1.7.21 we also used the latest bundler version 1.10.5 (but previously we used bundler 1.9.6). The latest bundler version during setup is calling And so the root cause is that In our application we will fix this issue by removing this invalid reference to "LICENSE" file in the gemspec for this private gem. Should this be tracked further as a jnr-possix bug that |
in the recent past there were quite a few issue with so my guess here is that on some linux the native ffi does not load for example when /tmp is not writable. jruby-complete.jar comes with the ".so" file packed but needs to copy it onto the filesystem before it can load it. I know some linux distributions (sabayon) make their /tmp read-only-root or so. could this explain the random behaviour here ? |
@rsim @mkristian I have seen weird returns from stat on Windows with native enabled when hitting UNC mounted filesystems. In that case the error return was not 0 but had really weird return values. I do not think we have figured out the best way to address this. If we fix it in jnr-posix it means we are deciding for all other projects that use it how we should error. If we fix it in JRuby then it means all other projects need to do mapping hacks if they do not want to cope with all the exotic return values. In any case, your problem is even stranger (at least to me)... @rsim I am really curious how this ends up as errno() 0. If as @mkristian is considering that native does not load I can actually see a code path where we can fail and return 0 from stat. In JavaLibCHelper any IOException thrown setting up the stat leads to a return 0 (note the TODOs in that part of the source). This looks super wrong, but I would love to get a reproduction on this before we change it so we can try and make some more tests. @rsim Could you open an issue on jnr-posix. I think @mkristian idea of jnr-posix native not loading might be worth investigating on the customer system. Having them type:
Should end up printing:
If it doesn't then we definitely have a problem with our pure-Java version. It also means there may be some additional issues to work through since they are failing to load native. |
I have one customer who reproduced this issue from the
and then they got irb(main):019:0> File.stat("missing file")
SystemCallError: Unknown error - Unknown Error (0) - missing file
from org/jruby/RubyFile.java:894:in `stat'
from (irb):19:in `evaluate' But now I don't know how to debug it further. When I tried to execute irb(main):001:0> File.stat("missing file")
Errno::ENOENT: No such file or directory - missing file
from org/jruby/RubyFile.java:894:in `stat'
from (irb):1:in `evaluate' Any other suggestions what can we try on the customer server to identify the cause for this? And it would be better if we could use |
@rsim ah from jruby-complete I don't think we support -X directly. Try:
|
@enebo When I execute
both on my Mac OS X and on my Ubuntu 12.04 then I do not see any output. Does it mean that I have pure-Java version of jnr-possix? Then pure-Java version |
@enebo Ah, when I execute $ java -Djruby.native.verbose=true -jar jruby-complete-1.7.21.jar -e "File.stat 'missing file'"
Successfully loaded native POSIX impl.
Errno::ENOENT: No such file or directory - missing file
stat at org/jruby/RubyFile.java:894
(root) at -e:1 then I see that native POSIX is loaded. Will ask to execute this also on our customer server. |
just realized we can switch off native via a cli:
so it is the native not loaded ! @rsim still wondering why native does not load for those customers. is /tmp writable for their java process ? (just thinking) |
@mkristian @enebo Got the response from the customer and as I see the reason for failing to load native POSIX implementation is failing to load missing
|
@mkristian @enebo Got more information from the customer which might provide a hint what is wrong. I asked them to list all libraries which have
When I execute this on my Ubuntu 12.04 then I get
So as I see they have |
@rsim this is very similar to #2913 (comment) I guess we should be able to load the libcrypt more reliable (IMO) |
@mkristian Yes, this is the same root cause and the same workaround worked. They have the following libcrypt* files:
After creating a symbolic link
native POSIX implementation is loaded and the correct Errno::ENOENT exception is raised:
So probably the solution would be to look not just for |
using the tar archive of oracle jdk on ubuntu does not setup the load library path for finding libcrypt (used by jnr-posix). this results in failed native support for jnr-posix. installing the jdk through some debian/ubuntu repositories do not show this problem. this patch also looks at /etc/ld.so.conf and /etc/ld.so.conf.d/* to setup the internal search path. with this the above combination of oracle jdk and ubuntu does find the native libraries for jnr-posix. fixes jruby/jruby#2913 and jruby/jruby#3145 probably also elastic/logstash#3127 (comment) Sponsored by Lookout Inc.
@mkristian is this still broken? I think we have updated jnr-ffi since then (or lib which deps on it)... |
Did this fix make it into jruby 1.7.24? I've recently got this following error: https://gist.github.com/atambo/acf070000c74f91466e7 When trying to run:
on a redhat 7.1 machine. |
When I run
|
@atambo A hunch: try |
@headius, when I run I get:
So it looks like jruby just doesn't look in /lib64. |
@headius (and anyone else interested) This is not the right place for this comment but I am adding it just so we start to think about it. Perhaps this path loading list we hardcode in jnr-ffi should be made to be separately releasable? I know it is in a maven artifact and I do not really want another artifact but every OS and version of OS seems to move this stuff around or have wrinkles. It is also temporal in nature. Where will a new Ubuntu store something? If we could just gem release (I am less interested in solving this on Java side) updated location info then we would not need to release the planet of jnr-* artifacts neccesarily. |
@enebo google says that redhat is using /etc/ld.so.conf and /etc/ld.so.conf.d/* as ubuntu/debian/mint are doing. there are no hardcoded paths in the code. so it is hard to say what is going wrong here not finding /lib64 would need to install redhat or centos for debugging |
@mkristian, when I look at
and when I look at the /etc/ld.so.conf.d/ directory I see nothing inside that directory. |
@atambo thanx for this info and then I am not surprised we can not find "any" library :P |
This probably should have been made into a second issue since @atambo issue is I am guessing different in how we process ld.so.conf files but I will just leave it open and advance to 1.7.26, |
@atambo when I use following Dockerfile
and execute
and do see following java.load.path
how did you install java ? I see something like |
I was using the IBM JRE so that may be the underlying reason. |
@atambo ok - let me see how to install the IBM JRE then :) |
We had a similar problem (JRuby 1.7.25 and 1.7.26) on RHEL 7 and after some debugging found out that on CentOS/RHEL /etc/ld.so.conf.d does not contain references to the standard library paths (e.g. /lib64, apparently the standard paths are hard-coded in the dynamic linker). Therefore, if one sets the java.library.path system property to some custom value, on RHEL JNR will only search that path for libraries, failing to find any and eventually resorting to the Java System.mapLibraryName call (see jnr.ffi.Platform and jnr.ffi.LibraryLoader). If the system property is not overridden, at least OpenJDK will populate it with the system default library paths and everything works. The posix interface tries to load "libc.so.6" and "crypt" -libraries, latter of which is present in the system with the name libcrypt.so.1. JNR search algorithm would normally find it, but in this case the mapLibraryName fall-back will produce the name "libcrypt.so" and loading the library and the native posix interface fails. |
I'm not sure what there is to fix here. Since the last reports, the libraries in question have been updated to do a better job of searching for needed libraries. I'm going to close this as invalid. If someone's still affected by this on 1.7 (or 9k) please open a new issue with your reproduction. |
For posterity, I resolved this same issue by removing the 'noexec' flag from a tmpdir mount. |
We have a downloadable application where we embed jruby-complete jar and in the latest version we updated JRuby from 1.7.20 to 1.7.21. After this update for several customers bundler initialization started to fail with
As I checked in this line
File.stat
on a "LICENSE" file from a gem is done and inRubyFile.java
the lineraises this
SystemCallError
. When we change jruby-complete from 1.7.21 to 1.7.20 then the application starts without an error.We are investigating what is common on these customer servers that might cause this error (as majority of customers do not have this problem). All these customers are using different Linux versions and different Java 7 or 8 versions. From one customer we learned that the application is installed on a NAS storage so I suspect that maybe the error appears when
File.stat
is done on a file that is located on a NAS mounted file system. Will updated this issue as I get more information.I am trying to find what changes might cause this error. I checked that jnr-posix version 3.0.12 is the same in JRuby 1.7.20 and 1.7.21.
@mkristian I was checking the commit log and was wondering that maybe one of these commits 0137667, a50803b or edac27b could cause this error in
File.stat
execution.The text was updated successfully, but these errors were encountered: