-
-
Notifications
You must be signed in to change notification settings - Fork 925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JRuby's impl of FFI differs in behavior from MRI; segfaults/crashes #4413
Comments
Looking into this a bit now, hopefully for 9.1.7.0. |
Ok, I'm going to need a little more help with this. I'd like to be able to reproduce, but getting this set up with appropriate paths to Vulkan is confusing. I'm also on Linux...so it may not even be possible to get it running before we need to push a 9.1.7.0 release. Two things could help me here:
The conversion error should be simple enough. Presumably the CRuby FFI just takes the Integer value and assumes you're giving it a valid pointer, while the JRuby version demands an actual MemoryPointer. Problem is, I can't see who would be raising it and I don't have a backtrace. Is it just trying to construct a struct? If you can get backtraces for the non-crashers, with and without The other crasher is likely similar; some value is not coercing properly, so we end up with a null pointer that doesn't manifest until Vulkan eventually tries to access that field of the struct. |
Error site detailsAs far as I can tell, the following is where our error happens: # Create a shader module from a SPIR-V file.
def load_shader( path )
bin = open( path, 'rb'){|io| io.read }
smci = VkShaderModuleCreateInfo.new(
sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
codeSize: bin.length,
pCode: bin.to_ptr # FIXME jruby validation sees NULL; MemoryPointer exposed in diag
)
# create it
shader = FFI::MemoryPointer.new( :pointer )
ok = vkCreateShaderModule( @logical_device, smci, nil, shader ) # KABOOM
raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
VkShaderModule.new( shader.read_pointer )
end The line with the Diag output
From it we can tell that a pointer of 1,281 bytes in length is created, which matches the same length reported by MRI. So that's good. The second comment, "KABOOM," is the site of the actual segmentation fault. For some reason the struct gets passed in with LunarG validation output
This, tied with the reference to Predefined test scriptAdded the branch jruby-hotfix which includes a version of the file that will configure itself (hopefully correctly) for both standard Windows installs and Linux, presuming the instructions for LunarG's SDK on Linux are correct and they appear in the /home of a user. You might have to substitute version numbers, however. If you can help me with producing the backtraces, I'll be happy to provide all details possible! JRuby prints the following (where "test/make_window.rb" is the original to the "preconf_test.rb" file):
|
Just did a quick diagnostic test. This is what's printed out after our VkShaderModuleCreateInfo struct is created:
For some reason, dumping its values is reflecting an Could that be what's causing our problem, or am I looking at the wrong thing? |
Your command line would need The pointer inside the struct probably is an FFI::Pointer. FFI::MemoryPointer#to_ptr returns this. But the handling of the struct when passed to native is not consuming FFI::Pointer properly. What does it do if you yank the actual pointer integer out and pass that for pCode? |
Seems the segfault is preventing the JVM from properly dumping the backtraces; the file contains only the usual header and the .TMP of the output is empty. It's definitely working otherwise, though, as there is a noticeable performance difference as the profiler runs. Using the corrected command line, when I swap the pointer to be a raw integer, I get the following. It fails to coerce the result of
That's the same issue I had first encountered where the constant The profiler appears to have output 119MB of data. The files are here:
|
Re-ran without the profiler to validate that the integer value is correct. It appears to be:
|
I'm confused...that memory address doesn't look valid, and it seems like it's still trying to create the struct using the MemoryPointer rather than the Integer value from your logging output. Can you show me the updated code for clarification? |
Original code: # Create a shader module from a SPIR-V file.
def load_shader( path )
bin = open( path, 'rb'){|io| io.read }
smci = VkShaderModuleCreateInfo.new(
sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
codeSize: bin.length,
pCode: bin.to_ptr # equivalent to FFI::MemoryPointer.from_string( bin )
)
# create it
shader = FFI::MemoryPointer.new( :pointer )
ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
VkShaderModule.new( shader.read_pointer )
end Updated code to use the integer value: # Create a shader module from a SPIR-V file.
def load_shader( path )
bin = open( path, 'rb'){|io| io.read }
ptr = bin.to_ptr # equivalent to FFI::MemoryPointer.from_string( bin )
loc = ptr.to_i # pull integer address
puts "INFO: creating VkShaderModuleCreateInfo with: #{ptr} at #{loc}"
smci = VkShaderModuleCreateInfo.new(
sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
codeSize: bin.length,
pCode: loc # pass in the integer address instead of the pointer; causes TypeError
)
puts "INFO: #{smci.to_h}"
# create it
shader = FFI::MemoryPointer.new( :pointer )
ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
VkShaderModule.new( shader.read_pointer )
end I'm still really unsure of why it's saying The long format of the code without my mixins would be: def load_shader( path )
bin = open( path, 'rb'){|io| io.read }
ptr = FFI::MemoryPointer.from_string( bin )
loc = ptr.to_i # pull integer address
puts "INFO: creating VkShaderModuleCreateInfo with: #{ptr} at #{loc}"
smci = VkShaderModuleCreateInfo.new()
smci[:sType] = :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
smci[:codeSize] = bin.length
smci[:pCode] = loc # <- TypeError is raised here
puts "INFO: #{smci.to_h}"
# create it
shader = FFI::MemoryPointer.new( :pointer )
ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
VkShaderModule.new( shader.read_pointer )
end |
Ok, so the TypeError with the raw Integer is probably because the struct says it wants a But it still begs the question why |
Am I right in thinking that all errors so far appear to be problems getting the pointer to propagate into native code properly? Values not coercing, null pointers getting dereferenced, etc? |
I ... believe so? Insofar the show-stopper has been that weird null out of nowhere, so I would assume it would be an issue of translating the pointer to native code, as introspection does correctly show what I would expect once it's loaded into the FFI Struct. Once it moves to native code and is consumed, however, we get the issue. So yes, that could well be it. |
Interesting. I used
So it definitely knows how to handle that, possibly maybe. If I instead change the struct's layout as follows, the Ruby integer address (no java conversions this time) takes just fine but we crash Java and get the same SPIR-V magic number failure output from the validation layer. class Vulkan::VkShaderModuleCreateInfo < FFI::Struct
layout sType: :VkStructureType,
pNext: :pointer,
flags: :VkShaderModuleCreateFlags,
codeSize: :size_t,
pCode: :ulong
end |
Quick update after using some introspection:
Insofar it's able to tell us that we are indeed loading a file and reading 1,280 bytes of ASCII-8BIT encoded string data. This then becomes a 1,281-byte string when Execution has not been changed from the original implementation, except to provide the debug output. # Create a shader module from a SPIR-V file.
def load_shader( path )
puts "DEBUG: load_shader(#{path.inspect})"
bin = open( path, 'rb'){|io| io.read }
puts "DEBUG: read #{bin.length} bytes of #{bin.encoding}"
ptr = FFI::MemoryPointer.from_string( bin )
puts "DEBUG: created pointer: #{ptr}"
smci = VkShaderModuleCreateInfo.new(
sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
codeSize: ptr.size, # account for extra byte
pCode: ptr
)
# create it
shader = FFI::MemoryPointer.new( :pointer )
ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
VkShaderModule.new( shader.read_pointer )
end When changed to match FFI's conventions rather than using my mix-ins .... # replace smci declaration:
smci = VkShaderModuleCreateInfo.new()
smci[:sType] = :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
smci[:codeSize] = ptr.size
smci[:pCode] = ptr ... we get similar output, but the crash is still encountered:
If instead the smci[:pCode] = ptr.address ... we find the previous, and still very peculiar,
|
Thanks for the additional investigation. This issue is far to involved for me to explore with a jetlaggy head, but I'll try to get into it tomorrow. If you IRC, we might be able to communicate more quickly (Freenode #jruby during US business hours). |
Finally circling back this. After tweaking things a bit I appear to have your preconf-test.rb script "working" but I'm unsure if this is an expected result or not:
Beyond that, smoketest.rb and vulkan.rb both run to completion without any indication of errors. Perhaps this branch (jruby-hotfix) has patched around the problem? I tried to revert npomf/hephaestus@4d40fe7871bace77d29cd685472c204078f07fd6 but it doesn't revert cleanly. Now that I have a working env based off your jruby-hotfix branch, what else do I need to do to reproduce? |
Good to hear that the smoketest is working as expected; I haven't checked that one for that branch recently. However, there's a difference in the two tests. The one you've shown above relies on the Vulkan validation layers, specifically those shipped from LunarG in their VulkanSDK. The error you're getting indicates that the layer was not found on the indicated You should be able to acquire them for Linux at LunarXchange; direct link to version 1.0.39.1. I should be able to provide an updated version of the branch with more up-to-date code and without the validation layers, though I'm unsure of what the impact of running without them will be. |
The jruby-hotfix branch has been updated with a cleaner I'm presently unable to commit to merging the latest code in due to a limitation in refinements with the current JRuby (does not seem to honor |
Ok, so I had some trouble getting it to find the libraries, which led me to copy them directly into cwd and modify the ffi_lib lines. I'll try to get it working against exactly what's in the sdk. |
Well, I got further!
I'm going to try booting back into X11 (instead of Wayland) and see if that helps. |
Oh, I was running X. Not sure what this error means or how to remedy it. Poking around. |
Ah-ha, I'm guessing this might be a problem with running the open-source NVidia driver, "Nouveau". I'll see if I can get a proper NVidia driver installed. |
That's the same issue I've had when trying to test on a linux VM (no native system for testing yet). However, it should only crop up if no device could be found (as obviously you have the necessary libvulkan .so's) which met the minimal criteria for use, which ... doesn't make sense if you're passing the smoketest. Though the Nouveau drivers COULD be at fault. Hard to say. This VM refuses to play nicely still and I'll have to get back to you once I can test this from my system proper. |
Well I'm running NVidia drivers now and it still gives me the same error, after briefly popping up a blank window. I guess I'm stumped. I've been meaning to get a Windows VM set up but it will be a while. |
And smoketest does indeed still result in "SUCCESS". |
I need to see if there was a code fix for polling that I need to port over. Insofar I am unable to merge the master branch into this due to Issue #4482. Will update shortly with more info. |
Worked to hand-build a version of the code to bypass the above issue and the funky You can find that via the smoketest or the vulkan tools. Example:
Results
You should be able to give it the name of whatever graphics adapter the smoke-test auto-chooses and it should work. |
Looks like my GPU ID is "940MX" according to vulkaninfo. So...we're closer!
|
Ok, I swapped MAILBOX for FIFO and I have now, finally, amazingly reproduced your original error!
|
I think you're going to like this: diff --git a/core/src/main/java/org/jruby/ext/ffi/StructLayout.java b/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
index 933bf8f..6963250 100644
--- a/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
+++ b/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
@@ -1136,7 +1136,7 @@ public final class StructLayout extends Type {
ptr.getMemoryIO().putMemoryIO(m.offset, mem);
} else if (value instanceof RubyInteger) {
- ptr.getMemoryIO().putAddress(m.offset, Util.int64Value(ptr));
+ ptr.getMemoryIO().putAddress(m.offset, Util.int64Value(value));
} else if (value.isNil()) {
ptr.getMemoryIO().putAddress(m.offset, 0L); With that fix, I believe it runs correctly:
I do get the following error during shutdown, but I presume it's unrelated:
|
Looks like this was just a mistake from the earliest days of JRuby FFI, and nobody had run into it until now: 7b7b5ad I'll push the fix. Perhaps you can come up with an FFI test or spec we can use? |
I might just be able to piece something together, sure! Super glad to hear there was an easy fix for this one, too. Thanks a ton! (Don't worry, the cleanup method is really poorly tested and needs desperate work.) |
Possible minimalist test created. Can't seem to get RSpec to actually output anything, though? View it on Gist, including a simplified format that just says OK or FAILED: require 'rspec'
require 'ffi' # builtin, but here for completeness
# A trivial struct to handle some very basic operational tests.
class Trivial < FFI::Struct
layout id: :uint,
name: :string,
pNext: :pointer
end
# Our spec.
describe Trivial do
context 'When performing coercion testing in JRuby FFI' do
# declare instance
before( :all ){ @struct = Trivial.new() }
# basic integer passing test
it 'should output 42 when the ID is set' do
expect( @struct[:id] = 42 ).to eql 42
end
# convert a Ruby string into a C string
# this **WILL FAIL** in MRI with an ArgumentError !!!
it "should be named 'JRuby'" do
str = 'JRuby'
expect( @struct[:name] = str ).to eql str
end
# test 'proper' pointer syntax using an arbitrary pointer
it 'should accept an FFI::Pointer reference' do
ptr = FFI::Pointer.new( 0x1234 )
@struct[:pNext] = ptr
expect( struct[:pNext].address ).to eql ptr.address
end
# test C-style integer offset passing
it 'should interpret integer offsets to pointers correctly' do
@struct[:pNext] = nil # force it to be 0x00 / NULL to ensure test accuracy
offset = 0x1234
@struct[:pNext] = offset
expect( @struct[:pNext] ).to be_instance_of FFI::Pointer
expect( @struct[:pNext].address ).to eql offset # we expect :offset to be a pointer now
end
end
end |
Environment
Environment details:
Gems in use:
Additional source code:
Expected Behavior
arg || CONSTANT
(evaluates tonil || 0x00 # => 0
) should not cause a TypeError.NULL
should not be received.Actual Behavior
All observations are currently drawn from the source file environment.rb in the Hephaestus project, a wrapper for Vulkan written in Ruby. Insofar it works as expected on the pre-compiled mingw32-x64 build of ruby-ffi, but the -java derivative is encountering strange behavioral differences.
from || VK_NULL_HANDLE
, which by default translates tonil || 0x00
causes aTypeError
("no implicit conversion of FFI::MemoryPointer to Integer").pCode: bin.to_ptr
assignment using the Struct constructor mixin to assign from kwargs causes the following, after which JRuby may segfault or simply crash with an OS-level "application not responding" warning:Inside the diagnostic output (enabled with
-v
on the jruby-fixes branch), the referencepCode
is clearly identified as being an FFI::MemoryPointer with a real address and a length of 1281 bytes (+1 byte from appending\x00
, which is reflected from core FFI behaviors).The
.to_ptr
contrivance is another mix-in which allows most basic Ruby objects to be rendered into FFI::MemoryPointer instances by translating their contents. In the above case, it is shorthand forFFI::MemoryPointer.from_string( bin )
.Caveats
Redoing the test without the validation layer causes the crash regardless, making me suspect it is only potentially not an issue with Vulkan (as the SDK is now being bypassed) which is causing the crash. However, I do recall there having been a potential crash in MRI before when I had accidentally passed in the raw GLSL files (as opposed to the correct SPIR-V files).
This does not change the fact that identical syntax between the two implementations of Ruby are not behaving in an identical manner, or that a pointer appears to be shedding or losing its content without warning as though it were being released / GC'd while within the scope of its creation.
JVM Dump
I've captured some findings (replicated above), as well as the output from the JVM when the crash is encountered. You can view it in this gist here.
Test File
I've reproduced the test file below, which uses the raw source of the project (as opposed to the gem, to allow more rapid development). It has not been altered from the code I would use for my personal machine, and thus the adapter name and environment variables may not reflect what you would use.
The text was updated successfully, but these errors were encountered: