Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JRuby's impl of FFI differs in behavior from MRI; segfaults/crashes #4413

Closed
ghost opened this issue Dec 28, 2016 · 34 comments
Closed

JRuby's impl of FFI differs in behavior from MRI; segfaults/crashes #4413

ghost opened this issue Dec 28, 2016 · 34 comments

Comments

@ghost
Copy link

ghost commented Dec 28, 2016

Environment

Environment details:

  • Java HotSpot(TM) 64-Bit Server VM (25.51-b03) for windows-amd64 JRE (1.8.0_51-b16), built on Jun 8 2015 18:03:07 by "java_re" with MS VC++ 10.0 (VS2010)
  • Windows 7 , 64 bit Build 7601 (6.1.7601.23572)
  • JRuby 9.1.6.0

Gems in use:

  • FFI 1.9.14-java

Additional source code:

Expected Behavior

  • The syntax of arg || CONSTANT (evaluates to nil || 0x00 # => 0) should not cause a TypeError.
  • When a Memory Pointer with a length which matches the source ASCII-8Bit -encoded string from which it is derived is passed, a NULL should not be received.
  • Execution of code which operates correctly in Ruby 2.3.3 (MRI) should not greatly differ when executed in JRuby 9k.

Actual Behavior

All observations are currently drawn from the source file environment.rb in the Hephaestus project, a wrapper for Vulkan written in Ruby. Insofar it works as expected on the pre-compiled mingw32-x64 build of ruby-ffi, but the -java derivative is encountering strange behavioral differences.

  • At line 245, from || VK_NULL_HANDLE, which by default translates to nil || 0x00 causes a TypeError ("no implicit conversion of FFI::MemoryPointer to Integer").
  • Likewise at line 318, pCode: bin.to_ptr assignment using the Struct constructor mixin to assign from kwargs causes the following, after which JRuby may segfault or simply crash with an OS-level "application not responding" warning:
ParameterValidation(ERROR): object: 0x0 type: 0 location: 190 msgCode: 4: vkCre
ateShaderModule: required parameter pCreateInfo->pCode specified as NULL
SC(ERROR): object: 0x0 type: 0 location: 10277 msgCode: 5: SPIR-V module not va
lid: Invalid SPIR-V magic number.

Inside the diagnostic output (enabled with -v on the jruby-fixes branch), the reference pCode is clearly identified as being an FFI::MemoryPointer with a real address and a length of 1281 bytes (+1 byte from appending \x00, which is reflected from core FFI behaviors).

The .to_ptr contrivance is another mix-in which allows most basic Ruby objects to be rendered into FFI::MemoryPointer instances by translating their contents. In the above case, it is shorthand for FFI::MemoryPointer.from_string( bin ).

Caveats

Redoing the test without the validation layer causes the crash regardless, making me suspect it is only potentially not an issue with Vulkan (as the SDK is now being bypassed) which is causing the crash. However, I do recall there having been a potential crash in MRI before when I had accidentally passed in the raw GLSL files (as opposed to the correct SPIR-V files).

This does not change the fact that identical syntax between the two implementations of Ruby are not behaving in an identical manner, or that a pointer appears to be shedding or losing its content without warning as though it were being released / GC'd while within the scope of its creation.

JVM Dump

I've captured some findings (replicated above), as well as the output from the JVM when the crash is encountered. You can view it in this gist here.

Test File

I've reproduced the test file below, which uses the raw source of the project (as opposed to the gem, to allow more rapid development). It has not been altered from the code I would use for my personal machine, and thus the adapter name and environment variables may not reflect what you would use.

# modify environment
ENV['PATH'] = './bin;' + ENV['PATH']
ENV['VK_LAYER_PATH'] = 'D:/VulkanSDK/1.0.33.0/Bin'

# load and track time
top = Time.now
require_relative '../vulkan'
require_relative '../lib/utils'
include Vulkan, GLFW3
puts "Loaded Vulkan in #{Time.now - top} seconds"

# mark head
$mark = Time.now
$failed = false

# setup
begin
    puts 'Creating environment ...'
    $env = Vulkan::Environment.new(
        'Test',
        adapter: 'GTX 660',
        validation: 'VK_LAYER_LUNARG_standard_validation'
    )

    puts 'Setting up window ...'
    win = Vulkan::Window.new( 1280, 800, $env )
    puts 'Finishing initialization: '

    puts ' - create rendering path ...'
    $env.create_default_rendering_path

    puts ' - create framebuffers ...'
    $env.create_framebuffers

    puts ' - create commandpool ...'
    $env.create_commandpool

    puts ' - create command buffers ...'
    $env.create_commandbuffers

    puts 'Done.'
rescue Exception => err
    root = Dir.pwd + '/'
    warn "#{err.class.name}: #{err}", *err.backtrace.map {|line| "    " + line.gsub( root, '') }
    $failed = true
ensure
    dT = Time.now - $mark
    puts "Initialization attempt #{$failed ? 'FAILED' : 'succeeded'} after #{dT} seconds",
         "Environment details:",
         "--------------------",
         $env.inspect
end
@headius headius added this to the JRuby 9.1.7.0 milestone Jan 4, 2017
@headius
Copy link
Member

headius commented Jan 4, 2017

Looking into this a bit now, hopefully for 9.1.7.0.

@headius
Copy link
Member

headius commented Jan 5, 2017

Ok, I'm going to need a little more help with this. I'd like to be able to reproduce, but getting this set up with appropriate paths to Vulkan is confusing. I'm also on Linux...so it may not even be possible to get it running before we need to push a 9.1.7.0 release.

Two things could help me here:

  • If you can sort out individual calls that are failing and simulate them with a simple C file and Ruby script, I can reproduce anywhere. I realize that's a bit of work.
  • Alternatively, if you can create a temporary repo that has all paths set up with a reproduction script, I can give it a shot. If Windows is the only option for you, that's fine too...it just means I'll have to take some extra time to set up a Windows VM.

The conversion error should be simple enough. Presumably the CRuby FFI just takes the Integer value and assumes you're giving it a valid pointer, while the JRuby version demands an actual MemoryPointer. Problem is, I can't see who would be raising it and I don't have a backtrace. Is it just trying to construct a struct?

If you can get backtraces for the non-crashers, with and without -Xbacktrace.style=full I might be able to fix it without even running it.

The other crasher is likely similar; some value is not coercing properly, so we end up with a null pointer that doesn't manifest until Vulkan eventually tries to access that field of the struct.

@ghost
Copy link
Author

ghost commented Jan 5, 2017

Error site details

As far as I can tell, the following is where our error happens:

# Create a shader module from a SPIR-V file.
def load_shader( path )
    bin = open( path, 'rb'){|io| io.read }
    smci = VkShaderModuleCreateInfo.new(
        sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
        codeSize: bin.length,
        pCode: bin.to_ptr   # FIXME jruby validation sees NULL; MemoryPointer exposed in diag
    )

    # create it
    shader = FFI::MemoryPointer.new( :pointer )
    ok = vkCreateShaderModule( @logical_device, smci, nil, shader )  # KABOOM
    raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
    VkShaderModule.new( shader.read_pointer )
end

The line with the FIXME comment will correctly show diagnostic data in debug mode of the expected pointer, which is only printed after assignment, so it's definitely not the FFI::Struct causing the fault.

Diag output
This output appears when the shader module is created in debug mode.

DIAG: #<Vulkan::VkShaderModuleCreateInfo:0x7cb502c>: sType (:sType) -> :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
DIAG: #<Vulkan::VkShaderModuleCreateInfo:0x7cb502c>: codeSize (:codeSize) -> 1280
DIAG: #<Vulkan::VkShaderModuleCreateInfo:0x7cb502c>: pCode (:pCode) -> #<FFI::MemoryPointer address=0x2d0000 size=1281>

From it we can tell that a pointer of 1,281 bytes in length is created, which matches the same length reported by MRI. So that's good.

The second comment, "KABOOM," is the site of the actual segmentation fault. For some reason the struct gets passed in with pCreateInfo->pCode = NULL, which is causing Vulkan to lose its mind, and is the error printed in the validation layer's diagnostics before failing.

LunarG validation output

ParameterValidation(ERROR): object: 0x0 type: 0 location: 190 msgCode: 4: vkCreateShaderModule: required parameter pCrea
teInfo->pCode specified as NULL
SC(ERROR): object: 0x0 type: 0 location: 10277 msgCode: 5: SPIR-V module not valid: Invalid SPIR-V magic number.

This, tied with the reference to nvoglv64.dll+0xedae47 as the problematic frame in question is making me think that the segfault is potentially part of Vulkan. The NULL content of the pointer, however, is definitely JRuby.

Predefined test script

Added the branch jruby-hotfix which includes a version of the file that will configure itself (hopefully correctly) for both standard Windows installs and Linux, presuming the instructions for LunarG's SDK on Linux are correct and they appear in the /home of a user. You might have to substitute version numbers, however.

If you can help me with producing the backtraces, I'll be happy to provide all details possible! JRuby prints the following (where "test/make_window.rb" is the original to the "preconf_test.rb" file):

D:\git\hephaestus>jruby -v -d -Xbacktrace.style=full -Xrunhprof test/make_window.rb
jruby: warning: unknown property jruby.runhprof

@ghost
Copy link
Author

ghost commented Jan 5, 2017

Just did a quick diagnostic test. This is what's printed out after our VkShaderModuleCreateInfo struct is created:

DIAG: #<Vulkan::VkShaderModuleCreateInfo:0x275bf9b3>: pCode (:pCode) -> #<FFI::MemoryPointer ad
dress=0x211ab0 size=1281>
INFO: {:sType=>:VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO, :pNext=>#<FFI::Pointer address=0x0
>, :flags=>0, :codeSize=>1280, :pCode=>#<FFI::Pointer address=0x211ab0>}

For some reason, dumping its values is reflecting an FFI::Pointer instead of the FFI::MemoryPointer passed in, but which has the same address?

Could that be what's causing our problem, or am I looking at the wrong thing?

@headius
Copy link
Member

headius commented Jan 6, 2017

Your command line would need -J in front of -Xrunhprof if you want that, but I'm not sure what you are using that flag for (it's a JVM-level profiler). The backtrace flag looks correct.

The pointer inside the struct probably is an FFI::Pointer. FFI::MemoryPointer#to_ptr returns this. But the handling of the struct when passed to native is not consuming FFI::Pointer properly. What does it do if you yank the actual pointer integer out and pass that for pCode?

@ghost
Copy link
Author

ghost commented Jan 6, 2017

Seems the segfault is preventing the JVM from properly dumping the backtraces; the file contains only the usual header and the .TMP of the output is empty. It's definitely working otherwise, though, as there is a noticeable performance difference as the profiler runs.

Using the corrected command line, when I swap the pointer to be a raw integer, I get the following. It fails to coerce the result of .to_i() on the pointer, and prevents the struct from being created:

DIAG: #<Vulkan::VkShaderModuleCreateInfo:0x2049a9c1>: codeSize (:codeSize) -> 1280
TypeError: no implicit conversion of FFI::MemoryPointer into Integer
    java/lang/Thread.java:1552:in `getStackTrace'
    org/jruby/runtime/backtrace/TraceType.java:246:in `getBacktraceData'
    org/jruby/runtime/backtrace/TraceType.java:47:in `getBacktrace'
    org/jruby/RubyException.java:236:in `prepareBacktrace'
    org/jruby/exceptions/RaiseException.java:216:in `preRaise'
    org/jruby/exceptions/RaiseException.java:183:in `preRaise'
    org/jruby/exceptions/RaiseException.java:111:in `<init>'
    org/jruby/Ruby.java:4182:in `newRaiseException'
    org/jruby/Ruby.java:3853:in `newTypeError'
    org/jruby/util/TypeConverter.java:400:in `handleImplicitlyUncoercibleObject'
    org/jruby/util/TypeConverter.java:396:in `handleUncoercibleObject'
    org/jruby/util/TypeConverter.java:99:in `convertToType19'
    org/jruby/util/TypeConverter.java:69:in `convertToType'
    org/jruby/RubyBasicObject.java:744:in `convertToInteger'
    org/jruby/RubyNumeric.java:219:in `other2long'
    org/jruby/RubyNumeric.java:207:in `num2long'
    org/jruby/ext/ffi/Util.java:100:in `longValue'
    org/jruby/ext/ffi/Util.java:75:in `int64Value'
    org/jruby/ext/ffi/StructLayout.java:1139:in `put'
    org/jruby/ext/ffi/StructLayout.java:495:in `put'
    org/jruby/ext/ffi/Struct.java:293:in `setFieldValue'
    org/jruby/ext/ffi/Struct$INVOKER$i$2$0$setFieldValue.gen:-1:in `call'
    org/jruby/internal/runtime/methods/JavaMethod.java:740:in `call'
    org/jruby/ir/targets/InvokeSite.java:152:in `fail'
    lib/structs.rb:49:in `block in new'
    org/jruby/runtime/CompiledIRBlockBody.java:156:in `yieldDirect'
    org/jruby/runtime/MixedModeIRBlockBody.java:122:in `yieldDirect'
    org/jruby/runtime/BlockBody.java:108:in `yield'
    org/jruby/runtime/Block.java:167:in `yield'
    org/jruby/RubyHash.java:1363:in `visit'
    org/jruby/RubyHash.java:1360:in `visit'
    org/jruby/RubyHash.java:669:in `visitLimited'
    org/jruby/RubyHash.java:654:in `visitAll'
    org/jruby/RubyHash.java:1320:in `iteratorVisitAll'
    org/jruby/RubyHash.java:1355:in `each_pairCommon'
    org/jruby/RubyHash.java:1344:in `each'
    org/jruby/RubyHash$INVOKER$i$0$0$each.gen:-1:in `call'
    org/jruby/internal/runtime/methods/JavaMethod.java:497:in `call'
    org/jruby/runtime/callsite/CachingCallSite.java:77:in `callBlock'
    org/jruby/runtime/callsite/CachingCallSite.java:83:in `call'
    org/jruby/ir/instructions/CallBase.java:428:in `interpret'
    org/jruby/ir/interpreter/InterpreterEngine.java:356:in `processCall'
    org/jruby/ir/interpreter/StartupInterpreterEngine.java:73:in `interpret'
    org/jruby/ir/interpreter/InterpreterEngine.java:84:in `interpret'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:179:in `INTERPRET_METHOD'
    lib/structs.rb:48:in `new'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:165:in `call'
    org/jruby/internal/runtime/methods/DynamicMethod.java:200:in `call'
    org/jruby/runtime/callsite/CachingCallSite.java:338:in `cacheAndCall'
    org/jruby/runtime/callsite/CachingCallSite.java:163:in `call'
    org/jruby/ir/interpreter/InterpreterEngine.java:315:in `processCall'
    org/jruby/ir/interpreter/StartupInterpreterEngine.java:73:in `interpret'
    org/jruby/ir/interpreter/InterpreterEngine.java:84:in `interpret'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:179:in `INTERPRET_METHOD'
    lib/util/environment.rb:315:in `load_shader'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:165:in `call'
    org/jruby/internal/runtime/methods/DynamicMethod.java:200:in `call'
    org/jruby/runtime/callsite/CachingCallSite.java:338:in `cacheAndCall'
    org/jruby/runtime/callsite/CachingCallSite.java:163:in `call'
    org/jruby/ir/interpreter/InterpreterEngine.java:315:in `processCall'
    org/jruby/ir/interpreter/StartupInterpreterEngine.java:73:in `interpret'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:109:in `INTERPRET_METHOD'
    lib/util/environment.rb:591:in `create_rendering_path'
    org/jruby/internal/runtime/methods/MixedModeIRMethod.java:95:in `call'
    org/jruby/internal/runtime/methods/AliasMethod.java:101:in `call'
    org/jruby/ir/targets/InvokeSite.java:140:in `invoke'
    test/make_window.rb:30:in `<main>'
    java/lang/invoke/MethodHandle.java:625:in `invokeWithArguments'
    org/jruby/ir/Compiler.java:111:in `load'
    org/jruby/Ruby.java:846:in `runScript'
    org/jruby/Ruby.java:761:in `runNormally'
    org/jruby/Ruby.java:779:in `runNormally'
    org/jruby/Ruby.java:592:in `runFromMain'
    org/jruby/Main.java:425:in `doRunFromMain'
    org/jruby/Main.java:313:in `internalRun'
    org/jruby/Main.java:242:in `run'
    org/jruby/Main.java:204:in `main'

That's the same issue I had first encountered where the constant 0x00 would raise the error.

The profiler appears to have output 119MB of data. The files are here:

@ghost
Copy link
Author

ghost commented Jan 6, 2017

Re-ran without the profiler to validate that the integer value is correct. It appears to be:

INFO: creating VkShaderModuleCreateInfo with: #<FFI::MemoryPointer address=0x390000 size=1281> 
at 3735552

@headius
Copy link
Member

headius commented Jan 6, 2017

I'm confused...that memory address doesn't look valid, and it seems like it's still trying to create the struct using the MemoryPointer rather than the Integer value from your logging output. Can you show me the updated code for clarification?

@ghost
Copy link
Author

ghost commented Jan 6, 2017

Original code:

# Create a shader module from a SPIR-V file.
def load_shader( path )
    bin = open( path, 'rb'){|io| io.read }
    smci = VkShaderModuleCreateInfo.new(
        sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
        codeSize: bin.length,
        pCode: bin.to_ptr  # equivalent to FFI::MemoryPointer.from_string( bin )
    )

    # create it
    shader = FFI::MemoryPointer.new( :pointer )
    ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
    raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
    VkShaderModule.new( shader.read_pointer )
end

Updated code to use the integer value:

# Create a shader module from a SPIR-V file.
def load_shader( path )
    bin = open( path, 'rb'){|io| io.read }
    ptr = bin.to_ptr  # equivalent to FFI::MemoryPointer.from_string( bin )
    loc = ptr.to_i    # pull integer address 
    puts "INFO: creating VkShaderModuleCreateInfo with: #{ptr} at #{loc}"
    smci = VkShaderModuleCreateInfo.new(
        sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
        codeSize: bin.length,
        pCode: loc  # pass in the integer address instead of the pointer; causes TypeError
    )
    puts "INFO: #{smci.to_h}"

    # create it
    shader = FFI::MemoryPointer.new( :pointer )
    ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
    raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
    VkShaderModule.new( shader.read_pointer )
end

I'm still really unsure of why it's saying conversion of FFI::MemoryPointer into Integer when the opposite order is what's being passed in. Regardless, the TypeError from the previous comment occurs during the assignment of the pCode member of the struct. We aren't reaching the invocation of vkCreateShaderModule() anymore.

The long format of the code without my mixins would be:

def load_shader( path )
    bin = open( path, 'rb'){|io| io.read }
    ptr = FFI::MemoryPointer.from_string( bin )
    loc = ptr.to_i    # pull integer address 
    puts "INFO: creating VkShaderModuleCreateInfo with: #{ptr} at #{loc}"
    smci = VkShaderModuleCreateInfo.new()
    smci[:sType] = :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
    smci[:codeSize] = bin.length
    smci[:pCode] = loc # <- TypeError is raised here
    puts "INFO: #{smci.to_h}"

    # create it
    shader = FFI::MemoryPointer.new( :pointer )
    ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
    raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
    VkShaderModule.new( shader.read_pointer )
end

@headius
Copy link
Member

headius commented Jan 7, 2017

Ok, so the TypeError with the raw Integer is probably because the struct says it wants a :pointer. If you change that to something suitable for a numeric pointer on your system (long or ulong I'd guess) the modified code would work.

But it still begs the question why :pointer is having trouble with an FFI::MemoryPointer. Hmm.

@headius
Copy link
Member

headius commented Jan 7, 2017

Am I right in thinking that all errors so far appear to be problems getting the pointer to propagate into native code properly? Values not coercing, null pointers getting dereferenced, etc?

@ghost
Copy link
Author

ghost commented Jan 7, 2017

I ... believe so? Insofar the show-stopper has been that weird null out of nowhere, so I would assume it would be an issue of translating the pointer to native code, as introspection does correctly show what I would expect once it's loaded into the FFI Struct.

Once it moves to native code and is consumed, however, we get the issue. So yes, that could well be it.

@ghost
Copy link
Author

ghost commented Jan 7, 2017

Interesting. I used .to_java( :long ) on the address and got a different error:

INFO: creating VkShaderModuleCreateInfo with: #<FFI::MemoryPointer address=0x4519d0 size=1281> 
at 4528592
ArgumentError: Invalid pointer value
    org/jruby/ext/ffi/Struct.java:293:in `[]='
    lib/structs.rb:49:in `block in new'
    org/jruby/RubyHash.java:1344:in `each'
    lib/structs.rb:48:in `new'
    lib/util/environment.rb:318:in `load_shader'
    lib/util/environment.rb:594:in `create_rendering_path'
    test/make_window.rb:30:in `<main>'

So it definitely knows how to handle that, possibly maybe.

If I instead change the struct's layout as follows, the Ruby integer address (no java conversions this time) takes just fine but we crash Java and get the same SPIR-V magic number failure output from the validation layer.

class Vulkan::VkShaderModuleCreateInfo < FFI::Struct
    layout sType:           :VkStructureType,
           pNext:           :pointer,
           flags:           :VkShaderModuleCreateFlags,
           codeSize:        :size_t,
           pCode:           :ulong
end

And our old friend:
java-stopped

@enebo enebo modified the milestones: JRuby 9.1.8.0, JRuby 9.1.7.0 Jan 10, 2017
@ghost
Copy link
Author

ghost commented Feb 5, 2017

Quick update after using some introspection:

DEBUG:  load_shader("res/vert.spv")
DEBUG:  read 1280 bytes of ASCII-8BIT
DEBUG:  created pointer: #<FFI::MemoryPointer address=0x3d0430 size=1281>
ParameterValidation(ERROR): object: 0x0 type: 0 location: 190 msgCode: 4:
vkCreateShaderModule: required parameter pCreateInfo->pCode specified as 
NULL
SC(ERROR): object: 0x0 type: 0 location: 10277 msgCode: 5: SPIR-V module 
not valid: Invalid SPIR-V magic number.

Insofar it's able to tell us that we are indeed loading a file and reading 1,280 bytes of ASCII-8BIT encoded string data. This then becomes a 1,281-byte string when 0x00 is appended so that it is null-terminated (thanks to FFI's FFI::MemoryPointer.from_string() method).

Execution has not been changed from the original implementation, except to provide the debug output.

# Create a shader module from a SPIR-V file.
def load_shader( path )
    puts "DEBUG:  load_shader(#{path.inspect})"
    bin = open( path, 'rb'){|io| io.read }
    puts "DEBUG:  read #{bin.length} bytes of #{bin.encoding}"
    ptr = FFI::MemoryPointer.from_string( bin )
    puts "DEBUG:  created pointer: #{ptr}"
    smci = VkShaderModuleCreateInfo.new(
        sType: :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO,
        codeSize: ptr.size,  # account for extra byte
        pCode: ptr
    )

    # create it
    shader = FFI::MemoryPointer.new( :pointer )
    ok = vkCreateShaderModule( @logical_device, smci, nil, shader )
    raise LoadError, "Failed to load shader: #{ok}" unless ok.eql? :VK_SUCCESS
    VkShaderModule.new( shader.read_pointer )
end

When changed to match FFI's conventions rather than using my mix-ins ....

# replace smci declaration:
smci = VkShaderModuleCreateInfo.new()
smci[:sType] = :VK_STRUCTURE_TYPE_SHADER_MODULE_CREATE_INFO
smci[:codeSize] = ptr.size
smci[:pCode] = ptr

... we get similar output, but the crash is still encountered:

DEBUG:  load_shader("res/vert.spv")
DEBUG:  read 1280 bytes of ASCII-8BIT
DEBUG:  created pointer: #<FFI::MemoryPointer address=0x3f0430 size=1281>
ParameterValidation(ERROR): object: 0x0 type: 0 location: 190 msgCode: 4: 
vkCreateShaderModule: required parameter pCreateInfo->pCode specified as 
NULL
SC(ERROR): object: 0x0 type: 0 location: 10277 msgCode: 5: SPIR-V module 
not valid: Invalid SPIR-V magic number.

If instead the :pCode field declaration is changed to use Ruby's working address convention ...

smci[:pCode] = ptr.address

... we find the previous, and still very peculiar, TypeError raised at ffi/Struct.java:293:

TypeError: no implicit conversion of FFI::MemoryPointer into Integer
    org/jruby/ext/ffi/Struct.java:293:in `[]='
    lib/util/environment.rb:336:in `load_shader'
    lib/util/environment.rb:608:in `create_rendering_path'
    lib/util/environment.rb:82:in `setup'
    lib/util/window.rb:35:in `initialize'
    test/make_window.rb:26:in `<main>'

@headius
Copy link
Member

headius commented Feb 9, 2017

Thanks for the additional investigation. This issue is far to involved for me to explore with a jetlaggy head, but I'll try to get into it tomorrow. If you IRC, we might be able to communicate more quickly (Freenode #jruby during US business hours).

@headius
Copy link
Member

headius commented Feb 23, 2017

Finally circling back this.

After tweaking things a bit I appear to have your preconf-test.rb script "working" but I'm unsure if this is an expected result or not:

$ jruby preconf_test.rb 
Probably on Linux; setting paths ...
Loaded Vulkan in 0.732283 seconds
Creating environment ...
RuntimeError: Failed to acquire Vulkan instance: VK_ERROR_LAYER_NOT_PRESENT
    lib/util/environment.rb:665:in `create_instance'
    lib/util/environment.rb:67:in `initialize'
    preconf_test.rb:26:in `<main>'
Initialization attempt FAILED after 0.025133000000000003 seconds
Environment details:
--------------------
nil

Beyond that, smoketest.rb and vulkan.rb both run to completion without any indication of errors. Perhaps this branch (jruby-hotfix) has patched around the problem? I tried to revert npomf/hephaestus@4d40fe7871bace77d29cd685472c204078f07fd6 but it doesn't revert cleanly.

Now that I have a working env based off your jruby-hotfix branch, what else do I need to do to reproduce?

@ghost
Copy link
Author

ghost commented Feb 23, 2017

Good to hear that the smoketest is working as expected; I haven't checked that one for that branch recently.

However, there's a difference in the two tests. The one you've shown above relies on the Vulkan validation layers, specifically those shipped from LunarG in their VulkanSDK. The error you're getting indicates that the layer was not found on the indicated PATH.

You should be able to acquire them for Linux at LunarXchange; direct link to version 1.0.39.1.

I should be able to provide an updated version of the branch with more up-to-date code and without the validation layers, though I'm unsure of what the impact of running without them will be.

@ghost
Copy link
Author

ghost commented Feb 23, 2017

The jruby-hotfix branch has been updated with a cleaner preconf_test.rb file that should bypass validation layers if they aren't found and ignore the adapter specification.

I'm presently unable to commit to merging the latest code in due to a limitation in refinements with the current JRuby (does not seem to honor .singleton_class refinements yet).

@headius
Copy link
Member

headius commented Feb 23, 2017

Ok, so I had some trouble getting it to find the libraries, which led me to copy them directly into cwd and modify the ffi_lib lines. I'll try to get it working against exactly what's in the sdk.

@headius
Copy link
Member

headius commented Feb 23, 2017

Well, I got further!

Probably on Linux; setting paths ...
Loaded Vulkan in 0.714981 seconds
Skipping validation layers as they were not found!
Creating environment ...
Setting up window ...
RuntimeError: No suitable device found!
    lib/util/environment.rb:694:in `enumerate_devices'
    lib/util/environment.rb:73:in `setup'
    lib/util/window.rb:33:in `initialize'
    preconf_test.rb:34:in `<main>'
Initialization attempt FAILED after 0.17169299999999998 seconds
Environment details:
--------------------
#<Vulkan::Environment:0x305ffe9e @framebuffers=[], @engine=nil, @vulkan=#<Vulkan::VkInstance address=0x7f04e88189c0>, @validation=nil, @name="Test", @adapter=nil, @surface=#<Vulkan::VkSurfaceKHR address=0x7f04e8813c40>, @shaders=[], @semaphores=[]>

I'm going to try booting back into X11 (instead of Wayland) and see if that helps.

@headius
Copy link
Member

headius commented Feb 23, 2017

Oh, I was running X. Not sure what this error means or how to remedy it. Poking around.

@headius
Copy link
Member

headius commented Feb 23, 2017

Ah-ha, I'm guessing this might be a problem with running the open-source NVidia driver, "Nouveau". I'll see if I can get a proper NVidia driver installed.

@ghost
Copy link
Author

ghost commented Feb 23, 2017

That's the same issue I've had when trying to test on a linux VM (no native system for testing yet). However, it should only crop up if no device could be found (as obviously you have the necessary libvulkan .so's) which met the minimal criteria for use, which ... doesn't make sense if you're passing the smoketest.

Though the Nouveau drivers COULD be at fault. Hard to say. This VM refuses to play nicely still and I'll have to get back to you once I can test this from my system proper.

@headius
Copy link
Member

headius commented Feb 23, 2017

Well I'm running NVidia drivers now and it still gives me the same error, after briefly popping up a blank window. I guess I'm stumped.

I've been meaning to get a Windows VM set up but it will be a while.

@headius
Copy link
Member

headius commented Feb 23, 2017

And smoketest does indeed still result in "SUCCESS".

@ghost
Copy link
Author

ghost commented Feb 23, 2017

I need to see if there was a code fix for polling that I need to port over. Insofar I am unable to merge the master branch into this due to Issue #4482. Will update shortly with more info.

@headius
Copy link
Member

headius commented Feb 23, 2017

I've built the Vulkan samples and they work fine...so I'm out of ideas :-)
screenshot from 2017-02-23 16-16-22

@ghost
Copy link
Author

ghost commented Feb 23, 2017

Worked to hand-build a version of the code to bypass the above issue and the funky refine SomeClass.singleton_class issue I've run into. I'm encountering the same problem as well, now. I'll need to do more research in MRI and JRuby as to why that's failing. Using a "known" adapter should work, though, and I've updated the file to use ARGV[0] as the suggested adapter name.

You can find that via the smoketest or the vulkan tools. Example:

D:\git\hephaestus>jruby smoketest.rb
. . .
Listing devices ...
Device 0 ~ #<FFI::Pointer address=0x5c096350>
 => properties: #<Vulkan::VkPhysicalDeviceProperties:0x1b1cfb87> (align: 8; size: 816 bytes)
--- DETAILS ---
API Version:    0x400025
Driver Version: 0x5e8c4000
Vendor ID:      0x10de
Device ID:      0x11c0
Device Name:    GeForce GTX 660
Using device #0: #<FFI::Pointer address=0x5c096350>

Results

D:\git\hephaestus>jruby preconf_test.rb "GeForce GTX 660"
Detected MS Windows; setting paths ...
Loaded Vulkan in 0.99 seconds
Skipping validation layers as they were not found!
Creating environment ...
Setting up window ...
TypeError: no implicit conversion of FFI::MemoryPointer into Integer
    org/jruby/ext/ffi/Struct.java:293:in `[]='
    lib/structs.rb:46:in `block in new'
    org/jruby/RubyHash.java:1343:in `each'
    lib/structs.rb:45:in `new'
    lib/util/environment.rb:254:in `create_swapchain'
    lib/util/environment.rb:81:in `setup'
    lib/util/window.rb:35:in `initialize'
    preconf_test.rb:35:in `<main>'
Initialization attempt FAILED after 0.363 seconds
. . .

You should be able to give it the name of whatever graphics adapter the smoke-test auto-chooses and it should work.

@headius
Copy link
Member

headius commented Feb 23, 2017

Looks like my GPU ID is "940MX" according to vulkaninfo. So...we're closer!

~/projects/jruby/hephaestus $ jruby preconf_test.rb "940MX"
Probably on Linux; setting paths ...
Loaded Vulkan in 0.484694 seconds
Skipping validation layers as they were not found!
Creating environment ...
Using graphics adapter '940MX'
Setting up window ...
ArgumentError: Unsupported present mode VK_PRESENT_MODE_MAILBOX_KHR
    lib/util/environment.rb:225:in `create_swapchain'
    lib/util/environment.rb:81:in `setup'
    lib/util/window.rb:35:in `initialize'
    preconf_test.rb:36:in `<main>'
Initialization attempt FAILED after 0.153452 seconds
Environment details:
--------------------
#<Vulkan::Environment:0x79e4c792 @queues=[0], @framebuffers=[], @engine=nil, @vulkan=#<Vulkan::VkInstance address=0x7f1d8480e8d0>, @validation=nil, @name="Test", @present_queue=#<Vulkan::VkQueue address=0x7f1d84ab83b0>, @adapter="940MX", @surface=#<Vulkan::VkSurfaceKHR address=0x7f1d848806c0>, @physical_device=#<Vulkan::VkPhysicalDevice address=0x7f1d84880de0>, @shaders=[], @semaphores=[], @graphics_queue=#<Vulkan::VkQueue address=0x7f1d84ab83b0>, @logical_device=#<Vulkan::VkDevice address=0x7f1d84995b40>>

@headius
Copy link
Member

headius commented Feb 23, 2017

Ok, I swapped MAILBOX for FIFO and I have now, finally, amazingly reproduced your original error!

~/projects/jruby/hephaestus $ git diff
diff --git a/lib/util/environment.rb b/lib/util/environment.rb
index 467bda8..21812cc 100644
--- a/lib/util/environment.rb
+++ b/lib/util/environment.rb
@@ -220,7 +220,7 @@ class Vulkan::Environment
 
     # Create the swapchain.  The `from` value indicates the pointer used by `oldSwapchain` when the swapchain is being
     # recreated.  Also defines the viewport.
-    def create_swapchain( width, height, mode: :VK_PRESENT_MODE_MAILBOX_KHR, format: :VK_FORMAT_B8G8R8A8_UNORM, colorspace: :VK_COLOR_SPACE_SRGB_NONLINEAR_KHR, from: nil )
+    def create_swapchain( width, height, mode: :VK_PRESENT_MODE_FIFO_KHR, format: :VK_FORMAT_B8G8R8A8_UNORM, colorspace: :VK_COLOR_SPACE_SRGB_NONLINEAR_KHR, from: nil )
         raise ArgumentError, "Unsupported surface format #{format}, #{colorspace}" unless format_supported? format, colorspace
         raise ArgumentError, "Unsupported present mode #{mode}" unless present_mode_supported? mode
 

~/projects/jruby/hephaestus $ jruby preconf_test.rb "940MX"
Probably on Linux; setting paths ...
Loaded Vulkan in 0.478298 seconds
Skipping validation layers as they were not found!
Creating environment ...
Using graphics adapter '940MX'
Setting up window ...
TypeError: no implicit conversion of FFI::MemoryPointer into Integer
    org/jruby/ext/ffi/Struct.java:293:in `[]='
    lib/structs.rb:46:in `block in new'
    org/jruby/RubyHash.java:1343:in `each'
    lib/structs.rb:45:in `new'
    lib/util/environment.rb:254:in `create_swapchain'
    lib/util/environment.rb:81:in `setup'
    lib/util/window.rb:35:in `initialize'
    preconf_test.rb:36:in `<main>'
Initialization attempt FAILED after 0.185739 seconds
Environment details:
--------------------
#<Vulkan::Environment:0x8519cb4 @queues=[0], @viewport=#<Vulkan::VkViewport:0x2d901eb0>, @present_mode=:VK_PRESENT_MODE_FIFO_KHR, @format=:VK_FORMAT_B8G8R8A8_UNORM, @shaders=[], @graphics_queue=#<Vulkan::VkQueue address=0x7fd1d8ae0860>, @height=800, @colorspace=:VK_COLOR_SPACE_SRGB_NONLINEAR_KHR, @framebuffers=[], @engine=nil, @vulkan=#<Vulkan::VkInstance address=0x7fd1d883c2c0>, @validation=nil, @name="Test", @width=1280, @present_queue=#<Vulkan::VkQueue address=0x7fd1d8ae0860>, @adapter="940MX", @surface=#<Vulkan::VkSurfaceKHR address=0x7fd1d88a8d60>, @physical_device=#<Vulkan::VkPhysicalDevice address=0x7fd1d88a8910>, @semaphores=[], @logical_device=#<Vulkan::VkDevice address=0x7fd1d89be410>>

@headius
Copy link
Member

headius commented Feb 23, 2017

I think you're going to like this:

diff --git a/core/src/main/java/org/jruby/ext/ffi/StructLayout.java b/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
index 933bf8f..6963250 100644
--- a/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
+++ b/core/src/main/java/org/jruby/ext/ffi/StructLayout.java
@@ -1136,7 +1136,7 @@ public final class StructLayout extends Type {
                 ptr.getMemoryIO().putMemoryIO(m.offset, mem);
 
             } else if (value instanceof RubyInteger) {
-                ptr.getMemoryIO().putAddress(m.offset, Util.int64Value(ptr));
+                ptr.getMemoryIO().putAddress(m.offset, Util.int64Value(value));
 
             } else if (value.isNil()) {
                 ptr.getMemoryIO().putAddress(m.offset, 0L);

With that fix, I believe it runs correctly:

$ jruby preconf_test.rb "940MX"
Probably on Linux; setting paths ...
Loaded Vulkan in 0.460969 seconds
Skipping validation layers as they were not found!
Creating environment ...
Using graphics adapter '940MX'
Setting up window ...
Done.
Initialization attempt succeeded after 0.222168 seconds
Environment details:
--------------------
#<Vulkan::Environment:0x1caa0244 @queues=[0], @renderpass=#<Vulkan::VkRenderPass address=0x7f807ce8e6c0>, @layout=#<Vulkan::VkPipelineLayout address=0x7f807ce8dd50>, @viewport=#<Vulkan::VkViewport:0x7ce69770>, @present_mode=:VK_PRESENT_MODE_FIFO_KHR, @swapchain=#<Vulkan::VkSwapchainKHR address=0x7f807c98c380>, @pipeline=#<Vulkan::VkPipeline address=0x7f807c98bd80>, @format=:VK_FORMAT_B8G8R8A8_UNORM, @shaders=[#<Vulkan::VkShaderModule address=0x7f807ce8ea00>, #<Vulkan::VkShaderModule address=0x7f807ce8eb20>], @graphics_queue=#<Vulkan::VkQueue address=0x7f807caf6810>, @height=800, @colorspace=:VK_COLOR_SPACE_SRGB_NONLINEAR_KHR, @framebuffers=[#<Vulkan::VkFramebuffer address=0x7f807ce94d30>, #<Vulkan::VkFramebuffer address=0x7f807ce94e70>], @engine=nil, @vulkan=#<Vulkan::VkInstance address=0x7f807c84e750>, @validation=nil, @name="Test", @width=1280, @present_queue=#<Vulkan::VkQueue address=0x7f807caf6810>, @adapter="940MX", @surface=#<Vulkan::VkSurfaceKHR address=0x7f807c8bed60>, @physical_device=#<Vulkan::VkPhysicalDevice address=0x7f807c8bf4f0>, @commandpool=#<Vulkan::VkCommandPool address=0x7f807cea7770>, @semaphores=[], @logical_device=#<Vulkan::VkDevice address=0x7f807c9d4dc0>, @command_buffers=[#<Vulkan::VkCommandBuffer address=0x7f807cea2d90>, #<Vulkan::VkCommandBuffer address=0x7f807cea3be0>]>

I do get the following error during shutdown, but I presume it's unrelated:

ArgumentError: wrong number of arguments (2 for 3)
  destroy! at /home/headius/projects/jruby/hephaestus/lib/util/environment.rb:100
    <main> at preconf_test.rb:48

@headius
Copy link
Member

headius commented Feb 23, 2017

Looks like this was just a mistake from the earliest days of JRuby FFI, and nobody had run into it until now: 7b7b5ad

I'll push the fix. Perhaps you can come up with an FFI test or spec we can use?

@ghost
Copy link
Author

ghost commented Feb 24, 2017

I might just be able to piece something together, sure!

Super glad to hear there was an easy fix for this one, too. Thanks a ton!

(Don't worry, the cleanup method is really poorly tested and needs desperate work.)

@ghost
Copy link
Author

ghost commented Feb 26, 2017

Possible minimalist test created. Can't seem to get RSpec to actually output anything, though?

View it on Gist, including a simplified format that just says OK or FAILED:

require 'rspec'
require 'ffi'  # builtin, but here for completeness

# A trivial struct to handle some very basic operational tests.
class Trivial < FFI::Struct
    layout id:              :uint,
           name:            :string,
           pNext:           :pointer
end

# Our spec.
describe Trivial do
    context 'When performing coercion testing in JRuby FFI' do
        # declare instance
        before( :all ){ @struct = Trivial.new() }
        
        # basic integer passing test
        it 'should output 42 when the ID is set' do
            expect( @struct[:id] = 42 ).to eql 42
        end
        
        # convert a Ruby string into a C string
        # this **WILL FAIL** in MRI with an ArgumentError !!!
        it "should be named 'JRuby'" do
            str = 'JRuby'
            expect( @struct[:name] = str ).to eql str
        end
        
        # test 'proper' pointer syntax using an arbitrary pointer
        it 'should accept an FFI::Pointer reference' do
            ptr = FFI::Pointer.new( 0x1234 )
            @struct[:pNext] = ptr
            expect( struct[:pNext].address ).to eql ptr.address
        end
        
        # test C-style integer offset passing
        it 'should interpret integer offsets to pointers correctly' do
            @struct[:pNext] = nil   # force it to be 0x00 / NULL to ensure test accuracy
            offset = 0x1234
            @struct[:pNext] = offset
            expect( @struct[:pNext] ).to be_instance_of FFI::Pointer
            expect( @struct[:pNext].address ).to eql offset  # we expect :offset to be a pointer now
        end
    end
end

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants