Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIRFilter crashes on OpenCL #501

Closed
thirtythreeforty opened this issue Jun 14, 2021 · 6 comments
Closed

FIRFilter crashes on OpenCL #501

thirtythreeforty opened this issue Jun 14, 2021 · 6 comments

Comments

@thirtythreeforty
Copy link

Having now recompiled with OpenCL headers available, I get an internal error when adding a FIRFilter with the default settings:

$ glscopeclient --debug --trace FIRFilter
OMP_WAIT_POLICY not set to PASSIVE. Re-exec'ing with correct environment
Detecting CPU features...
    * AVX2
    * FMA
    
Detecting OpenCL devices...
    Platform 0
        CL_PLATFORM_NAME    = Intel(R) OpenCL HD Graphics
        CL_PLATFORM_PROFILE = FULL_PROFILE
        CL_PLATFORM_VENDOR  = Intel(R) Corporation
        CL_PLATFORM_VERSION = OpenCL 3.0 
        Device 0
            CL_DEVICE_NAME              = Intel(R) HD Graphics 5500 [0x1616]
            CL_DEVICE_OPENCL_C_VERSION  = OpenCL C 3.0 
            CL_DEVICE_PROFILE           = FULL_PROFILE
            CL_DEVICE_VENDOR            = Intel(R) Corporation
            CL_DEVICE_VERSION           = OpenCL 3.0 NEO 
            CL_DRIVER_VERSION           = 1.0.0
    Platform 1
        CL_PLATFORM_NAME    = Clover
        CL_PLATFORM_PROFILE = FULL_PROFILE
        CL_PLATFORM_VENDOR  = Mesa
        CL_PLATFORM_VERSION = OpenCL 1.1 Mesa 21.1.2
        No GPUs found
clFFT support: not present at compile time

Importing CSV file "/home/georgev/WAV00.csv" to new session
Context: OpenGL 4.6 compatibility profile
    GL_VENDOR                   = Intel
    GL_RENDERER                 = Mesa Intel(R) HD Graphics 5500 (BDW GT2)
    GL_VERSION                  = 4.6 (Core Profile) Mesa 21.1.2
    GL_SHADING_LANGUAGE_VERSION = 4.60
    Initial GL error code       = 0
    GL_ARB_gpu_shader_int64     = supported
INTERNAL ERROR: OpenCL error: clEnqueueNDRangeKernel (-54)
    This indicates a bug in the program, please file a report via Github
Aborted (core dumped)

I'm not 100% sure what else to report. If you want me to add extra instrumentation, let me know.

@azonenberg
Copy link
Collaborator

That's CL_INVALID_WORK_GROUP_SIZE. Interesting, I'll want to add a bit more debug information to figure out why it's getting that.

How many samples did your test waveform have? It's possible there's some fairly small limit on older GPUs. I did most of my dev on an RTX 2080 Ti but we might have to break filters up into multiple invocations in some cases. Could also be shared memory limitations.

@azonenberg
Copy link
Collaborator

Just pushed an update that prints a whole bunch of OpenCL device stats during startup, please post a dump from that.

@thirtythreeforty
Copy link
Author

glscopeclient --debug --trace FIRFilter
OMP_WAIT_POLICY not set to PASSIVE. Re-exec'ing with correct environment
Detecting CPU features...
    * AVX2
    * FMA
    
Detecting OpenCL devices...
    Platform 0
        CL_PLATFORM_NAME    = Intel(R) OpenCL HD Graphics
        CL_PLATFORM_PROFILE = FULL_PROFILE
        CL_PLATFORM_VENDOR  = Intel(R) Corporation
        CL_PLATFORM_VERSION = OpenCL 3.0 
        Device 0
            CL_DRIVER_VERSION                   = 1.0.0
            CL_DEVICE_NAME                      = Intel(R) HD Graphics 5500 [0x1616]
            CL_DEVICE_OPENCL_C_VERSION          = OpenCL C 3.0 
            CL_DEVICE_PROFILE                   = FULL_PROFILE
            CL_DEVICE_VENDOR                    = Intel(R) Corporation
            CL_DEVICE_VERSION                   = OpenCL 3.0 NEO 
            CL_DEVICE_GLOBAL_MEM_CACHE_SIZE     = 0.250 MB
            CL_DEVICE_GLOBAL_MEM_CACHELINE_SIZE = 64
            CL_DEVICE_GLOBAL_MEM_SIZE           = 6.12 GB
            CL_DEVICE_LOCAL_MEM_SIZE            = 64.00 kB
            CL_DEVICE_MAX_CLOCK_FREQUENCY       = 950 MHz
            CL_DEVICE_MAX_COMPUTE_UNITS         = 24
            CL_DEVICE_MAX_CONSTANT_ARGS         = 8
            CL_DEVICE_MAX_CONSTANT_BUFFER_SIZE  = 2088960.00 kB
            CL_DEVICE_MAX_MEM_ALLOC_SIZE        = 1.99 GB
            CL_DEVICE_MAX_PARAMETER_SIZE        = 2048
            CL_DEVICE_MAX_WORK_GROUP_SIZE       = 256
            CL_DEVICE_MAX_WORK_ITEM_SIZES       = 256, 256, 256
            CL_DEVICE_EXTENSIONS:
                cl_khr_byte_addressable_store
                cl_khr_fp16
                cl_khr_global_int32_base_atomics
                cl_khr_global_int32_extended_atomics
                cl_khr_icd
                cl_khr_local_int32_base_atomics
                cl_khr_local_int32_extended_atomics
                cl_intel_command_queue_families
                cl_intel_subgroups
                cl_intel_required_subgroup_size
                cl_intel_subgroups_short
                cl_khr_spir
                cl_intel_accelerator
                cl_intel_driver_diagnostics
                cl_khr_priority_hints
                cl_khr_throttle_hints
                cl_khr_create_command_queue
                cl_intel_subgroups_char
                cl_intel_subgroups_long
                cl_khr_il_program
                cl_intel_mem_force_host_memory
                cl_khr_subgroup_extended_types
                cl_khr_subgroup_non_uniform_vote
                cl_khr_subgroup_ballot
                cl_khr_subgroup_non_uniform_arithmetic
                cl_khr_subgroup_shuffle
                cl_khr_subgroup_shuffle_relative
                cl_khr_subgroup_clustered_reduce
                cl_intel_device_attribute_query
                cl_khr_fp64
                cl_khr_subgroups
                cl_intel_spirv_media_block_io
                cl_intel_spirv_subgroups
                cl_khr_spirv_no_integer_wrap_decoration
                cl_intel_unified_shared_memory_preview
                cl_khr_mipmap_image
                cl_khr_mipmap_image_writes
                cl_intel_planar_yuv
                cl_intel_packed_yuv
                cl_khr_int64_base_atomics
                cl_khr_int64_extended_atomics
                cl_khr_image2d_from_buffer
                cl_khr_depth_images
                cl_khr_3d_image_writes
                cl_intel_media_block_io
                cl_intel_va_api_media_sharing
                cl_intel_sharing_format_query
    Platform 1
        CL_PLATFORM_NAME    = Clover
        CL_PLATFORM_PROFILE = FULL_PROFILE
        CL_PLATFORM_VENDOR  = Mesa
        CL_PLATFORM_VERSION = OpenCL 1.1 Mesa 21.1.2
        No GPUs found
clFFT support: not present at compile time

@thirtythreeforty
Copy link
Author

thirtythreeforty commented Jun 14, 2021

How many samples did your test waveform have?

A total of 32768. It's the same file as I've been working with in #499, but with the timebase fixed.

@azonenberg
Copy link
Collaborator

Looks like the FIR kernel tries to run with a local size of 1024, and your implementation has CL_DEVICE_MAX_WORK_GROUP_SIZE of 256. So that's why it fails. Gimme a bit...

@thirtythreeforty
Copy link
Author

thirtythreeforty commented Jun 14, 2021 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants