Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

apitrace: coherent mapping not implemented #232

Closed
degasus opened this issue Feb 24, 2014 · 10 comments · Fixed by #623
Closed

apitrace: coherent mapping not implemented #232

degasus opened this issue Feb 24, 2014 · 10 comments · Fixed by #623

Comments

@degasus
Copy link

degasus commented Feb 24, 2014

Both ARB_buffer_storage and AMD_pinned_memory allow to use coherent memory in shaders. As apitrace doesn't catch writes in this memory, the results are likely wrong.

AMD_pinned_memory could easily be disabled, but as ARB_buffer_storage is required for GL4.4, it's needed to be emulated.

tbh, I don't see any os Independent way to implement this but to have a second buffer synced before/after almost every call.
But there are useful os dependent ways: eg we could alloc a both write and read protected memory and only sync the accessed pages.

@amonakov
Copy link
Member

The same problem exists for GL client arrays. There's a thread starting at http://lists.freedesktop.org/archives/mesa-dev/2014-February/053328.html confirming the problem.

@jrfonseca jrfonseca self-assigned this Feb 25, 2014
@jrfonseca
Copy link
Member

Yep.

The https://github.com/apitrace/apitrace/tree/virtual-memory-regions has some experimental code to improve support for the old GL client arrays by keeping track of the content in user memory. We'll need something along these lines. (Though GL client arrays is slightly harder, as we don't even know how big are the user regions.)

For ARB_buffer_storage and AMD_pinned_memory we'll need to maintain a list of active user memory regions, and check for changes whenever there is a draw call.

But first step would be to get some test cases for this on https://github.com/apitrace/apitrace-tests . If anybody is willing to help, this would be great way to do so.

@eero-t
Copy link

eero-t commented Apr 14, 2016

Thanks for adding the warning!

Because of this issue, trace fails for Bioshock Infinite & DiRT Showndown (Steam games).

Of the Open Source games, for example Supertuxkart v0.9.x traces replay wrong with GL 4.3 supporting 3D drivers (e.g. Mesa with GL & GLSL version and compute extension overrides): https://en.wikipedia.org/wiki/SuperTuxKart

Are there other freely available use-cases where this issue breaks tracing?

@marekolsak
Copy link

https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt
This can be used to handle persistent mappings in apitrace. Note that it only works with non-device memory. The idea is that apitrace will return malloc'd memory for persistent mappings and use the dirty page tracking to do manual uploads.

@eero-t
Copy link

eero-t commented May 10, 2017

VulkanTools "vktrace" uses that, and in my testing it works fine with DOTA2.

@werman
Copy link
Contributor

werman commented Jun 25, 2019

I'd like to implement tracing of the coherent mapping.

VulkanTools "vktrace" uses that, and in my testing it works fine with DOTA2.

Some time ago vktrace dropped pagemap support and moved to mprotect (LunarG/VulkanTools@a8b5df4#diff-d658adcb226a1d34c995bd96a3c31291), maybe to get more unified code between platforms, maybe because of ugly loop which checked if page is dirty (I'm not exactly sure why it was needed).

Here I assume that it isn't possible to protect the pointer returned by glMapBufferRange since it could be device memory. Is it correct?

How I see the implementation:

As @marekolsak said we would need a second malloc'd memory that will be returned to the app. It would be similar to vktrace and new apitrace's branch apitrace/memtrace-exception-handler (for dx11)

We will protect the memory returned to the app for writes, then:

  • On write - remove write protection, set that the page is dirty.
  • On unmap - commit the writes to gl buffer
  • On Draw/Dispatch/flushes/fences/barriers (GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT ?) - commit all writes to gl buffers, add back protection to memory

However it leaves the issue of reads where the most problematic case will be:
App ensures gl buffer is synchronized -> writes to our shadow memory -> apitrace catches this, marks page dirty, removes protection -> app reads from the other part of the same page -> app gets stale data

In such case we won't be able to catch that read because after the first write all protections are removed (removing write protection implies removing of read protection).

That would be a rare case and more common will be:
App ensures gl buffer is synchronized -> reads from our shadow memory -> apitrace catches the read and fetches the up to date data fro gl buffer, removes read protection -> app continues to read updated memory

Adding read protection and distinguishing between read and write faults will solve the common case for Windows and Linux on x86-64 (distinguishing read and write faults is not straightforward on Linux).

And the full solution probably will be reading all buffers when app synchronizes gpu and cpu. I think at the moment it would be enough to implement the common case which will also have good enough performance.

I'd like to get some feedback before trying to implement this.

Thanks.

@jrfonseca
Copy link
Member

@werman thanks for volunteering. Your plan matches pretty much what i had in mind too.

In short, have a shadow buffer, and trap writes,

The only detail I'd add is:

  • there should be an acceleration on the page granularity when app writes to a buffer sequentially, so that writing N pages triggers O(log(N)) exceptions, as opposed to O(N) exceptions
  • also need to flush on calls that operate on PBOs too (grep source for pack_function_regex and unpack_function_regex)

Reads are indeed tricker to handle with shadows. We could trap reads from the shadow too and trigger a flush from real memory to shadow, but there are always situations that might not be granular enough. If we really want to handle reads effectively, we'll need to track when buffers are bound for writing into GL contexts, so that we flush the reads to the shadow when the corresponding fence is signaled. (Thankfully there are way fewer bind points where a buffer can be bound for write on GL.)

Another alternative for reads is to use the shadow just for memcompares, and pass the gpu accessible memory to the app. That is, compromise performance for correctness. If we could restrict this to the few eventual buffers that are used for reads, it would be a good compromise.

I think at the moment it would be enough to implement the common case which will also have good enough performance.

I agree.

@werman
Copy link
Contributor

werman commented Jun 27, 2019

there should be an acceleration on the page granularity when app writes to a buffer sequentially, so that writing N pages triggers O(log(N)) exceptions, as opposed to O(N) exceptions

Good idea, thanks.

also need to flush on calls that operate on PBOs too (grep source for pack_function_regex and unpack_function_regex)

noted

If we really want to handle reads effectively, we'll need to track when buffers are bound for writing into GL contexts, so that we flush the reads to the shadow when the corresponding fence is signaled. (Thankfully there are way fewer bind points where a buffer can be bound for write on GL.)

So if we have mapped coherent buffers for read:

  • Track glBindBuffersBase, glBindBufferRange, glBindBuffersRange of them and add fence after the draw/dispatch which uses such buffers.
  • Track clears, copies and pixel transfer operations for the buffer and add fence after them

The question is where to check that the fence is signaled?

Another alternative for reads is to use the shadow just for memcompares, and pass the gpu accessible memory to the app. That is, compromise performance for correctness. If we could restrict this to the few eventual buffers that are used for reads, it would be a good compromise.

If we pass the gpu accessible memory to the app how we will track writes since the buffer could be readable and writable? I'm not sure I understand this correctly, could you explain it further?

At the moment I have the next priority:

  1. Implement writes to shadow memory
  2. Some simple implementation for reads
  3. If would make sense - correct implementation for reads.

About the tests, I see that there is a simple test in the tests repo. I think I would need to make tests which check more cases and also check correctness of reads and writes.

@jrfonseca
Copy link
Member

The question is where to check that the fence is signaled?

The rules of GL_ARB_buffer_storage's MAP_COHERENT_BIT are such that an app needs to do glFenceSync and wait for it for side effects to be visible to CPU.

So, a way of dealing with this is:

  • every time a draw/dispatch happens, go over all buffer bindings, and move buffers bound for read into a list attached to the GL context
  • when glFenceSync calls happens, move that list to the sync object (e.g., using a std::map<GLsync,...>), and empty the context list
  • the first time a sync object completes, go over every buffer attached to that sync object, and flush reads

If we pass the gpu accessible memory to the app how we will track writes since the buffer could be readable and writable? I'm not sure I understand this correctly, could you explain it further?

Everytime that draw/etc that uses that buffer happens, we'd need to memcmp with the shadow, and emit a fake memcpy for touched regions. I agree it's not easy. And once can't distinguish the GPU writes from CPU writes. But it's a possibility.

I think I would need to make tests which check more cases and also check correctness of reads and writes.

Yes. Currently the tests are on a separate repos. I've been thinking it would be easier to merge with the main repo. But I'm not sure when I'll get around to do it.

@werman
Copy link
Contributor

werman commented Jun 27, 2019

The rules of GL_ARB_buffer_storage's MAP_COHERENT_BIT are such that an app needs to do glFenceSync and wait for it for side effects to be visible to CPU.

"If MAP_COHERENT_BIT is set and the server does a write, the app must
call FenceSync with SYNC_GPU_COMMANDS_COMPLETE (or Finish). Then the
CPU will see the writes after the sync is complete."

Yes, I see, for some reason my thoughts moved in some weird other direction.

So, a way of dealing with this:

Thank you for the details.

Everytime that draw/etc that uses that buffer happens, we'd need to memcmp with the shadow, and emit a fake memcpy for touched regions. I agree it's not easy. And once can't distinguish the GPU writes from CPU writes. But it's a possibility.

I like the first method more =)

Ok, I think I've got enough information to start implementing it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants