apitrace: coherent mapping not implemented #232

degasus · 2014-02-24T14:05:15Z

Both ARB_buffer_storage and AMD_pinned_memory allow to use coherent memory in shaders. As apitrace doesn't catch writes in this memory, the results are likely wrong.

AMD_pinned_memory could easily be disabled, but as ARB_buffer_storage is required for GL4.4, it's needed to be emulated.

tbh, I don't see any os Independent way to implement this but to have a second buffer synced before/after almost every call.
But there are useful os dependent ways: eg we could alloc a both write and read protected memory and only sync the accessed pages.

amonakov · 2014-02-24T17:40:19Z

The same problem exists for GL client arrays. There's a thread starting at http://lists.freedesktop.org/archives/mesa-dev/2014-February/053328.html confirming the problem.

jrfonseca · 2014-02-25T15:36:35Z

Yep.

The https://github.com/apitrace/apitrace/tree/virtual-memory-regions has some experimental code to improve support for the old GL client arrays by keeping track of the content in user memory. We'll need something along these lines. (Though GL client arrays is slightly harder, as we don't even know how big are the user regions.)

For ARB_buffer_storage and AMD_pinned_memory we'll need to maintain a list of active user memory regions, and check for changes whenever there is a draw call.

But first step would be to get some test cases for this on https://github.com/apitrace/apitrace-tests . If anybody is willing to help, this would be great way to do so.

See issue #232.

eero-t · 2016-04-14T14:34:44Z

Thanks for adding the warning!

Because of this issue, trace fails for Bioshock Infinite & DiRT Showndown (Steam games).

Of the Open Source games, for example Supertuxkart v0.9.x traces replay wrong with GL 4.3 supporting 3D drivers (e.g. Mesa with GL & GLSL version and compute extension overrides): https://en.wikipedia.org/wiki/SuperTuxKart

Are there other freely available use-cases where this issue breaks tracing?

marekolsak · 2017-05-09T22:04:38Z

https://www.kernel.org/doc/Documentation/vm/soft-dirty.txt
This can be used to handle persistent mappings in apitrace. Note that it only works with non-device memory. The idea is that apitrace will return malloc'd memory for persistent mappings and use the dirty page tracking to do manual uploads.

eero-t · 2017-05-10T07:31:49Z

VulkanTools "vktrace" uses that, and in my testing it works fine with DOTA2.

werman · 2019-06-25T16:08:39Z

I'd like to implement tracing of the coherent mapping.

VulkanTools "vktrace" uses that, and in my testing it works fine with DOTA2.

Some time ago vktrace dropped pagemap support and moved to mprotect (LunarG/VulkanTools@a8b5df4#diff-d658adcb226a1d34c995bd96a3c31291), maybe to get more unified code between platforms, maybe because of ugly loop which checked if page is dirty (I'm not exactly sure why it was needed).

Here I assume that it isn't possible to protect the pointer returned by glMapBufferRange since it could be device memory. Is it correct?

How I see the implementation:

As @marekolsak said we would need a second malloc'd memory that will be returned to the app. It would be similar to vktrace and new apitrace's branch apitrace/memtrace-exception-handler (for dx11)

We will protect the memory returned to the app for writes, then:

On write - remove write protection, set that the page is dirty.
On unmap - commit the writes to gl buffer
On Draw/Dispatch/flushes/fences/barriers (GL_CLIENT_MAPPED_BUFFER_BARRIER_BIT and GL_BUFFER_UPDATE_BARRIER_BIT ?) - commit all writes to gl buffers, add back protection to memory

However it leaves the issue of reads where the most problematic case will be:
App ensures gl buffer is synchronized -> writes to our shadow memory -> apitrace catches this, marks page dirty, removes protection -> app reads from the other part of the same page -> app gets stale data

In such case we won't be able to catch that read because after the first write all protections are removed (removing write protection implies removing of read protection).

That would be a rare case and more common will be:
App ensures gl buffer is synchronized -> reads from our shadow memory -> apitrace catches the read and fetches the up to date data fro gl buffer, removes read protection -> app continues to read updated memory

Adding read protection and distinguishing between read and write faults will solve the common case for Windows and Linux on x86-64 (distinguishing read and write faults is not straightforward on Linux).

And the full solution probably will be reading all buffers when app synchronizes gpu and cpu. I think at the moment it would be enough to implement the common case which will also have good enough performance.

I'd like to get some feedback before trying to implement this.

Thanks.

jrfonseca · 2019-06-26T21:42:06Z

@werman thanks for volunteering. Your plan matches pretty much what i had in mind too.

In short, have a shadow buffer, and trap writes,

The only detail I'd add is:

there should be an acceleration on the page granularity when app writes to a buffer sequentially, so that writing N pages triggers O(log(N)) exceptions, as opposed to O(N) exceptions
also need to flush on calls that operate on PBOs too (grep source for pack_function_regex and unpack_function_regex)

Reads are indeed tricker to handle with shadows. We could trap reads from the shadow too and trigger a flush from real memory to shadow, but there are always situations that might not be granular enough. If we really want to handle reads effectively, we'll need to track when buffers are bound for writing into GL contexts, so that we flush the reads to the shadow when the corresponding fence is signaled. (Thankfully there are way fewer bind points where a buffer can be bound for write on GL.)

Another alternative for reads is to use the shadow just for memcompares, and pass the gpu accessible memory to the app. That is, compromise performance for correctness. If we could restrict this to the few eventual buffers that are used for reads, it would be a good compromise.

I think at the moment it would be enough to implement the common case which will also have good enough performance.

I agree.

werman · 2019-06-27T11:23:03Z

there should be an acceleration on the page granularity when app writes to a buffer sequentially, so that writing N pages triggers O(log(N)) exceptions, as opposed to O(N) exceptions

Good idea, thanks.

also need to flush on calls that operate on PBOs too (grep source for pack_function_regex and unpack_function_regex)

noted

If we really want to handle reads effectively, we'll need to track when buffers are bound for writing into GL contexts, so that we flush the reads to the shadow when the corresponding fence is signaled. (Thankfully there are way fewer bind points where a buffer can be bound for write on GL.)

So if we have mapped coherent buffers for read:

Track glBindBuffersBase, glBindBufferRange, glBindBuffersRange of them and add fence after the draw/dispatch which uses such buffers.
Track clears, copies and pixel transfer operations for the buffer and add fence after them

The question is where to check that the fence is signaled?

Another alternative for reads is to use the shadow just for memcompares, and pass the gpu accessible memory to the app. That is, compromise performance for correctness. If we could restrict this to the few eventual buffers that are used for reads, it would be a good compromise.

If we pass the gpu accessible memory to the app how we will track writes since the buffer could be readable and writable? I'm not sure I understand this correctly, could you explain it further?

At the moment I have the next priority:

Implement writes to shadow memory
Some simple implementation for reads
If would make sense - correct implementation for reads.

About the tests, I see that there is a simple test in the tests repo. I think I would need to make tests which check more cases and also check correctness of reads and writes.

jrfonseca · 2019-06-27T13:28:40Z

The question is where to check that the fence is signaled?

The rules of GL_ARB_buffer_storage's MAP_COHERENT_BIT are such that an app needs to do glFenceSync and wait for it for side effects to be visible to CPU.

So, a way of dealing with this is:

every time a draw/dispatch happens, go over all buffer bindings, and move buffers bound for read into a list attached to the GL context
when glFenceSync calls happens, move that list to the sync object (e.g., using a std::map<GLsync,...>), and empty the context list
the first time a sync object completes, go over every buffer attached to that sync object, and flush reads

If we pass the gpu accessible memory to the app how we will track writes since the buffer could be readable and writable? I'm not sure I understand this correctly, could you explain it further?

Everytime that draw/etc that uses that buffer happens, we'd need to memcmp with the shadow, and emit a fake memcpy for touched regions. I agree it's not easy. And once can't distinguish the GPU writes from CPU writes. But it's a possibility.

I think I would need to make tests which check more cases and also check correctness of reads and writes.

Yes. Currently the tests are on a separate repos. I've been thinking it would be easier to merge with the main repo. But I'm not sure when I'll get around to do it.

werman · 2019-06-27T14:12:08Z

The rules of GL_ARB_buffer_storage's MAP_COHERENT_BIT are such that an app needs to do glFenceSync and wait for it for side effects to be visible to CPU.

"If MAP_COHERENT_BIT is set and the server does a write, the app must
call FenceSync with SYNC_GPU_COMMANDS_COMPLETE (or Finish). Then the
CPU will see the writes after the sync is complete."

Yes, I see, for some reason my thoughts moved in some weird other direction.

So, a way of dealing with this:

Thank you for the details.

Everytime that draw/etc that uses that buffer happens, we'd need to memcmp with the shadow, and emit a fake memcpy for touched regions. I agree it's not easy. And once can't distinguish the GPU writes from CPU writes. But it's a possibility.

I like the first method more =)

Ok, I think I've got enough information to start implementing it.

jrfonseca added the Tracing label Feb 25, 2014

jrfonseca self-assigned this Feb 25, 2014

jrfonseca added the Feature Request label Feb 28, 2014

jrfonseca mentioned this issue May 4, 2016

Black screen when replaying DOTA2 game #453

Closed

jrfonseca mentioned this issue Mar 27, 2017

replay the trace of DiRT Showdown show blank screen #514

Closed

turol mentioned this issue Dec 8, 2017

Retrace can't get buffer contents if created with glBufferStorage and without GL_MAP_READ_BIT #546

Open

werman mentioned this issue Jul 24, 2019

Support writable coherent buffers #623

Merged

jrfonseca closed this as completed in #623 Aug 5, 2019

stgatilov mentioned this issue Dec 15, 2022

Coherent mapping: wrong behavior on ID reuse #847

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

apitrace: coherent mapping not implemented #232

apitrace: coherent mapping not implemented #232

degasus commented Feb 24, 2014

amonakov commented Feb 24, 2014

jrfonseca commented Feb 25, 2014

eero-t commented Apr 14, 2016 •

edited

Loading

marekolsak commented May 9, 2017

eero-t commented May 10, 2017

werman commented Jun 25, 2019 •

edited

Loading

jrfonseca commented Jun 26, 2019

werman commented Jun 27, 2019

jrfonseca commented Jun 27, 2019

werman commented Jun 27, 2019

apitrace: coherent mapping not implemented #232

apitrace: coherent mapping not implemented #232

Comments

degasus commented Feb 24, 2014

amonakov commented Feb 24, 2014

jrfonseca commented Feb 25, 2014

eero-t commented Apr 14, 2016 • edited Loading

marekolsak commented May 9, 2017

eero-t commented May 10, 2017

werman commented Jun 25, 2019 • edited Loading

jrfonseca commented Jun 26, 2019

werman commented Jun 27, 2019

jrfonseca commented Jun 27, 2019

werman commented Jun 27, 2019

eero-t commented Apr 14, 2016 •

edited

Loading

werman commented Jun 25, 2019 •

edited

Loading