-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Faster VPR rrgraph load #333
Comments
I've spent the day profiling the rr graph load and identified a couple low hanging fruit. Around 40 seconds is wasted doing the following:
After commenting out check_rr_graph and disabling the reading of edge metadata, the rr graph creation time dropped ~24 seconds. The vast majority of that time was spent in In addition, if a mmap based capnp rr graph is used, the edge metadata could be proxied to from the capnp rr graph, avoiding the double memory usage and memcpy for the edge metadata strings. |
FYI - @duck2 |
Disabling the reading of metadata except for genfasm is a good idea, given that it's not used in the other stages of vpr. I'm not sure how to get rid of copying the capnp structure into VPR, though. Can we copy that much data in 2-3 seconds? Maybe a RRGraph object-like structure but with accessors to capnp instead of a For genfasm, as we have discussed in IRC, optimizing the |
First cut implementation of the uxsdcxx capnp based reader:
versus the uxsdcxx XML based reader:
So not only was the capnp reader faster, but it consumes half the memory! Not bad at all! |
9 seconds for the whole 50T graph is impressive! I think this can be made faster if we pool the edges instead of allocating a new unique_ptr for each |
I had a similar thought. However, the next step is to complete the integration of the reader into the |
The changes metioned in this issue have been commited! |
Problem statement
The current 50T rr graph load time is relatively high (~60 seconds), and the memory footprint is pretty big:
The largest artix 7-series part is a 200T, or 4 times larger. The largest 7-series part is a 1140T, which is ~23 times larger. And the largest US+ part is a 9000T, which is ~180 times longer.
The text was updated successfully, but these errors were encountered: