Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Faster VPR rrgraph load #333

Closed
litghost opened this issue Nov 27, 2019 · 7 comments
Closed

Faster VPR rrgraph load #333

litghost opened this issue Nov 27, 2019 · 7 comments
Assignees

Comments

@litghost
Copy link

Problem statement

The current 50T rr graph load time is relatively high (~60 seconds), and the memory footprint is pretty big:

## Loading routing resource graph took 70.99 seconds (max_rss 12693.1 MiB, delta_rss +12526.7 MiB)

The largest artix 7-series part is a 200T, or 4 times larger. The largest 7-series part is a 1140T, which is ~23 times larger. And the largest US+ part is a 9000T, which is ~180 times longer.

@litghost
Copy link
Author

I've spent the day profiling the rr graph load and identified a couple low hanging fruit. Around 40 seconds is wasted doing the following:

  • ~12 seconds in check rr graph, which generates 4 millions lines of warnings we ignore
  • ~27 in process edges, the majority of which is spent dealing with edge metadata which is only used during genfasm, so it can be safely avoid

After commenting out check_rr_graph and disabling the reading of edge metadata, the rr graph creation time dropped ~24 seconds. The vast majority of that time was spent in pugixml. Based on some experiments, those 24 seconds will become ~2-3 seconds if we switch to using the mmap based capnp rr graph.

In addition, if a mmap based capnp rr graph is used, the edge metadata could be proxied to from the capnp rr graph, avoiding the double memory usage and memcpy for the edge metadata strings.

@mithro
Copy link
Member

mithro commented Nov 28, 2019

FYI - @duck2

@duck2
Copy link
Member

duck2 commented Nov 28, 2019

Disabling the reading of metadata except for genfasm is a good idea, given that it's not used in the other stages of vpr.

I'm not sure how to get rid of copying the capnp structure into VPR, though. Can we copy that much data in 2-3 seconds? Maybe a RRGraph object-like structure but with accessors to capnp instead of a create_rr_graph?

For genfasm, as we have discussed in IRC, optimizing the std::unordered_map<std::string, std::vector<std::string>> will go a long way.

@litghost
Copy link
Author

litghost commented Dec 4, 2019

First cut implementation of the uxsdcxx capnp based reader:

## Loading routing resource graph

Warning 134: This architecture version is for VPR 8.0.0-rc1+3020d7146-dirty while your current VPR version is 8.0.0-rc1+92e57c2d0 compatability issues may arise

## Loading routing resource graph took 9.34 seconds (max_rss 3563.2 MiB, delta_rss +3469.8 MiB)
  RR Graph Nodes: 3539499
  RR Graph Edges: 22966465
# Create Device took 9.88 seconds (max_rss 3563.2 MiB, delta_rss +3469.8 MiB)

versus the uxsdcxx XML based reader:

## Loading routing resource graph

Warning 134: This architecture version is for VPR 8.0.0-rc1+3020d7146-dirty while your current VPR version is 8.0.0-rc1+92e57c2d0 compatability issues may arise

## Loading routing resource graph took 18.58 seconds (max_rss 7307.4 MiB, delta_rss +7216.5 MiB)
  RR Graph Nodes: 3539499
  RR Graph Edges: 22966465
# Create Device took 19.12 seconds (max_rss 7307.4 MiB, delta_rss +7216.5 MiB)

So not only was the capnp reader faster, but it consumes half the memory! Not bad at all!

@duck2
Copy link
Member

duck2 commented Dec 4, 2019

9 seconds for the whole 50T graph is impressive! I think this can be made faster if we pool the edges instead of allocating a new unique_ptr for each node.edges.

@litghost
Copy link
Author

litghost commented Dec 4, 2019

9 seconds for the whole 50T graph is impressive! I think this can be made faster if we pool the edges instead of allocating a new unique_ptr for each node.edges.

I had a similar thought. However, the next step is to complete the integration of the reader into the master+wip, along with a XML -> capnp writer tool, so that the improved memory usage and runtime can be used.

@litghost
Copy link
Author

litghost commented Mar 4, 2020

The changes metioned in this issue have been commited!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants