Faster VPR rrgraph load #333

litghost · 2019-11-27T21:01:25Z

Problem statement

The current 50T rr graph load time is relatively high (~60 seconds), and the memory footprint is pretty big:

## Loading routing resource graph took 70.99 seconds (max_rss 12693.1 MiB, delta_rss +12526.7 MiB)

The largest artix 7-series part is a 200T, or 4 times larger. The largest 7-series part is a 1140T, which is ~23 times larger. And the largest US+ part is a 9000T, which is ~180 times longer.

The text was updated successfully, but these errors were encountered:

litghost · 2019-11-27T21:06:48Z

I've spent the day profiling the rr graph load and identified a couple low hanging fruit. Around 40 seconds is wasted doing the following:

~12 seconds in check rr graph, which generates 4 millions lines of warnings we ignore
~27 in process edges, the majority of which is spent dealing with edge metadata which is only used during genfasm, so it can be safely avoid

After commenting out check_rr_graph and disabling the reading of edge metadata, the rr graph creation time dropped ~24 seconds. The vast majority of that time was spent in pugixml. Based on some experiments, those 24 seconds will become ~2-3 seconds if we switch to using the mmap based capnp rr graph.

In addition, if a mmap based capnp rr graph is used, the edge metadata could be proxied to from the capnp rr graph, avoiding the double memory usage and memcpy for the edge metadata strings.

mithro · 2019-11-28T01:28:08Z

FYI - @duck2

duck2 · 2019-11-28T11:21:03Z

Disabling the reading of metadata except for genfasm is a good idea, given that it's not used in the other stages of vpr.

I'm not sure how to get rid of copying the capnp structure into VPR, though. Can we copy that much data in 2-3 seconds? Maybe a RRGraph object-like structure but with accessors to capnp instead of a create_rr_graph?

For genfasm, as we have discussed in IRC, optimizing the std::unordered_map<std::string, std::vector<std::string>> will go a long way.

litghost · 2019-12-04T14:23:29Z

First cut implementation of the uxsdcxx capnp based reader:

## Loading routing resource graph

Warning 134: This architecture version is for VPR 8.0.0-rc1+3020d7146-dirty while your current VPR version is 8.0.0-rc1+92e57c2d0 compatability issues may arise

## Loading routing resource graph took 9.34 seconds (max_rss 3563.2 MiB, delta_rss +3469.8 MiB)
  RR Graph Nodes: 3539499
  RR Graph Edges: 22966465
# Create Device took 9.88 seconds (max_rss 3563.2 MiB, delta_rss +3469.8 MiB)

versus the uxsdcxx XML based reader:

## Loading routing resource graph

Warning 134: This architecture version is for VPR 8.0.0-rc1+3020d7146-dirty while your current VPR version is 8.0.0-rc1+92e57c2d0 compatability issues may arise

## Loading routing resource graph took 18.58 seconds (max_rss 7307.4 MiB, delta_rss +7216.5 MiB)
  RR Graph Nodes: 3539499
  RR Graph Edges: 22966465
# Create Device took 19.12 seconds (max_rss 7307.4 MiB, delta_rss +7216.5 MiB)

So not only was the capnp reader faster, but it consumes half the memory! Not bad at all!

duck2 · 2019-12-04T14:38:00Z

9 seconds for the whole 50T graph is impressive! I think this can be made faster if we pool the edges instead of allocating a new unique_ptr for each node.edges.

litghost · 2019-12-04T14:40:33Z

9 seconds for the whole 50T graph is impressive! I think this can be made faster if we pool the edges instead of allocating a new unique_ptr for each node.edges.

I had a similar thought. However, the next step is to complete the integration of the reader into the master+wip, along with a XML -> capnp writer tool, so that the improved memory usage and runtime can be used.

litghost · 2020-03-04T23:32:52Z

The changes metioned in this issue have been commited!

mithro assigned litghost Nov 28, 2019

litghost mentioned this issue Dec 4, 2019

WIP: RR graph uxsdcxx based reader and writer #340

Closed

litghost closed this as completed Mar 4, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Faster VPR rrgraph load #333

Faster VPR rrgraph load #333

litghost commented Nov 27, 2019

litghost commented Nov 27, 2019

mithro commented Nov 28, 2019

duck2 commented Nov 28, 2019 •

edited

Loading

litghost commented Dec 4, 2019

duck2 commented Dec 4, 2019

litghost commented Dec 4, 2019

litghost commented Mar 4, 2020

Faster VPR rrgraph load #333

Faster VPR rrgraph load #333

Comments

litghost commented Nov 27, 2019

Problem statement

litghost commented Nov 27, 2019

mithro commented Nov 28, 2019

duck2 commented Nov 28, 2019 • edited Loading

litghost commented Dec 4, 2019

duck2 commented Dec 4, 2019

litghost commented Dec 4, 2019

litghost commented Mar 4, 2020

duck2 commented Nov 28, 2019 •

edited

Loading