Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Lookahead map status #517

Open
acomodi opened this issue Jun 17, 2020 · 21 comments
Open

New Lookahead map status #517

acomodi opened this issue Jun 17, 2020 · 21 comments

Comments

@acomodi
Copy link
Collaborator

acomodi commented Jun 17, 2020

Related issues/PRs

There have been efforts to push the lookahead used in this fork upstream. All the related work/information can be found in the following links:
verilog-to-routing#1325
verilog-to-routing#1351
verilog-to-routing#1367

Current status

The integration of the connection box based lookahead and the upstream lookahead has been partially done. To reproduce the results obtained so far, the following versions of the tools are needed:

The VTR branch is still under development as it is not yet 100% functional.
In fact, there still are two assertions that do fail and have been temporarily disabled to continue in the P&R flow.
They are probably due to some incorrect lookahead values for some delta_x/y and some segment types and/or channels.

The results obtained by using the above commits are the following for the base litex design:

  • 174.95 seconds
  • 17.597 ns (CPD)
  • Works on HW
  • Lookahead size 15M (37x reduction)
  • Lookahead computation run-time: 532 seconds (~8m) on a 20 cores machine.
@acomodi
Copy link
Collaborator Author

acomodi commented Jun 17, 2020

@litghost @mithro @HackerFoo FYI

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 10, 2020

I have resumed the work on getting the lookahead in a mergeable state with upstream. First I successfully reproduced the results I have obtained a couple of weeks ago, than I have moved to use also the latest yosys master+wip and got improvements on placer run-time and quality of results as well.

By further analyzing the router behaviour with debugging and profiler enabled I still see a non-trivial amount of connections that take way too long compared to others as well as several backtracking step due to incorrect lookahead costs retrieval.

I need to collect more data to have a clearer view of what is happening.

One behaviour that I have seen, which should not happen IMO, is that the placer produces a solution with a ~1.5 ns lower than the final CPD obtained after the routing step.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 13, 2020

By analyzing the routing debug information when running some of the symbiflow-arch-defs tests, I could see that the Worst connections are relative to four kinds of categories:

  • Constant nets: it appears that constant nets do take, at worst, almost double the time w.r.t. the connection box lookahead:
Lookahead map:
Worst conn was RR node: 0 type: SOURCE location: (1,1) class: 0 capacity: 1 to RR node: 1101656 type: SINK location: (76,59) class: 235 capacity: 1 (crit: 0.980000) took 0.059810

Connection Box Lookahead map:
Worst conn was RR node: 0 type: SOURCE location: (1,1) class: 0 capacity: 1 to RR node: 1855792 type: SINK location: (61,106) class: 8 capacity: 1 (crit: 0.980000) took 0.028980
  • Connections that have a sink pin set to DMUX

  • Connections which involve a BRAM pin

  • Clock input connections:

Worst conn was RR node: 1396500 type: SOURCE location: (114,78) class: 53 capacity: 1 to RR node: 1855791 type: SINK location: (61,106) class: 7 capacity: 1 (crit: 0.980000) took 0.121228

One thing that I need to do to have a clearer view of what is happening is to have a version of VPR that implements both the connection box lookahead and the new lookahead map. This because I have tested the whole flow (packing, placement and routing) with two different VtR versions. In order to better compare the results, it is needed to have same .place and .net files in input to the routing step.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 22, 2020

After some debugging of the router behaviour, I have applied some fixes to the new lookahead map code. The fixes I have added are:

  1. In add_paths remove the site_pin_delay from the path
  2. Fix the site_pin_delay value that, prior to the fixes, could return an infinite value
  3. allow to have a negative delta. Prior to the fix, the absoulte delta was taken, collapsing the opposite deltas in only one delay, generating a symmetrical lookahead.

I have run the baselitex test, once with the current master+wip and the arch-defs-master (baseline) and the second time with the new lookahead VPR code (https://github.com/acomodi/vtr-verilog-to-routing/tree/new-lookahead-map-fixes) and the archdefs master with some fixes to have it running with the new lookahead map (https://github.com/antmicro/symbiflow-arch-defs/tree/new-lookahead-map-integration) (changes).

Both the tests were run with the latest yosys master+wip which introduced a worsening in CPD and runtime as well.

Baseline:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 75: 29 timing startpoints were not constrained during timing analysis
Warning 76: 3105 timing endpoints were not constrained during timing analysis
   1   29.1     0.0    0 4.0e+08   15283   56484   26759 ( 0.914%)  688970 (10.7%)   19.300     -378.1     -2.634      0.000      0.000      N/A
   2    8.9     4.0    0 9.3e+07   10892   41502    8515 ( 0.291%)  832288 (13.0%)   18.360     -127.1     -1.694      0.000      0.000      N/A
   3    8.3     5.2   13 7.2e+07    6175   22296    4796 ( 0.164%)  879231 (13.7%)   18.293     -138.5     -1.627      0.000      0.000      N/A
   4   10.0     6.8   50 7.8e+07    3674   15065    2616 ( 0.089%)  912145 (14.2%)   18.327     -136.1     -1.661      0.000      0.000      N/A
   5    9.6     8.8   53 7.2e+07    2086    9442    1264 ( 0.043%)  941215 (14.7%)   18.480     -165.4     -1.814      0.000      0.000      N/A
   6    8.4    11.4   80 6.1e+07    1102    5125     651 ( 0.022%)  956935 (14.9%)   18.496     -167.2     -1.830      0.000      0.000      N/A
   7   10.8    14.9   75 7.4e+07     657    2749     368 ( 0.013%)  968462 (15.1%)   18.454     -163.2     -1.788      0.000      0.000      N/A
   8   10.9    19.3   48 7.8e+07     391    1723     229 ( 0.008%)  975072 (15.2%)   18.581     -194.8     -1.915      0.000      0.000      N/A
   9   13.1    25.1   45 8.7e+07     243     983     145 ( 0.005%)  979506 (15.2%)   18.676     -222.5     -2.010      0.000      0.000      N/A
  10   15.8    32.6   38 9.6e+07     179     587     100 ( 0.003%)  983145 (15.3%)   18.676     -238.8     -2.010      0.000      0.000       18
  11   21.8    42.4   28 1.3e+08     131     539      65 ( 0.002%)  986148 (15.4%)   18.676     -238.8     -2.010      0.000      0.000       20
  12   19.0    55.1   23 1.1e+08      90     469      54 ( 0.002%)  988512 (15.4%)   18.676     -239.7     -2.010      0.000      0.000       20
  13   21.6    71.7   16 1.3e+08      76     235      43 ( 0.001%)  990044 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       22
  14   15.0    93.2   19 8.7e+07      58     179      31 ( 0.001%)  990801 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       23
  15    9.9   121.1    9 5.8e+07      43     126      21 ( 0.001%)  991256 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       24
  16    4.6   157.5   10 2.9e+07      37      70      15 ( 0.001%)  991920 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       24
  17    5.0   204.7   11 3.0e+07      24      53      16 ( 0.001%)  992253 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       25
  18    5.3   266.2    5 3.2e+07      26      65      15 ( 0.001%)  992536 (15.4%)   18.676     -239.9     -2.010      0.000      0.000       26
  19    5.5   346.0    4 3.2e+07      21      55      12 ( 0.000%)  992766 (15.5%)   18.676     -239.9     -2.010      0.000      0.000       28
  20    7.2   449.8    4 4.0e+07      20      47      13 ( 0.000%)  993150 (15.5%)   18.676     -239.9     -2.010      0.000      0.000       29
  21    3.1   584.8    4 1.8e+07      15      86       6 ( 0.000%)  993749 (15.5%)   18.676     -239.9     -2.010      0.000      0.000       32
  22    1.7   760.2    5 9954971      10      13       4 ( 0.000%)  994033 (15.5%)   18.676     -239.9     -2.010      0.000      0.000       30
  23    1.7   988.3    1 1.0e+07       8      15       2 ( 0.000%)  994270 (15.5%)   18.676     -240.5     -2.010      0.000      0.000       29
  24    0.8  1284.7    2 4480699       3       5       2 ( 0.000%)  994370 (15.5%)   18.676     -240.5     -2.010      0.000      0.000       28
  25    2.8  1670.2    0 1.5e+07       4      14       1 ( 0.000%)  994397 (15.5%)   18.676     -240.5     -2.010      0.000      0.000       27
  26    0.1  2171.2    1    2193       1       1       0 ( 0.000%)  994363 (15.5%)   18.676     -240.5     -2.010      0.000      0.000       27
Restoring best routing
Critical path: 18.6762 ns
...
# Routing took 252.15 seconds (max_rss 3466.0 MiB, delta_rss +0.0 MiB)

Router run-time: 252.15 seconds
CPD: 18.6762 ns

Changes:

---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Iter   Time    pres  BBs    Heap  Re-Rtd  Re-Rtd Overused RR Nodes      Wirelength      CPD       sTNS       sWNS       hTNS       hWNS Est Succ
      (sec)     fac Updt    push    Nets   Conns                                       (ns)       (ns)       (ns)       (ns)       (ns)     Iter
---- ------ ------- ---- ------- ------- ------- ----------------- --------------- -------- ---------- ---------- ---------- ---------- --------
Warning 305: 29 timing startpoints were not constrained during timing analysis
Warning 306: 3105 timing endpoints were not constrained during timing analysis
   1   47.8     0.0    0 6.5e+08   15283   56484   27050 ( 0.924%)  682596 (10.6%)   20.496     -1235.     -3.830      0.000      0.000      N/A
   2   14.7     4.0    2 1.4e+08   10819   40675    7724 ( 0.264%)  819197 (12.8%)   19.085     -197.3     -2.419      0.000      0.000      N/A
   3   13.5     5.2   14 1.1e+08    5926   21728    4322 ( 0.148%)  859622 (13.4%)   19.064     -209.6     -2.398      0.000      0.000      N/A
   4   13.4     6.8   32 1.0e+08    3416   15002    2288 ( 0.078%)  892266 (13.9%)   19.095     -208.7     -2.429      0.000      0.000      N/A
   5   11.2     8.8   52 8.1e+07    1848    8928    1023 ( 0.035%)  915860 (14.3%)   19.124     -216.9     -2.458      0.000      0.000      N/A
   6    6.5    11.4   46 4.7e+07     941    4213     445 ( 0.015%)  930869 (14.5%)   19.275     -264.7     -2.609      0.000      0.000      N/A
   7    4.2    14.9   58 3.3e+07     419    2688     176 ( 0.006%)  938294 (14.6%)   19.275     -271.1     -2.609      0.000      0.000      N/A
   8    5.2    19.3   26 3.4e+07     199     825      94 ( 0.003%)  942985 (14.7%)   19.275     -270.8     -2.609      0.000      0.000      N/A
   9    5.1    25.1   25 3.3e+07     108     384      59 ( 0.002%)  944694 (14.7%)   19.275     -270.7     -2.609      0.000      0.000      N/A
  10    6.4    32.6   10 4.0e+07      80     251      35 ( 0.001%)  946547 (14.7%)   19.275     -270.9     -2.609      0.000      0.000       14
  11    6.0    42.4   10 3.4e+07      46     105      24 ( 0.001%)  947326 (14.7%)   19.275     -270.9     -2.609      0.000      0.000       16
  12    7.4    55.1    5 3.9e+07      36      92      16 ( 0.001%)  948416 (14.8%)   19.275     -270.8     -2.609      0.000      0.000       16
  13    4.1    71.7    8 2.2e+07      25      67      14 ( 0.000%)  949500 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       18
  14    7.1    93.2    7 3.6e+07      20      48      10 ( 0.000%)  949967 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       19
  15    5.7   121.1    4 2.8e+07      18      43       7 ( 0.000%)  950189 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       20
  16    2.2   157.5    2 1.1e+07      10      29       6 ( 0.000%)  950498 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       20
  17    6.6   204.7    1 2.7e+07      13      38       5 ( 0.000%)  950935 (14.8%)   19.275     -271.3     -2.609      0.000      0.000       21
  18    2.5   266.2    3 9575677       8      66       3 ( 0.000%)  951352 (14.8%)   19.275     -271.3     -2.609      0.000      0.000       22
  19    0.3   346.0    3 1583485       4      11       4 ( 0.000%)  951448 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       22
  20    5.0   449.8    0 1.7e+07       6      27       3 ( 0.000%)  951613 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       23
  21    2.3   584.8    2 8493268       5       9       2 ( 0.000%)  951715 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       24
  22    2.1   760.2    3 7624080       4      11       2 ( 0.000%)  951657 (14.8%)   19.275     -271.2     -2.609      0.000      0.000       24
  23    1.7   988.3    0 7082008       4       5       1 ( 0.000%)  951808 (14.8%)   19.275     -271.4     -2.609      0.000      0.000       25
  24    0.0  1284.7    0    1612       1       1       0 ( 0.000%)  951806 (14.8%)   19.275     -271.4     -2.609      0.000      0.000       24
Restoring best routing
Critical path: 19.2752 ns
...
# Routing took 183.15 seconds (max_rss 3602.4 MiB, delta_rss +0.0 MiB)

Router run-time: 183.15 seconds
CPD: 19.2752 ns

To sum up, the tests performed show a reduce in run-time equal to 27.36% and an increase in CPD equal to 3.10% w.r.t. the baseline.

GitHub
Verilog to Routing -- Open Source CAD Flow for FPGA Research - acomodi/vtr-verilog-to-routing
GitHub
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - antmicro/symbiflow-arch-defs

@litghost
Copy link

To sum up, the tests performed show a reduce in run-time equal to 27.36% and an increase in CPD equal to 3.10% w.r.t. the baseline.

I think this is close enough that we really should work on getting the lookahead changes on upstream. I expect that the reduction is runtime could be converted into the same runtime with same or better CPD by lowing the astar factor.

Overall, great work. Let's start to get your changes in upstream VPR.

@litghost
Copy link

To sum up, the tests performed show a reduce in run-time equal to 27.36% and an increase in CPD equal to 3.10% w.r.t. the baseline.

To be clear, the new lookahead is based on an extension of the upstream map lookahead, comparing against the downstream connection box lookahead?

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 22, 2020

To be clear, the new lookahead is based on an extension of the upstream map lookahead, comparing against the downstream connection box lookahead?

Indeed. Actually the new lookahead uses code from both upstream map and connection box, where the connection boxes are not used anymore.
I'm general:

From upstream

  • constant time expansion of OPIN to CHAN
  • adapted the above point for the CHAN to IPIN delay

From downstream connection boxes:

  • sampling method
  • Dijkstra expansion
  • penalty costs on lookahead entries

@litghost
Copy link

Okay, so the sampling method and Dijkstra expansion (for place delay) should be both go upstream, yes? The penalty cost stuff is something we need to handle separately.

@litghost
Copy link

For my own clarity, all the changes you have that are not upstream only affect the generation of the lookahead files, not in their use?

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 23, 2020

Okay, so the sampling method and Dijkstra expansion (for place delay) should be both go upstream, yes?

Djikstra expansion is actually related to the lookahead generation, when expanding the sample nodes to find the costs. The upstream lookahead uses a different way of expanding nodes which is less performant than the connection box expansion.
The problem I am facing is that, running the regression tests with the current new lookahead generation (sampling + expanding nodes) is that they take way too long to compute the lookahead itself.
I am thinking that it would be best, in order to not interfere with the upstream lookahead map, to set another lookahead option which may be called extended_map which makes use of Dusty's sampling method (which produces more accurate samples, at least for 7-Series).

The penalty cost stuff is something we need to handle separately

To be clear, the penalty cost stuff is related to assigning a cost to the lookahead map entry based on their distance from the origin node, here

For my own clarity, all the changes you have that are not upstream only affect the generation of the lookahead files, not in their use?

The only change in the use of the lookahead file is an addition to get the CHAN -> IPIN delay cost as well as the OPIN -> CHAN. Apart from that, the ways the lookahead is used do not change.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 23, 2020

PR upstream open here: verilog-to-routing#1449

Additional note: the fixes resulted in an increase of the lookahead.bin size from 15M to 55M for the 50T device.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 28, 2020

Update:
I am trying to get the upstream lookahead and the downstream new lookahead as similar as possible.
I have updated verilog-to-routing#1449 to get towards the goal of having less differences between the two approaches.
I am also testing the PR with the quick titan qor to get an understanding on how the changes to the upstream lookahead affect QoR. For what I am seeing by preliminary results, CPD is lower, but run-time to compute the lookahead and routing runtime has increased.

I have encountered one issue though that regards QoR and runtime with the extended lookahead and symbiflow targets. The problem is that I am not able to reproduce the results reported in the above comment.
By building everything using those commits I do get worse results, and I am trying to figure out what is causing this behaviour.
I think that I still need to get in the debugging phase of the new lookahead, as it is clearly still unstable.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 29, 2020

Results differences

I was able to identify the issue related to the differences in the results obtained recently with the one reported in the above comment

The difference is due to the place delay matrix calculation. For some reason, while running the test that got me the results of the new lookahead, the delay matrix that was used was the one generated with the astar method instead of dijkstra. It results, after several experiments, that, with the "new" lookahead, the delay matrix generated with dijkstra performs worse.
The same applies for the connection box lookahead. I have noticed that, even though the dijkstra expansion for the delay matrix is already in the master+wip, it is not currently enabled in archdefs, therefore, the delay matrix uses the original astar method (which makes use of the lookahead to get the delay results).

For what I have seen. the dijkstra delay matrix generation, instead, uses no lookahead to get the delays, and I think it makes sense as otherwise we would fall back to use the astar method to calculate the delays.

Other issues:

  1. CHANX/CHANY distinction

During debugging, I also found a bug with the new lookahead code.
Instead of the connection box ids, the new lookahead distinguishes between CHANX and CHANY nodes to get the cost entry for a specific node -> SINK connection.

The lookahead matrix is indexed as follows:

cost_map[seg_index][chan_type][deltax][deltay]

The problem was that CHAN type was always the same. This was resulting by the fact that I have erroneously got the rr_type from the target node instead of the source node. This bug got undetected as the node type was wrong both when generating the lookahead and when querying it to get the cost entry.

By fixing this issue and having a disctinction between CHANX and CHANY of the source node, the router performs worse than the case in which CHANX and CHANY were merged in a unique entry.

  1. Huge time to calculate the delay matrix with the astar method

The other thing I debugged was the aster method to get the delay matrix huge runtime.
When calculating the delay matrix for the 50T, there are some unroutable connections that appear when trying to connect sources and sinks of the CLBs to all the other nodes.

For instance, I get the following unroutable connection:

Cannot route from BLK-TL-CLBLL_L[0].CLBLL_LL_A[0] (RR node: 31398 type: SOURCE location: (11,2) class: 31 capacity: 1) to BLK-TL-HCLK_IOI3[0].HCLK_IOI_IDELAYCTRL_REFCLK[0] (RR node: 2927656 type: SINK location: (114,133) class: 5 capacity: 1) -- no possible path

Clearly, there CLBs cannot be connected to the REFCLK of the IDELAYCTRL as that pin is within the clocking network and can be connected to only a clock tile.

In the case of using the connection box lookahead, the excpected cost returned is equal to infinite. This happens here

if (to_node_type == SINK) {
const auto& sink_to_ipin = device_ctx.connection_boxes.find_sink_connection_boxes(to_node_ind);
float cost = std::numeric_limits<float>::infinity();
// Find cheapest cost from from_node_ind to IPINs for this SINK.
for (int i = 0; i < sink_to_ipin.ipin_count; ++i) {
cost = std::min(cost,
get_map_cost(
from_node_ind,
sink_to_ipin.ipin_nodes[i], criticality_fac));
if (cost <= 0.f) break;
}
return cost;
}

I suppose that the for loop never gets executed and the lookahead results to be infinite, with the router exiting and returning an unroutable connection and getting to the next delay.

In the case of the new lookahead, instead, the expected cost does not result to be infinite and the cost entry appears to be valid. Therefore the router tries to find a path, taking a huge amount of time before understanding that the connection is unroutable.

Under these newly discovered data, I am unsure how to proceed.
Regarding the CHANX/CHANY distinction, I think that collapsing the cost entries in one entry only is beneficial, but the new lookahead greatly depends on the placement delay matrix.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 29, 2020

@litghost I think that, with the new information I got and summarized above, there are still issues to be solved with the lookahead if we want to stop using the connection boxes.

Currently, I am unsure of how to deal with the place_delay_matrix issue.

@litghost
Copy link

@litghost I think that, with the new information I got and summarized above, there are still issues to be solved with the lookahead if we want to stop using the connection boxes.

Currently, I am unsure of how to deal with the place_delay_matrix issue.

The existing baseline usage of the connection box lookahead should be using Dijkstra for computing the place_delay_matrix, so we should leave that variable fixed.

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 29, 2020

Ok, we should then enable that in archdefs, as it is not currently set. From what I have seen, this will likely result in a worsening of QoR and runtime

@litghost
Copy link

litghost commented Jul 29, 2020

Ok, we should then enable that in archdefs, as it is not currently set. From what I have seen, this will likely result in a worsening of QoR and runtime

It already is enabled, see https://github.com/SymbiFlow/symbiflow-arch-defs/blob/f0e7b4212544e1d55da776fb7a2ff79117e01454/xc/common/cmake/arch_define.cmake#L64

GitHub
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs

@acomodi
Copy link
Collaborator Author

acomodi commented Jul 29, 2020

Actually I think we might have missed to upgrade also the device_define parameters, hence the behaviour I have seen:
https://github.com/SymbiFlow/symbiflow-arch-defs/blob/f0e7b4212544e1d55da776fb7a2ff79117e01454/xc/common/cmake/device_define.cmake#L357

GitHub
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs

@litghost
Copy link

Good catch, let me review the CI logs.

@litghost
Copy link

litghost commented Jul 29, 2020

Actually I think we might have missed to upgrade also the device_define parameters, hence the behaviour I have seen:
https://github.com/SymbiFlow/symbiflow-arch-defs/blob/f0e7b4212544e1d55da776fb7a2ff79117e01454/xc/common/cmake/device_define.cmake#L357

GitHub**SymbiFlow/symbiflow-arch-defs**FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs

You are right, but this was an oversight. I believe the cache args were overlooked when we incorporated the Dijkstra expansion upstream :(

GitHub
FOSS architecture definitions of FPGA hardware useful for doing PnR device generation. - SymbiFlow/symbiflow-arch-defs

@acomodi
Copy link
Collaborator Author

acomodi commented Aug 3, 2020

I am currently running tests and trying to get better results out of the extended lookahead map.
So far, the tests I have performed are based on a master+wip VTR version which has been rebased on top of the current VTR upstream master and, on which, I have applied the extended lookahead as well.

Even though the VTR version is the same for both changes and baseline tests (which generates same place delay matrix and same packing output), the placer seem to produce different results when building the two set of tests.

To overcome this issue, and given that the packed netlist is the same in both the test sets, I could use the placer output of the changes test set for the baseline test set.

Below there is a summary of these tests:

Run Time (s) Baseline Changes Ratio
murax 7.55 13.75 1.68x
picosoc 18.18 30.68 1.68x
mini litex 22.67 35.37 1.56x
mini litex ddr 71.86 117.52 1.63x
litex linux 333.26 293.20 0.87x
Critical Path (ns) Baseline Changes Ratio
murax 13.3948 13.3404 0.995x
picosoc 15.8749 15.8356 0.997x
mini litex 13.3472 13.421 1.005x
mini litex ddr 18.3625 18.2212 0.992
litex linux 19.1891 18.4803 0.963x

The run-time hit is still present, with the exception of the baselitex design, while the CPD is mostly similar in both the tests set.

Regarding the VTR regression tests, even by fixing the CHANX/CHANY collapse issue, the some of the tests end up failing the QoR checks.
I am currently trying to get an implementation that keeps the old upstream lookahead generation and usage, while having the two solutions (lookahead map and extended lookahead map) as similar as possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants