New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Quicklogic: Router run time very high #1486
Comments
By settting --max_criticality to 0 all these designs route very fast.. Attached the log for one of the designs with that setting |
This points to a bad timing model. Has the timing model present been validated? |
One thing that also may affect the runtime is the number of RR graph nodes. The graph for EOS S3 has roughly 6-7 times more nodes than the graph for the iCE40 device. This is mostly due to the fact that when building the graph for EOS S3 certain connection rules are enforced that not necessary have to be obeyed. For example when connecting two parallel CHANX nodes, an intermediate CHANY node is inserted. This costs adding one extra node and edge to the graph. The function responsible for doing that can be found there in the code: https://github.com/antmicro/symbiflow-arch-defs/blob/fba277748427f7338128e586db61d14607d0da35/quicklogic/utils/routing_import.py#L144 I'd first suggest replacing content of that function by a single call to
|
@mkurc-ant, I remember you mentioning that within switchbox each N input Mux is represented as N+2 nodes and N+1 edges to mode variable load timing. Could this also be causing extra number of nodes? |
@tpagarani That's true. You should only need N nodes and N edges per mux. Actually, since we assume all driver resistances to be 1ohm we could "integrate" these resistances into the switches that model sinks. That would require refactoring of the VPR database generation script (switch types have to be modified) and the routing import script (different graph topology). |
@mkurc-ant , I am thinking that if we don't model the input load based delay for each mux and just consider one single delay (worst one) through MUX, then we can represent the whole switchbox will less number of nodes and edges. For example, in STAGE1 there are 3 switches with each switch consisting of 8 , 6-input Mux. Since all 8 Muxes are sharing same inputs coming from STAGE0, we can direcly connect output nodes of STAGE0 with output nodes of STAGE1. Is that possible? |
@tpagarani So If I understand you correctly, you want to do it as in the "B" part of the picture below: If you don't want to have the delay varying with the number of active inputs then that is doable. In the other case you double check if you can build the timing model with that topology using available VPR constructs (switches). |
Representation B is more efficient and should still be able to model load-dependent delays by using the Cinternal keyword when creating the switches used in the relevant parts of the rr-graph. See https://docs.verilogtorouting.org/en/latest/arch/reference/#switches When you have a Cinternal specified on the switches used to go from stage N to stage N+1 in figure B (for example):
|
The router run time for Quicklogic S3 device seems to be significantly slower than run time on ice40 device of around similar size. Please find attached VPR logs for 3 different designs.
S3 log files
vpr_stdout.log
vpr_stdout.log
vpr_stdout.log
Ice40 was run from the master branch
Ice40 log files
vpr_stdout.log
vpr_stdout.log
vpr_stdout.log
Looking at the logs it appears that for S3 device, router starts with Net Criticality = 1 for all the connections, which ends up with very high TNS.
The text was updated successfully, but these errors were encountered: