-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
UART DDR design state #1301
Comments
@litghost FYI |
Clock PessimismThere is a value that is used during timing analysis in Vivado which is Clock pessimism, is used to delete this differences, which is fictitious and not real. This can be seen in the following example:
The arrival time calculation until the BUFGCTRL output, report a Clock pessimism should delete their difference. This should not affect VPR timing analysis, as the timing is already being calculated starting from the BUFGCTRL output. In fact the same path is reported as follows by VPR:
|
BUFHCE incorrect routingThere is a situation for which a clock is being routed incorrectly between BUFHCEs. The incorrect routing stands in the fact that the BUFHCE on the right's input should be routed from a BUFG. Being the clock the same, the BUFHCE on the right should get the clock from the same pip as the left BUFHCE. Instead VPR chooses a longer route through the whole horizontal clock to get to the left CMT and go back to the BUFHCE on the right. This adds a huge delay in timing analysis, which is actually not captured by VPR: Vivado report
VPR report
Timing delay to route to the FDRE/C endpoint in required time calculation:
The same |
The Vivado timing report shows a WNS of |
@litghost I forgot to mention that to output more detailed timing reports for VPR I added the following parameters:
|
I believe the Vivado timing analysis requires additional constraints, can you please provide what constraints you used? |
@litghost just the clock creation.
Another thing to add is actually that there should be false paths set for the reset Flip Flops, because they are asynchronous resets. The xdc file from the minitest uses these constraints:
ars_ff1 and ars_ff2 are related to asynchronous flip flops for the asynchronous resets signals. |
One thing we should probably add to the |
As a random thought, did we try increasing the criticality of nets touching clocks yet? I expect it might have a positive effect on #1301 (comment) |
I believe the problem is that the ISERDES and OSERDES is missing all of it's timing model, both in prjxray-db and in symbiflow-arch-defs. The bad timing paths from these missing paths. |
I've added I/OSERDES timing data to the prjxray-db 007 fuzzer in f4pga/prjxray#1229 |
I've identified the same timing arc in each analysis, and VPR's model is wrong. Reports are attached: |
Basic analysis looks like an under-predict of the interconnect delay. The tcl function
|
By running VPR with some parameters I could get better timing results out of Vivado (still violating the setup)
With these settings I got the worst setup path to be |
There is absolutely zero point in tuning parameters until the timing model is fixed. |
Initial fixes to Initial timing worksheet from updated scripts: timing_ws.zip |
Turns out trying to have ~6500 timing calculations in 1 spreadsheet doesn't work great. I've added filtering to the timing worksheet generator, and here is the filtered spreadsheet: |
I may have identified the problem, and I'm working on testing a solution. |
Using the new timing data from f4pga/prjxray#1238 and this commit, I believe we have a design that closes timing. I do not have access to an Arty right now, so I have attached the output from the P&R. @acomodi / @mithro if you could test the bitstream to see if it is working, that would be awesome. Vivado is reporting 4 very small hold violations, but I hope the hold violations are small enough to be within timing model margins:
|
@litghost @kgugala @mithro I have noticed that in the FASM there were some wrong PULLTYPE settings, meaning that a normal Yosys+Vivado flow was setting some PULLTYPE features to NONE for the DDR related IOBs, while SymbiFlow still does not support those. By manually updating the wrong PULLTYPES from PULLDOWN to NONE in the top.fasm in the zip archive, and by regenerating the bitstream with fasm2frames and frames2bit, the DDR test now works on HW as expected! |
And here are the bitstream and fasm files from @acomodi |
Correction
This was actually a wrong statement. I have double checked and most probably the bistream I used to perform the test was a Vivado generated one. By repeating the manual procedure and replacing the PULLTYPEs, the bitstream does not work yet, unfortunately. |
To try to understand what may be wrong I have performed a series of tests to understand what the issue may be, as now timing is met. To do so I have first made sure that fasm2bels did correctly work and produced working bitstreams following this flow:
With the above test, it is verified that fasm2bels produces valid netlists that Vivado is able to implement, obtaining working bitstreams. Then I proceeded with going through the same flow, starting from VPR generated FASMs:
I have run a series of symbiflow tests with this approach, and, for now, the only one not working on HW is still the DDR one, even though timings were met. This can lead to the conclusion that there may be something wrong with the architecture model that we currently have. To perform this kind of tests I have also created an automated script to speed up this process: f2b_vivado |
I created a diagram at https://docs.google.com/drawings/d/1NJlN-cPLNx4nULHiL4938RD-H14izpayqtNSQ4XRjfA/edit which I believe should match what you are attempting @acomodi ?
|
@mithro Yes, this diagram describes the tests I have performed. There are some problems though when adopting this flow, that I have worked around in f2b_vivado:
|
I believe all remaining issues have been resolved, and the UART DDR is working consistently on hardware |
This issue is to track the state of the UART DDR design that is currently under PR: #1294.
Though CI goes green, the implementation still does not properly work on HW.
How to verify it works on HW
The memory calibration should find some correct delay and bitslip values, such as the following (obtained with a 50 MHz working UART DDR implemented design):
There is also the possibility to manually
read/write
to memory after the calibration delay has completed (the script selects the first valid bitslip and delay values).Current state
There are several possible issues with the SymbiFlow implementation that affect the results. Here are listed the main ones:
Inverted clock for ISERDESEs
The design is mainly driven by two clocks, a fast one (200MHz) for IOSERDESes, and a slow one (50 MHz), used in the system.
In addition, the ISERDESEs make use of an inverted fast clock, which is fed into the CLKB port.
There are two ways in which this inverted clock is handled:
Poor placement
By inspecting the
.dcp
file generated after fasm2bels, the placement chosen by VPR seems to be off, as it spreads all the blocks in a column style, instead of having a more compact result:This can result in critical blocks to be placed far way from each other (such as the white path in the image, corresponding to the clk200 reset that drives the reset signal of the IDELAYCTRL)
Timing violations
By extracting the timing summary from Vivado after the fasm2bels step, there are several endpoints which fail in the Setup/Hold timing analysys (here an example from the timing summary):
Below there is a tar containing the various timing reports, both from Vivado and VPR. The commit hashes of the various tools are the following:
master+wip
WIP: Update to VPR (including router refactor) SymbiFlow/vtr-verilog-to-routing#393. The commit has is SymbiFlow/vtr-verilog-to-routing@21df3a5The tarball containing reports is here
Notes
E3
pin in this case).The text was updated successfully, but these errors were encountered: