Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: Expand and document lib.cdc #47

Closed
nmigen-issue-migration opened this issue Mar 4, 2019 · 32 comments
Closed

WIP: Expand and document lib.cdc #47

nmigen-issue-migration opened this issue Mar 4, 2019 · 32 comments

Comments

@nmigen-issue-migration
Copy link

Issue by Wren6991
Monday Mar 04, 2019 at 21:06 GMT
Originally opened as m-labs/nmigen#40


Very much a work in progress, but I wanted to open early to get feedback on the approach/direction. I'm porting/rewriting some of the missing modules from migen.genlib.cdc.

Currently in this patch set:

  • Add docstrings to the existing modules. I tried to copy the "house style" I saw elsewhere.
  • Basic checking to give more meaningful traces when you do something like request a MultiReg of length 0. Again I tried to copy the idioms I saw elsewhere
  • Implement PulseSynchronizer
  • Implement Gearbox

Known issues:

  • Smoke tests only; no formal checks
  • Checks are limited by single-clock simulation -- maybe it was a bad idea to play with the CDC library first :)
  • I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)
  • The storage size calculation in this version of Gearbox is different to the one in Migen. IMO the old one was unsafe, but this one may be too conservative. I could also be plain wrong!
  • (edit:) The output mux on the Gearbox is driven from two clock domains. This is a legitimate thing to do here, but I saw mention somewhere that doing this should cause an error.

I definitely still intend to port:

  • BusSynchronizer
  • ElasticBuffer

And we probably ought to do something about GrayCounter at some point, but I think it's less important than the others.

As a general style question, there seem to be two ways of doing module wiring in nmigen.lib at the moment. One is to pass external signals into __init__, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in __init__, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?


Wren6991 included the following code: https://github.com/m-labs/nmigen/pull/40/commits

@nmigen-issue-migration
Copy link
Author

Comment by codecov[bot]
Monday Mar 04, 2019 at 21:15 GMT


Codecov Report

Merging #40 into master will decrease coverage by 0.95%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #40      +/-   ##
==========================================
- Coverage    82.3%   81.34%   -0.96%     
==========================================
  Files          34       32       -2     
  Lines        5565     5158     -407     
  Branches     1190     1105      -85     
==========================================
- Hits         4580     4196     -384     
+ Misses        852      829      -23     
  Partials      133      133
Impacted Files Coverage Δ
nmigen/lib/fifo.py 98.76% <ø> (+7.94%) ⬆️
nmigen/lib/cdc.py 100% <100%> (+16%) ⬆️
nmigen/back/verilog.py 61.9% <0%> (-5.67%) ⬇️
nmigen/build/run.py 28.33% <0%> (-2.05%) ⬇️
nmigen/compat/genlib/fsm.py 56.11% <0%> (-1.64%) ⬇️
nmigen/compat/fhdl/verilog.py 69.23% <0%> (-1.36%) ⬇️
nmigen/compat/fhdl/module.py 71.96% <0%> (-1.02%) ⬇️
nmigen/compat/fhdl/specials.py 29.88% <0%> (-0.46%) ⬇️
nmigen/hdl/dsl.py 99.28% <0%> (-0.42%) ⬇️
nmigen/tracer.py 94.44% <0%> (-0.16%) ⬇️
... and 23 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 51269ad...2e6cf3c. Read the comment docs.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Monday Mar 04, 2019 at 21:38 GMT


Last commit is just to fix a small coverage blackspot in the existing code. I was having far too much fun with the graphs on Codecov

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Tuesday Mar 05, 2019 at 08:55 GMT


BusSynchronizer is as per Migen except:

  • number of synchronisation stages is optionally parameterised
  • timeout retry is optional by setting timeout to 0 (since it is a bit risky, can cause metastabilities in capturing domain)
  • Timer is not factored out into a module. nMigen does not have a misc library right now

Again it needs formal checks, and tests are devalued by running them in one clock domain.

Edit: updated the commit. Previously there was a clock-enabled odomain register driven by a clock-enabled idomain register. This is ok for CDC purposes, because the CDC-safe handshake controls the clock enables, but I've also put in a (shortened) MultiReg between these registers, because "all cross-domain signals go through a MultiReg" seems like a useful property.

Edit: squashed in another small change: making sure that sync_stages gets passed on to the MultiReg for the width=1 special case.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Tuesday Mar 05, 2019 at 11:40 GMT


Add ElasticBuffer. Faithful reproduction of the Migen version, except without the reset synchronisers (since these would normally be per-reset-domain, and there is still some code motion wrt reset/clock domains in nMigen as a whole).

Edit: also this elastic buffer is unusual, usually there is some provision for slipping the pointers to account for long-term unbounded phase change. This would be more light-weight than an AsyncFIFO but only works for a few PPM difference. I'm implementing as-is because something like that would have a different interface to current ElasticBuffer.

Force push due to finding small issues in earlier commits, e.g. gearbox increment didn't work for non-pow-2 chunk counts (which can't be tested in one clock domain, I just spotted it while reading the code). I've tried to keep it so that each commit can be separately checked out, tested and reviewed. If you'd rather I just push fixes, let me know :)

Also, I've copied the _incr used in lib.fifo. Might be worth factoring this out, but that's outside of this PR.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Tuesday Mar 05, 2019 at 14:32 GMT


  • I'm passing in the domain names rather than using DomainRenamer. Not sure whether it's best to leave this PR open until this is in place, or merge early so people can start hacking on this. Not my decision :)

I'll try to fix DomainRenamer to work on Modules as soon as possible!

As a general style question, there seem to be two ways of doing module wiring in nmigen.lib at the moment. One is to pass external signals into __init__, which gives you more of a Verilog-style instantiation. The other is to create "port" signals in __init__, which the parent then wires up during elaboration. I've used the second style because it seems a bit cleaner, and doesn't require creating extraneous signals like in Verilog, but not sure if this is the right thing to do?

I use the second style, with some exceptions like MultiReg. The reason is that the signals aren't always completely independent, for example you might want to pass a width and have a set of signals with that width, double that width, etc created for you. Also, constructors with a large number of arguments are unwieldy and bug-prone, especially if you change the constructor later.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Tuesday Mar 05, 2019 at 17:52 GMT


I'll try to fix DomainRenamer to work on Modules as soon as possible!

Ok, no sweat, I'll rebase + fix as and when :)

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Sunday Apr 07, 2019 at 20:10 GMT


Rebased on latest master and fixed up docstrings:

  • Reflowed to 100 characters
  • Fixed ReST formatting
  • Americanized spellings to be consistent with class names
  • Some edits for clarity

I also stared at the code for a while, only one minor fix which was something that really oughtn't be retimed, but didn't have the no_retiming attribute. Not saying there's nothing to fix, just that it's reached the point where squinting at my own code is nonproductive.

Domains are passed as strings, will fix this after the xfrm.py changes go in.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Apr 10, 2019 at 00:32 GMT


@Wren6991 Done!

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday Apr 10, 2019 at 01:05 GMT


Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.

For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:

  • Matches the signal naming convention used in e.g. lib/coding.py. Makes sense for i to be in "i", o in "o" etc
  • Having domains be the same number of characters is super nice. I have bad habits from RTL of lining up blocks of assignments in my code, and it's nice if this happens without inserting whitespace
  • DomainRenamer({"i": "gpu", "o": "display"})(MyCDCModule()) is just a bit pithier

Was just something that occurred to me when editing the files, it's not up to me.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Apr 10, 2019 at 01:13 GMT


Thanks :) I've fixed up PulseSynchronizer and Gearbox so far. Unfortunately I'm away on a trip at the moment and don't have all the necessary tools set up on my work laptop, so it'll take me a while to get the fixes tested and pushed.

For consistency I'm using "read" and "write" for default domain names (matching AsyncFIFO), but there is a not-totally-frivolous argument to use something like "i" and "o" instead:

Now that I think more about it, I think it doesn't make sense to use DomainRenamer for synchronization primitives. The reason being, for these primitives, you always want to rename them, so requiring each caller to do that work is clearly superfluous, and it results in unnecessarily hard to read code too.

There's a good argument that PulseSynchronizer should work the same as AsyncFIFO. I agree. I think AsyncFIFO should use explicit domain specification as well! Pretty much every instance of AsyncFIFO is immediately followed by a DomainRenamer call.

WDYT?

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday Apr 10, 2019 at 01:27 GMT


So a flow could be:

  • If you are writing a block of synchronous logic, you use sync, and then apply DomainRenamer to yeet it into the correct domain
  • Knowledge of domains is confined to a specific region of your code, which places synchronous blocks into the correct domains and also glues them together with CDC modules
  • Since CDC + DomainRenamer is actually a single constructor in a trenchcoat, skip the extra step and pass domains in directly

If so, that feels cleaner, yeah!

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Apr 10, 2019 at 01:30 GMT


Yep!

to yeet it into the correct domain

Incredible.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday Apr 10, 2019 at 06:34 GMT


Awesome. I guess the next step for this PR is a lot more testing then. I did note #28, which I guess I could start to take a look at at some point, as this bumps against it. It would be good to learn about that part. Formal would be good too.

For CDC it would also be cool to do some soak testing on real FPGAs, to catch genuine CDC bugs rather than just state machine bugs (albeit they're difficult to debug once found). Not sure if this is something you already had ideas on.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Apr 10, 2019 at 07:03 GMT


For CDC it would also be cool to do some soak testing on real FPGAs

The CDC primitives in oMigen are extremely well tested. I think we mainly need to make sure the nMigen ones don't stray from them, like it happened with AsyncFIFO...

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday Apr 10, 2019 at 07:40 GMT


Ok that's a good point. Will read through the oMigen library again later today. For the most part it's pretty faithful but mistakes creep in!

There are things in the old library that seemed odd though, e.g. for Gearbox with lcm(iwidth, owidth) == iwidth, the write port will have depth 2. The pointers start 180° apart, but there is no guard period between final o read on an ichunk and the next write to that ichunk, or conversely, first read from an ichunk and the preceding write. You're at the mercy of dice roll on the two reset synchronisers. I'm currently using a different storage size calculation because of this (it can end up larger or smaller than oMigen depending on ratio).

I'll definitely make sure the logic is all ok, and then maybe raise questions about things like the above.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Apr 10, 2019 at 07:53 GMT


Sounds good!

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday Apr 17, 2019 at 16:36 GMT


I did a side-by-side of oMigen vs current state of my code

  • PulseSynchronizer
    • Actual logic identical
    • Number of sync stages in MultiReg is parameterised.
    • Currently I don't have the reset_less attribute on the non-CDC flops in this module, but the MultiReg defaults to reset_less=True. It would probably be better to parameterise this, and either have everything or nothing reset_less
  • BusSynchronizer
    • Handshake logic is identical: req/ack with two PulseSynchronizers
    • Number of sync stages parameterised
    • Option to disable timeout, as it has small chance of temporarily breaking the handshake for oclock << iclock (or oclock sometimes gated); multiple reqs can be in flight!
    • I removed one layer of flops from data CDC path. See diagram
    • Again some confusion over what should/should not be reset_less. Currently all synchronous logic is resettable but the MultiRegs are not
    • Found an issue with my code: ibuffer clock enable was driven by only _pong.o (ack, resynced to idomain) in oMigen, but mine was more permissive, and also enabled on timer restarts and start-of-day flag. Think this is ok, but I've fixed it to be exactly like oMigen.
  • ElasticBuffer
    • Logic is faithful reproduction of oMigen
    • oMigen code also has some reset structure inside. It ORs together the reset from each domain, and resynchronises this new reset internally. I think this is problematic because reset assertion is asynchronous, so e.g. an input-domain reset can inject metastabilities into the output domain. This is better handled by proper reset sequencing at system level. WDYT?
  • GearBox
    • Same comment on resets as ElasticBuffer
    • Actual logic is faithful to oMigen
    • Storage size calculation is different in my version: oMigen takes LCM of two widths, and doubles it unconditionally. This is unnecessarily large for many coprime widths (e.g. 5:4 for video) and for lcm(i, o) in (i, o) it seems too small to me. My code starts with lcm(i, o) and doubles it until the depth of each port is at least four, logic being that this gives you one chunk of no-man's-land on each side between the chunk you are reading/writing and the chunk the other side is writing/reading. WDYT?

BusSynchronizer datapath

I drew a rough box diagram of oMigen BusSynchronizer:

image

My problem with this is that there is a 3-flop delay for both the request and the data. (1 flop in idomain, two in odomain). This means there is a race between the obuffer clock enable and the data. I.e. due to bit skew observed at inputs to the req and data MultiRegs, it seems possible to have req = 1 at the output but garbled data.

I've fixed the race by removing one stage from the datapath MultiReg. Note there are still two non-retimable layers of flops in odomain on the datapath, and also, sync_stages can be increased to stretch out the entire BusSynchronizer.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday May 01, 2019 at 08:50 GMT


@Wren6991 Sorry, I completely missed your update, GitHub decided to not send me an email for it for some reason.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Wednesday May 01, 2019 at 12:00 GMT


Do you agree on my diagram of oMigen BusSynchronizer? I might have misunderstood e.g. the PulseSynchronizer logic. Haven't looked at it for a while now.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday May 01, 2019 at 12:03 GMT


I'll need to think more about it as I wasn't the one who wrote it. @sbourdeauducq ?

@nmigen-issue-migration
Copy link
Author

Comment by sbourdeauducq
Thursday May 16, 2019 at 02:57 GMT


I've fixed the race by removing one stage from the datapath MultiReg.

But then, how do you preserve the MultiReg constraints between the anti-metastabilty registers?

Sounds easier to add one register of delay into the request path.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Tuesday Jun 11, 2019 at 15:52 GMT


@sbourdeauducq You're 100% right -- it's possible to just falsepath/MCP the connection between write data buffer and the MultiReg, but it's cleaner if it can just rely on constraints on paths internal to the MultiReg.

Added a patch which restores the data path to its original length, and lengthens the request path to compensate.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Tuesday Jun 11, 2019 at 16:09 GMT


Fixed merge conflict and made classes derive from Elaboratable (sorry for force-push)

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Wednesday Jun 12, 2019 at 19:56 GMT


Thanks, I'll take another look soon.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Friday Jun 28, 2019 at 05:49 GMT


Updated naming conventions according to #97. I've also updated the only use of lib.cdc elsewhere in the standard library (in lib.fifo) to reflect this. Sorry for being unresponsive, day job has been busy

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Friday Jun 28, 2019 at 05:52 GMT


That's fine, I have a lot of other work here, so you're not really blocking anything.

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Friday Jun 28, 2019 at 06:35 GMT


I saw this in #113

CDC improvements like CDC safety (#4) and automatic false path constraints (#87; depends on #88). These are definitely needed, but there are still plenty of designs that will be just fine without them, and I think for most people waiting to pick up nMigen they wouldn't be an issue. Again, these are more of 0.2 issues.

And have also noted #87. Wanted to mention that falsepath is not necessarily safe for all of these primitives: in particular, BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.

Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Friday Jun 28, 2019 at 06:40 GMT


Wanted to mention that falsepath is not necessarily safe for all of these primitives

That seems very troublesome, and certainly something we want to loudly warn about. Would you be able to prepare a detailed evaluation of all of our CDC primitives with this in mind?

BusSynchronizer is unsafe if there is too much skew between the request path and the datapath, and false paths can accumulate huge routing delays when FPGA utilisation is high.

Are there any ways to work around this?

Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup. This isn't perfectly safe (clock skew), and does make closure harder, but might be worth considering.

This seems like a dramatic increase in complexity of platform code. It's an option, but maybe we can avoid it completely...

@nmigen-issue-migration
Copy link
Author

Comment by whitequark
Friday Jul 05, 2019 at 13:29 GMT


Relevant: m-labs/migen@f4979a2#commitcomment-34195474

@nmigen-issue-migration
Copy link
Author

Comment by Wren6991
Friday Jul 12, 2019 at 22:21 GMT


Relevant: m-labs/migen@f4979a2#commitcomment-34195474

That is an interesting handshake, but seems to overlap quite a lot with BusSynchronizer. I have had the occasional shower thought that BusSynchronizer should have an optional valid/ready handshake in the launching domain (and perhaps a valid pulse in the capturing domain), which would give you something similar to this BlindTransfer.

I'm also happy to just port BlindTransfer wholesale!

@nmigen-issue-migration nmigen-issue-migration added this to the 0.2 milestone Jan 27, 2020
@whitequark whitequark removed this from the 0.2 milestone Feb 13, 2020
@whitequark
Copy link
Member

Since we want the widest possible toolchain support, one option is a maxdelay constraint of capture period minus setup.

I think we actually have maxdelay constraint now on Vivado platforms.

awygle added a commit to awygle/nmigen that referenced this issue Feb 16, 2020
PulseSynchronizer ported from issue amaranth-lang#47, which was ported from oMigen.
@whitequark
Copy link
Member

Adding more functionality to the standard library today would require going through our RFC process, so I'm closing this PR as obsolete. Thank you for your effort, @Wren6991!

@whitequark whitequark closed this as not planned Won't fix, can't repro, duplicate, stale Feb 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants