Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strange simulator behavior: clock signal pausing while sync blocks are speeding up #554

Closed
hansfbaier opened this issue Dec 3, 2020 · 12 comments

Comments

@hansfbaier
Copy link

hansfbaier commented Dec 3, 2020

I have the following simulator configuration:

receiver = ADATReceiver()
clk_freq = 120e6
# 24 bit plus the 6 nibble separator bits for eight channel
# then 1 separator, 10 sync bits (zero), 1 separator and 4 user bits
adat_freq = 48000 * ((24 + 6) * 8 + 1 + 10 + 1 + 4)
clockratio = clk_freq / adat_freq
sim = Simulator(receiver)
sim.add_clock(1.0/clk_freq, domain="sync")
sim.add_clock(1.0/adat_freq, domain="adat")
print(f"clock ratio: {clockratio}")

cycles = 10

def sync_process():
    for _ in range(int(clockratio) * cycles):
        yield Tick("sync")

def adat_process():
    testdata = one_empty_adat_frame() + generate_sixteen_frames_with_channel_numbers_in_most_significant_nibble_and_sample_numbers_in_sample()
    for bit in testdata[224:512 * 2]:
        yield receiver.adat_in.eq(bit)
        yield Tick("adat") # if I don't put this here, the adat signal is constant zero

sim.add_sync_process(sync_process, domain="sync")
sim.add_sync_process(adat_process, domain="adat")
with sim.write_vcd('receiver-smoke-test.vcd', traces=[receiver.adat_in, receiver.clk_in, receiver.adat_clk_in]):
    sim.run()

When I simulate this, I have strange parts, where the clock of domain sync seems to pause,
but at the same time a counter in that domain seems to speed up:

image

What is going on here? Am I using the simulator in a wrong way, or is this a bug in the simulator?

@whitequark
Copy link
Member

I have absolutely no idea how you managed to get this behavior; I'm assuming a simulator bug.

@whitequark
Copy link
Member

whitequark commented Dec 3, 2020

Please attach the full code necessary to reproduce this issue.

@hansfbaier
Copy link
Author

hansfbaier commented Dec 3, 2020

@whitequark I just pushed my code. It is my first nontrivial FPGA project, so I am pretty much a newbie here. It is also my first nmigen project.
https://github.com/hansfbaier/adat-fpga

@hansfbaier
Copy link
Author

@whitequark Could you reproduce the behavior with the code I pushed above?

@whitequark
Copy link
Member

Sorry, I have only got around to it now. I get this:

Traceback (most recent call last):
  File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/whitequark/adat-fpga/adat/cli.py", line 62, in <module>
    main()
  File "/home/whitequark/adat-fpga/adat/cli.py", line 29, in main
    args.with_icache, args.icache_nways, args.icache_nlines, args.icache_nwords,
AttributeError: 'Namespace' object has no attribute 'with_icache'

@hansfbaier
Copy link
Author

hansfbaier commented Dec 14, 2020 via email

@hansfbaier
Copy link
Author

image
The behavior seems to happen whenever combinatorial logic operates. Then the clock takes a gasp,
but synchronous changes happen anyway, but with "squeezed" timings.
Fortunately that did not hinder me completing the receiver. Just pushed all the changes. The design probably can be much simplified, but it is my first non trivial FPGA project, and it just passed the smoke test, wohoo!

@whitequark
Copy link
Member

The design probably can be much simplified, but it is my first non trivial FPGA project, and it just passed the smoke test, wohoo!

Congratulations! I haven't been able to look into this issue yet as I've been quite busy working on CXXRTL, but it's something that I think is very serious, so I'll make sure to address it before the next release.

@hansfbaier
Copy link
Author

hansfbaier commented Jan 19, 2021

As the repository mentioned above is a moving target,
here is a snapshot to reproduce the issue:

clk-bug.tar.gz

Steps to reproduce:

  1. cd adat
  2. python3 receiver.py
  3. open the receiver-smoke-test.vcd with gtkwave and look where clk_in drops out

@andresmanelli
Copy link

I also noticed this. In my case, only the next clock edge is executed (no more than one extra transition).

The code in the zip is quite straight forward: There are two clocks that go through a mux, and for a couple of cycles the output clock is inactive when switching clocks. At the beginning of this period the "off-by-one" error is clearly visible.

I added markers to the following image:

  • A -> Clock 1 stops
  • B -> Extra transition
  • C -> Clock 2 stops
  • D -> Extra transition

Screenshot_20210928_220100

Same as @hansfbaier , run as python clock_bug_test.py. Tested with nmigen 0.3.dev261+g9f78ac0.

clock_bug_test.zip

@wanda-phi
Copy link
Member

I have debugged both examples. For the original bug report, the issue comes from the following interaction:

  • the clock signal of sync domain is sync_time_detector.clk_in
  • the clock signal of sync is driven by the simulator by add_clock
  • sync_time_detector.clk_in is also combinationally driven from top.clk_in (a const-0 signal due to never being assigned)
  • the driver conflict is not detected (which is a bug tracked as Python testbenches should not be able to assign combinatorially driven signals #557)
  • the two drivers fight unpredictably, with the comb driver winning whenever comb logic in top runs for unrelated reasons, sometimes changing value multiple times in 0 simulation time

The other report (from @andresmanelli) has a completely different failure mode, and is instead caused by the glitchy way in which the ClockMux is written:

  • some time after the selector is switched from clka to clkb, clka edge happens while and_1_reg_3 is still high
  • o_clk immediately goes high (being combinationally set to and_1_reg_3 & clka)
  • clka edge triggers logic in clka domain and makes and_1_reg_3 transition low
  • o_clk immediately goes back low

The above happens all within 0 simulation time, making it invisible in VCD. It can be seen using the recently added fs_per_delta option.

In both cases the solution is to be more careful when driving your clock signals.

@whitequark whitequark closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2024
@Lunaphied
Copy link
Contributor

Thank you for making us aware of fs_per_delta I can see us getting a lot of use out of it

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

5 participants