Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

class Memory: add reset_less parameter that will not initialize memories with 0 value in generated RTL. #270

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Fatsie
Copy link
Contributor

@Fatsie Fatsie commented Nov 16, 2019

I am using nmigen for generating RTL to be implemented on an ASIC. In an ASIC the content of a memory block is random at startup. Currently generated RTL by nmigen initializes memories by default with 0, which does not match behavior of a memory on an ASIC.
I am using external simulators and not pysim as my designs currently contain external verilog and VHDL code.
Some comments on the code:

  • I named the parameter reset_less after the same parameter name for Signal. I am open for suggestion for other/better name.
  • I added some unit test code but the feature would actually only be tested fully when generated rtlil code is read into yosys. Did not know how to best implement that in the nmigen unit test framework.
  • For pysim it may be good if memories could be initialized with random values if reset_less is set to True. I suppose pysim does not want to support 'X' or 'U' values for signals.

…ies with 0 value in generated RTL.

This is to simulate better the behavior for ASIC memory implementation where
memory blocks will have random value after startup.
This parameter can be configured on class level so it can be set for all
generated memory blocks if generating code for ASICs.
@codecov
Copy link

codecov bot commented Nov 16, 2019

Codecov Report

Merging #270 into master will increase coverage by <.01%.
The diff coverage is 85.71%.

Impacted file tree graph

@@            Coverage Diff            @@
##           master    #270      +/-   ##
=========================================
+ Coverage   82.29%   82.3%   +<.01%     
=========================================
  Files          34      34              
  Lines        5598    5601       +3     
  Branches     1201    1202       +1     
=========================================
+ Hits         4607    4610       +3     
  Misses        851     851              
  Partials      140     140
Impacted Files Coverage Δ
nmigen/hdl/mem.py 98.14% <100%> (+0.05%) ⬆️
nmigen/back/rtlil.py 83.25% <66.66%> (ø) ⬆️
nmigen/tracer.py 94.59% <0%> (ø) ⬆️
nmigen/hdl/ir.py 94.43% <0%> (ø) ⬆️
nmigen/hdl/ast.py 86.83% <0%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 834fe3c...8bf6551. Read the comment docs.

@whitequark
Copy link
Contributor

Currently generated RTL by nmigen initializes memories by default with 0, which does not match behavior of a memory on an ASIC.

This is true. However, nMigen initializes signals to 0 as well (even reset_less Signals), which does not match behavior of registers on an ASIC. How are you dealing with that?

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

This is true. However, nMigen initializes signals to 0 as well (even reset_less Signals), which does not match behavior of registers on an ASIC. How are you dealing with that?

I don't use reset_less signals at the moment. Difference is that for registers (asynchronous) reset flipflops will be used and get thus the right value using the reset. This is not the case for memories.

Actually I have been thinking of doing that 'X' initialization also for Signal but the testing required to have it right for both synchronous as combinatorial logic, slicing etc. scared me off.

@whitequark
Copy link
Contributor

Difference is that for registers (asynchronous) reset flipflops will be used and get thus the right value using the reset. This is not the case for memories.

But unlike on an FPGA, on an ASIC the flip-flops won't be reset unless you explicitly do so. So I see the challenges here as quite similar: both FPGA registers and (in most cases) memories are initialized, and neither ASIC registers nor memories are initialized.

I think adding this parameter to Memory is not appropriate, because (specifically in your case) this is not a property of the Memory, but rather a property of the platform. I believe it would be appropriate to alter the RTLIL backend such that it has an "ASIC mode" where it does not emit initial statements for either signals or memories.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

I believe it would be appropriate to alter the RTLIL backend such that it has an "ASIC mode" where it does not emit initial statements for either signals or memories.

Actually it's introducing back a feature from Migen where RTL generation actually was made altered for ASIC. I thought one of the goals for nMigen was to get rid of this difference...

Anyway so what do you think about having an extra emit_initial parameter to (rtilil|verilog).convert?

@whitequark
Copy link
Contributor

Actually it's introducing back a feature from Migen where RTL generation actually was made altered for ASIC.

That feature was removed from Migen because it was thought that no one used it, and it bitrotted. I do not see a way to generate the same code for FPGA and ASIC because FPGA and ASIC inherently have different behavior.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

Actually one of the reasons for using (n)Migen is to avoid the problem that designs that have been tested and are working on a FPGA don't work as an ASIC. Ideally nMigen code would always behave the same on FPGA and ASIC. If that is not possible at least the flow should error out if a feature is used that is not supported on a certain platform. It should not be the case that the behavior is dependent on the platform.
That's why I would like to implement the following:

  • reset_less synchronous signals are not initialized in the initial section of generated RTL code independent of platform. To me setting reset_less to True indicates that one does not care what the actual value is at start-up.
  • memory that is not explicitly initialized by passing values in init of Memory.__init__() is also not initialized
  • pysim uses random values as start for both synchronous reset_less signals and memory not explicitly initialized.
  • when one tries to implement a RAM on an ASIC where memory is initialized explicitly the flow will error. One difficulty is that ROMs, e.g. memories with only read ports, typically are possible on an ASIC by using a ROM compiler or even as synthesized logic as I do now for ao68000.

@whitequark
Copy link
Contributor

whitequark commented Nov 16, 2019

  • reset_less synchronous signals are not initialized in the initial section of generated RTL code independent of platform. To me setting reset_less to True indicates that one does not care what the actual value is at start-up.

reset_less has never meant this in the entire history of (n)Migen. The (n)Migen reset rules for synchronous signals are:

  • all signals are reset (to the reset value) at power-on reset;
  • non-reset-less signals are reset (to the reset value) at clock domain reset (for the clock domain that drives them).

Changing the behavior to what you suggest will break all (n)Migen code in existence.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

Bugger, due to lack of good documentation this was not clear to me up to now.

Can at least this expectation be adapted for memories only ? POR (power-on-reset) for ASIC and flip-flops can be implemented if really needed. For memories it would mean adding a ROM and loading that into the RAM at POR.

Can you give also example of code that actually depends on the fact that value is set at POR but not at clock domain reset ?

@whitequark
Copy link
Contributor

whitequark commented Nov 16, 2019

Can you give also example of code that actually depends on the fact that value is set at POR but not at clock domain reset ?

Any FPGA design that is synthesized without an explicit reset (either external or generated inside the design) by relying on the ability to initialize flip-flops.

For example, Xilinx specifically recommends against adding a global reset unless it is unavoidable: rather than resetting the design after power-on, they recommend gating the clock until end of startup. By default, nMigen generates code that complies with these recommendations.

Can at least this expectation be adapted for memories only ?

It would have to be, if only because some FPGAs have memories that cannot be initialized (e.g. iCE40 SPRAM; IIRC, Xilinx UltraRAM). But it is not clear to me what is the ideal path forward here. One of the major design goals behind (n)Migen is that every design you write (without using instances) should be deterministic (modulo asynchronous clocks), so it tries very hard to make sure there are no uninitialized design elements.

Whatever is the language-level solution here, I think that it should be applied to both signals and memories. Otherwise, it would lead to simulation-synthesis mismatch on ASICs, which is just as bad as platform-dependent behavior, if not worse.

I do not like the idea of using random values in place of uninitialized ones in simulation. In case of registers, there is no guarantee that at power-on, the flip-flop is not in a metastable state. That can violate invariants in downstream logic. In all cases, using random values may give the impression that the downstream logic must behave correctly if it passes enough simulation runs, but of course that is not true, e.g. let's say such downstream logic breaks only if a 32-bit register has one particular value. So it can hide bugs.

Are you using property checking with SymbiYosys? If you are, it is easy to set it up so that every possible power-on state will be checked. Of course, this does not address the issue of metastability.

Overall, I am leaning towards implementing conservative X-propagation in pysim to handle uninitialized registers and memories. But it is a sizable task.

@whitequark
Copy link
Contributor

Bugger, due to lack of good documentation this was not clear to me up to now.

It is documented in the docstring for Signal, at least in nMigen:

nmigen/nmigen/hdl/ast.py

Lines 809 to 812 in 834fe3c

reset_less : bool
If ``True``, do not generate reset logic for this ``Signal`` in synchronous statements.
The ``reset`` value is only used as a combinatorial default or as the initial value.
Defaults to ``False``.

But I agree that the documentation is currently not very accessible and must be improved.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

It is documented in the docstring for Signal, at least in nMigen

An ASIC designer like me sees POR also as reset logic; so 'no reset logic for this Signal in synchronous statements' I interprete than also as no init during power on if reset_less is given. Combinatorial signals are not a problem as that is fully deterministic after synthesis and no difference between FPGA and ASIC.

@whitequark
Copy link
Contributor

An ASIC designer like me sees POR also as reset logic; so 'no reset logic for this Signal in synchronous statements' I interprete than also as no init during power on if reset_less is given.

Indeed that is a very FPGA-centric statement. Before nMigen explicitly declares ASIC support all such statements should be clarified.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

Any FPGA design that is synthesized without an explicit reset (either external or generated inside the design) by relying on the ability to initialize flip-flops.

For example, Xilinx specifically recommends against adding a global reset unless it is unavoidable: rather than resetting the design after power-on, they recommend gating the clock until end of startup. By default, nMigen generates code that complies with these recommendations.

That should in itself not be a problem. It should be possible to use global reset for ASIC and rely on initialization for FPGA. The problem is when you do want reset_less to have POR but not set/reset on the clock domain reset signal. So it would be good to know the current use cases where reset_less is set to True.

BTW, on ASICs external reset signals are going through a reset synchronizer. The synchronous reset deassertion is to make sure all set/reset signals for all the flip-flops are deasserted in the same clock cycle. Otherwise it could mess up state machines.

In case of registers, there is no guarantee that at power-on, the flip-flop is not in a metastable state.

Metastability is only a problem for CDC, during power-on flip-flops will always initialize to 0 or 1. The metastable state is not stable due to noise in transistors and the live time is much shorter than the time taken for power ramp up.
Actually metastability is a kind of a misnomer when used for CDC problems. It is actually the fact that when a signal is connected to different gates and it is in transition some of the gates may still 'see' the old value and some the new value. If at this time the output of the gates are latched in another domain inconsistent values will be clocked in. Thus vectors can have wrong values.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 16, 2019

I do not like the idea of using random values in place of uninitialized ones. In case of registers, there is no guarantee that at power-on, the flip-flop is not in a metastable state. That can violate invariants in downstream logic. In all cases, using random values may give the impression that the downstream logic must behave correctly if it passes enough simulation runs, but of course that is not true, e.g. let's say such downstream logic breaks only if a 32-bit register has one particular value. So it can hide bugs.

Actually I was only talking about random values for pysim. For external simulators like verilator/icarus I would use classic 'X' during simulation. I agree that also implementing unitialized value support in pysim would be the best solution but this would have a performance impact.

@whitequark
Copy link
Contributor

I agree that also implementing unitialized value support in pysim would be the best solution but this would have a performance impact.

I propose using a special value Uninitialized that participates in the same operations as int (which is used to represent all signals internally) but which is a fixpoint for all those operations. That is, Undefined + 1 == Undefined and so on.

This is far more coarse grained than Verilog X-propagation--in fact in my proposal (call it "U-propagation") you would not be able to erase undefinedness by &ing it with 0, only by using a different pmux branch (i.e. using m.If / m.Switch constructs). However, it is also extremely cheap to track: it requires almost no changes to pysim code, and when it never appears in the design, it is zero-cost.

One drawback is that, by not behaving exactly like Verilog X-propagation, there will be a mismatch between pysim and Verilog simulators; more on that later.

The metastable state is not stable due to noise in transistors and the live time is much shorter than the time taken for power ramp up.

I agree that if your ASIC comes with a reset synchronizer then metastability on power-on cannot possibly be a problem. (Yes, I know what a reset synchronizer is :)

I'll elaborate the reason I mention metastability in this context below.

That should in itself not be a problem. It should be possible to use global reset for ASIC and rely on initialization for FPGA.

So let's take a step back and consider the wider context in which determinism appears in nMigen. It may be helpful to mention that my background is memory-safe low-level languages, so when working on nMigen, I tend to think in similar categories. In general, I see the two primary goals of nMigen as:

  1. Improving reliability by ensuring safety properties hold at compile time, and
  2. Improving reliability by reducing opportunity for error through emphasis on ergonomics.

The second one should be self-evident. In case of the first one, which safety properties would these be? Well, nMigen is a low-level, RTL language--it does not even have a type system--so it does not provide extensive modeling tools that could uphold domain invariants. What it can do is provide tools for tackling nondeterminism.

I believe that discussions of nondeterminism in HDLs are hampered by grouping three categories of nondeterminism under the same nondescript label 'x. These categories are (in my personal classification; I am not aware of an established one specifically in HDLs):

  1. Invariant-preserving nondeterminism. This category primarily consists of sampling unknown but well-defined values. Its defining property is that all invariants provided by the HDL hold. In other words, provided the design makes no assumptions about the sampled value, then all invariants in the design are preserved. In that case, no recovery is required.
  2. Invariant-breaking nondeterminsm. This category primarily consists of setup/hold violations. Its defining property is that once this kind of nondeterminism starts propagating through the design, it may get the design into a state it may not normally reach according to the invariants provided by the HDL. The only way to recover from it in general is to reset the control domain where it occurred. Provided that the rest of the design makes no assumptions about the signals sampled from the affected domain, no additional recovery is required.
  3. Invariant-introducing nondeterminism. This category primarily consists of using don't-cares 'x in expressions (not case patterns) to guide synthesis. Its defining property is that it is deliberately introduced by the designer into the parts of the state space that are assumed to be unreachable. So long as these states are not dynamically reached, no actual nondeterminism occurs. However, if they are in fact reached, then this category behaves even worse than the category (2): the synthesis tool is permitted to propagate the introduced invariants across control domains and through the entire design. (Consider that if careless use of 'x causes the input to a synchronizer to be inferred as always 0, there is nothing that stops an enterprising logic optimization tool from eliminating the synchronizer and all of its downstream logic entirely.) It is not possible to recover* if this category of nondeterminism has occurred.

* With any current synthesis toolchain out of the box, as far as I know. In principle, with appropriate toolchain support or a preprocessor simulating it, this category may be reduced to (1); more on that later.

By completely** (at the moment) excluding the ability to use don't-cares in expressions, nMigen eliminates category (3). By completely** (at the moment) excluding the ability to use uninitialized registers and memories, nMigen eliminates category (1). By requiring (in the future; see #4) explicit synchronization in every case where a signal is sampled that is not known to be synchronous, and adding (as currently done) appropriate clock constraints, nMigen has / will partially eliminate category (2); it is, of course, still the responsibility of the designer to ensure that the input signals meet declared constraints. (An additional practical complication is permitting the use of synchronous inputs with complex timing relationships, such as those requiring phase shifts during sampling, without unnecessary synchronization stages.)

** Of course, instances may be used to reintroduce both of these categories.

This near-complete elimination of nondeterminism on language level is highly desirable. The reason I am hesitating regarding your suggestions is that they introduces much of it back--indeed, more than one might expect. Let's consider two of them by themselves.

First, power-on reset. On FPGAs, the internal power-on reset signal is, in general, asynchronous to any user clock. This means that every design meets category (2) right at the start. As a consequence, every in-tree FPGA platform performs power-on sequencing in the default startup code, either synchronizing the reset to the clock, or synchronously gating the clock with the reset. Without this, the safety properties would not hold. The implementation of the checker #4 should take this into account.

The reason I mentioned metastability is because it is not clear just what the contract of nMigen on ASIC should be. Should it consider the ASIC power-on reset an asynchronous event it has to take into account? Then the timings of that event in relation to clocks, etc, would need to be considered.

In any case this problem is comparatively minor, since, as you have mentioned, you can always add a global power-on reset signal and make the behavior identical to existing FPGA platforms. If it is desirable to avoid that, and the platform guarantees that every flip-flop ends up as stable 0 or 1 shortly after reset, then they may be modelled as uninitialized values, i.e. category (1).

Second, memory initialization. It is undeniable that all ASIC RAMs and some FPGA RAMs cannot be initialized. There is simply no way around this platform restriction, so these would have to be modelled as uninitialized values, i.e. category (1).


And here's where the problem lies. Verilog collapses all three categories of nondeterminism into just 'x; any uninitialized values in registers or memories, category (1), end up being represented as 'x. Unfortunately, once that happens, the synthesis tool is free to interpret such values, if they ever end up being actually used by your logic, as category (3). In other words, after Verilog conversion, the explicit semantics and design intent that signal "a defined but unspecified bit value" are free to be interpreted as "a value indicating an unreachable area of state space". I think this is a serious problem.

(As an aside, VHDL appears to support "uninitialized" and "strong 0-or-1" as separate values. I am not sure what their exact semantics is, though, but it looks kind of like it would solve this problem.)

This is not a new problem. Treating access to uninitialized data (and other illegal operations) as a marker of unreachable state space is something that C and C++ did for a very long time, often with disastrous results. The way undefined behavior would propagate without bounds has long been recognized as a source of serious issues in critical software, and recently LLVM has introduced the freeze instruction that makes it bounded.

In context of a HDL in general and Yosys in particular, an equivalent for the LLVM freeze instruction would be, perhaps, a combinatorial $freeze cell with the following contract: when the input is to 0 or 1, the output is to the same; when the input is x, the output is either 0 or 1, at the liberty of the toolchain. Effectively, this cell converts nondeterminism of category (3) into category (1).

In terms of implementation, I think this cell could be lowered to Verilog as a function that includes a casez statement.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 17, 2019

But the reset_less parameter is now already introducing non-determinism in the design. You don't know what value the signal has when the reset comes and it will retain that value after reset. Again I don't see why signals that can have this behavior can't have also an unknown value after POR.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 17, 2019

The reason I mentioned metastability is because it is not clear just what the contract of nMigen on ASIC should be. Should it consider the ASIC power-on reset an asynchronous event it has to take into account? Then the timings of that event in relation to clocks, etc, would need to be considered.

No, taking care of the timing would be a task of the platform but the nMigen contract should be setup in such a way that the platform (e.g. ASIC) can do it efficiently.

In any case this problem is comparatively minor, since, as you have mentioned, you can always add a global power-on reset signal and make the behavior identical to existing FPGA platforms. If it is desirable to avoid that, and the platform guarantees that every flip-flop ends up as stable 0 or 1 shortly In any case this problem is comparatively minor, since, as you have mentioned, you can always add a global power-on reset signal and make the behavior identical to existing FPGA platforms. If it is desirable to avoid that, and the platform guarantees that every flip-flop ends up as stable 0 or 1 shortly after reset, then they may be modelled as uninitialized values, i.e. category (1).after reset, then they may be modelled as uninitialized values, i.e. category (1).

The problem is not that minor because if you introduce logic on asynchronous signals like a reset you have to make sure it is glitch free. This also means this has to be custom and can't be done with regular synthesis flow and also that the synthesis flow does not try to do optimizations on the logic.
Defining the contract that the platform guarantees that an unitialized signal gets either 0 or 1 at start-up is a contract that ASICs can easily adhere to. It's intrinsic to flip-flops and SRAM.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 17, 2019

Should it consider the ASIC power-on reset an asynchronous event it has to take into account? Then the timings of that event in relation to clocks, etc, would need to be considered.

As said timing should be taken care of by platform, but for ASIC handling of initialization with an asynchronous reset can be a solution.
Current (FPGA) generated code:

  reg sync_sig = 1'h0;
  reg \sync_sig$next ;
  always @(posedge clk)
      sync_sig <= \sync_sig$next ;

ASIC friendly code:

  reg sync_sig;
  reg \sync_sig$next ;
  always @(posedge clk or negedge por_reset_n)
    if (por_reset_n = 0)
      sync_sig <= 1'h0;
    else
      sync_sig <= \sync_sig$next ;
    end

Active low logic for por_reset is handy as then the signal can be generated by extra delay on supply voltage during power-up with a RC circuit without needing an on-chip POR circuit. But this could also be done by custom yosys code for the ASIC platform.
This makes the solution independent of how reset_less is handled but I would still advocate to use following code for reset_less signals, e.g. also consider the POR reset as reset logic:

  reg sync_sig;
  reg \sync_sig$next ;
  always @(posedge clk)
      sync_sig <= \sync_sig$next ;

And also handle the signal as unitialized in the simulators.
Avoiding asynchronous reset flip-flops reduces area and removes needed routing.

@whitequark
Copy link
Contributor

But the reset_less parameter is now already introducing non-determinism in the design. You don't know what value the signal has when the reset comes and it will retain that value after reset.

Right now that is of course not the case because reset_less signals are initialized to their reset value at power-on reset. We just went over that.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 18, 2019

But the reset_less parameter is now already introducing non-determinism in the design. You don't know what value the signal has when the reset comes and it will retain that value after reset.

Right now that is of course not the case because reset_less signals are initialized to their reset value at power-on reset. We just went over that.

I'm not sure how I can make my point more clear. A reset_less signal is initialized in the beginning, true, but basically has a random value after a reset has been done on the clock domain that it is part of. The reset can come at any time so the signal can have any value when the reset comes. With reset_less True the value won't be changed by the clock domain reset so basically the value can be seen as a random value after the reset in that domain. So in order to test if the circuit performs right after clock domain reset in theory one has to do a clock domain reset simulation for each possible value of the reset_less signal; e.g. very similar difficulty as starting with a random value for the reset_less signal at the start.

I am looking for a real life use case for such a signal where you need the POR reset but not the clock domain reset other than not having to support unitialized values in the simulator. Such a use case would allow me to think about improving nMigen ASIC friendliness taking it into account.

Personally I would currently say it is even better to use an unitialized value for reset_less signals at the start. It will force the use of reset_less signals only where it is save to do. I don't think typically in depth checks are performed on clock domain resets, so current behavior may risk overlooking clock domain reset problems. And I understood that one of the goals of nMigen is trying to avoid that.
Given that unitialized signals support is likely needed anyway for supporting RAMs it should not be much extra work to use it also for reset_less signals.

@whitequark
Copy link
Contributor

The reset can come at any time so the signal can have any value when the reset comes. With reset_less True the value won't be changed by the clock domain reset so basically the value can be seen as a random value after the reset in that domain. So in order to test if the circuit performs right after clock domain reset in theory one has to do a clock domain reset simulation for each possible value of the reset_less signal; e.g. very similar difficulty as starting with a random value for the reset_less signal at the start.

OK. I understand your argument now. Thank you for the explanation. I would not call this nondeterminism (at least when the clock domain has a synchronous reset) because the circuit behaves in a completely predictable way, but I agree that it introduces similar design challenges.

And I understood that one of the goals of nMigen is trying to avoid that.
Given that unitialized signals support is likely needed anyway for supporting RAMs it should not be much extra work to use it also for reset_less signals.

I agree that we can change the semantics of reset_less signals to be uninitialized at power-on on all platforms. (But there is still the question of how to treat uninitialized signals robustly in Verilog.)

but for ASIC handling of initialization with an asynchronous reset can be a solution.

nMigen already supports asynchronous resets. Use ClockDomain(async_reset=True).

Active low logic for por_reset is handy

It would be trivial to add negedge resets, and this is planned specifically for ASICs, see #184.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 18, 2019

I agree that we can change the semantics of reset_less signals to be uninitialized at power-on on all platforms. (But there is still the question of how to treat uninitialized signals robustly in Verilog.)

Robust in what way ? Do you mean the difference in behavior of pysim handling of unitialized values and 'X'/'U' propogation in different simulators ? For example verilator is known to not follow all Verilog rules strictly.
If understood well the unitialized value behavior in pysim would be more strict than 'X'/'U' propagation in Verilog. So one can add documentation to the reset_less parameter to need to be checked with pysim simulations to check correct behavior and not only by external simulators.

but for ASIC handling of initialization with an asynchronous reset can be a solution.

nMigen already supports asynchronous resets. Use ClockDomain(async_reset=True).

OK, but I would like to use the asynchronous reset with a signal coming from a POR circuit for handling the POR behavior as given in the example code earlier. For clock domains the current default of synchronous reset is also good for ASICs.
But if we remove any difference in the circuit state after POR and clock domain reset on ASIC solely the clock domain reset can be used and an extra POR circuit is not needed. That was actually the purpose of original proposed changes in this thread and is AFAICS achieved by making reset_less signals unitialized at start. It is common for ASICs to need a reset before an ASIC will work correctly. ASIC platforms would then enforce the use of clock domains with a reset.
FPGA could still depend on POR and even have clock domains without a reset by default to avoid the need for routing a global reset through your design.

@whitequark
Copy link
Contributor

Robust in what way ? Do you mean the difference in behavior of pysim handling of unitialized values and 'X'/'U' propogation in different simulators ? For example verilator is known to not follow all Verilog rules strictly. If understood well the unitialized value behavior in pysim would be more strict than 'X'/'U' propagation in Verilog.

The behavior of X-propagation in simulation is fine. You are correct that it is different in pysim, iverilog and verilator, but I believe that difference is not a significant hazard. However, the behavior of X-propagation in synthesis can be very harmful.

Suppose you have a reset_less register that you do not assign anything to, or always assign another uninitialized value. (As a trivial example, suppose your code contains s = Signal(reset_less=True); m.d.sync += s.eq(s)). In this case, this signal will be always uninitialized and evaluate to 'x. If the synthesis tool can prove that this is indeed the case, it is permitted to assume that it takes any value that it wants, and this value does not have to be consistent if s is used multiple times. For example, it may optimize the circuit as if s and ~s are both 1, potentially violating some user invariant.

OK, but I would like to use the asynchronous reset with a signal coming from a POR circuit for handling the POR behavior as given in the example code earlier. For clock domains the current default of synchronous reset is also good for ASICs.

What I meant is that nMigen can (with the exception of polarity, which is easy to fix) generate your "ASIC friendly code" example above. It is up to you whether and where you want to use that capability.

@Fatsie
Copy link
Contributor Author

Fatsie commented Nov 18, 2019

For example, it may optimize the circuit as if s and ~s are both 1, potentially violating some user invariant.

Understood, thanks. Is it the same for 'U', e.g. unitialized ?

What I meant is that nMigen can (with the exception of polarity, which is easy to fix) generate your "ASIC friendly code" example above. It is up to you whether and where you want to use that capability.

From what I understood it is only for clock domains and not the case for the POR behavior. For POR behavior code currently depends on initial behavior of Verilog which is not ASIC compatible. But like said above if clock domain reset is guaranteed to have the same effect than POR, POR would not be needed for ASIC by enforcing a clock domain reset at start. It would then be OK to just ignore initial setup in Verilog as is done for ASIC synthesis.
So what I am trying to say is that part of the contract between nMigen and platforms should be that POR and clock domain reset put circuit in exact the same state and that either one them may be left out of the implementation by the platform.
And as you say using asynchronous reset by default is already supported if wanted by the platform and everything is already there in nMigen if the contract clause is in place. That is currently not the case because of how reset_less and the initialization of Memory is handled.
I hope this summarizes the possible solution well.

@whitequark
Copy link
Contributor

Is it the same for 'U', e.g. unitialized ?

Neither Yosys nor Verilog have U, only X. VHDL does have both X and U; I did not investigate this in depth but after asking some quick questions it looks like VHDL U works the same way as Verilog X, and VHDL X indicates a bus conflict and may not appear in synthesis.

@jordens
Copy link
Member

jordens commented Nov 18, 2019

Is the use case of undefined as a special value in simulation to prove that this value does not have an impact on some output? If yes, then aren't there better tools to prove that from formal verification?

@whitequark
Copy link
Contributor

then aren't there better tools to prove that from formal verification?

AFAIK, right now SymbiYosys does not model X. So nMigen would have to emit special code to handle this case, at least. Even if it did, the simulation behavior should match the property testing behavior, and both should match synthesis, so this is not an "either-or" or "one of" situation.

@jordens
Copy link
Member

jordens commented Nov 18, 2019

There may be no need for x or u at all if all you want is to prove that the actual value 0 or 1 does not matter (in an application specific sense of matter).
Ultimately x and u are artificial values that don't exist in logic or asics as such. They are annotations and you all you want to do is to show that these don't affect your logic, i.e. that some invariants hold with respect to them.

@whitequark
Copy link
Contributor

There may be no need for x or u at all if all you want is to prove that the actual value 0 or 1 does not matter (in an application specific sense of matter).

That's only if you already have a formal specification, and this specification is sound. If you don't have one, there is no automatic way to verify this property.

@jordens
Copy link
Member

jordens commented Nov 18, 2019

But the same is true for analysing a simulation with x/u. If you don't have a specification of where they must not propagate then the entire endeavor is meaningless.

@whitequark
Copy link
Contributor

If you don't have a specification of where they must not propagate then the entire endeavor is meaningless.

Sure. But you probably already have an informal specification, which you apply by looking at gtkwave for the output corresponding to your test vectors. That is a much weaker approach, but if you already consider simulation valuable in spite of it being much weaker than property testing, you're fine with that.

Also, running yield expr in a Python process where expr depends on an uninitialized value should probably raise.

@jordens
Copy link
Member

jordens commented Nov 18, 2019

True. But it's not obvious to me that covering that use case with some augmented formal verification tooling would be harder than adding support for u/x/ throughout the graph. It would be tooling tailored to proving the invariance of this value w.r.t. some input/reset value.

@whitequark
Copy link
Contributor

adding support for u/x/ throughout the graph

Adding support for conservative U-propagation is the same as adding a Python class that overrides the arithmetic/bitwise operations on integers to return itself. It is essentially trivial.

@programmerjake
Copy link
Contributor

If Uninitialized does get implemented, it would be a good idea to have Mux behave like m.If rather than producing Uninitialized if any input is Uninitialized.

@programmerjake
Copy link
Contributor

it may be a good idea to track Uninitialized at the bit-level by using a mask, since that allows packing signals together using Cat and then unpacking them later without known bits becoming uninitialized. If you like, I could work on a pull request to implement that for the simulator.

@whitequark
Copy link
Contributor

it may be a good idea to track Uninitialized at the bit-level by using a mask, since that allows packing signals together using Cat and then unpacking them later without known bits becoming uninitialized.

I'm not sure if that's a good idea. My proposal (coarse-grained Uninitialized) can be computed at very low cost and with almost no changes to the simulator. Your proposal has a significant runtime cost and development cost (how do you validate it? every operation gains a third row/column in the truth table, like in Verilog) and requires major changes to the simulator.

@programmerjake
Copy link
Contributor

it may be a good idea to track Uninitialized at the bit-level by using a mask, since that allows packing signals together using Cat and then unpacking them later without known bits becoming uninitialized.

I'm not sure if that's a good idea. My proposal (coarse-grained Uninitialized) can be computed at very low cost and with almost no changes to the simulator. Your proposal has a significant runtime cost and development cost (how do you validate it? every operation gains a third row/column in the truth table, like in Verilog) and requires major changes to the simulator.

If values where all bits are initialized are implemented using the existing mechanics of just using an int and values where not all bits are initialized are modeled using something like:

class PartiallyInitialized:
    """
    value: int
        the value. uninitialized bits should be 0.

    mask: int
        the mask for which bits are initialized in `value`. 0 bits mean that the
        corresponding bit in `value` is uninitialized. 1 bits mean that the
        corresponding bit in `value` is initialized. The sign is treated similarly,
        allowing values to have an uninitialized sign bit. mask is assumed
        to have some zero bits, otherwise an int could have been used instead
        of PartiallyUninitialized
    """
    def __init__(self, value, mask):
        self.value = value
        self.mask = mask

    @classmethod
    def make(cls, value, mask):
        """get PartiallyInitialized or int."""
        assert isinstance(value, int)
        assert isinstance(mask, int)
        value &= mask
        if mask == -1:
            return value
        return cls(value, mask)

    @staticmethod
    def get_value_and_mask(value):
        if isinstance(value, PartiallyInitialized):
            return self.value, self.mask
        assert isinstance(value, int)
        return value, -1

    @classmethod
    def uninitialized(cls):
        """totally uninitialized value"""
        return cls(0, 0)

    # all arithmetic operations return uninitialized
    def __add__(self, other):
        return self.uninitialized()

    def __radd__(self, other):
        return self.uninitialized()

    # similarly for all other arithmetic operations

    # all bitwise operations `&` the masks together and do the operation on the values
    def __invert__(self):
        return self.make(~self.value, self.mask)

    def __and__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        return self.make(self.value & other_value, self.mask & other_mask)

    def __rand__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        return self.make(self.value & other_value, self.mask & other_mask)

    def __or__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        return self.make(self.value | other_value, self.mask & other_mask)

    def __ror__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        return self.make(self.value | other_value, self.mask & other_mask)

    # similarly for xor

    # shifts return `uninitialized` for partially-initialized shift amounts,
    # otherwise just shift value and mask
    def __lshift__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        if other_mask != -1:
            return self.uninitialized()
        return self.make(self.value << other_value, self.mask << other_value)

    def __rlshift__(self, other):
        # self is partially uninitialized
        return self.uninitialized()

    def __rshift__(self, other):
        other_value, other_mask = self.get_value_and_mask(other)
        if other_mask != -1:
            return self.uninitialized()
        return self.make(self.value >> other_value, self.mask >> other_value)

    def __rrshift__(self, other):
        # self is partially uninitialized
        return self.uninitialized()

bitslicing, part, and Cat can be built on top of the bitwise and shift operations.

@whitequark
Copy link
Contributor

I see. In terms of performance impact, your proposal is similar to mine (they both rely on bimorphic call sites) with one important caveat: if you reconstruct a full-width mask from several partial-width masks, in this implementation it wouldn't become -1.

In terms of semantics, this is essentially the same as Verilog, right? I don't feel comfortable using it without randomized testing against Yosys, then. (Actually, it is bad enough that we don't test pysim against Yosys, but current pysim is way too slow. I'm working on pysim2 right now that should address this problem.)

@programmerjake
Copy link
Contributor

I see. In terms of performance impact, your proposal is similar to mine (they both rely on bimorphic call sites) with one important caveat: if you reconstruct a full-width mask from several partial-width masks, in this implementation it wouldn't become -1.

I don't quite follow, could you give a concrete example?

In terms of semantics, this is essentially the same as Verilog, right?

yeah, except that & and | don't short-circuit: if either bit is uninitialized, then the result is uninitialized. this makes it a bit less special-casey, since it either returns uninitialized if any input bits at all are uninitialized (arithmetic operations), or does the same thing but in a bitwise manner (bitwise operations). shifts treat the shift amount like an arithmetic op and the shifted value like a bitwise op.

I picked the above rules since uninitialized bits always propagate, they can't be hidden (by anding with 0, for example). Mux would have to be implemented using a python if, using (a & ~s) | (b & s) wouldn't work.

I don't feel comfortable using it without randomized testing against Yosys, then.

I'm more comfortable since uninitialized bits can't be hidden without Mux or actual control-flow, but I think testing against yosys is a good idea even if uninitialized doesn't get implemented at all.

(Actually, it is bad enough that we don't test pysim against Yosys, but current pysim is way too slow. I'm working on pysim2 right now that should address this problem.)

yay for pysim2!

@whitequark
Copy link
Contributor

I don't quite follow, could you give a concrete example?

Nevermind that, on second look I think I was just wrong.

I picked the above rules since uninitialized bits always propagate, they can't be hidden (by anding with 0, for example). Mux would have to be implemented using a python if, using (a & ~s) | (b & s) wouldn't work.

Is this a good thing? I'm worried that we'll end up with rules almost but not quite matching Verilog X-propagation, so people will come to expect netlist simulators like iverilog to match pysim, which will not be the case but also won't be obvious. I think this isn't obviously desirable, and needs further thought. I wonder if we should rather do X-propagation completely on our own somehow, and use casex to detect the case of reading uninitialized data from memory, or something like that.

@programmerjake
Copy link
Contributor

I picked the above rules since uninitialized bits always propagate, they can't be hidden (by anding with 0, for example). Mux would have to be implemented using a python if, using (a & ~s) | (b & s) wouldn't work.

Is this a good thing? I'm worried that we'll end up with rules almost but not quite matching Verilog X-propagation, so people will come to expect netlist simulators like iverilog to match pysim, which will not be the case but also won't be obvious. I think this isn't obviously desirable, and needs further thought. I wonder if we should rather do X-propagation completely on our own somehow, and use casex to detect the case of reading uninitialized data from memory, or something like that.

I'm not sure myself. There is on the one hand the much less complex rules that uninitialized input bits always output uninitialized bits (unless using Mux or control-flow), on the other hand compatibility with Verilog and all its gnarly details does have its own benefits.

If we implement uninitialized propagation ourselves (using casex or similar), we need to ensure it doesn't result in less optimal synthesis results, which may be difficult. We could also just disable all the uninitialized-handling code when synthesizing, which may work (assuming yosys or other tools don't take liberties).

@whitequark
Copy link
Contributor

We could also just disable all the uninitialized-handling code when synthesizing, which may work (assuming yosys or other tools don't take liberties).

They do take liberties, so this is definitely not an option. Please refer to my earlier comment on the 3rd category of nondeterminism for a deeper explanation why.

If we implement uninitialized propagation ourselves (using casex or similar), we need to ensure it doesn't result in less optimal synthesis results

I think less optimal synthesis results could be a consequence of "sealing away" uninitializedness at the point where it introduced, which is orthogonal to having it propagate. Indeed, even if nMigen would not have any built-in notion of uninitializedness whatsoever (like in @Fatsie's original proposal, which has the simulator randomize these values), then I would still insist that it use casex or a $freeze cell or the like to make sure that any uninitialized values read from registers or memories are considered opaque in synthesis, and do not propagate there.

Once we do that, any X-propagation rules we impose on top would be considered unused logic by synthesis tools, so they won't affect synthesis any further.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants