Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reconsider simulator interface #228

Open
whitequark opened this issue Sep 23, 2019 · 42 comments
Open

Reconsider simulator interface #228

whitequark opened this issue Sep 23, 2019 · 42 comments

Comments

@whitequark
Copy link
Contributor

whitequark commented Sep 23, 2019

The oMigen simulator had a strange interface where user processes would run after a clock edge, but before synchronous signals assume their new value. Not only it is very hard to use and bug-prone (my simulation testing workflow, for example, involves adding yield until tests pass...), but it also makes certain patterns impossible! For example, let's consider the FIFOInterface.write() simulation function:

def write(self, data):
    """Write method for simulation."""
    assert (yield self.w_rdy)
    yield self.w_data.eq(data)
    yield self.w_en.eq(1)
    yield
    yield self.w_en.eq(0)
    yield

It has to take two clock cycles. This is because, after the first yield, it is not yet possible to read the new value of self.w_rdy; running (yield self.w_rdy) would return the value from the previous cycle. To make sure it's updated, it is necessary to deassert self.w_en, and waste another cycle doing nothing.

Because of this, I've removed these methods from nMigen FIFOInterface in a57b76f.

I think we should look at prior art for cosimulation (cocotb?) and adopt a usage pattern that is easier to use. This is complicated by the fact that it is desirable to run oMigen testbenches unmodified.

@sbourdeauducq
Copy link
Member

The oMigen simulator had a strange interface where user processes would run after a clock edge, but before synchronous signals assume their new value.

The reason for this is to ensure determinism when there are multiple processes on the same clock communicating through signals. The result is independent of the order in which those process are run.

@sbourdeauducq
Copy link
Member

Was this changed already? This prints 1 0:

from nmigen import *
from nmigen.back.pysim import *

class Dummy(Elaboratable):
     def __init__(self):
        self.x = Signal()
        self.y = Signal()

     def elaborate(self, platform):
        m = Module()
        m.d.sync += self.y.eq(self.x)
        return m

if __name__ == "__main__":
     dummy = Dummy()
     with Simulator(dummy) as sim:
        def process():
            yield dummy.x.eq(1)
            print((yield dummy.x))
            yield dummy.x.eq(0)
            print((yield dummy.x))
        sim.add_clock(1e-6)
        sim.add_sync_process(process)
        sim.run()

@whitequark
Copy link
Contributor Author

You aren't using dummy.y at all in this code, no?

@sbourdeauducq
Copy link
Member

Yes. It is a dummy signal to work around #262.

@whitequark
Copy link
Contributor Author

Was this changed already?

I didn't intend to change this. I think the reason you see the behavior you do is that I did not understand the design of the oMigen simulator, so I didn't implement it entirely correctly.

The reason for this is to ensure determinism when there are multiple processes on the same clock communicating through signals. The result is independent of the order in which those process are run.

OK, so this works like VHDL. That's good. Unfortunately, the problem I outlined in this issue stands: not being able to write a single-cycle FIFOInterface.read() is a serious defect.

I've been thinking about this problem and came to the conclusion that there should probably be three kinds of simulator processes:

  • add_sync_process, working like in oMigen, where the Python code works like synchronous logic;
  • add_comb_process, where the Python code works like combinatorial logic (Make it possible for generators to simulate combinatorial logic #11);
  • add_toplevel_process (or something like that), only one of which can be present in a design, and where you could implement single-cycle FIFOInterface.read.

@sbourdeauducq
Copy link
Member

sbourdeauducq commented Oct 28, 2019

What about an option to run synchronous processes at twice the clock rate?

For example, a synchronous process could yield a full cycle (default), or a high half-cycle, or a low half-cycle.

At the beginning of a cycle, it could yield either a full cycle (default) or a high half-cycle. At the middle of a cycle it could yield only a low half-cycle. Doing otherwise raises an error, which ensures that processes are always "in phase" with the clock.

This is backward compatible, can be deterministic using the VHDL-style behavior, and supports an arbitrary number of processes.

@whitequark
Copy link
Contributor Author

The basic idea is sound, thanks. But, what happens if someone uses a negedge clock domain (#185) in the design? There are many valid reasons to do so, and even some FPGA primitives (Xilinx UltraRAM) trigger on both clock edges.

So I think we should change it a bit: rather than use a half cycle, run all "sync" processes at the posedge (or the negedge) as it currently happens, and then run all "testbench" (or whatever they are called) processes simultaneously, a very small amount of time later (however much it takes for combinatorial logic to propagate, and for "comb" processes to run as well).

@jordens
Copy link
Member

jordens commented Oct 28, 2019

I would want to differentiate that yield (None). Semantically yield currently means to continue the simulation until before the next clock edge: "update sync signals and settle comb signals again". At that point the values for sync signals for the next cycle are known (and can be altered from the process) and the sync signals are about to update.

What you actually want instead is to be able to differentiate between yielding to update the sync signals and yielding to settle the comb signals.
I don't think mapping this to two half cycles is correct since that will most likely clash with support for multiple clock domains: the settling of comb signals does not take the length of a half cycle. That's arbitrary. In reality you have the sync clock cycle and -- semantically different from it -- a delta cycle that just settles comb.

@whitequark
Copy link
Contributor Author

In terms of the UI, currently add_process requires submitting explicit wait commands (i.e. Tick("domain"), and add_sync_process(domain=) automatically substitutes yield Tick(domain) for any yield with no arguments. We could split that into yield BeforeTick(domain) (the same as current Tick) and yield AfterTick(domain); in both cases, you can only observe "current" values for all signals and update "next" values for signals, preserving determinism, but all built-in synchronous logic, and any "sync" processes, wait for the BeforeTick event if you do yield with no argument, whereas simulation testbenches wait for AfterTick event in that case.

This solves an important issue where something like FIFOInterface.read cannot just yield because the meaning of yield changes depending on the kind of process it's in. But if it can explicitly do yield AfterTick() (with the domain perhaps substituted by some sort of default), then it would be possible to use it unambiguously.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

And this extends to multiple clock domains accordingly (sync cycles executed in the pattern defined by their timing beat and comb/delta cycles in between). A process should be able to specify what kind of cycle it wants to yield for: not just a bare yield but maybe yield None for comb settling vs yield dut.sync or yield "sync" for the next corresponding clock edge update.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

I don't think BeforeTick/AfterTick is sufficient: there are good reasons to do multiple delta cycles between sync cycles.

@whitequark
Copy link
Contributor Author

Can you provide some examples of those reasons?

@jordens
Copy link
Member

jordens commented Oct 28, 2019

Otherwise yield x.eq(1) could only ever been a sync update. If your process wants to do the equivalent of comb logic, this is limiting. There would not be a way for a process to react to non-trivial comb signal changes from the simulated gateware within the same cycle.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

E.g. if you don't have any synchronous clock domain.
In the simplest case if you want to test whether your module reacts to a signal immediately:

yield stb.eq(1)
yield  # settle comb
assert (yield busy)
yield stb.eq(0)
yield  # settle comb
assert not (yield busy)

@whitequark
Copy link
Contributor Author

If your process wants to do the equivalent of comb logic, this is limiting.

Comb processes are something I wanted to introduce explicitly (that's the second oldest nMigen issue, #11). In my mind, they would require specifying the exact set of signals they will wait on during creation, and then always wait on exactly those signals; that way, they would have no choice but to be well behaved.

However, that would not work for the snippet you show. And I think you are making a very good point with it. I'm inclined to agree that a separate command for comb settling is desirable.

@whitequark
Copy link
Contributor Author

whitequark commented Oct 28, 2019

Perhaps a trio of yield Tick(domain) (where if the domain is omitted, it is taken from the process), waiting until clock edge; yield Settle(), waiting until comb fixpoint; yield Wait(signals...), waiting until any signal in the sensitivity list changes?

@jordens
Copy link
Member

jordens commented Oct 28, 2019

Yes. yield Tick(domain) will imply a yield Settle() just before the respective domain update. And now we can debate if yield Tick(domain) should be "settle, update sync, and settle again". Then it would be robustly equivalent to the current behavior. I would not want yield Tick(domain) to be equivalent to just "settle, update" since that in practice is always an unstable and inconsistent transient state.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

The yield Wait(signals) (or Watch() or Trigger()) seems a bit orthogonal since it passively waits for some signal-related event while the others wait for something time-related (even though in the case of the delta cycle this is a very short time).

@jordens
Copy link
Member

jordens commented Oct 28, 2019

Ah. But Wait() is indeed needed to implement coexistence of combinatorial processes and synchronous clock domains and sync processes. You can't implement it as while not (yield trigger): yield Settle(). That would block.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

Do you even need to differentiate add_sync_process and add_comb_process with the above?

@whitequark
Copy link
Contributor Author

Do you even need to differentiate add_sync_process and add_comb_process with the above?

No, but for that matter, add_sync_process was never strictly a necessity either. Today it is simple syntactic sugar over Tick. (However, without this sugar, you'd have to give a domain name to FIFOInterface.read, which seems suboptimal.)

One option would be to get rid of yield with no arguments completely, and then add a notion of "default domain" to all processes. So doing yield Tick() would use that default domain. It is a bit verbose, but I like its lack of ambiguity.

@whitequark
Copy link
Contributor Author

And now we can debate if yield Tick(domain) should be "settle, update sync, and settle again". Then it would be robustly equivalent to the current behavior. I would not want yield Tick(domain) to be equivalent to just "settle, update" since that in practice is always an unstable and inconsistent transient state.

I'm confused by this paragraph. The current behavior is something like "settle, stop just before updating sync, run all processes", not "settle, update, settle". Or I misunderstand you.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

Yes. yield Tick() is better. If there is a default sync clock domain for elaboration then there should be something analogous for processes.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

The current behavior of yield is "update sync, settle comb". You need to have "update sync" in there.

@jordens
Copy link
Member

jordens commented Oct 28, 2019

And "stop just before updating sync" is in fact a NOP because after settling comb nothing would happen until the next sync update.

@whitequark
Copy link
Contributor Author

You're right. We want the same thing but I was not describing it precisely.

Thank you for this discussion. I now have a very clear picture of how the next iteration of the simulator should look like, and I am happy with every aspect of it.

@sbourdeauducq
Copy link
Member

what happens if someone uses a negedge clock domain (#185) in the design?

A negedge clock domain is just like another clock domain but with the clock shifted 180 degrees. If simulation supports multiple clock domains (it should) then there is nothing really special about it, I think?
Maybe just rename "high/low half-cycles" to "first/second half-cycles" or similar to avoid confusion.

@whitequark
Copy link
Contributor Author

If simulation supports multiple clock domains (it should) then there is nothing really special about it, I think?

There is if there are two clock domains defined to be clocked by different edges of the same signal. Then you have the same problem in a single process that interacts with both.

@sbourdeauducq
Copy link
Member

Then you have the same problem in a single process that interacts with both.

Not if such processes are not allowed (and I think they should not be allowed: a synchronous process must be associated with a single clock signal and a single sensitivity edge).

@whitequark
Copy link
Contributor Author

and I think they should not be allowed: a synchronous process must be associated with a single clock signal and a single sensitivity edge

Among other things this restriction would make it impossible to simulate ResetSynchronizer, which is something that nMigen does today. So it would be a step back.

@whitequark
Copy link
Contributor Author

whitequark commented Nov 20, 2019

@jordens Bikeshed: what should be the names for the new commands?

I propose:

  • Settle(): waits until combinatorial fixpoint;
  • WaitTime(interval): waits until interval pases;
  • WaitComb(signal, [signal...]): waits for any change in any of the signals;
  • WaitSync("domain_name" | clock_domain): waits for a clock tick (or async reset assertion, if any)
    in the provided domain, executing concurrently with any m.d.<domain_name> statements;
  • Active() and Passive(): self-explanatory.

@whitequark
Copy link
Contributor Author

As an additional nicety, I propose an ability to attach generator functions to fragments, so that any given fragment could be implemented either using nMigen code or Python code completely transparently to its users.

@jordens
Copy link
Member

jordens commented Nov 22, 2019

You need to define whether the Wait* commands include a final Settle() (which does not consume any simulation time) or not. IMO they should.
I'd stay away from Time meaning a duration/interval and call it WaitDuration or WaitInterval.
Active() and Passive() are not that self-explanatory to me. I end up (wrongly) asking myself whether a Passive() process is allowed to actively change signals. Is the only difference that the termination of a Passive() process is not a precondition for termination of the simulation run?

One thing that I'm annoyed by: In many processes, I'd like to return the data that I have gathered during the run. But with the current setup, I have to pass in a handle or use a global and attach all the data to that. With the Python async/await structure it seems to me that one could solve this in a better way. Make the simulator behave like the asyncio event loop, and have:

async def process(dut):
    await Settle()
    await dut.x.eq(1)
    log = []
    for i in range(100):
        await WaitSync("sys");
        log.append(await dut.y)
    return log

Maybe there are other niceties that would come with this for free (e.g. hooking up vpi or yosys or other external components).

@jordens
Copy link
Member

jordens commented Nov 22, 2019

await settle(), await sleep(123e-9), await sync("sys"), await trigger(dut.x) might even be more natural sounding than before. But I'm not that radical with the name bikeshedding. I can live with any of those.

And maybe we want to get rid of .eq() in processes since it's definitely not the same as .eq() in elaboration. E.g. a combinatorial yield dut.x.eq(dut.y) has a well defined meaning in elaboration, while in a simulation process, it might not even be allowed. If it is allowed, then it would be in the form and meaning of i = yield dut.y; yield dut.x.eq(i) which is very different than the combinatorial assignment. Maybe we want to have accurate getter/setter methods for simulation that are distinct from logic wiring methods (.eq()): either await dut.x.set(await dut.y.get()) or even non-async dut.x.set(dut.y.get() if that's possible.

@whitequark
Copy link
Contributor Author

whitequark commented Nov 22, 2019

You need to define whether the Wait* commands include a final Settle() (which does not consume any simulation time) or not. IMO they should.

I think they should not.

For sync logic: without the final Settle(), any sync logic can be straightforwardly implemented by taking HDL, removing all m.d assignments except for one specific m.d.<domain>, replacing any nMigen AST nodes with the corresponding Python code, and placing the entire thing in a loop like while True: yield WaitSync(); <code>. (In fact that's almost exactly how the new simulator works internally.) With the final Settle(), we would need a separate WaitSyncNoSettle command to implement such logic.

For comb logic: it doesn't make sense to have WaitComb() include a final Settle() because Python code that implements combinatorial logic is a part of the settling process. If your combinatorial task is well-behaved then it would make no difference except for requiring twice as much work to be done, since it would arrive at the same value but on the next step (still after zero physical time) rather than the next delta cycle.

Assuming that having full feature parity between simulator tasks implemented in HDL and ones implemented in Python is important (and I think it is quite important, as it allows you to gradually migrate code from Python to HDL, or replace slow numeric HDL with fast numpy-based code), we should have a non-settling version of these commands. We could also have a settling version of WaitSync that would be useful for writing testbenches, or we could make it legal to do yield [WaitSync(), Settle()], which is basically as good as having a WaitSyncAndSettle() command.

I'd stay away from Time meaning a duration/interval and call it WaitDuration or WaitInterval.

WaitInterval is fine with me.

With the Python async/await structure it seems to me that one could solve this in a better way.

Integration with async/await is on my radar, mostly because it makes it possible to write integration tests where host Python code talks to device nMigen code through some PC interface (USB, Ethernet, ...) by cutting out the interface logic and directly driving simulated FIFOs. Glasgow applets are tested that way, for example.

However, I'm not yet convinced this is a good approach in general, because I don't know how to integrate nicely with the asyncio event loop. Simulator tasks that call async functions are easy (the simulator only needs to block if it receives one, or have an async run method, or both), but the other way around seems very gnarly.

For your use case, we could return a handle from add_process, and handle.result would contain the return value of the generator function. It makes sense to have such handles anyway in case one wants to reuse the same simulator across many testbenches, since elaborating a large design can be slow and there's no reason to do it repeatedly if only the Python task code changes.

Maybe there are other niceties that would come with this for free (e.g. hooking up vpi or yosys or other external components).

Not as far as I know. I have a Yosys-based C++ simulation generator planned as well, but, like VPI, it would export a C API. If you wanted to use it with asyncio you would have to wrap it first, which is actually more work.

And maybe we want to get rid of .eq() in processes since it's definitely not the same as .eq() in elaboration.

It is, if you are writing a task using WaitSync or WaitComb and it is well-behaved. (That is, you use only one those wait commands, and for WaitComb case, you compute the same function each time you get woken up.) That is the whole point of having parity between HDL and Python tasks.

E.g. a combinatorial yield dut.x.eq(dut.y) has a well defined meaning in elaboration, while in a simulation process, it might not even be allowed.

You can use that command right now (in the old simulator). In the new simulator, which can support WaitComb, a task like:

while True:
    yield WaitComb(dut.y)
    yield dut.x.eq(dut.y)

does the exact same thing as:

m = Module()
m.d.comb += dut.x.eq(dut.y)

@whitequark

This comment has been minimized.

@whitequark
Copy link
Contributor Author

Nevermind the previous comment. I think that's an unrelated bug.

@jordens
Copy link
Member

jordens commented Nov 22, 2019

Let me elaborate on the problem I have with .eq() in simulation. In the following it is not clear in simulation context whether the dummy.y.eq(dummy.x) assignment is comb or sync. But let's assume this is not important. It's also not clear whether a) is a one-time sync assignment or whether it happens at each subsequent clock cycle in the same way m.d.sync += self.y.eq(self.x) would happen.

I think the reason might lie in the fact that commands of different type are passed through the same generator interface and that the .eq() method has a different meaning in elaborate() than it has in simulation.

  • commands to interact with the time line (yield None, yield WaitComb())
  • commands to interact with instantaneous signal values (yield dut.x, yield dut.x.eq(2)).
  • The latter is different from elaborate() context where dut.y.eq(dut.x) builds logic.

With await and explicit one-time getters and setters this would be clear:

  • await to interact with the time line (as the name says)
  • and dut.x.get()/dut.x.set(dut.y.get()) to interact with the values of signals
  • .eq() is forbidden in simulation. There is no support to build logic during simulation. I think this should be clarified.
from nmigen import *
from nmigen.back.pysim import *

class Dummy(Elaboratable):
     def __init__(self):
        self.x = Signal()
        self.y = Signal()

     def elaborate(self, platform):
        m = Module()
        z = Signal()
        m.d.sync += If(~z, self.y.eq(1)), z.eq(1)
        return m

dummy = Dummy()
with Simulator(dummy) as sim:
    def process():
        yield 
        # a)
        yield dummy.y.eq(dummy.x)
        yield
        # b)
        i = yield dummy.x
        yield dummy.y.eq(i)
    sim.add_clock(1e-6)
    sim.add_sync_process(process)
    sim.run()

@whitequark
Copy link
Contributor Author

In the following it is not clear in simulation context whether the dummy.y.eq(dummy.x) assignment is comb or sync.

Do you mean it's not clear in general, or not clear in comparison to b)? I'm not sure what the case b) is demonstrating. It is exactly equivalent to a) in your code.

  • and dut.x.get()/dut.x.set(dut.y.get()) to interact with the values of signals

This isn't possible to implement because simulation does not (and should not) mutate the design itself, just like synthesis doesn't, and similarly, there shouldn't be a 1:1 correspondence between designs and simulator instances. Some alternatives are await dut.x.get() or sim.get(dut.x).

  • await to interact with the time line (as the name says)

I don't think await should be used unless it's somehow possible to integrate with the asyncio event loop. After all, async/await are features specifically introduced for asynchronous I/O that we hijack, and it would be quite confusing to override them in a way that completely changes their meaning.

  • The latter is different from elaborate() context where dut.y.eq(dut.x) builds logic.
  • .eq() is forbidden in simulation. There is no support to build logic during simulation. I think this should be clarified.

I think your view of the simulator is unnecessarily limiting. I see nMigen statements as describing behavior. The statement x.eq(y), by itself, describes instantaneous behavior. By placing it within the context of m.d.sync += (or add_sync_process), or m.d.comb += (or add_comb_process), this behavior is placed on the timeline: either on each active clock edge, or each time any signals on the RHS change. The behavior added to a module can change depending on the control flow statements like m.If, and the behavior added as a process can change depending on the control flow statements like if.

I think the ability to use nMigen values in commands, as opposed to just signals, is valuable. It allows performing computations like nMigen does, which can otherwise be a major hassle. Bit-slicing, wrapping arithmetics, signed arithmetics, match with wildcards, ... all would otherwise require the user to reimplement tricky and already well-tested parts of nMigen on their own, introducing subtle bugs.

I see absolutely no reason to prohibit adding (or removing) logic during simulation. There is no technical reason to do so and in fact it would be more work to prevent that. Conversely, it can be useful when simulating a design that uses partial reconfiguration, or perhaps a simulation of an entire platform with peripherals that are hot-swapped.

To summarize, I now see nMigen as a pair of languages that are duals and appear on even terms: a synchronous hardware [description] language and a synchronous [reactive] software language. I am certain that the ergonomics of this arrangement can be improved, but I do not understand why you want to strip it down to something much less elegant and useful.

@jordens
Copy link
Member

jordens commented Nov 22, 2019

Do you mean it's not clear in general, or not clear in comparison to b)? I'm not sure what the case b) is demonstrating. It is exactly equivalent to a) in your code.

Pretty sure it's not. They are executed in different clock cycles.
But I may be happy to concede the point that .eq() may be meaningfully defined by simulation context as an synchronous assignment for the given clock cycle.

The next question is: assuming that elaboration defines whether a signal is a register or a combinatorial value, can simulation override that behavior?
And does yield dut.x.eq(a); b = yield dut.x; assert a == b hold for a combinatorial dut.x without yield Settle() in between? If yes, for how long does it hold (given that the combinatorial logic also assigns something)?

You say both

simulation does not (and should not) mutate the design itself

and

I see absolutely no reason to prohibit adding (or removing) logic during simulation.

In my understanding of the terms this is strictly contradictory.

AFAICT currently altering the logic in a design is actually prevented by the fact that you generally don't have access to the elaborated Module() during simulation. Is that by accident or intentional? Migen has finalization after which you couldn't/shouldn't add/alter logic. How final is elaboration?

I don't see the problem with await on simulator futures. You say python asyncio targets external IO. That's certainly true. But even the most mundane await sleep(1.) doesn't concern itself with IO and many other asyncio applications/examples don't. If you think of it as a event loop and a bunch of futures then I fail to see why you are so worried about confusion. There ist certainly no change in meaning. Sure, await asyncio.sleep(1.) and await simulator.interval(1.) are different. But they coexist nicely. await asyncio.sleep(1.) doesn't tell you how much simulation time (or some clock cycles) has gone by. await simulator.interval(1.) doesn't tell you how much that took on the wall clock. In the same way as yield simulator.trigger(dut.x) doesn't tell you anything about the number of cycles that have passed on some clock domain. From the event loops perspective it's simple: things that can proceed, proceed. Things that wait for external IO (or loop.sleep()) can't proceed. The same would hold for the simulator: the simulator will proceed through its simulation timeline and all its futures will be served accordingly untill there is a process that stalls the simulation timeline to wait waiting for some external event (i.e. with simulator.stall(): await loop.sleep(1.) or similar).
Do you have an example where this would be problematic?

@whitequark
Copy link
Contributor Author

whitequark commented Nov 23, 2019

Pretty sure it's not. They are executed in different clock cycles.

Sorry, I was unclear. I mean that this code:

    def process():
        yield 
        # a)
        yield dummy.y.eq(dummy.x)
        yield
        # b)
        i = yield dummy.x
        yield dummy.y.eq(i)

is exactly equivalent to this code:

    def process():
        yield 
        # b)
        i = yield dummy.x
        yield dummy.y.eq(i)
        yield
        # a)
        yield dummy.y.eq(dummy.x)

in terms of simulation behavior. (Of course the variable i is assigned at a different time.)

The next question is: assuming that elaboration defines whether a signal is a register or a combinatorial value, can simulation override that behavior?

I would say no: if you did that, then the simulation would end up in an inconsistent state, and there's no case where this would be useful or deterministic. So if you add a combinatorial function to the simulation, either with m.d.comb += or with add_comb_process, then its outputs should be protected from interference.

And does yield dut.x.eq(a); b = yield dut.x; assert a == b hold for a combinatorial dut.x without yield Settle() in between?

That assert would always fail without yield Settle().

In my understanding of the terms this is strictly contradictory.

Let me state this more clearly. "The design" is a collection of user objects that inherit from Elaboratable. After they are created, these user objects (normally) contain only nMigen Signals; they do not immediately create any Modules or Statements. So in typical user code that looks something like:

dut = MinervaSoC(...)
sim = Simulator(dut)

the dut object is an Elaboratable that only contains Signals. If you were able to write dut.x.set(), then there somehow should be a link between dut and sim that is created when you evaluate Simulator(dut), but that would violate the principle that elaborating the design must be idempotent.
(In most cases, calling elaborate() should not change the objects that are a part of the design at all, but it is fine to populate a cache or something similar.)

Moreover, the following code is not only allowed but desirable:

dut = MinervaSoC(...)
sim1 = Simulator(dut)
sim2 = Simulator(dut)

because it allows e.g. parallelizing tests. (One can also imagine copying a simulator so as not to perform elaboration twice.)

AFAICT currently altering the logic in a design is actually prevented by the fact that you generally don't have access to the elaborated Module() during simulation. Is that by accident or intentional? Migen has finalization after which you couldn't/shouldn't add/alter logic. How final is elaboration?

Intentional. Migen finalization had a severe issue where (a) finalization order is not controllable or reliable and (b) the correctness of your design sometimes depends on it. E.g. m-labs/migen#158 / m-labs/microscope#1. The reason this issue happens is this code:

https://github.com/m-labs/migen/blob/7882e71660469c8966b2d7ca987531ab32dd13af/migen/fhdl/module.py#L156-L159

Consider what happens if you want to add a microscope probe that samples the state of some FSM. FSMs use finalization, and the state signal does not exist until that happens, so you have to add another finalization method to insert the probe. But, microscope also uses finalization to insert probe logic. There is an implicit dependency from the microscope core module to every microscope probe, even those which may not have been created yet.

I think the only way to solve this problem while keeping finalization itself would be to iterate the design to fixpoint (i.e. until finalization does not create any further logic). In fact, Migen does one iteration of that (collecting the submodules that finalization added), but stops afterwards. However, it would be difficult to write code for e.g. microscope that can robustly handle this; you might find that you need to resize a FIFO for example, which isn't really possible right now.

nMigen avoids the problem entirely by replacing finalization (which repeatedly mutates design logic in the sequence chosen by Migen, and can be performed at most once) with elaboration (which creates design logic from scratch in the sequence chosen by the user, and can be performed however many times).

In other words, the finalize method allows late binding, but it also invokes side effects to do anything useful, which is why it is order-dependent. Conversely, the elaborate method also allows late binding, but by the time it runs, all side effects should have already been performed, and so the order in which it runs doesn't matter.

Is it a panacea? No. There is nothing that inherently stops you from changing the design in the elaborate method, and if you do that, then you'll get the same kinds of bugs. The reason I chose this design is that I believe it encourages writing correct code through demonstration of intent. If you are writing a finalize method, then mutating the design (or some global registry) is what you're expected to do, and it is not clear that this will lead to problems. If you are writing an elaborate method, then mutating the design (or some global registry) is very unusual, and in case of microscope, a more natural way to add probes would be to insert them into the returned fragment tree, which there are a few ways to do.

(The other reason I chose it is because Migen had the convention where ### separated the code that defined an interface in the constructor from the first half of the code that defined the implementation in the constructor, the second half being the code in the finalize method, and it seemed natural to codify and streamline this convention.)

In my understanding of the terms this is strictly contradictory.

Note that the above does not in any way impact adding or removing logic during simulation. Consider the following code, where sub1 and sub2 are some elaboratables that talk to each other through some Signals they share:

cd_sync = ClockDomain()
m = Module()
m.clock_domains += cd_sync
m.submodules += sub1, sub2
sim = Simulator(m)

With a 2-line change to the new simulator to add a public add_hdl_process API, we could rewrite this as:

cd_sync = ClockDomain()
m1 = Module()
m1.clock_domains += cd_sync
m1.submodules += sub1
m2 = Module()
m2.clock_domains += cd_sync
m2.submodules += sub2
sim = Simulator()
sim.add_hdl_process(m1)
sim.add_hdl_process(m2)

The behavior would be exactly equivalent. In fact, this is almost identical to what Simulator(m) in the first snippet does internally, with the exception that the first snippet elaborates once and the second snippet twice. (After all transformations, e.g. domain lowering, and assuming there was no hierarchy flattening in m, the fragments corresponding to sub1 and sub2 would be identical in both cases.)

Why is it this easy? Well, because the new simulator is designed to make it possible to use Python code as a drop-in replacement for e.g. the HDL in sub2. So from its perspective, there is no difference between sim.add_hdl_process(m2) and a series of sim.add_comb_process(...) + sim.add_sync_process(...) that drive the same signals in the same domains.

Do you have an example where this would be problematic?

I think there is no conceptual problem with your proposal. It is an implementation issue.

async def foo():
    await simulator.interval(1.)
asyncio.run(foo())

I don't know a way to make the above work nicely without adding a massive amount of overhead by creating lots of futures internally in the simulator. I know how to do it in a way where the simulator becomes the event loop (although that still has considerable overhead beyond what we can do with generators), but then you couldn't use asyncio futures together with the simulator.

Given that a lot of people complain that the old nMigen simulator is very slow, and that I put in a lot of effort making the new nMigen simulator faster (it is currently 4× faster than the old nMigen simulator, and should be 2× faster than the Migen simulator while being significantly more flexible), I have a lot of reservations about adding overhead there.

@whitequark
Copy link
Contributor Author

I think I figured out semantics that would result in a robust simulator. I'm going to write a detailed description in a separate document.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants