-
Notifications
You must be signed in to change notification settings - Fork 176
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
back.pysim performance improvements #348
Conversation
Curr/next comparisons implement the two-phase simulation algorithm that is deterministic in presence of asynchronous code. I looked at your fix and it will lead to incorrect simulation results. I can explain the algorithm if necessary.
I looked at your changes and while the intent seems reasonable, the patch seems fairly invasive at first glance. I'll take another look later.
Extremely good change, one I was going to make for some time but never got around to.
Also looks all reasonable. To summarize, 642e477, d1fe3c6 and 6540568 are mergeable as-is, 6902c99 needs more discussion, and 52e30bc is unsound. |
Codecov Report
@@ Coverage Diff @@
## master #348 +/- ##
==========================================
+ Coverage 82.69% 83.17% +0.47%
==========================================
Files 35 35
Lines 5919 5991 +72
Branches 1201 1179 -22
==========================================
+ Hits 4895 4983 +88
+ Misses 863 859 -4
+ Partials 161 149 -12
Continue to review full report at Codecov.
|
nmigen/hdl/ast.py
Outdated
@@ -124,6 +124,14 @@ def cast(obj): | |||
return Const(obj.value, Shape.cast(type(obj))) | |||
raise TypeError("Object {!r} cannot be converted to an nMigen value".format(obj)) | |||
|
|||
@staticmethod | |||
def castable(obj): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A patch dedicated to optimization should not introduce a new public API.
Thanks!
I'm not sure I understand, but it's easy enough to revert. Would you prefer I modify the history to keep it linear or layer revert commits etc. on top of what's already here?
Yeah, I wasn't entirely sure how to make this less invasive. A big part of the performance gain was from eliminating glue logic from what was previously ValueKey/SignalKey.
As it is, 642e477 depends on 6902c99 -- that was the motivation for the changes to ValueKey. I actually started with repr(value) as the key, but there was still a fair bit of overhead there.
Should that specific method be named _castable? Or should it go somewhere else? |
The basic idea is that:
The reason it's arranged like that is because rules (1) and (2) ensure that no matter in which order No part of this scheme is an "optimization"; it's solely used for correctness.
I strongly prefer keeping the history linear.
I'm not sure yet. I'll need to understand your proposed changes in detail to see if there is a less invasive way to do the same thing, or if there's really no other choice. |
I reverted the unsound curr/next change and rearranged the interning API to conserve the existing ValueKey and SignalKey classes. I think all the new abstractions should now be hidden. |
Thanks. I've cherry-picked the second two commits. I expect it will take a while to convince myself that the proposed interning changes are the way forward. I appreciate that you rearranged them to have no impact to the public API, but the bulk of my concern has to do with the increased internal complexity as well as soundness in face of mutability of AST nodes: I am hesitant to introduce any sort of persistent hashing while properties of most AST nodes remain writable. Have you considered adding fast paths to the simulator, for the case where bench code operates on simple signals alone? This removes the need for caching bytecode, is an easy to understand change with minimal impact, and should yield most of the performance benefit. |
Understood. I'm not necessarily expecting to get this specific design upstreamed, it's just where I ended up after probing for performance improvements.
That makes sense. Do you anticipate the AST becoming immutable?
I'm not sure I follow. Are you thinking along the lines of falling back to a simpler scheduler if only the default sync and comb domains are used? Or did you have something else in mind? |
Thank you for explaining this, BTW. The eval/commit/loop approach makes perfect sense to me. I think I see at least one way the unsoundness of 52e30bc could manifest: if one invocation sets next != curr, and a second invocation sets it back to curr, it will be left with the wrong value for next. That's important because a single fragment could set next repeatedly. I retraced my steps and realized that the call to pending.clear was the only part of that commit that actually mattered for performance. The rest of the changes were a result of trying to micro-optimize _SignalState.commit. If I'm not (still) very much mistaken, calling pending.clear should be correct since at that point in _SimulatorState.commit, every pending update from the current eval/commit loop has been effected, and the next eval step will set pending if necessary. What do you think? |
This is a bit of a complex issue. At the very beginning, I designed nMigen so that it never mutates ASTs (and it still doesn't). When it needs to change an AST, it emits a new one with the necessary elements changed (the FragmentTransformer stuff). I knew this could be a performance issue when writing it, but I didn't realize how just how slow it would be. This slowness could be fixed in two ways: a simpler but more flawed one--which is to copy the entire AST once, eliminating any sharing, and then mutating it; and a more complex but principled one--which is to not change ASTs in first place (with or without mutation), and instead treat The principled approach potentially has very significant benefits. It would dramatically improve performance on all designs but especially large generic ones. It would eliminate multiple complicated corner cases from the internal machinery. It would bring the internals of nMigen closer to well understood and well behaved compiler development principles, namely using variable bindings for late bound names (there's nothing wrong in principle with term rewriting it's currently doing, but this term rewriting is very ad-hoc and hard to analyze). It's also a lot of work. There is a separate issue, which is the user mutability of ASTs, whether interior or exterior. For most expression nodes (everything except Signal, ArrayProxy, and UserValue), this is clearly undesirable. For ArrayProxy and UserValue, it's just interior mutability, and there's already code to cope with it, which should be largely sufficient. For Signal, the story is a bit complex--I believe it's never been "officially" supported, but you can mutate Signals and the results are currently correct, provided you never use them in expressions before you stop mutating them. This is used internally in nMigen in the FSM generator code, so it seems like a valuable capability and we might want to enforce that invariant in a future immutable AST. To summarize, I think the AST should be immutable, but we're not quite there yet, and there is at least one corner case that currently presents an API with a soundness issue that would likely need to be handled explicitly. |
Yup, and there are many cases where this happens--for example each time you assert reset.
Could you show a patch that does what you want? I'm not quite sure exactly what change you're referring to, and it seems like preparing a patch isn't a lot of effort here. |
Oh... did I just completely forget to remove elements from |
I'm curious, could you measure the perf impact? I'm really glad you noticed this, it's a really silly mistake with severe consequences. |
I also cherry-picked it onto master:
Here's what I got with VCD output disabled:
|
I've cherry-picked commit d891a79, thank you!
Something else. You can notice that most simulator commands are along the lines of I think @jordens proposed (in #228) an improvement to the simulator interface that would make all commands except for these two impossible to express. I'm not yet fully convinced that this is the way to go, and we'll have to keep the existing code for Migen compatibility, but you can see that this special case is less narrow than it looks at first. |
I integrated the cherry-picked changes to clean up the PR in case you decide to do something with it. One thing that occurred to me that I haven't bothered to change is that the compilation cache should probably be scoped to the simulator rather than the coroutine process.
That makes a lot of sense. Speaking of the simulator interface, something I've thought of requesting is a way to wait asynchronously for a signal change from a coroutine process. It sounds from #228 like you already have some idea of how you'd like to do that? FWIW I found Value hashing useful for hacking together (emphasis on "hack") an implementation using _FragmentCompiler. |
Internally, this is easy to do. The external API needs to be designed. |
As per discussion above, I am closing this PR; I would encourage you (or anyone else) to submit another PR with a less invasive approach I suggested above. If it turns out that (contrary to my expectations) the simpler approach does not bring similar performance benefits, then I'm going to reevaluate the approach in this PR. Thanks for doing this work! |
I'm not sure how high a priority the Python simulation backend is or whether it's worth making the structural changes to hdl.ast to optimize hashing, but I figured I would share what I've figured out so far in case it's useful. Please let me know what you think.
The changes here fall roughly into the following categories:
I used the following simulation to make performance measurements: sim_perf_test.py. I have perf data for each commit in the chain, both with and without VCD output enabled. I can see about uploading them somewhere if there's interest, but in the meantime here's a summary:
With no VCD output:
With VCD output enabled:
I did run the nMigen unit tests, but I haven't figured out yet how to set up sby on Windows so I don't know whether those tests are broken by my changes.