-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pysim is very slow #160
Comments
I also tried using the following AST transformer to optimize out some no-ops pysim2 emits, but it resulted in less than 0.5% speedup, which is not worth the added complexity: diff --git a/nmigen/back/pysim.py b/nmigen/back/pysim.py
index 6f905ff..1cb44d4 100644
--- a/nmigen/back/pysim.py
+++ b/nmigen/back/pysim.py
@@ -1,3 +1,5 @@
+import sys
+import ast
import inspect
import warnings
from contextlib import contextmanager
@@ -337,6 +339,28 @@ class _Emitter:
return name
+class _Optimizer(ast.NodeTransformer):
+ # Python has a built-in AST optimizer that can fold constant expressions like `100 & 1`
+ # just fine. However, it does not optimize some expressions, e.g. `x << 0` or `x & 1 & 1`
+ # because, while they can be algebraically simplified, there is no way to communicate that
+ # `x: int`, which is a precondition. However, we know this precondition holds, so we can
+ # just do it ourselves. Fortunately, the amount of pathological expressions we emit is low.
+ def visit_BinOp(self, node):
+ if type(node.op) in (ast.LShift, ast.RShift):
+ if type(node.right) is ast.Num and node.right.n == 0:
+ node.left = self.visit(node.left)
+ elif type(node.op) is (ast.BitAnd, ast.BitOr):
+ left = self.visit(node.left)
+ if type(left) is ast.BinOp and type(left.op) is type(node.op):
+ if type(node.right) is ast.Num and type(left.right) is ast.Num:
+ if node.right.n == left.right.n:
+ return left
+ elif type(node.op) is ast.Mult:
+ if type(node.right) is ast.Num and node.right.n == 1:
+ return self.visit(node.left)
+ return self.generic_visit(node)
+
+
class _Compiler:
def __init__(self, context, emitter):
self.context = context
@@ -750,7 +774,10 @@ class _FragmentCompiler:
emitter.append(f"slots[{signal_index}].set(next_{signal_index})")
exec_locals = {"slots": domain_process.context.slots, **_ValueCompiler.helpers}
- exec(emitter.flush(), exec_locals)
+ code = emitter.flush()
+ if sys.flags.optimize > 0:
+ code = _Optimizer().visit(ast.parse(code))
+ exec(compile(code, domain_process.name, "exec"), exec_locals)
domain_process.run = exec_locals["run"]
processes.add(domain_process) |
Another interesting detail: before, PyPy was significantly faster than CPython. With new pysim, PyPy (after warmup) is pretty much exactly as fast as CPython. @programmerjake any idea why? |
I'm not sure, I didn't get my SIMD multiplier code (which is probably a good benchmark due to having a short setup time and a long simulate time) to work, it just raises I haven't been paying very close attention so I'm not sure how pysim vs. pysim2 changed at an architectural level. I would guess something to do with cache misses in a hash table making both be bottlenecked on memory latency? |
pysim2 compiles nMigen AST to Python code that uses almost exclusively integer operations and control flow. |
Pysim is slow at simulation (this gets much better with PyPy) and at elaboration (this gets worse with PyPy). Needs investigation as to why exactly.
The text was updated successfully, but these errors were encountered: