Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pysim is very slow #160

Closed
whitequark opened this issue Jul 25, 2019 · 4 comments
Closed

pysim is very slow #160

whitequark opened this issue Jul 25, 2019 · 4 comments

Comments

@whitequark
Copy link
Contributor

Pysim is slow at simulation (this gets much better with PyPy) and at elaboration (this gets worse with PyPy). Needs investigation as to why exactly.

@whitequark
Copy link
Contributor Author

I also tried using the following AST transformer to optimize out some no-ops pysim2 emits, but it resulted in less than 0.5% speedup, which is not worth the added complexity:

diff --git a/nmigen/back/pysim.py b/nmigen/back/pysim.py
index 6f905ff..1cb44d4 100644
--- a/nmigen/back/pysim.py
+++ b/nmigen/back/pysim.py
@@ -1,3 +1,5 @@
+import sys
+import ast
 import inspect
 import warnings
 from contextlib import contextmanager
@@ -337,6 +339,28 @@ class _Emitter:
         return name
 
 
+class _Optimizer(ast.NodeTransformer):
+    # Python has a built-in AST optimizer that can fold constant expressions like `100 & 1`
+    # just fine. However, it does not optimize some expressions, e.g. `x << 0` or `x & 1 & 1`
+    # because, while they can be algebraically simplified, there is no way to communicate that
+    # `x: int`, which is a precondition. However, we know this precondition holds, so we can
+    # just do it ourselves. Fortunately, the amount of pathological expressions we emit is low.
+    def visit_BinOp(self, node):
+        if type(node.op) in (ast.LShift, ast.RShift):
+            if type(node.right) is ast.Num and node.right.n == 0:
+                node.left = self.visit(node.left)
+        elif type(node.op) is (ast.BitAnd, ast.BitOr):
+            left = self.visit(node.left)
+            if type(left) is ast.BinOp and type(left.op) is type(node.op):
+                if type(node.right) is ast.Num and type(left.right) is ast.Num:
+                    if node.right.n == left.right.n:
+                        return left
+        elif type(node.op) is ast.Mult:
+            if type(node.right) is ast.Num and node.right.n == 1:
+                return self.visit(node.left)
+        return self.generic_visit(node)
+
+
 class _Compiler:
     def __init__(self, context, emitter):
         self.context = context
@@ -750,7 +774,10 @@ class _FragmentCompiler:
                 emitter.append(f"slots[{signal_index}].set(next_{signal_index})")
 
             exec_locals = {"slots": domain_process.context.slots, **_ValueCompiler.helpers}
-            exec(emitter.flush(), exec_locals)
+            code = emitter.flush()
+            if sys.flags.optimize > 0:
+                code = _Optimizer().visit(ast.parse(code))
+            exec(compile(code, domain_process.name, "exec"), exec_locals)
             domain_process.run = exec_locals["run"]
 
             processes.add(domain_process)

@whitequark
Copy link
Contributor Author

Another interesting detail: before, PyPy was significantly faster than CPython. With new pysim, PyPy (after warmup) is pretty much exactly as fast as CPython. @programmerjake any idea why?

@programmerjake
Copy link
Contributor

Another interesting detail: before, PyPy was significantly faster than CPython. With new pysim, PyPy (after warmup) is pretty much exactly as fast as CPython. @programmerjake any idea why?

I'm not sure, I didn't get my SIMD multiplier code (which is probably a good benchmark due to having a short setup time and a long simulate time) to work, it just raises nmigen.back.pysim.DeadlineError: Delta cycles exceeded process deadline; combinatorial loop?.

I haven't been paying very close attention so I'm not sure how pysim vs. pysim2 changed at an architectural level.

I would guess something to do with cache misses in a hash table making both be bottlenecked on memory latency?

@whitequark
Copy link
Contributor Author

I haven't been paying very close attention so I'm not sure how pysim vs. pysim2 changed at an architectural level.

pysim2 compiles nMigen AST to Python code that uses almost exclusively integer operations and control flow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants