Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Finding out what resources an nMigen module uses in the output design #289

Closed
JarrettBillingsley opened this issue Dec 31, 2019 · 12 comments
Labels

Comments

@JarrettBillingsley
Copy link

JarrettBillingsley commented Dec 31, 2019

This is a question!

I've been learning FPGA design on an iCEStick with nMigen. I haven't had more than "toy-level" experience with Verilog or with the yosys-nextpnr toolchain. The problem I keep running into is sort of multi-faceted:

  • The optimization pass is very good at removing unused bits of circuitry
    • But sometimes it optimizes things too well and I don't realize that it's removing large parts of my circuit
  • This gives me a false sense of how many cells my design uses, and sometimes changing one wire connection adds hundreds of cells to the output design
    • Because before it was throwing out 1/3 of my circuit without me even knowing
  • So I'd like to find out what is actually taking up the cells/nets, to know where to optimize my design, as well as a report of what was optimized away
    • But I haven't found an easy way to do that

I've been reading the .rpt file, but that's huge, and a lot of it doesn't make much sense without thoroughly understanding the toolchain that produced it.

I also open the design in the nextpnr-ice40 GUI and then poke around in the cells/nets lists to see what signals they map back to. This is pretty awkward and indirect, and a lot of the cells/nets are autogenerated with names that only refer to the pass that generated them (e.g. simplemap_dff, alumacc, etc), and not the original code that they map to.

I think it's doing some serious global optimization, so I don't expect to be able to get a perfect report of "elaboratable X is using Y LCs" or something, but it would at least be nice to be able to get an estimate. Is this something that the toolchain provides? Is this even possible to do?

Thanks for any help you can give. Keep up the great work on this library.

@Ravenslofty
Copy link
Contributor

  • The optimization pass is very good at removing unused bits of circuitry
    • But sometimes it optimizes things too well and I don't realize that it's removing large parts of my circuit
  • This gives me a false sense of how many cells my design uses, and sometimes changing one wire connection adds hundreds of cells to the output design
    • Because before it was throwing out 1/3 of my circuit without me even knowing

This is unfortunate behaviour, but when it happens it generally means you have a pretty major logic bug. The best way to catch these is through simulation and a healthy dose of paranoia.

  • So I'd like to find out what is actually taking up the cells/nets, to know where to optimize my design, as well as a report of what was optimized away
    • But I haven't found an easy way to do that

My personal approach here is to use verilog.convert to output Verilog and then run yosys -p "synth_ice40 -abc9 -noflatten" output.v. This will run just synthesis of it, and at the end you'll get output that looks like a series of these:

=== alpha_blend ===

   Number of wires:                701
   Number of wire bits:           1148
   Number of public wires:         701
   Number of public wire bits:    1148
   Number of memories:               0
   Number of memory bits:            0
   Number of processes:              0
   Number of cells:                777
     SB_CARRY                       72
     SB_DFFSR                      102
     SB_LUT4                       603

Here, alpha_blend is not the name of the Elaboratable class, but rather what you called the module when you did m.submodules["name"] = .... (this means that if you instantiate an Elaboratable multiple times, it will appear multiple times in the output). The Number of lines are not very useful, but the lines after it are; they tell you the resource usage in carry logic (SB_CARRY), flip-flops (SB_DFFSR) and LUTs (SB_LUT4) respectively. They're not final numbers; you would need to run nextpnr for that, but they're a useful guide, I think.

I've been reading the .rpt file, but that's huge, and a lot of it doesn't make much sense without thoroughly understanding the toolchain that produced it.

I'll admit Yosys does produce a lot of noise here, but there are useful things inside of it: the stat display (which I just went through) is helpful and memory_bram will usually explain why a Memory is being synthesised into flops instead of RAM cells.

I also open the design in the nextpnr-ice40 GUI and then poke around in the cells/nets lists to see what signals they map back to. This is pretty awkward and indirect, and a lot of the cells/nets are autogenerated with names that only refer to the pass that generated them (e.g. simplemap_dff, alumacc, etc), and not the original code that they map to.

Your Yosys could be a little old; Yosys added an autoname pass to deal with this problem in November, producing much more useful output.

@whitequark
Copy link
Contributor

whitequark commented Dec 31, 2019

I think it's doing some serious global optimization, so I don't expect to be able to get a perfect report of "elaboratable X is using Y LCs" or something, but it would at least be nice to be able to get an estimate. Is this something that the toolchain provides? Is this even possible to do?

@ZirconiumX's suggestion of the -noflatten flag is the best way to do this that I know as well. A friend mentioned recently that it would be convenient to have an editor extension that would show this information next to your code, which I also think would be neat.

Their suggestion of using simulation (or property testing) to catch logic bugs is also good.

They're not final numbers; you would need to run nextpnr for that, but they're a useful guide, I think.

The FF numbers are final because nextpnr doesn't duplicate FFs currently (and most toolchains don't do it by default). The LUT and carry numbers aren't quite final because of legalization, but they are very close and I would say you don't need to look at nextpnr report.


In general unfortunately the issues you highlight are valid but they are not something nMigen can fix as it is at the mercy of the synthesis tool. For example, Lattice Diamond prints why it removed some cell or net each time it does so.

@JarrettBillingsley
Copy link
Author

Thank you both for your responses!

Your Yosys could be a little old; Yosys added an autoname pass to deal with this problem in November, producing much more useful output.

I was using the 0.9 release, so I just tried building and installing the newest version. That is SO much better, thank you!

I'll admit Yosys does produce a lot of noise here, but there are useful things inside of it: the stat display (which I just went through) is helpful and memory_bram will usually explain why a Memory is being synthesised into flops instead of RAM cells.

Oh yes! I used that BRAM pass output to figure out why my 256-byte RAM wasn't really a RAM, haha. But sometimes the "efficiency" is too low and it uses DFFs anyway, and I'm not sure why... maybe you know?

This is unfortunate behaviour, but when it happens it generally means you have a pretty major logic bug. The best way to catch these is through simulation and a healthy dose of paranoia.

Yeah, I've mostly just gone in with guns blazing and tested with my ears (it's a little synthesizer). I'm still very new to hardware design and haven't learned the best practices for development yet. :D

My personal approach here is to use verilog.convert to output Verilog and then run yosys -p "synth_ice40 -abc9 -noflatten" output.v. This will run just synthesis of it, and at the end you'll get output that looks like a series of these:

That's pretty much EXACTLY what I was looking for! Thanks! I can already see some places where I need to look more closely at what's going on.

In general unfortunately the issues you highlight are valid but they are not something nMigen can fix as it is at the mercy of the synthesis tool. For example, Lattice Diamond prints why it removed some cell or net each time it does so.

Right, I figure this is mostly out of nMigen's hands. But I'm glad I was able to ask here and get some good answers. :)

I'll close this now since it was just a question. Thanks again!

@whitequark
Copy link
Contributor

But sometimes the "efficiency" is too low and it uses DFFs anyway, and I'm not sure why... maybe you know?

BRAMs are a limited resource; if Yosys converted every pair of registers with a mux into a BRAM, you'd quickly exhaust them all. Worse, currently it does not even know how many are available in any given FPGA. Also, using a BRAM may make your design slower in some cases because it constraints the router.

"Efficiency" is an attempt to capture all these concerns with a single number. It's not perfect, but it helps.

My personal approach here is to use verilog.convert to output Verilog and then run yosys -p "synth_ice40 -abc9 -noflatten" output.v.

You don't have to use verilog.convert by the way; if you're using nMigen platforms, you can set the environment variable NMIGEN_synth_opts to -abc9 -noflatten.

@JarrettBillingsley
Copy link
Author

"Efficiency" is an attempt to capture all these concerns with a single number. It's not perfect, but it helps.

Makes sense. The log mentions "waste" (unused bits?) and seems to judge my memories too wasteful to bother with the BRAMs. But I have two 8-element arrays of 24-bit numbers, which is 384 bits, and that's a lot of flip-flops on an HX1K...

Are there ways to "encourage" it to use BRAMs even when it doesn't think they're the best idea? I have a ton of timing slack right now so I don't even mind if there's some performance hit.

You don't have to use verilog.convert by the way; if you're using nMigen platforms, you can set the environment variable NMIGEN_synth_opts to -abc9 -noflatten.

I was just playing around with that! I did

platform.build(top, do_program = args.program or args.interactive, synth_opts = '-abc9 -noflatten')

And though yosys was happy, nextpnr died with an assertion failure (!).

libc++abi.dylib: terminating with uncaught exception of type nextpnr_ice40::assertion_failure: Assertion failure: !ctx->net_aliases.count(labelid) (/tmp/nextpnr-ice40-20191220-16205-avs0vj/json/jsonparse.cc:942)

@daveshah1
Copy link

And though yosys was happy, nextpnr died with an assertion failure (!).

You will need bang-up-to-date nextpnr to use -noflatten hierarchical netlists with it (and even then beware that it is an experimental feature)

@whitequark
Copy link
Contributor

Are there ways to "encourage" it to use BRAMs even when it doesn't think they're the best idea?

Wish granted: YosysHQ/yosys#1603

@JarrettBillingsley
Copy link
Author

You will need bang-up-to-date nextpnr to use -noflatten hierarchical netlists with it (and even then beware that it is an experimental feature)

Fair enough! I can use the -noflatten just when I wanna look at the resource usage.

Wish granted: YosysHQ/yosys#1603

Very nice. How exactly would I use that though?

@whitequark
Copy link
Contributor

How exactly would I use that though?

On iCE40, use (* ram_block *) to force BRAM, and (* logic_block *) to force FFRAM.

@JarrettBillingsley
Copy link
Author

Sorry but, how about in nMigen?

@whitequark
Copy link
Contributor

Oh oops sorry. That's um, actually not possible right now... primarily because Yosys did not have this feature before. Could you open an issue for attaching attributes to memories?

@JarrettBillingsley
Copy link
Author

Sure thing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants