Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

XDR configuration should allow disabling gearboxes #532

Closed
sapphire-arches opened this issue Nov 7, 2020 · 8 comments
Closed

XDR configuration should allow disabling gearboxes #532

sapphire-arches opened this issue Nov 7, 2020 · 8 comments
Labels

Comments

@sapphire-arches
Copy link

I ran in to this while working on my port of Bob Miller's Icebreaker Candy. That project (ab)uses XDR pins to get effectively 2x resolution on a handful of important signals (LED panel blank, latch, and generating SCK). When using the raw Lattice primitives, this is easy:

    SB_IO #(
        .PIN_TYPE(6'b010001)
    ) it (
        .PACKAGE_PIN(ddr_pin),
        .LATCH_INPUT_VALUE(1'b0),
        .INPUT_CLK(clk),
        .OUTPUT_CLK(clk),
        .D_OUT_0(1),
        .D_OUT_1(0));

will effectively automatically generate the required pixel clock (SCK) signal. In nmigen however, the automatically generated gearbox for XDR=2 doesn't allow direct access to D_OUT_1. Confusingly the ECP5 platform does allow direct access to these pins (e.g. https://github.com/nmigen/nmigen/blob/c6150d05867bea1a7266e3c950d3d9846ba7ed58/nmigen/vendor/lattice_ecp5.py#L532). From reading the documentation on Pin it seems like ECP5 is "wrong" here -- we should be generating gearboxes there as well.

I have a local patch which does the "dumb" thing and just makes ICE40 behave like ECP5 (no gearbox) but I doubt you want that upstream =). Instead, either the platform.request API or the resource configuration should allow specifying whether gearboxes are desired. This probably interacts with #458.

@adamgreig
Copy link
Contributor

What goes wrong when you just use the normal nmigen-generated output? Such as

import nmigen as nm
from nmigen_boards.icebreaker import ICEBreakerPlatform
plat = ICEBreakerPlatform()
m = nm.Module()
plat.add_resources(plat.break_off_pmod)
led = plat.request("led_r", 1, xdr=2)
m.d.comb += led.o_clk.eq(nm.ClockSignal()), led.o0.eq(1), led.o1.eq(0)
plat.build(m)

From a quick look at the generated verilog this essentially generates exactly what you've put as your raw example, with the only exception being that D_OUT_1 has an extra FF added that captures on the OUTPUT_CLK rising edge, which won't affect the generated signal/latency (but improves hold time for that signal from your logic). With the snippet above you should get exactly the same square-wave-at-clock-frequency output. That said, for just generating the clock, you could also just m.d.comb += led.o.eq(ClockSignal()), right? What does your patch do?

@whitequark
Copy link
Member

whitequark commented Nov 7, 2020

it seems like ECP5 is "wrong" here -- we should be generating gearboxes there as well.

If I remember correctly, the gearbox in ECP5 is built into the IOB. But I might be wrong.

Instead, either the platform.request API or the resource configuration should allow specifying whether gearboxes are desired.

My view is that:

  • "requesting pin with XDR>0 results in i/o/oe synchronized to the corresponding clock, possibly with added latency" is a well-defined guarantee that every platform ought to be able to provide, and
  • "requesting pin with XDR>0 results in i/o/oe having platform-specific relationship to the clock and implementation-specific latency and synchronization requirements" is a useless guarantee that almost no downstream code would be able to actually use.

Pin provides an abstraction, specifically that of a pin with gearbox. Eliminating platform-specific differences is the whole point. If you don't want that, don't use the abstraction! Instantiate the IOB directly.

@daveshah1
Copy link

If I remember correctly, the gearbox in ECP5 is built into the IOB. But I might be wrong.

Yeah, that's correct, the total latency for ECP5 is 3 cycles as a result, and both inputs are synchronous to the posedge of the clock

@sapphire-arches
Copy link
Author

Hmm, I think my understanding of what went wrong is incorrect. Here is my patch:

diff --git a/nmigen/vendor/lattice_ice40.py b/nmigen/vendor/lattice_ice40.py
index d5e4dd6..dbddfcc 100644
--- a/nmigen/vendor/lattice_ice40.py
+++ b/nmigen/vendor/lattice_ice40.py
@@ -498,14 +498,14 @@ class LatticeICE40Platform(TemplatedPlatform):
                 pin_o0 = get_oneg(pin.o0, o_invert)
                 pin_o1 = get_oneg(pin.o1, o_invert)
 
-        if "i" in pin.dir and pin.xdr == 2:
-            i0_ff = Signal.like(pin_i0, name_suffix="_ff")
-            i1_ff = Signal.like(pin_i1, name_suffix="_ff")
-            get_dff(pin.i_clk, i0_ff, pin_i0)
-            get_dff(pin.i_clk, i1_ff, pin_i1)
-        if "o" in pin.dir and pin.xdr == 2:
-            o1_ff = Signal.like(pin_o1, name_suffix="_ff")
-            get_dff(pin.o_clk, pin_o1, o1_ff)
 
         for bit in range(len(port)):
             io_args = [
@@ -561,7 +561,7 @@ class LatticeICE40Platform(TemplatedPlatform):
                     # Re-register negedge output after it leaves fabric. This increases setup time
                     # to an entire cycle, and doesn't add latency.
                     io_args.append(("i", "D_OUT_0", pin_o0[bit]))
-                    io_args.append(("i", "D_OUT_1", o1_ff[bit]))
+                    io_args.append(("i", "D_OUT_1", pin_o1[bit]))
 
             if pin.dir in ("oe", "io"):
                 io_args.append(("i", "OUTPUT_ENABLE", pin.oe))

It removes the automatically generated FF from the o1 datapath. I think I've severely misunderstood the purpose of that FF though. I'll do some more experiments, but in the mean time I do have some followup questions:

  • Is there a way to retrieve the expected latency between the input ports on the I/O block and the output? For these LED panels (and I imagine other applications) I need the non-DDR signals to hit the edge of the chip at the same time as the DDR signals. I can manually add registers of course, but it would be nice if there was a way to do that generically.
  • Should there be another FF for the o0 signal here? Without it my design glitches on the panel (presumably because bits are getting scrambled, but I'll have to code up a test design for that because I don't actually have a debug tool fast enough to measure the panel signals at full speed). Adding one as follows fixes those glitches:
diff --git a/nmigen/vendor/lattice_ice40.py b/nmigen/vendor/lattice_ice40.py
index d5e4dd6..c29bdee 100644
--- a/nmigen/vendor/lattice_ice40.py
+++ b/nmigen/vendor/lattice_ice40.py
@@ -504,7 +504,9 @@ class LatticeICE40Platform(TemplatedPlatform):
             get_dff(pin.i_clk, i0_ff, pin_i0)
             get_dff(pin.i_clk, i1_ff, pin_i1)
         if "o" in pin.dir and pin.xdr == 2:
+            o0_ff = Signal.like(pin_o0, name_suffix="_ff")
             o1_ff = Signal.like(pin_o1, name_suffix="_ff")
+            get_dff(pin.o_clk, pin.o0, o0_ff)
             get_dff(pin.o_clk, pin_o1, o1_ff)
 
         for bit in range(len(port)):
@@ -560,7 +562,7 @@ class LatticeICE40Platform(TemplatedPlatform):
                 elif pin.xdr == 2:
                     # Re-register negedge output after it leaves fabric. This increases setup time
                     # to an entire cycle, and doesn't add latency.
-                    io_args.append(("i", "D_OUT_0", pin_o0[bit]))
+                    io_args.append(("i", "D_OUT_0", o0_ff[bit]))
                     io_args.append(("i", "D_OUT_1", o1_ff[bit]))
 
             if pin.dir in ("oe", "io"):

@sapphire-arches
Copy link
Author

sapphire-arches commented Nov 7, 2020

OK, I've thrown together this test design:


import nmigen as nm
from nmigen_boards.icebreaker import ICEBreakerPlatform
plat = ICEBreakerPlatform()

plat.default_clk = 'SB_HFOSC'
plat.hfosc_div = 3

m = nm.Module()
plat.add_resources(plat.break_off_pmod)

led0 = plat.request("led_g", 1, xdr=2) # PMOD2.1
led1 = plat.request("led_g", 2, xdr=2) # PMOD2.2
led2 = plat.request("led_g", 4) # PMOD2.3

led0_con = nm.Signal(2)
led1_con = nm.Signal(2)
flop = nm.Signal()
flop_ff = nm.Signal()

m.d.sync += flop_ff.eq(flop)

with m.FSM():
with m.State("S0"):
m.d.sync += led0_con.eq(0b00)
m.d.sync += led1_con.eq(0b01)
m.d.sync += flop.eq(0)
m.next = "S1"
with m.State("S1"):
m.d.sync += led0_con.eq(0b11)
m.d.sync += led1_con.eq(0b11)
m.d.sync += flop.eq(1)
m.next = "S2"
with m.State("S2"):
m.d.sync += led0_con.eq(0b00)
m.d.sync += led1_con.eq(0b10)
m.d.sync += flop.eq(0)
m.next = "S3"
with m.State("S3"):
m.d.sync += led0_con.eq(0b00)
m.d.sync += led1_con.eq(0b00)
m.d.sync += flop.eq(0)
m.next = "S0"

m.d.comb += led0.o_clk.eq(nm.ClockSignal()), led0.o0.eq(led0_con[0]), led0.o1.eq(led0_con[1])
m.d.comb += led1.o_clk.eq(nm.ClockSignal()), led1.o0.eq(led1_con[0]), led1.o1.eq(led1_con[1])
m.d.comb += led2.eq(flop_ff)

plat.build(m, do_program=True)

Which I think demonstrates the requirement for a FF on o0. led0 and led1 should go high for 1 clock cycle at a 180 degree phase offset to led2, which provides a reference for the internal clock. led0 stays high for 1 cycle, while led1 stays high for 2.
Without an FF on d0, we get the following signals out:

image

(top row is led0, then led1, then led2). Adding an FF to d0 using the following patch:

diff --git a/nmigen/vendor/lattice_ice40.py b/nmigen/vendor/lattice_ice40.py
index d5e4dd6..c29bdee 100644
--- a/nmigen/vendor/lattice_ice40.py
+++ b/nmigen/vendor/lattice_ice40.py
@@ -504,7 +504,9 @@ class LatticeICE40Platform(TemplatedPlatform):
             get_dff(pin.i_clk, i0_ff, pin_i0)
             get_dff(pin.i_clk, i1_ff, pin_i1)
         if "o" in pin.dir and pin.xdr == 2:
+            o0_ff = Signal.like(pin_o0, name_suffix="_ff")
             o1_ff = Signal.like(pin_o1, name_suffix="_ff")
+            get_dff(pin.o_clk, pin.o0, o0_ff)
             get_dff(pin.o_clk, pin_o1, o1_ff)
 
         for bit in range(len(port)):
@@ -560,7 +562,7 @@ class LatticeICE40Platform(TemplatedPlatform):
                 elif pin.xdr == 2:
                     # Re-register negedge output after it leaves fabric. This increases setup time
                     # to an entire cycle, and doesn't add latency.
-                    io_args.append(("i", "D_OUT_0", pin_o0[bit]))
+                    io_args.append(("i", "D_OUT_0", o0_ff[bit]))
                     io_args.append(("i", "D_OUT_1", o1_ff[bit]))
 
             if pin.dir in ("oe", "io"):

"fixes" the signals to be what I would expect:

image

I can submit a PR for the above change if desired, or I can contribute some documentation of this so others don't stub their toe on it =). My question about getting the gearbox latency from above still remains.

@adamgreig
Copy link
Contributor

I haven't tried it on hardware, but I think your demo has mixed up o0 and o1 - o0 is the value for the rising edge, o1 for the subsequent falling edge, so you've assigned the LSb to happen first, followed by the MSb. That means your state machine sequences, in time order: 10 11 01 00.

If you bit-reverse the values for led1_con, do you get the expected output? I think that adding the extra FF to o0 is inserting an extra cycle of latency into o0 compared to o1, which re-orders them in time, which is why it seems to fix your example.

Meanwhile I'd expect led0 and led2 to be identical (as in your first trace), since both flop and led0_con are assigned the same value at the same time, then you add one register to flop and led0 is at xdr=2 so contains one register too, meaning both end up with the same latency to the output.

@sapphire-arches
Copy link
Author

I see. I think there's nothing to do here then, I'm just a fool and need to fix my design. Thanks everyone, and sorry for the noise.

@whitequark
Copy link
Member

I'm just a fool and need to fix my design

We certainly should have better documentation here.

My question about getting the gearbox latency from above still remains.

You're correct in that gearbox latency should be provided but isn't. I have omitted it when I first implemented IO buffer insertion, and no one had enough time and inclination to add them since then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Development

No branches or pull requests

4 participants