You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue is primarily informational and used to collect design choices for the XDR buffer factories for all interesting platforms.
The XDR buffer factory cannot possibly cover every conceivable permutation of modes that the platform can provide. It also shouldn't. It would be very hard to validate an abstraction for even 2-3 different XDR mode combinations, and it would not even be particularly useful for writing robust code because inevitably there will be combinations of platforms, modes and gearings that are not supported, and portable code would avoid it.
Instead, I choose to select a specific mode that would be the only one supported when using platform.request(..., xdr=n). This mode is selected so that it is supported on the maximum amount of platforms (hopefully, every single one) and presents the least amount of timing difficulties.
Specifically, XDR factories should instantiate a buffer that:
for output buffers, captures o0, o1, ... at the rising clock edge;
for input buffers, outputs i0, i1, ... at the rising clock edge, with one cycle of latency.
This way, no additional timing constraints are added: o* need to be valid for one cycle before the edge, and i* are valid for one cycle after the edge.
This can be implemented as follows for each FPGA family we support, considering only DDR (XDR=2) primitives:
iCE40: re-register D_OUT_1, and re-register D_IN_0 as well as D_IN_1 in fabric.
ECP5: the only available mode.
MachXO2: the only available mode.
Series 6: use DDR_ALIGNMENT=C0 and re-register Q0 in fabric.
Series 7: use DDR_CLK_EDGE=SAME_EDGE_PIPELINED for IDDR, and use DDR_CLK_EDGE=SAME_EDGE for ODDR.
I've looked at several FPGA families (S7, ECP5, XO2) to see how they implement XDR for X>2. It looks like there are two common motifs: an additional clock is used and there is CDC from low-speed clock to high-speed clock, with an associated latency. This is probably the only realistic way to implement this so I'd expect all other families to work the same.
So, a minimal change to the buffer factories to enable this would be to add i/o_clk_fast to pin_layout, and, since the CDC latency is both inevitable and variable, make it so that the platform can communicate the latency back to code.
Most realistic applications will also need fixed or configurable delay elements, which will also need to be exposed via pin_layout. It is not entirely trivial to do because these delay elements often have platform-specific limitations, like "input and output delay may not be used on the same pin", so that will need to be addressed somehow, instead of just always instantiating every possible delay element.
I've implemented the ECP5 buffers. MachXO2 buffers are basically the same, so I think this is all done! Of course, we can look at more platforms in the future, but I think 3 families are demonstrating the viability of the concept quite well.
This issue is primarily informational and used to collect design choices for the XDR buffer factories for all interesting platforms.
The XDR buffer factory cannot possibly cover every conceivable permutation of modes that the platform can provide. It also shouldn't. It would be very hard to validate an abstraction for even 2-3 different XDR mode combinations, and it would not even be particularly useful for writing robust code because inevitably there will be combinations of platforms, modes and gearings that are not supported, and portable code would avoid it.
Instead, I choose to select a specific mode that would be the only one supported when using
platform.request(..., xdr=n)
. This mode is selected so that it is supported on the maximum amount of platforms (hopefully, every single one) and presents the least amount of timing difficulties.Specifically, XDR factories should instantiate a buffer that:
o0
,o1
, ... at the rising clock edge;i0
,i1
, ... at the rising clock edge, with one cycle of latency.This way, no additional timing constraints are added:
o*
need to be valid for one cycle before the edge, andi*
are valid for one cycle after the edge.This can be implemented as follows for each FPGA family we support, considering only DDR (XDR=2) primitives:
D_OUT_1
, and re-registerD_IN_0
as well asD_IN_1
in fabric.DDR_ALIGNMENT=C0
and re-registerQ0
in fabric.DDR_CLK_EDGE=SAME_EDGE_PIPELINED
forIDDR
, and useDDR_CLK_EDGE=SAME_EDGE
forODDR
.See also this IRC discussion.
The text was updated successfully, but these errors were encountered: