Skip to content

Commit

Permalink
doc/bus: update
Browse files Browse the repository at this point in the history
  • Loading branch information
Sebastien Bourdeauducq committed Jul 20, 2013
1 parent 411e6ec commit 0cef983
Show file tree
Hide file tree
Showing 3 changed files with 38 additions and 60 deletions.
Binary file removed doc/asmi_topology.dia
Binary file not shown.
Binary file removed doc/asmi_topology.png
Binary file not shown.
98 changes: 38 additions & 60 deletions doc/bus.rst
Expand Up @@ -5,21 +5,21 @@ Migen Bus contains classes providing a common structure for master and slave int

* Wishbone [wishbone]_, the general purpose bus recommended by Opencores.
* CSR-2 (see :ref:`csr2`), a low-bandwidth, resource-sensitive bus designed for accessing the configuration and status registers of cores from software.
* ASMIbus (see :ref:`asmi`), a split-transaction bus optimized for use with a high-performance, out-of-order SDRAM controller.
* LASMIbus (see :ref:`lasmi`), a bus optimized for use with a high-performance frequency-ratio SDRAM controller.
* DFI [dfi]_ (partial), a standard interface protocol between memory controller logic and PHY interfaces.

.. [wishbone] http://cdn.opencores.org/downloads/wbspec_b4.pdf
.. [dfi] http://www.ddr-phy.org/
It also provides interconnect components for these buses, such as arbiters and address decoders. The strength of the Migen procedurally generated logic can be illustrated by the following example: ::

wbcon = wishbone.InterconnectShared(
self.submodules.wbcon = wishbone.InterconnectShared(
[cpu.ibus, cpu.dbus, ethernet.dma, audio.dma],
[(lambda a: a[27:] == 0, norflash.bus),
(lambda a: a[27:] == 1, wishbone2asmi.wishbone),
(lambda a: a[27:] == 1, wishbone2lasmi.wishbone),
(lambda a: a[27:] == 3, wishbone2csr.wishbone)])

In this example, the interconnect component generates a 4-way round-robin arbiter, multiplexes the master bus signals into a shared bus, and connects all slave interfaces to the shared bus, inserting the address decoder logic in the bus cycle qualification signals and multiplexing the data return path. It can recognize the signals in each core's bus interface thanks to the common structure mandated by Migen Bus. All this happens automatically, using only that much user code. The resulting interconnect logic can be retrieved using ``wbcon.get_fragment()``, and combined with the fragments from the rest of the system.
In this example, the interconnect component generates a 4-way round-robin arbiter, multiplexes the master bus signals into a shared bus, and connects all slave interfaces to the shared bus, inserting the address decoder logic in the bus cycle qualification signals and multiplexing the data return path. It can recognize the signals in each core's bus interface thanks to the common structure mandated by Migen Bus. All this happens automatically, using only that much user code.


Configuration and Status Registers
Expand All @@ -46,20 +46,21 @@ Migen Bank is a system comparable to wishbone-gen [wbgen]_, which automates the
Bank takes a description made up of a list of registers and generates logic implementing it with a slave interface compatible with Migen Bus.

A register can be "raw", which means that the core has direct access to it. It also means that the register width must be less or equal to the bus word width. In that case, the register object provides the following signals:
The lowest-level description of a register is provided by the ``CSR`` class, which maps to the value at a single address on the target bus. The width of the register needs to be inferior or equal to the bus word width. All accesses are atomic. It has the following signal properties as interface to the user design:

* ``r``, which contains the data written from the bus interface.
* ``re``, which is the strobe signal for ``r``. It is active for one cycle, after or during a write from the bus. ``r`` is only valid when ``re`` is high.
* ``w``, which must provide at all times the value to be read from the bus.

Registers that are not raw are managed by Bank and contain fields. If the sum of the widths of all fields attached to a register exceeds the bus word width, the register will automatically be sliced into words of the maximum size and implemented at consecutive bus addresses, MSB first. Field objects have two parameters, ``access_bus`` and ``access_dev``, determining respectively the access policies for the bus and core sides. They can take the values ``READ_ONLY``, ``WRITE_ONLY`` and ``READ_WRITE``.
If the device can read, the field object provides the r signal, which contains at all times the current value of the field (kept by the logic generated by Bank).
If the device can write, the field object provides the following signals:
Compound CSRs (which are transformed into ``CSR`` plus additional logic for implementation) provide additional features optimized for common applications.

* ``w``, which provides the value to be written into the field.
* ``we``, which strobes the value into the field.
The ``CSRStatus`` class is meant to be used as a status register that is read-only from the CPU. The user design is expected to drive its ``status`` signal. The advantage of using ``CSRStatus`` instead of using ``CSR`` and driving ``w`` is that the width of ``CSRStatus`` can be arbitrary. Status registers larger than the bus word width are automatically broken down into several ``CSR`` registers to span several addresses. Be careful that the atomicity of reads is not guaranteed.

As a special exception, fields that are read-only from the bus and write-only for the device do not use the ``we`` signal. Instead, the device must permanently drive valid data on the ``w`` signal.
The ``CSRStorage`` class provides a memory location that can be read and written by the CPU, and read and optionally written by the design. It can also span several CSR addresses. An optional mechanism for atomic CPU writes is provided; when enabled, writes to the first CSR addresses go to a back-buffer whose contents are atomically copied to the main buffer when the last address is written. When ``CSRStorage`` can be written to by the design, the atomicity of reads by the CPU is not guaranteed.

A module can provide bus-independent CSRs by implementing a ``get_csrs`` method that returns a list of instances of the classes described above. Similary, bus-independent memories can be returned as a list by a ``get_memories`` method.

To avoid listing those manually, a module can inherit from the ``AutoCSR`` class, which provides ``get_csrs`` and ``get_memories`` methods that scan for CSR and memory attributes and return their list. If the module has child objects that implement ``get_csrs`` or ``get_memories``, they will be called by the ``AutoCSR`` methods and their CSR and memories added to the lists returned, with the child objects' names as prefixes.

Generating interrupt controllers
================================
Expand All @@ -68,20 +69,21 @@ The event manager provides a systematic way to generate standard interrupt contr
Its constructor takes as parameters one or several *event sources*. An event source is an instance of either:

* ``EventSourcePulse``, which contains a signal ``trigger`` that generates an event when high. The event stays asserted after the ``trigger`` signal goes low, and until software acknowledges it. An example use is to pulse ``trigger`` high for 1 cycle after the reception of a character in a UART.
* ``EventSourceLevel``, which contains a signal ``trigger`` that generates an event on its falling edge. The purpose of this event source is to monitor the status of processes and generate an interrupt on their completion. The signal ``trigger`` can be connected to the ``busy`` signal of a dataflow actor, for example.
* ``EventSourceProcess``, which contains a signal ``trigger`` that generates an event on its falling edge. The purpose of this event source is to monitor the status of processes and generate an interrupt on their completion. The signal ``trigger`` can be connected to the ``busy`` signal of a dataflow actor, for example.
* ``EventSourceLevel``, whose ``trigger`` contains the instantaneous state of the event. It must be set and released by the user design. For example, a DMA controller with several slots can use this event source to signal that one or more slots require CPU attention.

The ``EventManager`` provides a signal ``irq`` which is driven high whenever there is a pending and unmasked event. It is typically connected to an interrupt line of a CPU.

The ``EventManager`` provides a method ``get_registers``, that returns a list of registers to be used with Migen Bank. Each event source is assigned one bit in each of those registers. They are:
The ``EventManager`` provides a method ``get_csrs``, that returns a bus-independent list of CSRs to be used with Migen Bank as explained above. Each event source is assigned one bit in each of those registers. They are:

* ``status``: contains the current level of the trigger line of ``EventSourceLevel`` sources. It is 0 for ``EventSourcePulse``. This register is read-only.
* ``status``: contains the current level of the trigger line of ``EventSourceProcess`` and ``EventSourceLevel`` sources. It is 0 for ``EventSourcePulse``. This register is read-only.
* ``pending``: contains the currently asserted events. Writing 1 to the bit assigned to an event clears it.
* ``enable``: defines which asserted events will cause the ``irq`` line to be asserted. This register is read-write.

.. _asmi:
.. _lasmi:

Advanced System Memory Infrastructure
*************************************
Lightweight Advanced System Memory Infrastructure
*************************************************

Rationale
=========
Expand All @@ -103,58 +105,38 @@ The first two techniques are explained with more details in [drreorder]_.

.. [drreorder] http://www.xilinx.com/txpatches/pub/documentation/misc/improving%20ddr%20sdram%20efficiency.pdf
To enable the efficient implementation of these mechanisms, a new communication protocol with the memory controller must be devised. Migen and Milkymist SoC (-NG) implement their own bus, called ASMIbus, based on the split-transaction principle.

Topology
========
The ASMI consists of a memory controller (e.g. ASMIcon) containing a hub that connects the multiple masters, handles transaction tags, and presents a view of the pending requests to the rest of the memory controller.

Each master has a number of dedicated transaction slots allocated inside the hub. Each slot is assigned a tag, that is later used in the data transfer to identify the slot the data belongs to.

It is suggested that memory controllers use an interface to a PHY compatible with DFI [dfi]_. The DFI clock can be the same as the ASMIbus clock, with optional serialization and deserialization taking place across the PHY, as specified in the DFI standard.

.. figure:: asmi_topology.png
:scale: 85 %
Migen and milkymist-ng implement their own bus, called LASMIbus, that features the last two techniques. Grouping by row had been previously explored with ASMI, but difficulties in achieving timing closure at reasonable latencies in FPGA combined with uncertain performance pay-off for some applications discouraged work in that direction.

ASMI topology.

Signals
=======
The ASMIbus consists of two parts: the control signals, and the data signals.

The control signals are used to issue requests.

* Master-to-Hub:

* ``adr`` communicates the memory address to be accessed. The unit is the word width of the particular implementation of ASMIbus.
* ``we`` is the write enable signal.
* ``stb`` qualifies the transaction request, and should be asserted until ``ack`` goes high.
Topology and transactions
=========================
The LASMI consists of one or several memory controllers (e.g. LASMIcon from milkymist-ng), multiple masters, and crossbar interconnect.

* Hub-to-Master
Each memory controller can expose several bank machines to the crossbar. This way, requests to different SDRAM banks can be processed in parallel.

* ``tag_issue`` is an integer representing the transaction ("tag") attributed by the hub. The width of this signal is determined by the maximum number of in-flight transactions that the hub port can handle.
* ``ack`` is asserted when ``tag_issue`` is valid and the transaction has been registered by the hub. A hub may assert ``ack`` even when ``stb`` is low, which means it is ready to accept any new transaction and will do as soon as ``stb`` goes high.
Transactions on LASMI work as follows:

The data signals are used to complete requests.
1. The master presents a valid address and write enable signals, and asserts its strobe signal.
2. The crossbar decodes the bank address and, in a multi-controller configuration, the controller address and connects the master to the appropriate bank machine.
3. The bank machine acknowledges the request from the master. The master can immediately issue a new request to the same bank machine, without waiting for data.
4. The bank machine sends data acknowledgements to the master, in the same order as it issued requests. After receiving a data acknowldegement, the master must either:

* Hub-to-Master
* present valid data after a fixed number of cycles (for writes). Masters must hold their data lines at 0 at all other times so that they can be simply ORed for each controller to produce the final SDRAM write data.
* sample the data bus after a fixed number of cycles (for reads).

* ``tag_call`` is used to identify the transaction for which the data is "called". It takes the tag value that has been previously attributed by the hub to that transaction during the issue phase.
* ``call`` qualifies ``tag_call``.
* ``data_r`` returns data from the DRAM in the case of a read transaction. It is valid for one cycle after CALL has been asserted and ``tag_call`` has identified the transaction. The value of this signal is undefined for the cycle after a write transaction data have been called.
5. In a multi-controller configuration, the crossbar multiplexes write and data signals to route data to and from the appropriate controller.

* Master-to-Hub
When there are queued requests (i.e. more request acknowledgements than data acknowledgements), the bank machine asserts its ``lock`` signal which freezes the crossbar connection between the master and the bank machine. This simplifies two problems:

* ``data_w`` must supply data to the controller from the appropriate write transaction, on the cycle after they have been called using ``call`` and ``tag_call``.
* ``data_wm`` are the byte-granular write data masks. They are used in combination with ``data_w`` to identify the bytes that should be modified in the memory. The ``data_wm`` bit should be low for its corresponding ``data_w`` byte to be written.
#. Determining to which master a data acknowledgement from a bank machine should be sent.
#. Having to deal with a master queuing requests into multiple different bank machines which may collectively complete them in a different order than the master issued them.

In order to avoid duplicating the tag matching and tracking logic, the master-to-hub data signals must be driven low when they are not in use, so that they can be simply ORed together inside the memory controller. This way, only masters have to track (their own) transactions for arbitrating the data lines.
For each master, transactions are completed in-order by the memory system. Reordering may only occur between masters, e.g. a master issuing a request that hits a page may have it completed sooner than a master requesting earlier a precharge/activate cycle of another bank.

Tags represent in-flight transactions. The hub can reissue a tag as soon as the cycle when it appears on ``tag_call``.
It is suggested that memory controllers use an interface to a PHY compatible with DFI [dfi]_. The DFI clock can be the same as the LASMIbus clock, with optional serialization and deserialization taking place across the PHY, as specified in the DFI standard.

SDRAM burst length and clock ratios
===================================
A system using ASMI must set the SDRAM burst length B, the ASMIbus word width W and the ratio between the ASMIbus clock frequency Fa and the SDRAM I/O frequency Fi so that all data transfers last for exactly one ASMIbus cycle.
A system using LASMI must set the SDRAM burst length B, the LASMIbus word width W and the ratio between the LASMIbus clock frequency Fa and the SDRAM I/O frequency Fi so that all data transfers last for exactly one LASMIbus cycle.

More explicitly, these relations must be verified:

Expand All @@ -163,7 +145,3 @@ B = Fi/Fa
W = B*[number of SDRAM I/O pins]

For DDR memories, the I/O frequency is twice the logic frequency.

Using ASMI with Migen
=====================
TODO: please document me!

0 comments on commit 0cef983

Please sign in to comment.