f-cpu/vhdl/registers/README.txt
created Mon Jun 24 00:31:19 CEST 2002 by whygee@f-cpu.org

The Register Set, also called "R7" for short
(because 7 is pronounced "set" in french...),
is where the pipeline starts and ends, so it is
critical for building the "execution core".


Some basics :

The register set is a synchronous SRAM (seems
that asynchronous uses much less power but is
not easily synthesised because of timing constraints
and the unique clock domain) with 3 data read ports
and 2 data write ports. There are 64 registers
with a number of bits specified in MAXSIZE, and
the 0th register is hardwired to zero.

However it is a bit more complex than that
because it has to maintain a certain number
of flags that are read by the instruction
decoder (the famous "condition" flags).
Though the "big part" of this "unit" is the
main SRAM array, the complex part is where
the flags are updated and handed to the decoder.

So in fact, because coherency has to be
maintained, it is performed inside a "black box",
namely this "unit".


Characteristics :

The register set unit has one cycle of read latency
between the time the address is applied to the
ports until the data is available on the read port.

Similarly, R7 has to wait for one cycle when writing,
before a new read cycle can be started again.
This means that one cycle elapses between the time
the address and data are provided, and the time the
data is actually stable in the cells.

To avoid coherency, scheduling or timing problems,
the Xbar implements a specific, 2-level bypass
network. To this end, the inputs and outputs of the
R7 are NOT latched (the only memory is in the arrays).

The F-CPU data format has also been recently changed :
when not running in SIMD mode, the MSBs are cleared
(either by the EUs or Xbar). This greatly simplifies
the R7's design because it doesn't have to keep
partial OR results. Another consequence of the
"no partial writes" policy is that the write enable
flags are not necessary anymore (this changes the
interface a bit) and the SRAM blocks can be composed
and arranged more freely (no 8-8-15-15-15 structure
anymore). The scheduler is also affected (simplified).

A side effect is that the write_enable signal
is considered as always ON, and we use the fact
that when the register number is 0, nothing happens.
This is a bit weird but it further simplifies the
rest of the scheduler.


Hierarchy :

The toplevel is register_set.vhdl, which includes
 - FCPU_config (definition of the user parameters)
 - sram3r2w    (the generic main SRAM array)
     --> entity in sram3r2w.vhdl and
         architecture in sram3r2w_simple.vhdl
 - sram2r1w    (another generic SRAM array
              but smaller, for the flag cache)


PROBLEM : the sign bit is connected to what bit ???
Currently it is on R7_read_port_0(MAX_CHUNK_SIZE-1)
but i doubt it is a wise decision !
