f-cpu/vhdl/xbar/README.txt
created Thu Jun 20 07:02:44 CEST 2002
by whygee@f-cpu.org

The "Xbar" entity connects all the execution units to the
register set and is the place where we can do bypasses.
It takes a whole cycle because it uses a fair amount of
wires (it contains the 3 read ports and the 2 write ports).

However, contrarily to what its name says, Xbar
is not exactly a "crossbar", rather a large and distributed
multiplexor. The original idea was to allocate some die room
to put these wires but real ASIC implementations will certainly
route the Xbar on the upper metal layers. Here's why.


Some layout basics :

The Execution Units are likely to have a very long H/L aspect
ratio. While computing a single bit can seem pretty long (L),
this must be done in parallel with at least 64 bits and this
can make very long wires to go from bit 0 to bit 63.

(one bit strip : FF FF FF ############## FF FF
 one pipeline stage :
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
 F#F
..... 
)

If you include the pipeline latches, the height/length ratio
is somewhere around 10. It seems logical to build the pipeline
with a series of these slices, one slice per EU. The slices
communicate with each others and with the register set with
wires routed over the EUs. This keeps the wires as short
as possible and spares some die surface compared to the
initial approach (that you can still see on the schematic
views).

A further enhancement factors the pipeline gates of two
neighbouring EUs. The consequence is that the "input" of
the EU can not be on the same "side" as the "output".
The slices/EUs are arranged side by side, each second
unit being mirored to share the FF of the two neighbours.


The layout looks like this :

Register set #<Xbar/bypass<#>rop2>#<INC<#>ASU>#<SHL<#>POPC>#<BIST<#>IDIV

and the multiplier (which is too fat to fit in a slice)
is put in parallel to this layout.
