f-cpu/c/jaap.txt
notes about fcpusim by: Jaap Stolk (JWS) jwstolk@yahoo.com
version:
Sun Jul 21 04:12:04 CEST 2002 JWS: updated.

-notes are in no particular order.
-feel free to comment

status: %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 - most test_xxxx.c files are broken (but not needed for fcpusim)
 - very limited working of some units
 - add (and sunn,inc,dec) works
 - bypass shoeld work ( i haven't tryed yet)
 - a bit of a scheduler ( detects read register and write bus stalls)
 - results are not written back to the registers
 - ... 

compile the simulator:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 - rhere in no real need anymore to add my files to one of YG's
   snaphots. all needed inlude files are in the JWS snapshot.
 - run: runme.sh in the /f-cpu/c/fcpusim/  directory

 - if you need to modify the configuration files, 
   you will have to use the scripts from the YG snapshot

   i'm thinking of a better way of maintaining two snapshots

use the simulator:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 run: /f-cpu/c/fcpusim/fcpusim

 I uses a text based interface. after each cycle you can select one of the units
 to look at. press <enter> to simulate the next cycle. (the selected view is
 opdated automaticaly.)

 after every cycle you wille be asked by the fetcher unit to enter an (hex!)
 instruction for the given IP. i will read them from file soon.
 first the simulator clears the pipelines (just enter 0x0 for the instructions)
 its not using the BIST unit yet.

 enter the hex code for the example instruction.
 look at the fetcher unit (f), it outputs the new instruction
 look at the register unit (r), it inputs the three opcodes

 press enter (and enter 0x0 for the next instruction)
 look at the register unit (r), it outputs the register values
 the decoder has done its job in the same cycle
 look at the asu unit (a), it inputs the instruction (ADD) and flags, but no
 input data yet.
 look at the xbar unit (x), xbar_R0_nr=0, witch indicates a normal register read.

 ( look at the scheduler(s) )
 you can now couse a stall by reading a "pending output register" or use all
 avalable write ports on the xbar.

 press enter (and enter 0x0 for the next instruction)
 look at the asu unit (a), the ADD instruction has moved into its pipeline, and the
 data is avalable on the input.

 press enter (and enter 0x0 for the next instruction)
 look at the asu unit (a), the first stage of the ASU unit is now completed, but for
 this 64bit ADD it will tahe two cycles.

 press enter (and enter 0x0 for the next instruction)
 look at the asu unit (a), the second stage of the ASU unit is now completed, and the
 result (asu_out1=carry) apear on the output port of the asu unit.

 now after al that work, the result is forgotten, as there is no scheduler yet, to
 indicate when the results must be saved to the registers.....





 some ramblings about a simulator: 

I seem to have confused the D-latches wetween the pipeline stages
with flip-flops, sorry, i'll try and correct it later.

how it works:  %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

the simulator simulates each unit of the f-cpu. every unit has input and
output values. the simulator runs all units (in a random order) and then
connects all units by copying output values to input values. this
copying-stage simulates the Filp-Flops in the f-cpu. if a unit takes more
than one stage, it has internal Filp-Flops/values as well.

detecting / counting / visualising pipeline stalls would be handy for
optimising code. (it also needs to show the reason that caused the stall)
this information could even be used for automatic optimisation ( in a
compiler ??)

 
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

 f = fetcher
 i = inc
 2 = rop2
 a = asu
   = shl
 s = scheduler 
 p = popc
 b = bist
 v = div
 m = mul
   = sr's (i really want to use "s" for shuffle and "r" for registers)
 x = xbar
 r = registers ( R/W ports)
 d = data unit/mem
 c = code unit/mem
 t = tlb's (data and code?)

 e = decoder + scheduler ?

speed:
i might optimise the code a bit more, but the only reason will be to be able to
test run programs in less than an hour.
this simulator is not intended to be fast, its written to test/try different
configurations of the f-cpu, and find bottlenecks, optimal TLB size, etc.
it will be possible to add/remove optional EU's, an it should be possible to
run it with 128 / 256 bit registers.
olso testing different configurations of the X-bar would be interesting.
(shearing Xbar ports between different EU's) or even experiment with the
number of read/write busses of the Xbar ?

every simulated unit must be cross tested with the corresponding VHDL unit

it should be possible to run the simulator at maximum speed by not selecting
any particular status view. also run until next call/ret would be nice.
(or even put breakpoints on the use of register / memory / units / etc)

the copying stage that simulates the ff might be removed, if the units are
run in the correct order, but at some point (xbar?) they are still needed
to close the loop.

the actual f-cpu has no ff for the register unit, i change the simulation to
work the same way.

it would be nice to mix C and VHDL units, this could be done if the VHDL units
read/write there input/output ff from/to files that are read by the simulator.

a "save_state" function might be nice, if its saved in a compact but readable
text format, it could be e-mailed, and someone else could look at that
situation? ( -> copy past the terminal text would do the same job ?)

what programs i would like to run on this simulator:
- things like the Winograd DCT algorithm, and other critical routines
  i.e. things that today's (and tomorrows) programs spend most cycles on
- L4 (or other micro kernels)
- programs to test different IRQ I/O designs

i could add a history buffer (ff status for the last 100 cycles), so we
could trace backward and find the cause of a pipeline stall its.
(when walking back, the ff states are only changed, the units are not run)

the simulator needs to show the ASM (and C?) code that is executed, as well as
(parts of) memory.

would it be possible to add a <script> </script> tag to the c files and turn
it into an on-line simulator ?? (i.e. type a few binary instructions and see
them flow threw the f-cpu ?).

at some point the simulator needs to be connected to (simulated?) I/O
to start with: a serial port (for console).

simulator also needs to test the power up (random start) sequence!.
this should be done by the BIST unit.

also show a TLB's status screen.

