F-CPU Architecture: Register Organization
=========================================

Introduction
------------

One of the most important features defining a CPU architecture is the internalregister organization, and the F-CPU
architecture breaks with tradition here: we have chosen a memory-to-memory architecture (with some twists).

Among the advantages of this architecture, one of the most important ones is the fact that any patents relative to register-register
architectures are automatically avoided. Also, we are innovating, and innovation has a value by itself.

But the choice of a memory-to-memory architecture was also motivated by technical, performance reasons:

1) We wanted to avoid the usual cycle where variables are loaded from memory into registers, then processed, then returned to
memory. These memory-to-CPU-to-memory cycles have three disadvantages: a) theyincrease the latency for processing any variable,
b) they are useless, but increase the number of instructions needed for even the simplest assignment and c) the compiler has to
work harder at optimizing away these register load/unload cycles.

2) The presence of large, fast L1 caches inside CPUs and the constant improvements in VLSI technology allow more choices when it
comes to an efficient memory hierarchy, compared to the "traditional" memory hierarchy in Von Neumann machines:
main memory <---> caches <---> CPU registers. In fact, we skip an entire level, since all we have now is: main memory <---> caches.

3) A large register set means a longer context switch latency, because of the time needed to save the clobbered registers. By
avoiding the use of registers, the issue is entirely avoided.

4) The issue of register windows has been previously researched and the conclusion is that the benefits brought by this technique
are independent of either the CISC or RISC basic architectural choice. So we decided to include this "twist" in our memory-to-memory
architecture. In fact, it's very easy to implement a "memory window" feature on top of a memory-to-memory architecture.

5) Since now all data move instructions are basically memory-to-memory move instructions, the resulting instruction set is simpler
and more orthogonal. The general purpose registers are _truly_ general purpose. Greatly simplifies the user-visible machine.

6) We also have the advantages of low register pressure and graceful performance degradation in those special cases where many
variables are needed.

7) Simplifies the internal CPU architecture. We have the L0 data cache and theALU directly connected to the internal CPU bus. Also
simplifies control unit structure. We get the advantages of a large register set without spending any silicon real estate for this.

8) We can now tie our basic CPU clock frequency to how fast we can make the L0data cache operate. This simplifies the basic job of
the F-1 VLSI implementation team.

Register Organization Implementation
------------------------------------

The F-CPU architecture has the standard Program Counter or instruction pointer(PC) 64-bit register, and also a standard Status or
flags (ST) 32-bit register.

And it has a Memory Window (MW) 64-bit register, which contains the base address of the active memory window. This memory window
is a set of 32 8-byte blocks that can be accessed using short versions of the standard instructions. Data pointed to by a memory
window is usually already in the L0 data cache, providing zero-latency accesses. At any instant, there are 32 possible memory
windows that can be accessed (with an 8KB L0 data cache).

The F-CPU architecture also has many dedicated registers to control:

a) CPU Configuration

b) Memory Regions

c) FPU Control

d) Multiprocessing

e) Paging

f) Segmentation

g) Interrupt Processing

h) Coprocessor Control

i) Performance Monitoring

j) TimeStamp Counter Control

k) Reconfigurable Logic Control

Instruction Set
---------------

Addressing Modes
----------------

In a memory-to-memory architecture there is obviously no distinction between aregister-based addressing mode and a memory-based
addressing mode. An interesting feature is also that there is not much sense in including immediate addressing modes, since these
can just as well be thought of as PC-relative memory-to-memory operations.

Consequently, the addressing modes of the F-CPU architecture are few and simple:

- direct.
- indirect.
- PC-relative direct.
- PC-relative displacement.

Other addressing modes can be synthesized using parallelizable instruction sequences, hence with near zero-cost in terms of
performance.

Instruction Format
------------------

With a 64-bit instruction format, few addressing modes, external FPU and coprocessors and generally regular instruction set, the
F-CPU architecture has a very simple instruction format. All instructions are 64-bit long.

Two bits decode into the following four classes of instructions:
- Standard ALU and branch instructions.
- Control instructions.
- DMA instructions.
- FPU and coprocessor instructions.

Since the F-CPU instruction set is regular with respect to the size of data, all instructions can address either a byte (8 bits), a
word (16 bits), a double word (or double, meaning 32 bits) or a quad word (or quad, meaning 64 bits). Two bits are thus spent. There
are no alignment requirements.

Referencing any one of the memory addresses in the current active window takes5 bits per argument. In any of the other 31 windows
takes an extra 5 bits. Anywhere in memory takes obviously a full 64-bit address.




