f-cpu/ygasm/structure.txt
created dim mar 17 08:40:26 GMT 2002 by whygee@f-cpu.org

 @@@@@ Structure of the ygasm program @@@@@

YGASM manipulates several types of data and it certainly looks
messy to the newcomers. In fact, YGASM is the living branch of
several software generations which evolved and changed, adding,
changing or removing features from time to time.

Sometimes, full-featured allocation functions were necessary,
then only "alloc and exit" was needed, so it's not always
very coherent. Some structures use linked lists, with sometimes
different access patterns (backwards or forwards), sometimes
there's a binary tree (for the symbols) or even a static array
that is flushed when full (or before exit).

There is a main allocation pool that must be used for small
allocations, mainly because malloc() is not very reliable
and may introduce memory under-utilisation and leaks.
However not all structures use it :

 - #defines : the name is stored in the pool but the
associated string uses malloc() and realloc() because
the string is not known in advance. On top of that,
YGASM allows the user to re-define a symbol and we might
want to "erase" (free()) this buffer.

 - binary output : this is a linked list of blocks
that are malloc()ed. The size is not known in advance
but we can't use realloc() because the forward declarations
requires the program to "patch" the output when the
value is finally known. We use pointers to the location
that must be patched and realloc() reserves the right
to modify the block's base address, so realloc() can't
be used -> when the output grows in memory, using a linked
list helps keeping the pointers stable.


 * All other structures use a linked list (or a binary tree)
where the elements are stored in a "malloc pool" for small grained
allocations. There is also a linked list (should be a tree ?)
of free blocks so free'ed block can be reused.

# the symbols are organised in a binary tree, but each symbol
can have its own linked list of fixing references, in case
the symbol is used but its value not defined. When the symbol
is finally associated to a value, the linked list is scanned
and the elements are patched, then the list is freed, added
to the linked list of free blocks. Contiguous free blocks should
be merged.

# the #define'd symbols are stored in a linked list (it could
be transformed into a binary tree to speed up the search a bit).
There is no #undef yet, but if it is needed, both the descriptor
and the associated malloc()'ed block should be free()'ed.

# the files are stored in a stack but never erased (the stack
elements are allocated from the pool). The file names might
be necessary in the future to point the errors so when a
file is popped, it is not erased. Pushing a file will allocate
a new descriptor.

I wish all memory allocation routines and algorithms
can be merged and become uniform, one day. Some algorithms
come to my mind :
 - using a balanced binary tree (or better : a table
that is searched with dichotomy) for searching the free
blocks of the best suitable size (to reduce memory
fragmentation).
 - using several malloc domains, each one corresponding
to a particular block size.

However it is not my goal to write a whole new memory
allocation library, and this is not yet critical because
we don't need millions of symbols and lines of code.
But the idea is interesting.
