next up previous contents
Next: 13.2 Automatic Garbage Collection Up: 13. Garbage Collection Previous: 13. Garbage Collection   Contents

13.1 Storage Allocation

Consider a typical storage organization of a program:

stack.gif

All dynamically allocated data are stored in the heap. These are the data created by malloc in C or new in C++. You can imagine the heap as a vector of bytes (characters) and end_of_heap a pointer to the first available byte in the heap:

char heap[heap_size];
int end_of_heap = 0;
A trivial implementation of malloc in C is:
void* malloc ( int size ) {
  void* loc = (void*) &heap[end_of_heap];
  end_of_heap = end_of_heap+size;
  return loc;
};
If you always create dynamic structures but never delete them, you will eventually run out of heap space. In that case, malloc will request the operating system for more heap (this code is not shown). This is very expensive, because it may require to move the stack data to higher memory locations. Alternatively, you can recycle the dynamically allocated data that you don't use. This is done using free in C or delete in C++. To do so, you need to link all deallocated data in the heap into a list:
typedef struct T { struct T *next; int size; } T;
T* free_list;
Initially, the free list contains only one element that covers the entire heap (ie, it's heap_size bytes long):
initialize_heap () {
  free_list = (T*) heap;
  free_list->next = null;
  free_list->size = heap_size;
};
(Pardon me for the terrible C code, but this can only be done by coercing bytes to pointers.) Note that each cell in the heap has a T component (ie, a next pointer and a size) followed by size-sizeof(T) bytes (so that it's size bytes long). malloc first searches the free list to find a cell large enough to fit the given number of bytes. If it finds one, it gets a bite out of the cell leaving the rest untouched:
void* malloc ( int size ) {
  T* prev = free_list;
  for(T* r=free_list; r!=null; prev=r, r=r->next)
     if (r->size >= size)
     {  T* new_r = (T*) (((char*) r)+size);
        new_r->next = r->next;
        new_r->size = r->size;
        if (prev==free_list)
           free_list = new_r;
        else prev->next = new_r;
        return (void*) r;
     };
  void* loc = (void*) &heap[end_of_heap];
  end_of_heap = end_of_heap+size;
  return loc;
};
free simply puts the recycled cell at the beginning of the free list:
free ( void* x ) {
  if ("size of *x" <= 8)
     return;
  ((T*) x)->next = free_list;
  ((T*) x)->size = "size of *x";
  free_list = (T*) x;
};
We can see that there is a lot of overhead in malloc since the free list may be very long. The most important problem though is fragmentation of the heap into tiny cells, so that even though the total free space in the free list is plenty, it is useless for large object allocation. The reason for fragmentation is that we unrestrictedly choose a cell from the free list to do allocation. It makes more sense to try to find a free cell that provides the requested number of bytes exactly. This can be done by maintaining a vector of free lists, so that the nth element of the vector is a free list that links all the free cells of size n.

The above malloc/free way of allocating dynamic cells is called manual memory allocation since the programmer is responsible for allocating and deleting objects explicitly. This is to be contrasted to automatic memory management where we leave the system manage the dynamic allocation automatically. Manual memory management is the source of the worst and hardest to find bugs. It's also the source of most of the mysterious core dumps, program crashes, etc. It takes the fun out of programming. Now of course all modern languages (Java, SML, Haskell, etc) use automatic garbage collection. Despite that, C and C++ are still popular and programmers spend most of their time trying to do memory allocation (and then after they remove most core dumps from their program, they port it to a different architecture and their program crashes!).

Why manual memory management is so hard? Basically, you need to have a global view of how dynamic instances of a type are created and passed around. This destroys the good software engineering principle that programs should be developed in small independent components. The problem is to keep track of pointer assignments. If more than one objects point to a dynamically allocated object, then the latter object should be deleted only if all objects that point to it do not need it anymore. So basically, you need to keep track of how many pointers are pointing to each object. This is called reference counting and used to be popular for OO languages like C++. We can't call this method automatic garbage collection because the programmer again is responsible of putting counters to every object to count references. This method is easy to implement for languages like C++ where you can redefine the assignment operator dest=source, when both dest and source are pointers to data. For each class C, we add a new attribute count that counts references to objects:

class C {
  int count;
  ...
  C () {
    count = 0;
  };
  ~C () {
    count--;
    if (count <= 0)
       free(this);
  };
  operator= ( C* dest, C* source ) {
    dest->count--;
    if (dest->count <= 0)
       free(dest);
    source->count++;
    dest = source;
  };
};

When an instance of class C is created, its counter is zero. When we delete the object, we just decrement the counter because there may be other objects pointing to it. But if the counter becomes zero, we delete it. The only way to move pointers around, is through an assignment from a C* object to a C* object. In that case, the object pointed by the source pointer will have one pointer more (the dest pointer), while the object pointed by the dest pointer will have one pointer less. Reference counting will avoid some misuses of the heap but it comes with a high cost: every assignment takes many cycles to be completed and some of the work may be unnecessary since you may pass a pointer around causing many unnecessary counter increments/decrements. In addition, we cannot get rid of cyclic objects (eg. when A points to B and B points to A) using reference counting. All objects that participate in a cyclic chain of references will always have their counters greater than zero, so they will never be deleted even if they are garbage.


next up previous contents
Next: 13.2 Automatic Garbage Collection Up: 13. Garbage Collection Previous: 13. Garbage Collection   Contents
Leonidas Fegaras
2000-12-27