18.5 Living in harmony with the garbage collector

18.5.1. Simple interface
18.5.2. Low-level interface

Unused blocks in the heap are automatically reclaimed by the garbage collector. This requires some cooperation from C code that manipulates heap-allocated blocks.

18.5.1 Simple interface

All the macros described in this section are declared in the memory.h header file.

Rule 1 A function that has parameters or local variables of type value must begin with a call to one of the CAMLparam macros and return with CAMLreturn, CAMLreturn0, or CAMLreturnT.

There are six CAMLparam macros: CAMLparam0 to CAMLparam5, which take zero to five arguments respectively. If your function has fewer than 5 parameters of type value, use the corresponding macros with these parameters as arguments. If your function has more than 5 parameters of type value, use CAMLparam5 with five of these parameters, and use one or more calls to the CAMLxparam macros for the remaining parameters (CAMLxparam1 to CAMLxparam5).

The macros CAMLreturn, CAMLreturn0, and CAMLreturnT are used to replace the C keyword return. Every occurence of return x must be replaced by CAMLreturn (x) if x has type value, or CAMLreturnT (t, x) (where t is the type of x); every occurence of return without argument must be replaced by CAMLreturn0. If your C function is a procedure (i.e. if it returns void), you must insert CAMLreturn0 at the end (to replace C's implicit return). Note: some C compilers give bogus warnings about unused variables caml__dummy_xxx at each use of CAMLparam and CAMLlocal. You should ignore them.

Example:

void foo (value v1, value v2, value v3)
{
  CAMLparam3 (v1, v2, v3);
  ...
  CAMLreturn0;
}

Note: if your function is a primitive with more than 5 arguments for use with the byte-code runtime, its arguments are not values and must not be declared (they have types value * and int).

Rule 2 Local variables of type value must be declared with one of the CAMLlocal macros. Arrays of values are declared with CAMLlocalN. These macros must be used at the beginning of the function, not in a nested block.

The macros CAMLlocal1 to CAMLlocal5 declare and initialize one to five local variables of type value. The variable names are given as arguments to the macros. CAMLlocalN(x, n) declares and initializes a local variable of type value [n]. You can use several calls to these macros if you have more than 5 local variables.

Example:

value bar (value v1, value v2, value v3)
{
  CAMLparam3 (v1, v2, v3);
  CAMLlocal1 (result);
  result = caml_alloc (3, 0);
  ...
  CAMLreturn (result);
}

Rule 3 Assignments to the fields of structured blocks must be done with the Store_field macro (for normal blocks) or Store_double_field macro (for arrays and records of floating-point numbers). Other assignments must not use Store_field nor Store_double_field.

Store_field (b, n, v) stores the value v in the field number n of value b, which must be a block (i.e. Is_block(b) must be true).

Example:

value bar (value v1, value v2, value v3)
{
  CAMLparam3 (v1, v2, v3);
  CAMLlocal1 (result);
  result = caml_alloc (3, 0);
  Store_field (result, 0, v1);
  Store_field (result, 1, v2);
  Store_field (result, 2, v3);
  CAMLreturn (result);
}

Warning:

The first argument of Store_field and Store_double_field must be a variable declared by CAMLparam* or a parameter declared by CAMLlocal* to ensure that a garbage collection triggered by the evaluation of the other arguments will not invalidate the first argument after it is computed.

Rule 4 Global variables containing values must be registered with the garbage collector using the register_global_root function.

Registration of a global variable v is achieved by calling caml_register_global_root(&v) just before a valid value is stored in v for the first time.

A registered global variable v can be un-registered by calling caml_remove_global_root(&v).

Note: The CAML macros use identifiers (local variables, type identifiers, structure tags) that start with caml__. Do not use any identifier starting with caml__ in your programs.

18.5.2 Low-level interface

We now give the GC rules corresponding to the low-level allocation functions caml_alloc_small and caml_alloc_shr. You can ignore those rules if you stick to the simplified allocation function caml_alloc.

Rule 5 After a structured block (a block with tag less than No_scan_tag) is allocated with the low-level functions, all fields of this block must be filled with well-formed values before the next allocation operation. If the block has been allocated with caml_alloc_small, filling is performed by direct assignment to the fields of the block:

Field(v, n) = v_n;

If the block has been allocated with caml_alloc_shr, filling is performed through the caml_initialize function:

caml_initialize(&Field(v, n), v_n);

The next allocation can trigger a garbage collection. The garbage collector assumes that all structured blocks contain well-formed values. Newly created blocks contain random data, which generally do not represent well-formed values.

If you really need to allocate before the fields can receive their final value, first initialize with a constant value (e.g. Val_unit), then allocate, then modify the fields with the correct value (see rule 6).

Rule 6 Direct assignment to a field of a block, as in

Field(v, n) = w;

is safe only if v is a block newly allocated by caml_alloc_small; that is, if no allocation took place between the allocation of v and the assignment to the field. In all other cases, never assign directly. If the block has just been allocated by caml_alloc_shr, use caml_initialize to assign a value to a field for the first time:

caml_initialize(&Field(v, n), w);

Otherwise, you are updating a field that previously contained a well-formed value; then, call the caml_modify function:

caml_modify(&Field(v, n), w);

To illustrate the rules above, here is a C function that builds and returns a list containing the two integers given as parameters. First, we write it using the simplified allocation functions:

value alloc_list_int(int i1, int i2)
{
  CAMLparam0 ();
  CAMLlocal2 (result, r);

  r = caml_alloc(2, 0);                   /* Allocate a cons cell */
  Store_field(r, 0, Val_int(i2));         /* car = the integer i2 */
  Store_field(r, 1, Val_int(0));          /* cdr = the empty list [] */
  result = caml_alloc(2, 0);              /* Allocate the other cons cell */
  Store_field(result, 0, Val_int(i1));    /* car = the integer i1 */
  Store_field(result, 1, r);              /* cdr = the first cons cell */
  CAMLreturn (result);
}

Here, the registering of result is not strictly needed, because no allocation takes place after it gets its value, but it's easier and safer to simply register all the local variables that have type value.

Here is the same function written using the low-level allocation functions. We notice that the cons cells are small blocks and can be allocated with caml_alloc_small, and filled by direct assignments on their fields.

value alloc_list_int(int i1, int i2)
{
  CAMLparam0 ();
  CAMLlocal2 (result, r);

  r = caml_alloc_small(2, 0);                  /* Allocate a cons cell */
  Field(r, 0) = Val_int(i2);              /* car = the integer i2 */
  Field(r, 1) = Val_int(0);               /* cdr = the empty list [] */
  result = caml_alloc_small(2, 0);        /* Allocate the other cons cell */
  Field(result, 0) = Val_int(i1);         /* car = the integer i1 */
  Field(result, 1) = r;                   /* cdr = the first cons cell */
  CAMLreturn (result);
}

In the two examples above, the list is built bottom-up. Here is an alternate way, that proceeds top-down. It is less efficient, but illustrates the use of modify.

value alloc_list_int(int i1, int i2)
{
  CAMLparam0 ();
  CAMLlocal2 (tail, r);

  r = caml_alloc_small(2, 0);             /* Allocate a cons cell */
  Field(r, 0) = Val_int(i1);              /* car = the integer i1 */
  Field(r, 1) = Val_int(0);               /* A dummy value
  tail = caml_alloc_small(2, 0);          /* Allocate the other cons cell */
  Field(tail, 0) = Val_int(i2);           /* car = the integer i2 */
  Field(tail, 1) = Val_int(0);            /* cdr = the empty list [] */
  caml_modify(&Field(r, 1), tail);        /* cdr of the result = tail */
  CAMLreturn (r);
}

It would be incorrect to perform Field(r, 1) = tail directly, because the allocation of tail has taken place since r was allocated.