[ << Programming work ] | [Top][Contents] | [ Release work >> ] |
[ < Conversion ] | [ Up : Programming work ] | [ LilyPond miscellany > ] |
10.18 Garbage collection for dummies
Note: Reading this section is strongly recommended before attempting complex C++ programming.
Within LilyPond, interaction with Guile is ubiquitous. LilyPond is written in C++ and Guile Scheme. Even in C++, most of the code uses Guile APIs to interface with the outside Scheme world, both with user and internal Scheme code.
Scheme is a garbage-collected language. This means that once in a while, a so-called garbage collector scans the memory for values that are no longer being used, and reclaims them. This process ensures that the memory is given back to the computer and made available for other uses. The garbage collector implementation used in Guile 2 and later is the Boehm-Demers-Weiser garbage collector (BDWGC).
C++, on the other hand, usually frees values at determined points of time (although most of the time they remain implicit, through the use of the famous “RAII” or “scope-bound resource management” technique). It has no direct support for garbage collection. This can make memory management of Scheme values in C++ a challenge (or a headache). Whenever you are using a value whose memory is managed by Guile, you must keep an eye on its lifetime.
To be more precise, the garbage collector works in a mark phase and a sweep phase. During marking, the collector scans values that the program is currently using, then asks these values for containing references to other values, and continues following references until all reachable objects have been found. Objects that are unreachable can logically no longer be used in the program, so they are freed in the sweep phase.
In Schemeland, the interpreter takes care of marking values for you. For instance, if you store a list in a variable, then during garbage collection, this list is automatically marked, and this causes all elements of the list to be marked in turn, which ensures they remain alive. In Cppland, you need to be very careful to keep values allocated on Guile’s heap as visible to the garbage collector if they cannot be reached from the Scheme side.
Understanding which values are under garbage-collected management
To begin with, which values are allocated on the Guile heap? The
basic Guile API type is the SCM type, which represents a value
boxed for usage in Scheme. The SCM type is pointer-sized piece of
data. It is either a pointer to Scheme data structures
(e.g. pair, double pair, etc.) – in this case, the pointer is
64-bit aligned and has its lower bits set to 0 –, or it is an
immediate value (short integer, boolean, '()
, etc.) – in
which case the lower order bits are non-zero. Smobs, vectors,
strings and many other Scheme data structures are represented as
pairs, where the car holds a tag value (non-aligned, lower order
bits set) and the cdr holds the pointer to data. From the scheme
side, the fact that these types are represented using pairs is
invisible.
Thus, for immediate SCM values, all the value is contained in the SCM itself. There is no concept of freeing these values, as they are never heap-allocated: they just keep being copied around, and dropped by normal C++ lifetime mechanisms when done (such as dropping local variables of a function when it returns). On the other hand, all other values point to memory allocated on the Guile heap. It is the lifetime of this memory that you need to care about.
LilyPond adds its own object types to Guile as well. They as called “smobs”, which depending on sources means “Scheme objects” or “small objects”. Smobs come in two flavors:
“Simple smobs” are objects that can be passed around by copy
without changing the meaning. Their classes derive from
Simple_smob
. Pitch
and Duration
are good
examples. The usual way to create them is just like a normal C++
object (e.g., Smob_type variable (constructor
parameters);
). When created in this way, simple smobs are
allocated on the stack like any other C++ automatic variable, and
dropped in the same way too. When you need to send a simple smob
to Schemeland, you should call the member function
smobbed_copy ()
. This calls the smob’s copy constructor to
make a copy under garbage collection control, packed in an SCM
value.
“Complex smobs” are objects with an identity, such as
Music
, Context
and Grob
. Their classes
derive from Smob
. They are always created via the C++
new
operator. After allocating, their memory is put under
the control of the garbage collector. A complex smob has a field
containing its SCM identity, which points back to itself. You can
access this field using the member function self_scm ()
.
The function to convert a SCM value back into the C++ smob type is
unsmob<Smob_type *> (value)
(which returns a null pointer
if the SCM was not a value of the smob type in question). Because
of the dual nature of simple smobs, you need to be mindful that if
Smob_type
derives from Simple_smob
, the memory
referred to by the result of unsmob<Smob_type>
(if
non-null) may either be on the stack or on the Guile heap, even
though most of the time it will be on the Guile heap. On the
other hand, for a complex smob, it will systematically be on the
Guile heap.
How values are protected
When the garbage collector starts a collection, it first scans all memory being used by the program at the current point of time. This is called the root set. For Scheme, it includes all global variables of all modules and local variables of the function being executed. C++ adds everything that is on the stack and in registers (FIXME: investigate global variables). The dependencies of these values are then marked, etc.
Marking roots
The marking of the C++ function stack is very simple: scan the stack and treat every value as a possible pointer. This principle is called “conservative garbage collection”, and has a few consequences. One is that there may be some false positives, if random values on the stack happen to look the same as pointers to memory in the Guile heap. These values will be held longer than necessary, which is harmless.
Another, much more nasty consequence is that values are only kept alive while they have an SCM presence on the stack. Here is an example of what not to do:
Complex_smob_type * func () { Complex_smob_type *object = new Complex_smob_type (); object->unprotect (); return object; }
When the caller of this function receives the object pointer,
there is no reason for the object’s SCM identity (what would be
returned by its self_scm ()
method) to be present on the
stack or in registers. Only the pointer to the C++ object is.
This does not work to protect the object from garbage collection.
The object could be freed if a GC pass occurs. The fix is to
unprotect later if possible, at a point where the object’s
self_scm ()
is placed in a long-lived reachable Scheme data
structure. Alternatively, if this is impractical, return an SCM
to keep the object protected. The unprotect ()
method
actually returns the SCM for convenience.
SCM func () { Complex_smob_type *object = new Complex_smob_type (); return object->unprotect (); }
A different, even nastier trap can be illustrated with this example:
LY_DEFINE (ly_func, "ly:func", 1, 0, 0, (SCM param), R"( Doc )") { Smob_type *object = unsmob<Smob_type> (param); // do some stuff here, including scm_cons (a, b) // ... return to_scm (object->some_field_); }
At first glance, this looks fine. The SCM value param
should remain on the stack until the end of the function, keeping
the smob protected. This is not always true, however. If the
compiler does a clever optimization, it might reuse the memory of
the param
variable for something else. If this happens,
the object is unprotected while the memory of the cons cell is
being allocated, which could cause the smob to be collected. The
access object->some_field_
is then use-after-free.
The solution to this is to use scm_remember_upto_here
,
which allows to forcefully keep the object alive:
LY_DEFINE (ly_func, "ly:func", 1, 0, 0, (SCM param), R"( Doc )") { Smob_type *object = unsmob<Smob_type> (param); // do some stuff here, including scm_cons (a, b) // ... SCM field = to_scm (object->some_field_); scm_remember_upto_here (param); return field; }
GC marking for smobs
Guile automatically marks the elements contained in compound
values of the types it provides, like lists and vectors.
LilyPond’s smobs must do the same in order to keep elements they
refer to alive while they are themselves alive. This is done by
implementing the member function SCM mark_smob () const
.
This function must call scm_gc_mark
on every Scheme value
that needs to be kept alive with the object. It can return an SCM
value, which is marked in the same way. (This dates back to Guile
1, which used the C++ function stack to mark objects. It was
necessary to keep the stack depth constant when marking objects
such as lists, or stack overflows would have easily ensued. It is
no longer very relevant in Guile 2.)
For many smob types, mark_smob
needs to add marking to the
implementation of the superclass. This is usually done using a
derived_mark
method. This is the case for translators, for
example. The child class should thus just implement
derived_mark
and not override mark_smob
.
For simple smobs allocated as automatic variables, i.e., outside
of Guile’s control, mark_smob
is not called during
garbage collection. In this case, the only marking that the
object receives is conservative scanning of the stack. This has
the strong implication that a simple smob must contain all SCM
values it refers to in its memory image on the stack. Anything
that needs more complex marking behavior should be a complex smob.
For example, it’s not OK for a simple smob to contain an
std::vector<SCM>
. On the other hand, that would be OK for
a complex smob as long as its mark_smob
function iterates
over the vector to mark each element. The simplest solution is
storing a Guile vector, of SCM type, which is OK even in
simple smobs because the memory image on the stack is an SCM
vector value, which during marking causes the marking of all
vector elements, unlike an std::vector<SCM>
.
Initial protection for complex smobs
When you create a complex smob, it receives an initial GC
protection, which should be removed with its unprotect ()
method once the complex smob enters an area where it is protected
by other means.
There is no such protection for a smobbed_copy ()
of a
simple smob because those tend to be more short-lived and are
often just returned to Scheme after being created.
TODO: expand on smob constructors, especially the need for Preinit classes. See lily/include/smobs.hh.
TODO: explain the quirks of finalization (non-)ordering.
See commit 6555b3841a
.
[ << Programming work ] | [Top][Contents] | [ Release work >> ] |
[ < Conversion ] | [ Up : Programming work ] | [ LilyPond miscellany > ] |