Current version

v1.10.4 (stable)

Navigation

Main page
Archived news
Downloads
Documentation
   Capture
   Compiling
   Processing
   Crashes
Features
Filters
Plugin SDK
Knowledge base
Contact info
 
Other projects
   Altirra

Archives

Blog Archive

On reference counting and garbage collection

I find it interesting that both Apple and Microsoft have opted to invest in compiler-assisted garbage collection through reference counting. It's called Automatic Garbage Collection (ARC) on Mac OS X / iOS and it's part of the Windows Runtime environment on the Microsoft side. This is despite reference counting being regarded as the weakest and poorest performing type of garbage collection.

Wikipedia has a much better writeup of reference counting than I could do, but in summary, you find a place to associate a reference count with each object, increment and decrement the reference count whenever references are created or destroyed, and destroy the object when the count reaches zero. This can either be done either intrusively (COM AddRef()/Release()) or non-intrusively (std::shared_ptr). Refcounting is thus a simple and effective mechanism, but the cost of maintaining the reference counts and the inability to handle cycles often make tracing garbage collection a preferable option for language runtimes.

The commonly cited problem with tracing GC is the delay imposed by the mark-and-sweep operation. There are lots of approaches used to mitigating this, such as generational and incremental collection, and for the most part I'd say they're effective on desktop apps, as we're long past the point where a Java program would visibly pause on screen while the GC cleaned house. What I don't see mentioned as often is the cost of getting this in place, namely:

These reasons why I don't like attempts to make tracing GC an optional language feature: the GC dependency tends to leak out into the rest of the program and gets its tendrils into everything. This is highly undesirable in C++, which is largely a pay-as-you-go language where you don't pay for a language feature if you don't use it. An imprecise, non-compacting conservative garbage collector is sometimes used to avoid imposing onerous requirements on the C++ compiler and runtime environment, but in my experience this is unsatisfactory: you still have to make sure the GC sees everything and you get memory leaks from the imprecise collection.

This brings us back to reference counting. The often touted advantage to reference counting is that it is deterministic, but I like even more that it has minimal requirements: adding a reference count and AddRef/Release operations to an object is all that is required. Simple and effective, can be used in just about any environment, and used only where needed. I also find that the performance cost of reference counting is overstated since data structures are often read more often than they are modified and there is usually no need to take a strong reference on temporary pointers during read access. Cycles are still a problem but can often be avoided by judicious use of untracked weak pointers, the most common case being parent pointers in a tree structure. Reference counting also doesn't require traversal of live but infrequently accessed objects, which is the main culprit in tracing GC performing badly when a virtual memory system is paging.

Despite this, using reference counting correctly and efficiently does require some manual effort, and I'm skeptical about trying to hide it by building it into the compiler as "automatic." It will definitely be easier and less error prone on average, but there will be cases where the compiler fails to optimize out refcount adjustments on critical paths, and the lower visibility will make it easier to fall into refcounting traps. Looking over the ARC and WinRT docs a bit, I had two thoughts:

That's not counting the performance overhead, which I haven't been able to assess not having used either implementation. As such, while I like refcounting, I have to think that compiler-aided refcounting is only suitable as a stop-gap until tracing GC can be further improved enough to be used universally.

Comments

This blog was originally open for comments when this entry was first posted, but was later closed and then removed due to spam and after a migration away from the original blog software. Unfortunately, it would have been a lot of work to reformat the comments to republish them. The author thanks everyone who posted comments and added to the discussion.