A different kind of C#: be nice to the GC

C# is a garbage collected language. Each time the ‘new’ keyword is used to create an instance of a reference type, the GC (garbage collector) begins tracking it. The programmer doesn’t have to do anything to free the memory; instead, at some point after the memory is guaranteed to no longer be in use, the GC takes care of it.

This tends to improve productivity by eliminating manual memory management as a concern. There’s no easy way to leak memory if you stick to GC tracked references and never use any unmanaged resources. Allocation also tends to be very cheap thanks to the GC’s bookkeeping.

But there is a dark side: you will pay the price for all those allocations when the GC runs. Worse, your control over when this process occurs is limited. Your game might drop a frame (or five) at an inopportune moment.

To minimize this issue, performance-focused C# applications have to be kind to the GC. Ideally, that means not making any garbage to collect, or failing that, making collection very cheap.

The most common form of GC kindness is to reuse references to instances rather than simply instantiating new ones on demand.

This has a couple of problems:

  1. Pooling instances works against generational garbage collection. Gen0 collections are pretty cheap. Higher generations are more expensive, and allocations are only promoted to the higher generations if they stay alive. The entire point of pools is to keep allocations alive, so if those pooled instances are eventually collected, it will be more expensive.

  2. The GC has to track all the living references somehow. Conceptually, this is a graph traversal where the nodes are the tracked allocations and edges are references between those allocations. The more nodes and edges there are in the graph, the more work the GC has to do. Pooling directly increases the size of this graph. Even if the pooled instances are never collected until the application is torn down, any other garbage collection will suffer from the increased heap complexity.

Using GC-tracked instance pools is, in other words, a commitment to never causing a GC at all, or being okay with a potentially long hitch when a GC does eventually occur.

That might be okay for an application in some cases where you have full control over memory allocation, but a library would ideally not assume anything about the user’s memory management strategy.

For the same reason, it would be unwise for a library to trigger a bunch of collections under the assumption of a simple heap.

We could try to salvage instance pooling by exerting control over the GC. More recent versions of .NET allow you to suppress collections during critical times. That can be handy- for example, in HFT, you would probably want to defer GC until after a trade is fully processed. In games, you might be able to defer many collections into the idle time between frames.

But the work is still being done. Those are CPU cycles you could use for something else.

What’s left?

If we want to eliminate the GC as a concern, we can just… not use the GC. Plenty of languages don’t even have a concept of a GC; this isn’t a new idea. How does this look in C#, a language which was built on the assumption of a quick GC?

Value types (like int, float, other primitives, and any struct) differ from reference types (any class) in that value types are not directly tracked by the GC. That is, whatever contains a value type instance is responsible for the memory of the instance. From the GC’s perspective, an array of N floats is a single tracked allocation, not N different allocations.

The only time a GC needs to concern itself with a value type instance is if the value type contains a GC-tracked reference. The GC can basically skip over “pure” value types that contain no references. These are sometimes called ‘unmanaged’ or ‘blittable’ types. (There’s some complexity here regarding generics, but that’s more of a language thing than a GC thing.)

The fact that the GC can ignore unmanaged value types means you can stick such types anywhere you want, including in unmanaged memory:

struct Unmanaged
{
  public float Q;
  public float P;
}
unsafe static void ItsOkayGCJustTakeANap(int elementCount)
{
  var pointer = (Unmanaged*)Marshal.AllocHGlobal(elementCount * sizeof(Unmanaged));
  for (int i = 0; i < elementCount; ++i)
  {
    pointer[i] = new Unmanaged { Q = 3, P = i };
  }
  Marshal.FreeHGlobal(new System.IntPtr(pointer));
}

You are free to write basically-C in C# if you really want to. Hooray?

The BEPUutilities library used by BEPUphysics v2 implements its own memory management. It’s not built on AllocHGlobal at the moment- it instead allocates big chunks out of the Large Object Heap and then suballocates from them- but that’s a fairly minor implementation detail.

The BEPUutilities BufferPool class is the source of most memory in the library. It’s a simple power-of-2 bucket buffer allocator that provides fast allocation and deallocation at the cost of some wasted space. If you scan through the source, you’ll likely run across a variety of Buffer{T} instances; all of those come from the BufferPool.

Using it isn’t too ugly:

pool.Take<int>(128, out var buffer);
for (int i = 0; i < 128; ++i)
{
  buffer[i] = i;
}
pool.Return(ref buffer);

C# doesn’t offer much to help here. Value types don’t have scope sensitive destructors (or any destructors at all- only reference types have finalizers in C#), so you have to clean up after yourself.

The closest language feature is using statements, but they aren’t the prettiest feature. The picture might improve soon.

In practice, manual memory management is not much of a problem. Pulling from the pools is actually pretty rare because almost all allocations serve big batches of work, not per-instance function calls.

The use of per-thread buffer pools for ephemeral allocations could be simplified by the use of region allocators. Rather than needing to dispose of individual allocations, the whole block can be tossed out at once when the work block is complete.

Stack allocation using the stackalloc keyword can also be used for temporary allocations with reasonable sizes. Using the stack for huge allocations isn’t a great idea, but there are plenty of cases where it fits. Just be careful about localsinit: while the C# spec doesn’t require that the stackalloc memory be zeroed, it almost always will be unless you strip the locals initialization flag with a post build step or suppress it with an upcoming attribute. This isn’t a correctness problem, but zeroing kilobytes of memory on every function call can be a noticeable performance cost.

Any blog post about memory management in C# would be incomplete without mentioning Span{T} and friends. It provides a clean abstraction over any kind of memory.

You might notice that BEPUphysics doesn’t use Span{T} at all. That’s partially because v2 started development before Span{T} was ready, and partially because almost all the memory is known to be unmanaged anyway. If you’re a user of the library, you might still find spans to be very helpful when interoperating with systems that weren’t built on the assumption of unmanaged memory.

Notably, v2 still has some reference types floating around, but they are quite rare and their number does not scale with the size of a simulation. They’re things like the Simulation object itself or its stages. There are also arrays of TypeProcessors that use abstract classes to provide indirection for batch processing, but those could just be raw function pointers if they existed. (And they might later!)

Summary

You, too, can experience the joy of manual memory management. Immerse yourself in memory leaks and access violations. Come, don’t be afraid; build your own custom allocators, because you can.

Or don’t do that, and stick to something like Span{T} instead. But at least consider giving your poor GC a break and try using nice big buffers of value types instead of a billion object instances. Mix it together with a heaping scoop of batch processing and you’ve got the recipe for efficiency with a surprising amount of simplicity, even if it looks a little bit like code from 1995 sometimes.