What Does an Allocator Do? Memory Management Explained

An allocator manages your program’s memory by requesting large blocks from the operating system, then parceling out smaller pieces to your program as needed. When your program asks for memory (by calling something like malloc in C), the allocator finds a suitable chunk, marks it as in use, and hands back a reference to it. When your program is done with that memory, the allocator reclaims it so it can be reused later. This behind-the-scenes work is what makes dynamic memory possible.

Why Programs Need an Allocator

Your program uses two main types of memory: stack and heap. Stack memory is automatic. When you create a local variable inside a function, the system allocates space for it, and when the function ends, that space is reclaimed without you doing anything. Stack memory is fast but rigid. You can’t control how long it sticks around, and you can’t resize it.

Heap memory is different. It’s allocated explicitly by the programmer and won’t be freed until you explicitly release it. This is where the allocator lives. Any time your program needs memory whose size isn’t known at compile time, or memory that needs to outlive the function that created it, the allocator handles that request from the heap.

How an Allocator Fills a Request

When your program requests, say, 27 bytes of memory, the allocator doesn’t go straight to the operating system every time. It maintains its own internal bookkeeping, most commonly a structure called a free list: a collection of previously used memory blocks that have been returned and are available for reuse. If a suitable block exists on the free list, the allocator removes it and hands it to your program. This is far faster than asking the OS for fresh memory.

If the free list is empty or has nothing large enough, the allocator requests a new chunk from the operating system. Historically, this happened through a system call called sbrk, which extends the program’s heap by moving a boundary in virtual memory. Modern allocators more commonly use mmap, which requests independent regions of virtual memory and allows the allocator to manage multiple separate memory regions instead of one continuous block.

When your program frees memory, the allocator determines the block’s size and starting address, then adds it back to the appropriate free list so it’s available for future requests.

Choosing Where to Put Things

One of the allocator’s core challenges is deciding which free block to hand out when multiple options exist. Three classic strategies handle this differently.

First fit scans the free list and returns the first block large enough to satisfy the request. It’s simple and fast, but it can leave awkward gaps in memory over time.
Best fit searches for the smallest free block that’s still large enough. This minimizes wasted space per allocation but takes longer to search and can create many tiny leftover fragments.
Buddy system works with block sizes that are always powers of two. When a block needs to be split, it’s divided exactly in half. When two neighboring “buddy” blocks are both free, they’re recombined into one block twice as big. This makes merging fast and predictable, at the cost of some internal waste since allocations get rounded up to the next power of two.

The Fragmentation Problem

Over time, repeated allocation and deallocation creates gaps in memory. This is fragmentation, and it comes in two forms.

External fragmentation means there’s enough total free memory to satisfy a request, but it’s scattered across small, non-contiguous blocks. Imagine needing 100 bytes when you have three free chunks of 40, 30, and 50 bytes. You technically have 120 bytes free, but no single block is large enough. The allocator can’t fulfill the request without asking the OS for more memory.

Internal fragmentation is the opposite problem. The allocated block is larger than what the program actually needs. If your program asks for 27 bytes and the allocator rounds up to 32 (a common practice for alignment and bookkeeping), those 5 extra bytes are wasted. They’re reserved but unused, and no other part of your program can touch them. Allocators also store their own metadata alongside each block to track its size and status, which adds a small overhead to every allocation.

How Modern Allocators Handle Threads

In programs running many threads simultaneously, a single shared free list becomes a bottleneck. Every thread that wants to allocate or free memory has to wait its turn, which slows everything down. Modern allocators solve this with thread-local caching.

The default allocator in most Linux systems (part of glibc) uses a structure called a tcache: each thread gets its own small cache of recently freed memory chunks. A single tcache holds 64 linked lists, each storing up to seven chunks of the same size. When a thread needs memory, it checks its own tcache first, which requires no coordination with other threads. Only when the tcache is empty does the thread fall back to a shared structure called an arena, where it needs to acquire a lock.

The system also uses specialized bins for different chunk sizes. Fast bins handle small chunks (under 160 bytes on 64-bit systems) that are requested and released frequently. An unsorted bin acts as a staging area where recently freed chunks land before being sorted into size-specific bins. The full lookup order is tcache, then fast bin, then unsorted bin, then small or large bin.

Specialized High-Performance Allocators

For applications that push the default allocator’s limits, drop-in replacements exist. Two of the most widely used are jemalloc and TCMalloc.

jemalloc was originally developed as the default allocator for FreeBSD, designed for low fragmentation and good scaling across many CPU cores. It’s now used by Facebook, Cassandra, and Android. In database benchmarks, switching from the standard glibc allocator to jemalloc cut query execution time in half. It handles “dirty” memory pages intelligently, reusing recently freed pages rather than immediately returning them to the OS, which keeps performance high under heavy allocation workloads. In tests with many concurrent queries, jemalloc showed the lowest latencies and smallest variance.

TCMalloc, developed by Google, takes a similar thread-caching approach. Each thread gets a local cache for small allocations (up to 256 KB), organized into size classes stored as linked lists. Freed memory is released aggressively to reduce fragmentation. However, benchmarks show TCMalloc’s performance drops on larger servers with many cores, where jemalloc and Intel’s TBBmalloc maintain better scaling.

Allocators and Memory Safety

Beyond performance, allocators play a growing role in catching memory bugs. Two of the most common errors are use-after-free (accessing memory that’s already been released) and double-free (releasing the same memory twice). Both can cause crashes or security vulnerabilities.

Traditional allocators don’t protect against these. If your program frees a block and then tries to read from it, the allocator won’t stop you. The data might still be there, or it might have been overwritten by a new allocation. The behavior is unpredictable.

Newer tools pair the allocator with the compiler to catch these problems. One approach, developed by researchers at Northwestern University, has the allocator track which pointers reference each memory block. When a block is freed, the allocator identifies every pointer still referencing that block and neutralizes them, making use-after-free access impossible. In testing against a standard suite of memory bugs, this approach successfully detected all double-free errors. The tradeoff is speed: safety-focused allocators add overhead that makes them more suited to testing and debugging than production use.