Measuring the overhead of HashMaps in Rust

Tuesday, November 22, 2022

While working on a project where I was putting a lot of data into a HashMap, I started to notice my hashmaps were taking up a lot of RAM. I mean, a lot of RAM. I did a back of the napkin calculation for what the minimum memory usage should be, and I was getting more than twice what I expected in resident memory.

I'm aware that HashMaps trade off space for time. By using more space, we're able to make inserts and retrievals much more efficient. But how much space do they trade off for that time?

I didn't have an answer to that question, so I decided to measure and find out. If you just want to know the answer, skip to the last section; you'll know you're there when you see charts. Also, all the supporting code and data is available if you want to do your own analysis.

Allocators in Rust

Rust takes care of a lot of memory management for you. In most cases, you don't need to think about the allocation behavior: Things are created when you ask for them, and they're dropped when you stop using them. The times when you do have to think about it, the borrow checker will usually make that clear to you.

Sometimes, though, you get into situations where memory allocation behavior matters a lot more for your system. This can be the case if you're very memory constrained (as I was) or if you are trying to avoid the cost of memory allocation. In these situations, Rust lets you define your own allocator with the behavior you want!

The System allocator is the default allocator used by Rust programs if you don't do anything special. It uses the default allocator provided by your operating system, so it's using malloc under the hood on Linux systems.

Another one I've seen referenced a lot is tikv-jemallocator, which provides a different malloc implementation with some characteristics like avoiding fragmentation. It comes from FreeBSD. I didn't explore using this one other than idly trying it in my main project, where it didn't make any discernible difference in memory overhead1.

There are some other fun allocators, too, and you can do some really neat things with them. Here are two that I thought were neat:

  • bumpalo has a cute name and is a bump allocator that can allocate super quickly, but generally cannot deallocate individual objects; niche in use
  • wee-alloc is also cutely (and descriptively!) named and is a "simple, correct implementation" of an allocator for WASM targets, so it generates small code for allocations

There are also a few allocators which help you measure overhead. But where's the fun in that??? Let's do it ourselves!

Writing an allocator to track allocations

It's tricky writing an allocator that does the useful work of allocation, and there's a lot of nuance. It's a lot easier to write an allocator that wraps around an existing one and records measurements! That's what we're doing.

The thing to know is that we need to implement the GlobalAlloc trait. It has two methods we have to define: alloc and dealloc. We will make something very simple which just wraps System without doing anything at all besides passing through data to some record functions.

We start with a struct.

/// TrackingAllocator records the sum of how many bytes are allocated
/// and deallocated for later analysis.
struct TrackingAllocator;

Note that our struct doesn't have any fields. We can't put anything dynamic in it. We'll need some atomic ints and such to track allocations. Since we don't expect to have multiple of these at once, we'll just put those as statics in the module scope. We could put the fields we want in the struct, but it makes constructing it a little more annoying and we won't have multiple allocators at once, so we're just going to make those statics.

static ALLOC: AtomicUsize = AtomicUsize::new(0);
static DEALLOC: AtomicUsize = AtomicUsize::new(0);

And now we define alloc and dealloc so that TrackingAllocator is GlobalAlloc. Implementing GlobalAlloc requires marking things unsafe. What we're doing here isn't really unsafe, but we satisfy the interface. All we're doing is passing through to System and recording it with some helper functions we'll define later.

unsafe impl GlobalAlloc for TrackingAllocator {
    unsafe fn alloc(&self, layout: Layout) -> *mut u8 {
        let p = System.alloc(layout);
        record_alloc(layout);
        p
    }

    unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout) {
        record_dealloc(layout);
        System.dealloc(ptr, layout);
    }
}

Now we also have to define the helper methods to record allocations. They're as straightforward as can be, just doing a fetch_add to record the size of the allocated or deallocated memory into its corresponding counter.

pub fn record_alloc(layout: Layout) {
    ALLOC.fetch_add(layout.size(), SeqCst);
}

pub fn record_dealloc(layout: Layout) {
    DEALLOC.fetch_add(layout.size(), SeqCst);
}

Now the functionality for the allocator itself is basically in place, and we can move on to using it!

Using our allocator

There are two things we need to do to use our allocator: Set it up as the global allocator, and add a little bit of functionality to get useful data out.

Let's make it the global allocator first. This is the easy bit. Somewhere in your program (such as in main.rs), you create an instance and mark it as the global allocator:

#[global_allocator]
static ALLOC: TrackingAllocator = TrackingAllocator;

And now that's done. That's all you need to do to change the global allocator! You can see also why we made initialization as easy as possible.

Now to address the ergonomics of use. As it stands, every allocation and deallocation will get recorded. That's not quite what we want. We want to isolate certain pieces of the program to measure their allocation separately from test setup and teardown. We also want to record stats from multiple separate runs and report them nicely.

The first thing to do is define a struct for the stats we want to return. We want the total allocation and deallocation, and it would also be convenient to have their difference. This can be calculated later, but let's just include it in the struct for now.

pub struct Stats {
    pub alloc: usize,
    pub dealloc: usize,
    pub diff: isize,
}

And now we need some helper methods to reset the counters, and get our stats out.


pub fn reset() {
    ALLOC.store(0, SeqCst);
    DEALLOC.store(0, SeqCst);
}


pub fn stats() -> Stats {
    let alloc: usize = ALLOC.load(SeqCst);
    let dealloc: usize = DEALLOC.load(SeqCst);
    let diff = (alloc as isize) - (dealloc as isize);

    Stats {
        alloc,
        dealloc,
        diff,
    }
}

And we have the pieces we need to use this nicely! We can call reset() to clear the values before an experiment, and call stats() to get them afterwards.

Putting together the pieces

Let's put together the pieces now and measure the overhead of HashMaps! As a bonus, we'll also measure the overhead of BTreeMaps.

First let's define a helper function that lets us measure and report on the allocations from a test scenario. This function should take in another function, which will return some data (this is important so that the data isn't dropped until after the measurement is complete, or the diff will be inaccurately low). The job of this function is to reset the allocator, run the function, report the stats, then drop the data.

pub fn run_and_track<T>(name: &str, size: usize, f: impl FnOnce() -> T) {
    alloc::reset();

    let t = f();

    let Stats {
        alloc,
        dealloc,
        diff,
    } = alloc::stats();
    println!("{name},{size},{alloc},{dealloc},{diff}");

    drop(t);
}

For simplicity we're just printing the results to stdout, and the CSV header will be defined elsewhere.

Now let's define our scenarios. For this, we'll first assume that we have constructed some data:

let pairs = generate_keys_values(1_000_000);

There's a helper function that fills a Vec with as many key/value pairs as we want. Each is a pair of a random u64 (key) and a 100-byte random u8 blob (value). The particular data here shouldn't matter too much, but I picked something of about 100 bytes to match the domain I originally saw this in.

We'll also have a list of sizes for the tests; later, we can just assume we have a usize called size which we can use. You can see the full details in the complete listing.

Now let's define the baseline. The baseline here is two Vecs, one of the keys and one of the values, constructed with exactly the capacity we need and no more.

run_and_track("vec-pair", size, || {
    let mut k: Vec<u64> = Vec::with_capacity(size);
    let mut v: Vec<DummyData> = Vec::with_capacity(size);

    for (key, val) in &pairs[..size] {
        k.push(*key);
        v.push(*val);
    }

    (k, v)
});

And now we can also define our BTree and HashMap scenarios.

run_and_track("hashmap", size, || {
    let mut m = HashMap::<u64, DummyData>::new();

    for (key, val) in &pairs[..size] {
        m.insert(*key, *val);
    }

    m
});

run_and_track("btreemap", size, || {
    let mut m = BTreeMap::<u64, DummyData>::new();

    for (key, val) in &pairs[..size] {
        m.insert(*key, *val);
    }

    m
});

When we run these (with some additional glue code), we'll get a CSV as output which we can then load into a spreadsheet and analyze.

The results (I brought charts)

The results surprised me, because I (naively, perhaps) expected the HashMap to maintain fairly constant, fairly low overhead. I was aware that hashmaps in general have a "load factor", but I didn't fully understand how it was utilized. It is used to define when the HashMap will resize to contain more elements. If your load factor is 1, then it will reallocate when the map is full. I think the load factor for Rust's HashMap is something like 7/8. This means that when it has 12.5% capacity remaining, it will reallocate (and probably double, so that the amortized cost of reallocating is O(1)).

If we do some analysis, we can reach a better estimate than my naive unthinking estimate that it would have 12.5% overhead. In fact, it's much higher than that. If the HashMap doubles its capacity when it hits 12.5% remaining (14% overhead), then after doubling it will have 56% free capacity, and the overhead of the extra space is about 125% of the used space. On average, we expect the overhead to be somewhere between those, perhaps around 70%.

How does this compare to what we see in this test?

First we can see the growth behavior of both containers against the baseline:

Chart of growth in memory usage of HashMaps and BTreeMaps against a baseline

Here we can see that BTreeMaps grow smoothly linearly with the size of the data, while HashMaps are growing stepwise. Additionally, it looks like HashMaps are almost always using more memory than BTreeMaps.

We can see the trends more clearly if instead of the direct memory usage, we plot the overhead: as a ratio, how much additional memory is it using compared to the baseline? For the baseline, the answer is 0. From our analysis, we expect the hashmap to average about 0.7.

Chart of the overhead ratio of HashMaps and BTreeMaps against a baseline

And here we see the behavior more clearly. BTreeMaps do indeed have fairly consistent overhead. On the other hand, HashMaps' overhead swings wildly. It goes up over 1.25 (about what we hypothesized), and drops as low as about 0.125 (also what we hypothesized).

And if we average it? 0.73. The hypothesis was bang on.

So in general, you can expect to allocate nearly twice as much memory as your elements alone if you put them in a Rust HashMap, and about 50% extra memory if you put them in a BTreeMap.

Hashmaps make a clear space-for-time tradeoff, and it's easier to make that tradeoff effectively if you know how much of each you're trading off! Measuring the time tradeoff is left as an exercise for the reader 😉.


1

I tried this when someone suggested my high resident memory might be from fragmentation, since jemalloc is better at avoiding fragments. This was before I realized the extent of the overhead of HashMaps, but it did lead me down this allocator journey.


If you have comments, questions, or feedback, please email my public inbox. To get new posts, please use my RSS feed.