# Speeding up queries 1000x by sorting my bitmaps

**Monday, January 23, 2023**

I'm working on a database system that stores and queries chess games and positions.
Right now, it contains 240 million unique positions^{1} from 3.8 million games.
One of the things it needs to do is quickly find all the games where a particular position occurs.
I'd also like it to do things like find games where this position occurs and it ends in a draw.

Bitmaps are really useful here, and with some care they can achieve unbelievable efficiency. They can also be really slow if you're not careful. It's a journey.

We'll start by looking at how my bitmaps are implemented, and then we'll see how an assumption punished me severely and how I fixed it to make things a lot faster.

# Bitmap encodings

A bitmap is a sequence of bits which indicate a boolean condition across a range of items. You can use them to store a true/false condition for any collection which you can assign sequential integer IDs to.

For example, if you have 8 cats, you could store for each one whether it is cute or not. The simple approach is to store a list of booleans, each which indicates this condition. This is wasteful: Each of those booleans takes at least one byte of memory, but you're only really using one bit from that byte! You would allocate 64 bits to use 8.

Instead, you could store their cuteness in a bitmap: each cat would have a 1 if it is cute and a 0 if it is not^{2}.
This would use just 8 bits total, or 1 byte^{3}.

Here are two hypothetical bitmaps containing eight values:

These bitmaps are stored in a dense form. Each bit physically exists! This makes a lot of things easy, and it's a nice mental model for bitmaps. But it's not very space efficient, since it makes absolutely no attempt at compressing the data down. If we're storing mostly the same value, like almost all 1s or almost all 0s, then we can store this a lot more efficiently!

There's a common encoding call run-length encoding. Instead of storing each item, you instead store the item and how many times it is repeated. This is really useful for times when you have the same value multiple times!

Here's one way we could store these same bitmaps with run-length encoding:

The first bitmap was `11000000`

as a dense bitmap, and is now represented as two pairs: `(1,2)`

saying we start with two 1s, and `(0,6)`

, saying we then have six 0s.
The second bitmap is represented the same way. It's `10011100`

, which is represented as `(1,1)`

for one 1, `(0,2)`

for two 0s, `(1,3)`

for three 1s, and `(0,2)`

for the final two 0s.

This gives us a nice reduction in space for large bitmaps with lots of long runs^{4}!
But we can still do better.

One trick we can use is to observe that we don't need to store the value, since it alternates.
This lets us save some space, but ends up complicating bitwise operations.
Instead, we can store pairs of `(position, count)`

for just 1s, and let all 0s be implicit.

This lets us remove half the space needed!
The first example says that there are two 1s at position 0; all the other bits are 0.
The second example says that there is one 1 at position 0, three 1s at position 3, and the rest are 0.
It ends up exactly equivalent to the earlier ones, with the added benefit of bitwise operations being **much easier to implement**.

# Logical operations

Bitmaps are usually used to say that something is true for given elements in a collection. By itself, this is fine. They let you get quick counts for how many things are true, or quickly find elements where the thing it true. But they become much more powerful when you use logical operations on them.

The usual bitwise logical operators (bitwise-and, -or, and -not) are useful here! With these, you can take bitmaps and combine them for more complex conditions, and then get the specific items or a count of matching items.

# (Chess) database usage of bitmaps

In my particular case, I have a collection of chess games and the corresponding positions that occur in those games. We assume that we can give an id to each game, and we want to know for any given position, how many games it has occurred in. Further, we want to know how many of those games were wins, losses, or draws for a given color. This is an ideal problem for bitmaps.

First, we form what's essentially **a sparse matrix of the data**.
The columns correspond to games, and the rows correspond to positions.
For a position, you look at its row to find all the games which it occurred in.
For a game, you look at the column to see all the positions which occurred during the game.
**We store one bitmap per position**.

They're drawn here as dense bitmaps, but they'd actually be sparse bitmaps; it's just easier to draw the dense form!

We also have **a bitmap for each possible game result**: white wins, black wins, draw, or other (game was aborted, unknown outcome, etc.).
For these, we have one column for each game (just like a position bitmap), which indicates if the game had that particular result.
So if white won game 3, then that bitmap would have a 1 in position 3, and all the other bitmaps would have a 0 there.

Using these, we can compute nice things like how many games from a given position end in each result: you take the position's bitmap and then do a logical-and with each of the position results, and count the number of bits set.

# Speeding up the bitmaps

I implemented this with sparse bitmaps, and it worked!

...slowly.

**A pageload took about 300ms.**
This was doing the counts for about 15 moves, each with 4 bitmap operations, so each bitmap operation was taking about 5 ms.
This seemed much slower than it needed to be, so I poked around at the data.

The position bitmaps seemed reasonable, but fairly large.
Each one had hundreds of runs in the sparse bitmap, but we can deal with that.
The problem was the game results bitmaps:
The white/black in bitmaps had **half a million runs** each.
Churning through those was super expensive, at least for my hand-rolled possibly-naive bitmap implementation.

Making major improvements turned out to be fairly easy once a key insight was found.
**The problem is because of excessive runs**, so if we can reduce the number of runs, we'll speed things up.

The games database was initially sorted in no particular order, just whatever order I pulled them out of the games archive (roughly chronological, but not exactly), because I didn't think the order would matter initially. What if we sort that data?

I decided to sort by a key composed of both the game result (with an arbitrary ordering) and the first `n`

moves of the game (initially 5 ply, but I'll extend it further).
The particular ordering isn't what matters so much as grouping like elements together in the bitmaps.

Before sorting, the positions index was 17 MB on the disk and contained about 500k runs bitmaps in the bitmaps for win/loss (not sure what draws looked like).
After sorting? This bitmap is now **96 bytes on the disk**. Since we put all the results where white wins together, we have **one run**, which it takes just a few 64-bit ints to represent. Same for all the others, so each bitmap ends up as 24 bytes, and we have 4 of them.

That's a **177,000x space savings**, which... damn.

The position bitmaps benefited from compression as well, but not as significantly:
The **positions index went from 7.8GB to 7.6GB, saving 200 MB**.
There was less gain to be had since most positions are unique and occur in only one game, so there were often no runs to collapse.
Most of the benefit came from compressing moves in the early opening.

We were concerned mostly with speed, not space.
Fortunately, the speed also improved dramatically.
Before, **page loads were 300ms and now they're about 300 microseconds**.
A 1000x improvement in computation time!
Once you get beyond the first 5 moves that we sorted by, things drop to only a 100x improvement, as times rise to ~3ms until you get back into more unique positions (thus why I'm going to do a deeper sort later).

# Takeaways

I originally thought the order of my database didn't matter and that there wasn't a natural ordering for it. That was wrong! The ordering matters tremendously.

The correct ordering isn't obvious, because it depends on what you want to do with the data! In general, for bitmaps, you'll get a lot of benefit if you can sort the data in a way that reduces the number of runs in your bitmaps.

The sorting was also pretty computationally expensive and slow. This is pushing me into parallelizing the data loading sooner than I would have otherwise prioritized it. There's a clear trade-off here between time spent processing the data for the index vs. time spent using the index in a query, and it's cool to be able to turn that knob to speed up queries!

^{1}

I want to increase this significantly in the future. Right now, I only have master-level games played on Lichess from 2013 to 2020. I want to expand this to include both over-the-board play and to include games from all rating levels, sampled at different rates.

^{2}

Since all cats are cute, this example is silly. It would be all 1s.

^{3}

Not including the overhead of the bitmap itself, of course. We have some bookkeeping, but it is negligible as the bitmap grows in size.

^{4}

On the other hand, if your runs are all *very short*, it will increase the space needed for your bitmap and will make it perform a lot worse.
There are other encoding techniques which will perform better here, such as roaring bitmaps.