Paper review: C-store
Friday, November 4, 2022
It's that time again: I read another paper, and here's what I took away from it! This week I read "C-store: a column-oriented DBMS" from chapter 4 of the Red Book. This one I picked since I thought it would be helpful for the chess database I'm working on, and it does seem applicable!
This paper was pretty significant for making a strong case for the utility of columnar databases in read-heavy situations. It demonstrated an architecture for a column database which not only beats row-based databases of the time (in their workload) but also beat the proprietary column databases of the time as well. One of their key takeaways is that being columnar gives you:
- Very good compression that's not feasible with row stores
- A reduction in overhead of storing records
- The ability to have multiple sorted orders for a column for efficient querying
The overall architecture they presented seems straightforward and perhaps deceptively simple. They have three major components:
- WS, the writeable store
- RS, the readable store
- Tuple mover, to move written data from WS into RS in bulk periodically
Each of these components was described in brief detail. There was enough detail to get the gist, but not enough to go write an implementation myself. I think this is the nature of publishing: You have limited space to publish in, and also it would be nice to save some details to publish later. They also have a number of things which were planned but not implemented, so sparse detail may also be from simply not having answers.
Some of the things I was left wondering were:
- When does the tuple mover decide that data is suitable to merge?
- How does the tuple mover merge process work?
- What ratio of reads:writes does this architecture support? Beyond what point does the WS become a bottleneck?
- What workloads are favorable to the row stores over the column store?
- How are the projections chosen? (This last one is probably my biggest open question.)
This paper really gave me some inspiration for how to structure the database I'm working on. Hopefully I'll have a post up about that database's structure once it's working (always more to do, always performance traps to fall in before it's done!) and I'll be able to talk more about how its design was informed by C-store.
Next week's paper will be the DynamoDB paper, which I'm excited to read! Later!