RC Week 6: Halfway done, wrote a parser!
Saturday, October 29, 2022
I'm halfway done with my RC batch now. Time feels like it has sped up. The feeling that my time at RC is infinite is gone. This was compounded by seeing folks from the Fall 1 batch conclude their batches yesterday. We'll get a new boost from the Winter 1 batch joining on Monday, which I'm really pumped for! New people, new excitement, new energy!
I'm happy with how things have gone so far in the batch. I don't think I want to do anything dramatically different in the second half of the batch, except be a little bit more focused on one project instead of splitting between two.
I did have a less social week this week than most weeks, because I have some personal life stress right now (should wrap up next week) and it made it hard to focus, and I withdrew a little bit. Despite that, I still had a pretty social week! Something for me to take away here is that RC has shifted my understanding of where I get energy from and how much I do benefit from social things.
This week, I...
- had 5 pairing sessions
- had 5 coffee chats
- went to the Rust and theorem prover groups
Next week I'm going to have more coffee chats probably, since the new folks are joining. But I'm going to remember to be kind to myself, and to be realistic. There are some factors outside of my control (taking a family member to a couple of appointments, plus two 8-hour drives to/from there) which will limit how much I can do next week.
But! I did get through some code this week, and I really want to share what I did.
Wrote my first parser!
I learned to use the nom, a parser combinator library. I wrote a PGN parser (and improved it and sped it up, with the help of two other Recursers, yay pairing and community!) which worked pretty well. It can parse a 5.6 million game file in about 30 seconds on my laptop. In comparison, pgn-reader, which is reputed to be reasonably quick, takes 60 seconds for the same dataset on my same hardware.
Ultimately, though, I did trade in my own parser to use
Performance of the parser itself isn't what I want to spend my time on.
Actually, I don't want to spend my time on the parser at all right now, since I want to get into the database portion of my chess database.
My own parser was failing on 0.1% of the games, and there were enough edge cases that it was going to be a significant time sink to fix them all.
You can see the source for the custom parser at commit 8713d3ae if you're curious. It'll be around if I ever decide that no, I do want it to go ridiculously fast thankyouverymuch.
While I'm not using the parser I wrote, this was a very useful exercise. I'm not afraid of parsers anymore! They're a lot more approachable than I thought before. In fact, I'm going to have to write another one for this project. I'll have a query language of some sort (TBD, but I have a batchmate who is very into programming languages, compilers, and parsers, and I hope they'll help me design it!). That will, naturally, require a parser. That'll be a lot of fun to tackle, and I won't have any way to back out of it!
Started designing and implementing the DB (IsabellaDB)
Beyond pulling data in from the PGN file, there's a lot of work to do to actually make a chess database in the full sense of the word "database". People will often colloquially refer to a big PGN file as a database, but I'm referring to the software portion that allows loading that, querying on it, and doing analysis on it.
My initial design doc is available if anyone wants to look at it. I'm also working on reading through a paper on a columnar database, which matches how I was thinking about storing and indexing the data. There will be a lot of fun challenges with getting things to be searchable in a nice, efficient manner.
What's planned next for IsabellaDB?
The data storage and retrieval side of things is a little fuzzy for me until I get in the weeds, just some details are out of focus. But I think the thing that's the biggest unknown really comes down to product type things:
- What queries will users want to do? (This is needed to choose what things to index on!)
- How should users interact with it? What should the query language be like?
Because these unknowns are product-y, I'm focusing right now on getting something usable that I can start playing with.
Next week, I want to get the position index up (so I can, given a position, find all other games that contained that position) and build a UI that exposes searching positions by clicking through an opening tree. That'll be enough for me to start thinking about how I want to use it. I'm also going to continue pondering design: I have some ideas on how to pull out fields to a columnar store, but I'm less clear on how query planning and optimization will work in any format, so that's on my docket to learn about!
Next week I'll have a pretty full schedule:
- Lighter pairing load, because of personal life obligations
- Coffee chats at least once a day! It'll be exciting with the new batch coming in.
- Write a couple of blog posts. I have two that are in the queue that I have some research done for, so I need to buckle down and write them. I'll stagger their releases.
- Work on IsabellaDB (position index and frontend)
- Finish reading and summarizing my current Red Book paper
It's going to be a full week! I'm excited to welcome in the new batch, and keep in contact with all the folks from Fall 1.
See you next week!
If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts and support my work, subscribe to the newsletter. There is also an RSS feed.
Want to become a better programmer? Join the Recurse Center!