Making Rust builds fail from YAML config mistakes

Monday, October 14, 2024

I was talking to a friend recently, and zie1 lamented that a Rust web framework uses YAML for its configuration. I'm far from one to defend YAML2, but dug in a little to understand zir issues with it: is it the trauma here, or is it something else? Ultimately, zie wanted something that I also seek in Rust: compile time errors over runtime errors.

Checking it with a test

My first thought was to use a test to check the configuration. This winds up pretty straightforward.

use loco_rs::{config::Config, environment::Environment};

#[test]
fn can_load_development_config() {
    let config = Config::new(&Environment::Development);
    assert!(config.is_ok());
}

We try to load the config from its default location (./config/development.yaml), then we check that it did actually load successfully!

This is a partial solution. It detects major errors, like malformed YAML files or missing required options. But it misses the subtle mistakes that can saddle you with a misconfiguration, like misspelling binding as bindimg. Misspelled optional configs are one of the things that can plague a debugging session. You think you've made a change, but you haven't, and it's often hard to notice a misspelling.

Can we solve this, too?

You bet we can. Kind of.

The naive idea, which doesn't work well, is to deserialize it into an any-valued type, then round-trip the loaded config as well into one of those, and see if there's anything extra! This could work, but you can't do it without a lot of extra effort, since you can't use direct equality. The one you deserialize, serialize, deserialize again, will have some fields that were added when you serialized it since they were loaded as default values.

Instead, we can use serde_ignored to detect fields which are ignored during deserialization. We can adapt the example from the crate's README and wind up with this test. Instead of using the built-in loader we have to read the file in from the disk ourselves and render it (the config file is templated), then deserialize it with our nice serde_ignored wrapper.

#[test]
fn no_extra_fields_in_development_config() {
    let filename = "./config/development.yaml";

    let raw_content = std::fs::read_to_string(filename).unwrap();
    let context = Context::new();
    let rendered_content = Tera::one_off(&raw_content, &context, false).unwrap();

    let deserializer = serde_yaml::Deserializer::from_str(rendered_content.as_str());

    let mut unused_fields = HashSet::new();

    let _config: Config = serde_ignored::deserialize(deserializer, |path| {
        unused_fields.insert(path.to_string());
    })
    .unwrap();

    assert!(
        unused_fields.is_empty(),
        "got unexpected fields: {:?}",
        unused_fields
    );
}

And then when we run it, we get what we were looking for.

running 2 tests
test config::tests::can_load_development_config ... ok
test config::tests::no_extra_fields_in_development_config ... FAILED

failures:

---- config::tests::no_extra_fields_in_development_config stdout ----
thread 'config::tests::no_extra_fields_in_development_config' panicked at src/config.rs:31:9:
got unexpected fields: {"server.bindimg"}
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

It tells us that we have an unused field and exactly which one it is. Now we can fix our typo and go on our way!

This is how I would do it for a real project. It leaves the config separate from the build so that it can compile even if YAML is messed up, while still giving you guardrails for catching mistakes. But that's not where the fun ends, because zie specifically wanted a compile-time error. Well, Anya, you're in luck. I gotchu.

Failing the build because YAML

Rust lets you hook into the build system by writing a build.rs file. It runs before your crate compiles, so you can't really access what's in there. Usually this is used to compile parts that are written in other languages, or for doing code generation.

We can certainly abuse it for this, though!

Let's check if the YAML is malformed first, then worry about detecting unused fields as well. First, we'll add a few dependencies in the build-dependencies section of our Cargo.toml file. I added loco-rs, serde, serde_yaml, serde_ignored, and tera. We'll only need loco and serde to start, but we'll use the others eventually as well.

Even though these exist already as dependencies (if we're using Loco), we have to add them as build dependencies so that they're pulled in early. This is not a small decision, because it impacts build times, requiring them to be compiled before starting the rest of your build!

After adding those dependencies, we can write a simple script. We'll just load the config and, if it fails, we print an error and exit with an error code.

use std::process::exit;

use loco_rs::{config::Config, environment::Environment};

fn main() {
    println!("cargo::rerun-if-changed=config/development.yaml");

    let config = Config::new(&Environment::Development);
    if let Err(err) = config {
        println!("Error while loading the config: {}", err);
        exit(1);
    }
}

Now if we run this with malformed configs, we get an error. Neat!

> cargo build
   Compiling premove v0.1.0 (/home/nicole/Code/premove-chess)
error: failed to run custom build command for `premove v0.1.0 (/home/nicole/Code/premove-chess)`

Caused by:
  process didn't exit successfully: `/home/nicole/Code/premove-chess/target/debug/build/premove-f804d605420bf9b9/build-script-build` (exit status: 1)
  --- stdout
  cargo::rerun-if-changed=config/development.yaml
  Error while loading the config: cannot parse `config/development.yaml`: could not find expected ':' at line 168 column 3, while scanning a simple key at line 167 column 1

We have the same problem as before, though, of only getting some errors. Let's copy that over. But this time, we'll be a little nicer, and we'll call unused fields a warning instead of an error3.

This looks basically like our test did except that, instead of storing the fields in a set to check for emptiness, we print out a warning each time we hit one. These are printed in a particular format so that we can tell Cargo they're warnings to pass along.

use loco_rs::config::Config;
use tera::{Context, Tera};

fn main() {
    println!("cargo::rerun-if-changed=config/development.yaml");

    let filename = "./config/development.yaml";

    let raw_content = std::fs::read_to_string(filename).unwrap();
    let context = Context::new();
    let rendered_content = Tera::one_off(&raw_content, &context, false).unwrap();

    let deserializer = serde_yaml::Deserializer::from_str(rendered_content.as_str());

    let _config: Config = serde_ignored::deserialize(deserializer, |path| {
        println!("cargo::warning=Unused field in {}: {}", filename, path.to_string());
    })
    .unwrap();
}

Now we get this nice little warning if we have an unused field!

> cargo build
   Compiling premove v0.1.0 (/home/nicole/Code/premove-chess)
warning: premove@0.1.0: Unused field in ./config/development.yaml: server.bindimg
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 6.47s

Don't do this in the build, probably

This is probably a bad idea and you shouldn't do it. The testing approach is much better.

First off, people ignore test failures much less than they ignore warnings. But if you made unused fields fail the compile step, then you wouldn't even be able to run any tests at all, which seems like the wrong trade-off to me (since the code itself isn't wrong).

Then you have the overhead of the build. If you do this in build.rs, you end up bringing quite a few dependencies into the build-dependencies section. This seems like a bad idea since you're adding a lot of overhead to the upfront stage of compilation, and you also risk these drifting out of sync with the rest of your code. Cargo will reuse them if it can, but it can't do that if you end up on different versions in build.rs and elsewhere.

But perhaps most important is that this is incredibly opaque. If you shove important checks into build.rs, people won't find them as much. Tests are something we should all be familiar with and using; far fewer of us spelunk into our build systems. By putting an important check in there, you're hiding it from most people on the project.

But do think about putting it in your tests. It's a nice way to shorten some frustrating debugging sessions.


1

Zie uses zie/zir/zirs pronouns and has a handy pronunciation guide on zir homepage.

2

I've been traumatized by the piles of YAML that constitute Kubernetes and Helm configurations.

3

All warnings should be treated as errors in CI, but it's nice to be able to still, you know, compile things locally while developing even if you dare leave an unused variable for a moment. Yes, looking at you, Go.


If this post was enjoyable or useful for you, please share it! If you have comments, questions, or feedback, you can email my personal email. To get new posts and support my work, subscribe to the newsletter. There is also an RSS feed.

Want to become a better programmer? Join the Recurse Center!
Want to hire great programmers? Hire via Recurse Center!