March 2021 – Wild Wacky Adventures in Senior Software Engineering

In all my work with infrastructure tools, particularly configuration management software, I have things I’ve liked about each but there are consistent shortcomings. Chef has the advantages of utilizing a full programming language (ruby), but has a complex infrastructure and a high upfront time cost to build out a viable system. If you dig into the source code of any of these platforms you’ll likely find all the custom resources, objects, and functions, are predominantly, at their core, wrappers around bash scripts with fancy regex to capture output, and it’s often incomplete. This is all buried in a series of case statements to determine the OS family (Linux, mac, windows) and distro or version, since there are variations to how those system tools function or where certain values are stored. Ansible is significantly more lightweight and straightforward but functionality, programmatic logic is provided through ridiculous yaml values and ill defined jinja2 templating substitutions, and variables are established through 10+ layers of inheritance that are unintuitive and don’t hash well (object notation is overwritten, so to change one value, one is required to redefine the default object in it’s entirety). As someone who is pretty comfortable with a Linux shell, it’s very frustrating to be digging through source code to figure out which arguments to pass or values to set to extrapolate into the simple bash command I need.

These tools are valuable but I don’t see them to be as an end point. The bright side of these struggles as a user, is that I envision how I would design a similar tool. My ideal tool would be lightweight, only complicating infrastructure as necessary for required features like concurrent runs. It would be forceful, if system files are modified manually, they would be reset. Any configuration or packages required needs to be in the configuration management. It would be customizable, all resources should be easily inherited and overwritten, I should have to rewrite an entire tool just to change one value or the way one function works. It would include a local development tool via containers or virtual machines. It’s infuriating to have to push changes into a production or dev environment to make sure they behave as expected. Speaking of, why are there never tools for testing pipeline tools locally? Maybe that’s a rant for another blog post.