[personal profile] hartmans

For the past four years, I’ve been working on Carthage, a free-software Infrastructure as Code framework. We’ve finally reached a point where it makes sense to talk about Carthage and what it can do. This is the first in a series of blog posts to introduce Carthage, discuss what it can do and show how it works.

Why Another IAC Framework

It seems everywhere you look, there are products designed to support the IAC pattern. On the simple side, you could check a Containerfile into Git. Products like Terraform and Vagrant allow you to template cloud infrastructure and VMs. There are more commercial offerings than I can keep up with.

We were disappointed by what was out there when we started Carthage. Other products have improved, but for many of our applications we’re happy with what Carthage can build. The biggest challenge we ran into is that products wanted us to specify things at the wrong level. For some of our cyber training work we wanted to say things like “We want 3 blue teams, each with a couple defended networks, a red team, and some neutral infrastructure for red to exploit.” Yet the tools we were trying to use wanted to lay things out at the individual machine/container level. We found ourselves contemplating writing a program to generate input for some other IAC tool.

Things were worse for our internal testing. Sometimes we’d be shipping hardware to a customer. But sometimes we’d be virtualizing that build out in a lab. Sometimes we’d be doing a mixture. So we wanted to completely separate the descriptions of machines, networks, and software from any of the information about whether that was realized on hardware, VMs, containers, or a mixture.

Dimensional Breakdown

In discussing Carthage with Enrico Zini, he pointed me at Cognitive Dimensions of notation as a way to think about how Carthage approaches the IAC problem. I’m more interested in the idea of breaking down a design along the idea of dimensions that allow examining the design space than I am particular adherence to Green’s original dimensions.

Low Viscosity, High Abstraction Reuse

One of the guiding principles is that we want to be able to reuse different components at different scales and in different environments. These include being able to do things like:

  • Define an operation like “Update a Debian system” and apply that in several environments including as part of building a base VM or container image, applying to an independently managed machine, or applying to a micro service container that does not run services like ssh or systemd.

  • Defining a role like DNS server that can be applied to a dedicated machine only having that role, to a traditional server with multiple roles, or in a micro service environment.

  • Allowing people to write groups of functionality that can be useful in descriptions of a small number of machines, but can also be reused in large environments like modeling of cyber infrastructure to defend. In the small environments, things are simplified, but in larger environments integration like directories, authentication infrastructure and the like is needed.

  • Allow grouping of functionality at multiple levels. So far I have talked about grouping of software to be installed on a single machine or container. We also want to allow groups of containers (pods or otherwise), groups of machines, groups of networks, or even enclaves (think a model of an entire company or section of a company). Each kind of grouping needs to be parametric and reusable.

Hidden Dependencies

To accomplish these abstraction goals, dependencies need to be non-local. For example, a software role might need to integrate with a directory if a directory is present in the environment. When writing the role, no one is going to know which directory to use, nor whether a directory is present. Taking that as an explicit input into the role is error-prone when the role is combined into large abstract units (bigger roles or collections of machines). Instead it is better to have a non-local dependency, and to find the directory if it is available. We accomplish this using dependency injection.

In addition to being non-local, dependencies are sometimes hidden. It is very easy to overwhelm our cognitive capacity with even a fairly simple IAC description. An effective notation allows us to focus on the parts that matter when working with a particular part of the description. I’ve found hiding dependencies, especially indirect dependencies, to be essential in building complex descriptions.

Obviously, tools are required for examining these dependencies as part of debugging.

First Class Modeling

Clearly one of the goals of IAC descriptions is to actually build and manage infrastructure. It turns out that there are all sorts of things you want to do with the description well before you instantiate the infrastructure. You might want to query the description to build network diagrams, understand interdependencies, or even build inventory/bill of materials. We often find ourselves building Ansible inventory, switch configurations, DNS zones, and all sorts of configuration artifacts. These artifacts may be installed into infrastructure that is instantiated by the description, but they may be consumed in other ways. Allowing the artifacts to be consumed externally means that you can avoid pre-commitment and focus on whatever part of the description you originally want to work on. You may use an existing network at first. Later the IAC description may replace that, or perhaps it never will.

As a result, Carthage separates modeling from instantiation. The model can generally be built and queried without needing to interact with clouds, VMs, or containers. We’ve actually found it useful to build Carthage layouts that cannot ever be fully instantiated, for example because they never specify details like whether a model should be instantiated on a container or VM, or what kind of technology will realize a modeled network. This allows developing roles before the machines that use them or focusing on how machines will interact and how the network will be laid out before the details of installing on specific hardware.

The modeling separation is by far the difference I value most between Carthage and other systems.

A Tool for Experts.

In Neal Stephenson’s essay “In the Beginning… Was the Command Line”, Stephenson points out that the kind of tools experts need are not the same tools that beginners need. The illustration of why a beginner might not be satisfied with a Hole Hog drill caught my attention. Carthage is a tool for experts. Despite what cloud providers will tell you, IAC is not easy. Doubly so when you start making reusable components. Trying to hide that or focus on making things easy to get started can make it harder for experts to efficiently solve the problems they are facing. When we have faced trade offs between making Carthage easy to pick up and making it powerful for expert users, we have chosen to support the experts.

That said, Carthage today is harder to pick up than it needs to be. It’s a relatively new project with few external users as of this time. Our documentation and examples need improvement, just like every project at this level of maturity. Similarly, as the set of things people try to do expand, we will doubtless run into bugs that our current test cases don’t cover. So Carthage absolutely will get easier to learn and use than it is today.

Also, we’ve already had success building beginner-focused applications on top of Carthage. For our cyber training, we built web applications on top of Carthage that made rebuilding and exploring infrastructure easy. We’ve had success using relatively understood tools like Ansible as integration and customization points for Carthage layouts. But in all these cases, when the core layout had significant reusable components and significant complexity in the networking, only an IAC expert was going to be able to maintain and develop that layout.

What Carthage can do.

Carthage has a number of capabilities today. One of Carthage’s strengths is its extensible design. Abstract interfaces make it easy to add new virtualization platforms, cloud services, and support for various ways of managing real hardware. This approach has been validated by incrementally adding support for virtualization architectures and cloud services. As development has progressed, adding new integrations continues to get faster because we are able to reuse existing infrastructure.

Today, Carthage can model:

  • Machines
  • Networks
  • Dynamically compose groupings of the above
  • Generate model level artifacts
    • Ansible inventory
    • Various DNS integrations
    • Various switch configurations

Carthage has excellent facilities for dealing with images on which VMs and Containers can be based, although it does have a bit of a Debian/Ubuntu bias in how it thinks about images:

  • Building base images from a tool like debootstrap
  • Customizing these images
  • Converting into VM images for kvm, VMware, and AWS
  • Building from scratch OCI images for Podman, Docker and k8s
  • Adding layers to existing OCI images

When instantiating infrastructure, Carthage can work with:

  • systemd nspawn containers
  • Podman (Docker would be easy)
  • Libvirt
  • VMware
  • With the AWS plugin, EC2 VMs and networking

We have also looked at Oracle Cloud and I believe Openstack, although that code is not merged.

Future posts will talk about core Carthage concepts and how to use Carthage to build infrastructure.

Profile

Sam Hartman

October 2025

S M T W T F S
   1234
567891011
12131415161718
192021222324 25
262728293031 

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated Jan. 2nd, 2026 07:24 pm
Powered by Dreamwidth Studios