This is the second in a series of blog posts introducing Carthage, an Infrastructure as Code framework I’ve been working on the last four years. In this post we’ll talk about how we use Carthage to build the Carthage container images. We absolutely could have just used a Containerfile to do this; in fact I recently removed a hybrid solution that produced an artifact and then used a Containerfile to turn it into an OCI image. The biggest reason we don’t use a Containerfile is that we want to be able to reuse the same infrastructure (installed software and configuration) across multiple environments. For example CarthageServerRole, a reusable Carthage component that install Carthage itself is used in several places:

  1. on raw hardware when we’re using Carthage to drive a hypervisor
  2. As part of image building pipelines to build AMIs for Amazon Web Services
  3. Installed onto AWS instances built from the Debian AMI where we cannot use custom AMIs
  4. Installed onto KVM VMs
  5. As part of building the Carthage container images

So the biggest thing Carthage gives us is uniformity in how we set up infrastructure. We’ve found a number of disadvantages of Containerfiles as well:

  1. Containerfiles mix the disadvantages of imperative and declarative formats. Like a declarative format they have no explicit control logic. It seems like that would be good for introspecting and reasoning about Containers. But all you get is the base image and a set of commands to build a container. For reasoning about common things like whether a container has a particular vulnerability or can be distributed under a particular license, that’s not very useful. So we don’t get much valuable introspection out of the declarative aspects, and all too often we see Containerfiles generated by Makefiles or other multi-level build-systems to get more logic or control flow.

  2. Containerfiles have limited facility for doing things outside the container. The disadvantage of this is that you end up installing all the software you need to build the container into the container itself (or having a multi-level build system). But for example if I want to use Ansible to configure a container, the easiest way to do that is to actually install Ansible into the container itself, even though Ansible has a large dependency chain most of which we won’t need in the container. Yes, Ansible does have a number of connection methods including one for Buildah, but by the point you’re using that, you’re already using a multi-level build system and aren’t really just using a Containerfile.

Okay, so since we’re not going to just use a Containerfile, what do we do instead? We produce a CarthageLayout. A CarthageLayout is an object in the Carthage modeling language. The modeling language looks a lot like Python—in fact it’s even implemented using Python metaclasses and uses the Python parser. However, there are some key semantic differences and it may help to think of the modeling language as its own thing. Carthage layouts are typically contained in Carthage plugins. For example, the oci_images plugin is our focus today. Most of the work in that plugin is in layout.py, and the layout begins here:

class layout(CarthageLayout):
    add_provider(ConfigLayout)
    add_provider(carthage.ansible.ansible_log, str(_dir/"ansible.log"))

The add_provider calls are special, and we’ll discuss them in a future post. For now, think of them as assignments in a more complex namespace than simple identifiers. But the heart of this layout is the CarthageImage class:

    class CarthageImage(PodmanImageModel, carthage_base.CarthageServerRole):
        base_image = injector_access('from_scratch_debian')
        oci_image_tag = 'localhost/carthage:latest'
        oci_image_command = ['/bin/systemd']

Most of the work of our image is done by inheritance. We inherit from the CarthageServerRole from the carthage_base plugin collection. A role is a reusable set of infrastructure that can be attached directly to a MachineModel. By inheriting from this role, we request the installation of the Carthage software. The role also supports copying in various dependencies; for example when Carthage is used to manage a cluster of machines, the layout corresponding to the cluster can automatically be copied to all nodes in the cluster. We do not need this feature to build the container image. The CarthageImage class sets its base image. Currently we are using our own base Debian image that we build with debootstrap and then import as a container image. In the fairly near future, we’ll change that to:

        base_image = ‘debian:bookworm’

That will simply use the Debian image from Dockerhub. We are building our own base image for historical reasons and need to confirm that everything works before switching over. By setting oci_image_tag we specify where in the local images the resulting image will be stored. We also specify that this image boots systemd. We actually do want to do a bit of work on top of CarthageServerRole specific to the container image. To do that we use a Carthage feature called a Customization. There are various types of customization. For example MachineCustomization runs a set of tasks on a Machine that is booted and on the network. When building images, the most common type of customization is a FilesystemCustomization. For these, we have access to the filesystem, and we have some way of running a command in the context of the filesystem. We don’t boot the filesystem as a machine unless we need to. (We might if the filesystem is a kvm VM or AWS instance for example). Carthage collects all the customizations in a role or image model. In the case of container image classes like PodmanImageModel, each customization is applied as an individual layer in the resulting container image.

Roles and customizations are both reusable infrastructure. Roles typically contain customizations. Roles operate at the modeling layer; you might introspect a machine’s model or an image’s model to see what functionality (roles) it provides. In contrast, customizations operate at the implementation layer. They do specific things like move files around, apply Ansible roles or similar.

Let’s take a look at the customization applied for the Carthage container image (full code):


        class customize_for_oci(FilesystemCustomization):

            @setup_task("Remove Software")
            async def remove_software(self):
                await self.run_command("apt", "-y", "purge",
                                       "exim4-base",
                                       )

            @setup_task("Install service")
            async def install_service(self):
               # installs and activates a systemd unit

Then to pull it all together, we simply run the layout:

sudo PYTHONPATH=$(pwd) python3 ./bin/carthage-runner ./oci_images build

In the next post, we will dig more into how to make infrastructure reusable.

For the past four years, I’ve been working on Carthage, a free-software Infrastructure as Code framework. We’ve finally reached a point where it makes sense to talk about Carthage and what it can do. This is the first in a series of blog posts to introduce Carthage, discuss what it can do and show how it works.

Why Another IAC Framework

It seems everywhere you look, there are products designed to support the IAC pattern. On the simple side, you could check a Containerfile into Git. Products like Terraform and Vagrant allow you to template cloud infrastructure and VMs. There are more commercial offerings than I can keep up with.

We were disappointed by what was out there when we started Carthage. Other products have improved, but for many of our applications we’re happy with what Carthage can build. The biggest challenge we ran into is that products wanted us to specify things at the wrong level. For some of our cyber training work we wanted to say things like “We want 3 blue teams, each with a couple defended networks, a red team, and some neutral infrastructure for red to exploit.” Yet the tools we were trying to use wanted to lay things out at the individual machine/container level. We found ourselves contemplating writing a program to generate input for some other IAC tool.

Things were worse for our internal testing. Sometimes we’d be shipping hardware to a customer. But sometimes we’d be virtualizing that build out in a lab. Sometimes we’d be doing a mixture. So we wanted to completely separate the descriptions of machines, networks, and software from any of the information about whether that was realized on hardware, VMs, containers, or a mixture.

Dimensional Breakdown

In discussing Carthage with Enrico Zini, he pointed me at Cognitive Dimensions of notation as a way to think about how Carthage approaches the IAC problem. I’m more interested in the idea of breaking down a design along the idea of dimensions that allow examining the design space than I am particular adherence to Green’s original dimensions.

Low Viscosity, High Abstraction Reuse

One of the guiding principles is that we want to be able to reuse different components at different scales and in different environments. These include being able to do things like:

  • Define an operation like “Update a Debian system” and apply that in several environments including as part of building a base VM or container image, applying to an independently managed machine, or applying to a micro service container that does not run services like ssh or systemd.

  • Defining a role like DNS server that can be applied to a dedicated machine only having that role, to a traditional server with multiple roles, or in a micro service environment.

  • Allowing people to write groups of functionality that can be useful in descriptions of a small number of machines, but can also be reused in large environments like modeling of cyber infrastructure to defend. In the small environments, things are simplified, but in larger environments integration like directories, authentication infrastructure and the like is needed.

  • Allow grouping of functionality at multiple levels. So far I have talked about grouping of software to be installed on a single machine or container. We also want to allow groups of containers (pods or otherwise), groups of machines, groups of networks, or even enclaves (think a model of an entire company or section of a company). Each kind of grouping needs to be parametric and reusable.

Hidden Dependencies

To accomplish these abstraction goals, dependencies need to be non-local. For example, a software role might need to integrate with a directory if a directory is present in the environment. When writing the role, no one is going to know which directory to use, nor whether a directory is present. Taking that as an explicit input into the role is error-prone when the role is combined into large abstract units (bigger roles or collections of machines). Instead it is better to have a non-local dependency, and to find the directory if it is available. We accomplish this using dependency injection.

In addition to being non-local, dependencies are sometimes hidden. It is very easy to overwhelm our cognitive capacity with even a fairly simple IAC description. An effective notation allows us to focus on the parts that matter when working with a particular part of the description. I’ve found hiding dependencies, especially indirect dependencies, to be essential in building complex descriptions.

Obviously, tools are required for examining these dependencies as part of debugging.

First Class Modeling

Clearly one of the goals of IAC descriptions is to actually build and manage infrastructure. It turns out that there are all sorts of things you want to do with the description well before you instantiate the infrastructure. You might want to query the description to build network diagrams, understand interdependencies, or even build inventory/bill of materials. We often find ourselves building Ansible inventory, switch configurations, DNS zones, and all sorts of configuration artifacts. These artifacts may be installed into infrastructure that is instantiated by the description, but they may be consumed in other ways. Allowing the artifacts to be consumed externally means that you can avoid pre-commitment and focus on whatever part of the description you originally want to work on. You may use an existing network at first. Later the IAC description may replace that, or perhaps it never will.

As a result, Carthage separates modeling from instantiation. The model can generally be built and queried without needing to interact with clouds, VMs, or containers. We’ve actually found it useful to build Carthage layouts that cannot ever be fully instantiated, for example because they never specify details like whether a model should be instantiated on a container or VM, or what kind of technology will realize a modeled network. This allows developing roles before the machines that use them or focusing on how machines will interact and how the network will be laid out before the details of installing on specific hardware.

The modeling separation is by far the difference I value most between Carthage and other systems.

A Tool for Experts.

In Neal Stephenson’s essay “In the Beginning… Was the Command Line”, Stephenson points out that the kind of tools experts need are not the same tools that beginners need. The illustration of why a beginner might not be satisfied with a Hole Hog drill caught my attention. Carthage is a tool for experts. Despite what cloud providers will tell you, IAC is not easy. Doubly so when you start making reusable components. Trying to hide that or focus on making things easy to get started can make it harder for experts to efficiently solve the problems they are facing. When we have faced trade offs between making Carthage easy to pick up and making it powerful for expert users, we have chosen to support the experts.

That said, Carthage today is harder to pick up than it needs to be. It’s a relatively new project with few external users as of this time. Our documentation and examples need improvement, just like every project at this level of maturity. Similarly, as the set of things people try to do expand, we will doubtless run into bugs that our current test cases don’t cover. So Carthage absolutely will get easier to learn and use than it is today.

Also, we’ve already had success building beginner-focused applications on top of Carthage. For our cyber training, we built web applications on top of Carthage that made rebuilding and exploring infrastructure easy. We’ve had success using relatively understood tools like Ansible as integration and customization points for Carthage layouts. But in all these cases, when the core layout had significant reusable components and significant complexity in the networking, only an IAC expert was going to be able to maintain and develop that layout.

What Carthage can do.

Carthage has a number of capabilities today. One of Carthage’s strengths is its extensible design. Abstract interfaces make it easy to add new virtualization platforms, cloud services, and support for various ways of managing real hardware. This approach has been validated by incrementally adding support for virtualization architectures and cloud services. As development has progressed, adding new integrations continues to get faster because we are able to reuse existing infrastructure.

Today, Carthage can model:

  • Machines
  • Networks
  • Dynamically compose groupings of the above
  • Generate model level artifacts
    • Ansible inventory
    • Various DNS integrations
    • Various switch configurations

Carthage has excellent facilities for dealing with images on which VMs and Containers can be based, although it does have a bit of a Debian/Ubuntu bias in how it thinks about images:

  • Building base images from a tool like debootstrap
  • Customizing these images
  • Converting into VM images for kvm, VMware, and AWS
  • Building from scratch OCI images for Podman, Docker and k8s
  • Adding layers to existing OCI images

When instantiating infrastructure, Carthage can work with:

  • systemd nspawn containers
  • Podman (Docker would be easy)
  • Libvirt
  • VMware
  • With the AWS plugin, EC2 VMs and networking

We have also looked at Oracle Cloud and I believe Openstack, although that code is not merged.

Future posts will talk about core Carthage concepts and how to use Carthage to build infrastructure.

Profile

Sam Hartman

October 2025

S M T W T F S
   1234
567891011
12131415161718
192021222324 25
262728293031 

Syndicate

RSS Atom

Most Popular Tags

Style Credit

Expand Cut Tags

No cut tags
Page generated May. 21st, 2026 05:17 am
Powered by Dreamwidth Studios