An ongoing expression of fascination burnout

Recently on twitter...

Why using packages makes sense in a configuration management world

I woke up this morning to a discussion on twitter between two of my favorite internet people, Andrew Shafer and R.I. Pienaar.

Andrew I know from his previous job with Reductive (now Puppet Labs) and I love what he has to say. (I really liked his DevOps Cafe episode – in particular making me change my opinion about “commitments” in Agile contexts.)

R.I. is a force of nature – his blog is great, and full of years of hard-earned wisdom, and mcollective project is something I can’t wait to roll out.

The discussion centered around the question that I’ll paraphrase:

In the era of configuration management tools like Chef and Puppet, what value do packages provide? What are the pros and cons of packaging?

RIs question

littleidea question

It was clear that both of them were feeling the pinch of expressing themselves in 140 characters. It’s a topic I’m pretty passionate about, after 15+ years fighting to keep systems under control, so I figured I’d write up my take on it.

Packages vs wget && tar

What are the pros and cons at the utility level?

(As dogma-free and objective as I can be, of course…)

The cost of building packages

Let’s start with the only downside I can think of to having to build packages – it’s an extra step, and takes some time.

Packaging your own code is easy – you solve it once, and then have something like Hudson or BuildBot take it from there. However, packaging upstream code that’s not in your distro is a pain in the butt. That’s a given.

Both of these get worse if (like me) you’re stuck running multiple distributions. Right now we have to build .rpm, .deb and Solaris packages.

Depending on what language you’re using, there might be some tools that help package things the right way. For Debian+Perl, for example, dh-make-perl is getting to the point of being awesome and very usable.

One way to get packages for upstream stuff that’s not very painful is with a tool like CheckInstall – you do the equivalent of a make install one time in a sandbox, and that gives you a package you can install at will and get all the benefits I’ll elaborate below.

No matter what, it’s an extra step.

@AshBerlin points out that, in the case that you’re managing some upstream software, that this is a cost you have to take on for every version they release, not just a 1 time cost.

There is no question that it’s ‘easier’ to do a make install than to build a package (every time the software is updated) and get that installed.

So here’s why that’s worth putting up with.

The value of building packages

Redundant, Version-aware Repositories

What’s can go wrong with code that looks like this? (Arbitrary link I happened to have in my .bash_history)

$ wget http://yuilibrary.com/downloads/yuicompressor/yuicompressor-2.4.2.zip
  1. What happens when the version changes? You have to update your configuration. This can be a good or bad thing, but in some cases you really want $latest to be installed, rather than the hard-coded version someone supplied the last time they edited the configuration manifest.
  2. What happens when the yuilibrary folks change their (arbitrary) download URI’s?
  3. What happens if http://yuilibrary.com is down the next time you want to do a build?
  4. What if you want to be shipping yuicompressor-2.4.1.zip? Is that still available for download?

When you want to install a package, the more reliable you can make the process, the better. Upgrading all the servers in a cluster? You want all the servers to be upgraded. Trying to bring a new server online? You want to be able to do that with a very low probability of anything going wrong. (The more you can trust your deploys, especially in the age of automated infrastructure, it saves you money to be able to bring servers up as “Just in Time” as possible.)

The “repo” model provided in at least the Red Hat and Debian packaging system handles all of these cases really, really well.

  • You can provide a list of repositories that an attempt to install a package will try from until they find one working
  • It’s a trivial sysadmin task to have several repos with the same content available. Each one doesn’t even have to be fancy and “HA”.
  • It’s 100% predictable (and an implementation detail you don’t have to worry about) what will happen when you say you either want a specific version of a package, or the latest version.

Built-in “tripwire”-like functionality

Apt and Yum both keep checksums of all the files on the system installed by packages. So at any time you can ask “have any files supposed to be managed by packages been modified”?

This is a useful thing to know for security reasons, of course. However, it’s even more important for helping people adapt their behaviors.

In an environment that’s moving from “not configuration managed” to “configuration managed”, and the status quo has been “modify the files on production servers”, it’s great to be able to get a nagios alert that one of the servers is now out of configuration, check the sums, and find out exactly what file(s) and when were modified.

(If you couple that with a nice ‘everyone logs in as their user and sudo’s when needed’ policy, you can find out exactly who and when, as well.)

The package manager knows how files got on your system

Knowing what files got spewed into your system from your average make install is pretty predictable, but certainly not always.

This is useful for 2 cases:

Troubleshooting, and knowing where to make changes when you find the problem

The server keeps throwing 500 errors. Why? Ah, an untrapped exception in /some/file/deep/on/the/system. Ok, I can fix that. Where do I need to go fix that? dpkg -S <filename> tells me the exact package responsible for that file.

$ dpkg -S /usr/bin/factor
    coreutils: /usr/bin/factor
$ dpkg -S /usr/bin/facter
    facter: /usr/bin/facter

That’s one area in particular where configuration management systems can add an extra layer of value. R.I. actually wrote a tool that helps you discover which puppet module is responsible for configuring a given resource on the server. (localconfig.yaml parser) So if the answer is, rather than “it’s something shipped with a package”, “it’s a config file that puppet wrote with your module called my_module”, you can easily find and fix it.

Uninstalling

I’ve had some concrete problems from this, where I did an upgrade, and cruft left over from the previous version interfered with the new versions. For dynamic languages which build up large library trees, this can be particularly nasty, since default search paths might end up including remnants of an old version.

When the package manager knows the location of every file, it can rip them out as happily as it put them in.

Dependencies are built in

At the level of configuration management, I really care about the application I’m configuring.

I want to run lighttpd. I tell the configuration manager to install it. I don’t want to have to do a research project to find all the supporting libraries required for it. Also, I really don’t want to track down the -dev versions of all of those libraries by hand.

This is especially important for upgrades – if an application starts using a new library after an upgrade (or depends on a newer version of a library when upgrading) that’s all handled (and expressible) at the package layer.

Discovery of available updates are built in

This is one of the most compelling reasons.

If you’ve got a system with 9 tarball’d packages,

  • What versions are currently installed?
  • What software has updates available?

It’s bad enough if you’re installing software you know about, but I assume we’re going to be using a distribution at some point. You also REALLY want to know about updates your distribution is providing, right? Especially when it’s things like critical kernel issues, openssl problems – anything that can be remotely triggered, at the very least.

Knowing what versions are available can be easily automated. You can use things like

to handle this, and even do things like automatically install security patches if you’d like.

Since you need to solve this ‘sysadmin problem’ at large anyway, why not leverage the tools and practices you build around this to learn about your own software, as well? It’s just as valuable to know that there’s a new build of apache2 available from the core Debian repo as it is to know that our_custom_app, which we expected to be at version 2.6 everywhere, is pinging because host25 is still running 2.5.

Cryptographic signatures are built in

Packages give you a way to trust that the bits you’re about to install are the ones you should be installing.

In this time of huge malware infestations, attempts to trojan even things like the linux kernel, and a large black market for “owned boxes”, you have every reason not to trust that the software you download isn’t compromised in some way.

Most places you can download tarballs also post checksums of what those files should be. There’s two problems with this:

  1. If you can’t trust the place the download is hosted, why would you trust the sum?
  2. That adds a lot of complexity to the automated installer.

You go from

  • Download the package
  • Decompress, configure, make

To

  • Download the package
  • Scrape for the latest checksums
  • Verify the checksums
  • THEN install the packages

Packages solve both problems.

  1. They distribute a public key asynchronously from the hosting of the packages. Unless you are owned already, you can verify that a package was signed by the person who holds the matching private key.
  2. Checking signatures is built in. Your package install will fail if something goes wrong.

Binary-identical code on every server

There are lots of things that can go wrong when you try and build a package from source.

  • You may not have some development libraries you need. Great, now you’re stuck managing those explicitly in configuration management as well. This also means that
  • Every server has to have the full stack of software needed to be able to build your software
  • Compilation may fail, especially if the package was updated from the last time you tried to build it.
  • Things that varied from build-to-build need to be accounted for when troubleshooting. “Huh, redis on host25 keeps crashing. Do we need to rebuild it?”

In general, it’s very nice to be able to completely decouple the tasks of

  • Creating a ‘build’ of your software and stack
  • Deploying a ‘build’ of your software and stack

Environment Support

One reason we like packages and repos is that it lets us define a configuration as:

  • A set of which packages
  • A set of configurations of those packages

Then when we’re working in a topic or feature branch, we can create a repository just for that branch (this is also automated), and the repo configuration is the only thing that needs to be modified in the configuration management code.

Also, because you only need a subset of all the packages you need in a repo, this lets us “stack” them.

  • Prefer my project repo (which only has my project-sensitive packages getting built into it, the stuff I’m modifying)
  • Fall back to the production repo for anything else you need

The End

Packages may add some pain and complexity up front to the install process, but they add a tremendous amount of value to the “lifecycle management” of your applications. Most of the hell we go through as people running servers doesn’t happen the first month we set them up, it’s month 6, 18, and 24 that are the real problem. And those problems (“servers are graveyards of state”, etc) that make Configuration Management the right thing to do in the first place.

Use them. Love them.

Deploying self-contained Perl Dancer applications

I’m really enjoying using Perl’s Dancer for building lightweight web applications. It’s heavily inspired by Ruby’s Sinatra framework, but clearly Perlish.

The only thing I’ve been bothered by so far is getting my applications from my development environment out to production. It’s pretty easy to actually do deployments in terms of actually getting your app up and handling web requests, but shipping the software to the remote system has always bothered me.

Update: I wanted to clarify that Dancer itself is very lightweight and has as close to zero dependencies as is reasonable to have.

However, the Dancer apps that I’ve been building tend to require a lot of pretty fresh CPAN goodness. (Task::Plack, Moose, DBIx::Class, AnyEvent, and more.) This is a problem if you’re trying to avoid using CPAN as root to just install system packages, which I like to avoid – it makes systems harder to define with something like Puppet, and can cause weird interaction problems between multiple applications running on the same machine when they use the same modules.

Dancer builds you a nice starter application container when you run dancer -a – it’s made to look like a Perl module, with a Makefile.PL and everything. This initially excited me, because I could just turn it into a debian package it with something like dh-make-perl. Here’s the problem – when you perl Makefile.PL && make dist none of your non-perl-module assets make the trip. I’m not really interested in deploying applications that don’t have templates, CSS, Javascript, or images.

From what I can tell in the docs and on IRC, most people solve this by just checking out their application from their version control system on the production box.

This is more or less this idea:

installing an app

If you’re like me, and living life with hundreds or thousands of servers, that approach doesn’t really work. It also doesn’t solve the first problem above of how to handle all the dependencies (or dependency clashes on a single machine.)

If you’re running lots of servers, you’ll end up with this problem. You install your first app on a single box or two, and it’s running along fine. Then the people come, and you need more horsepower. Time to build a new box.

adding a new server

Oops! DBIx::Class isn’t passing tests right now. I guess we can’t deploy a new server. Or, more subtly, everything installs, but when you add your app to the load balancer, something is “wacky” one 1/2 of the web requests. Pleh.

You really don’t want to be in that trap if you’re trying to auto-scale your app on, say, EC2.

Ok, so we don’t want to install from CPAN on the fly.

I really like brian d foy, and he’s got a strategy to handle this problem: run your own CPAN!

This is actually a pretty good idea, and in some environments, I’d be using it right now.

That turns the model to this:

building from minicpan

We have our own minicpan to use to buffer the volatility of the CPAN. Upgrades can happen to minicpan when we want and need them to. If the minicpan didn’t change, we can install our application on as many servers as we want, and trust they’ll be getting the same code.

There’s still a big problem with this: what if you have multiple apps that have different dependencies? You can’t use CPAN’s “new hotness” for one simple app that could really use from it, without worrying about if all your other applications will be able to work with all that new code. So we’ve given ourselves the ability to add a ‘buffer’ between our usage and CPAN’s potential volatility, but we haven’t bought much independence for our applications.

Brian’s started working on extending minicpan to handle multiple minicpan’s.

However, there is another approach, which brings some other nice features as well.

Enter Shipwright.

Shipwright lets you keep a local, version-controlled copy of all the source (from CPAN or otherwise) that your application needs. It keeps the information about where all that source “came from”, be it CPAN or a local file, so you can keep it as up to date as you want to, and when you want to. It nicely decouples the “application building and packaging” problem (“make me a new package”) from the “application maintenance” problem (“update some components.”)

So now we’ve got a version controlled “CPAN Cache” per application we’re managing.

building with shipwright

The other things I really like about Shipwright are:

  1. It doesn’t just handle the CPAN problem – it also makes your code into a little self-contained unit which can be dropped into any directory on the target system.
  2. You can tweak any module’s “build process”. (As I take advantage of below.) If the CPAN installer doesn’t work the way you want or need, you can do some pre/post-install hacking. Again – in a nice, version controlled and repeatable way.
  3. You can ship autoconf style applications along with it. Want to also deploy a patched version of nginx, or a redis server? You can do that here. (I wouldn’t, but you can.)

Essentially, all I’m saying is “use shipwright” – but there are a few tricks to make it work for Dancer applications.

MANIFEST

First, you’ll need to make sure all the files you care about in your Dancer app are going to be included at all. This means getting them in the MANIFEST file. I just did a simple find . -type f > MANIFEST and cleaned out the entries I didn’t need or already had. If you’re doing this a lot, or modifying the file contents of your applications a lot, I’m sure there’s a more elegant approach.

shipwright build file

One of the nice things about shipwright is that it allows you to tune up a build script for everything you’re installing.

Even though the Dancer packages now contain all your files, they still don’t know where to get installed.

Normally the scripts/MyApp/build contents look like this:

install: %%MAKE%% install

If we add a simple extra step, that gives us a copy of all the module’s assets rooted off our package’s ‘/www’ path.

install: %%MAKE%% install ; cp -av . %%INSTALL_BASE%%/www

Walkthrough

First, you’ll need to install Shipwright. I am in love with the mighty combination of local::lib and cpanm, so I’d recommend starting there.

Build your Dancer app

Once you’ve fixed the MANIFEST file as described above, you need to build a distribution of your Dancer app.

$ cd ../MyApp
$ perl Makefile.PL
$ make dist

Prepare the ‘vessel’

I’m going to be doing all the work in a directory called ~/home/work/shipwright. I’m also using the git shipwright backend here – it works with svn, plain filesystem, and other options as well.

# you might need to mkdir -p "$HOME/work/shipwright/" first.
$ export APPNAME="MyApp"
$ export SHIPWRIGHT_SHIPYARD="git:file:///$HOME/work/shipwright/$APPNAME-vessel.git"
$ shipwright create

Ok, now you’ve got the vessel. It’s time to load it full of CPAN’y goodness. Since this is a tutorial for Dancer, I’ll include some of the basics I like to have when deploying Dancer apps.

Fill the vessel with software

I’m using --no-follow here because I had some errors trying to follow dependencies on my internal applications that I also install via distribution file. If you’re only loading CPAN modules from your Dancer app, you can take this off.

$ shipwright import cpan:Template cpan:Dancer cpan:YAML::XS cpan:Task::Plack
# put the full path, and right version number of, the file here
$ shipwright import file:~/work/$APPNAME/$APPNAME-0.004.tar.gz --no-follow
# REPEAT importing for any of your other in-house modules/code
$ cd ~/work/shipwright
$ git clone $APPNAME-vessel.git
$ cd $APPNAME-vessel
$ vi scripts/$APPNAME/build
# change the install line from:
#       install: %%MAKE%% install
# to
#       install: %%MAKE%% install ; cp -av . %%INSTALL_BASE%%/www
$ git add scripts/$APPNAME/build
$ git commit -m "tweaked build script for $APPNAME"
$ git push origin master

Build the vessel

Cool. Now we’ve got a self-contained, versioned repository. Time to build it.

$ ./bin/shipwright-builder --install-base ~/work/shipwright/$APPNAME --force

The --force is because some modules don’t pass tests. Shipwright does have a ways to go with dependency management (or I’m doing something wrong) – if I’ve install a module into the ‘vessel’, sometimes other modules that depend on it can’t use it at build/test time.

Now you’ve got a directory (~/work/shipwright/$APPNAME) which can be deployed repeatably on your servers. You can wrap it up in a Debian or Red Hat package if you’d like, tar it, rsync it, BitTorrent it – up to you.

Maintaining the Vessel

When you build a new version of your Dancer app, all you have to do is update the vessel, then build.

$ shipwright relocate $APPNAME file:~/....new.tar.gz
$ shipwright update $APPNAME
$ cd $APPNAME-vessel
$ git pull
$ rm -rf ~/work/shipwright/$APPNAME && ./bin/shipwright-builder --install-base ~/work/shipwright/$APPNAME --force

Using the Vessel

Shipwright has some nice features to set up all the environment variables needed so you can use your app. All you have to do is source the appropriate script, like so:

# set up your environment so the '$APPNAME' libraries and binaries are in your path
$ . /opt/yourstuff/$APPNAME/tools/shipwright-source-bash /opt/yourstuff/$APPNAME/

What’s cool is you can do the same thing from SYSV-style init scripts. Let’s say you’re launching this as a fastcgi application. Your startup script can look like this example. The magic line is source $APP_BASE... which uses the shipwright shell config to set the variables used by the rest of the script.

#!/bin/bash

NAME=$APPNAME
APP_BASE="/opt/mt/$NAME"
source $APP_BASE/tools/shipwright-source-bash $APP_BASE
APP_BIN="$APP_BASE/bin"
APP_WWW="$APP_BASE/www"
APP_PSGI="$APP_WWW/app.psgi"
FCGI_LISTEN=127.0.0.1:55900
DAEMON="$APP_BIN/plackup"

# Defaults
#RUN="no"
OPTIONS="-s FCGI --listen $FCGI_LISTEN -E production --app $APP_PSGI"

PIDFILE="$NAME.pid"

[ -f /lib/lsb/init-functions ] && . /lib/lsb/init-functions

start()
{
    log_daemon_msg "Starting plack server" "$NAME"
    start-stop-daemon -b -m --start --quiet --pidfile "$PIDFILE" --exec $DAEMON -- $OPTIONS
    if [ $? != 0 ]; then
        log_end_msg 1
        exit 1
    else
        log_end_msg 0
    fi
}

signal()
{

    if [ "$1" = "stop" ]; then
    SIGNAL="TERM"
        log_daemon_msg "Stopping plack server" "$NAME"
    else
    if [ "$1" = "reload" ]; then
        SIGNAL="HUP"
            log_daemon_msg "Reloading plack server" "$NAME"
    else
        echo "ERR: wrong parameter given to signal()"
        exit 1
    fi
    fi
    if [ -f "$PIDFILE" ]; then
        start-stop-daemon --stop --signal $SIGNAL --quiet --pidfile "$PIDFILE"
     if [ $? = 0 ]; then
            log_end_msg 0
        else
        SIGNAL="KILL"
        start-stop-daemon --stop --signal $SIGNAL --quiet --pidfile "$PIDFILE"
            if [ $? != 0 ]; then
                log_end_msg 1
                [ $2 != 0 ] || exit 0
            else
            rm "$PIDFILE"
                log_end_msg 0
            fi
        fi
    if [ "$SIGNAL" = "KILL" ]; then
        rm -f "$PIDFILE"
        fi
    else
        log_end_msg 0
    fi
}

case "$1" in
    start)
    start
    ;;

    force-start)
    start
    ;;

    stop)
        signal stop 0
    ;;

    force-stop)
    signal stop 0
    ;;

    reload)
    signal reload 0
    ;;

    force-reload|restart)
    signal stop 1
    sleep 2
    start
    ;;

    *)
    echo "Usage: /etc/init.d/$NAME {start|force-start|stop|force-stop|reload|restart|force-reload}"
    exit 1
    ;;
esac

exit 0

Conclusion

Dancer is great. Shipwright is great. CPAN is great, but I want a buffer from all that awesomeness.

Journal Archives