Software Installation / Deployment

for host in `cat hosts`; do ssh root@$host runcmd; done # Dear God Please Fire Me.

Problem Statement

A systems administrator has a task: Make some change on 1000 hosts.

The Wrong Way(s)

The young admin gets right to work: logs into each host and makes the change.

This is obviously the wrong way.

    1. It takes forever.

    2. It's done by hand on each host, so may be done wrong somewhere

    3. You miss hosts that are down

    4. You miss hosts that aren't up yet (the host that gets installed tomorrow)

If you're in middle management, and you see a manager praising someone for working this way ("look at all the hard work he's doing"), fire that manager. Really. And then go talk to the admin in question. Admins shouldn't ever work hard: it takes away valuable Slashdot-reading time.

The older admin does the for loop above in the subheading at the top of the page. This fixes problem 2, but keeps problems 1, 3, and 4. It does leaves a lot more Admin time free for reading Slashdot. This is state-of-the-art in many places.... but there are better ways.

The Slightly Less Wrong Way

You have a CMDB somewhere, right? At least finance has one. Get access to it, and write a simple program to do queries and output matching hosts to standard out. Write a pssh script (or use func) that takes a list of hosts from standard in, and run the command provided on the command line on each host, in parallel. This takes a LOT less time, thus solving problems 1 and 2 above.

(This is more or less the right approach to take for creating ad-hoc audit jobs. "Which hosts do we have that are affected by this BIOS bug?" type queries. pssh is about as good a solution as you'll get for this)

This is the BladeLogic model. It is Still Wrong!

Your admin spent less time on the task, so that's good. If you bought BladeLogic or another comparable product, you have a report that shows you which hosts you missed because they were down. That allows you to solve problem 3. But there's still problem 4.

So you spend $$, and buy something that does Discovery! And now you can run Audit Rules against your hosts. And so when the host comes on the network and doesn't have the new configuration, you'll Detect It. And thus be able to Fix It. Just think of all the work you'll automatically queue up for your admins!

The problem is: Your thinking about the problem is still backwards. The admin's job isn't to look busy for your entertainment, or to close tickets. The admins job is to keep the systems running, so that you can use them to do your job.

You're thinking about it wrong.

Revisited problem statement: There is a configuration change that needs to be applied to 1000 hosts.

There's the crux of the issue. It's not a task that needs to be done, it's a state that needs to be changed.

Once you come to this conclusion, the immediate reaction is to use some software written around this concept. Puppet, cfengine, chef, or something else in this family. However that's not required.

You see, your operating system of choice already has some sane system to manage configuration changes. RedHat calls it YUM. Debian calls it apt-get. Solaris calls it pkg-get and lets a 3rd party distribute it. (Blastwave).

Packages

    1. Build a package that embodies the configuration change. This could just be a package that runs an install script to edit /etc/sysctl.conf, or it could drop files in a magic directory. But the point is, it's a package.

    2. Use your package management tools to install that package on 1000 hosts.

This approach (turning it into a package) takes care of issue 4. Having sane package management tools deal with the other 3 issues.

Software

I'm busy building a simple CMDB that's integrated with a package management system. Hopefully I'll be able to share, someday. But it's not a hard thing for you to build.

CMDB

Start with a database

    • Hosts

      • Owners

      • Configuration

      • Interfaces/Location information/whatever else you find helpful

    • Groups

      • A set of hosts, for any arbitrary reason.

      • Has packages.

So with these simple relationships, you can generate package information. You know what groups a host is in. You know what packages are associated with that group: Generate a list of packages that are on that host.

Now: Add some perl-template-toolkit, and generate a platform package (.rpm, .pkg) that has dependencies on those other packages. Call it something based off of the hostname, and version it based off a timestamp.

Now, when you update the host, the update software finds a more recent version of a package, so it updates that, and you pull in the new packages you wanted on that host..

Easy Packages

The next tool you need is something to make it easy to create packages. Remember, I'm telling you to that making a package is the admin's primary way to run commands on a system. So it had better be Darn Easy to make a package.

So make it so. Any admin can create a directory tree. Create a tool that turns that into a package. Add a spot to add metadata (perhaps a directory in the base of the tree called "meta"?) and you're most of the way there.

Next Steps

Since you've started building a CMDB: Continue. Add some data elements to make it easy to Manage Configuration.

    • Processes

      • Bound to a host. Inherits attributes from

    • Clusters

      • A collection of processes that are configured similarly.

And now your admin should have a lot more time to spend reading slashdot. Which is really what you're hiring them for.

Copyright 2011, Doug Kilpatrick and "American Dave" Kline