labs-vcap/warden
Pieter Noordhuis c27daa0e52 warden: Update README
Change-Id: I48f8416e4e1cc52fb34a2c285cd4ba95e71e8999
2011-12-21 14:29:13 -08:00
..
bin warden: Switch to a line-based JSON 2011-12-20 10:40:04 -08:00
lib warden: Use new network offset per test 2011-12-20 14:35:53 -08:00
root warden: Misc cleanup for container setup 2011-12-20 14:52:41 -08:00
spec warden: Exit status is nil when it is unknown 2011-12-20 15:05:03 -08:00
src warden: Update Makefile target; fix compiler warning 2011-12-20 15:14:07 -08:00
tmp warden: tmp directory should always exist 2011-12-21 12:58:57 -08:00
.gitignore Add warden: manager for ephemeral Linux containers 2011-11-17 13:37:25 -08:00
Gemfile Add uid pool to LXC container 2011-12-05 14:30:00 -08:00
README.md warden: Update README 2011-12-21 14:29:13 -08:00
Rakefile warden: Update README 2011-12-21 14:29:13 -08:00
TODO.md Add warden: manager for ephemeral Linux containers 2011-11-17 13:37:25 -08:00
warden.gemspec warden: Move back to EM 0.12 2011-12-21 13:46:56 -08:00

README.md

warden

warden is a daemon that manages throw-away Linux containers. The containers can be used as scratch-spaces to run arbitrary scripts as unpriviledged user, not only jailing the file system, but also jailing the process table and networking interface. The network interface for every container is a virtual ethernet pair that lives in an isolated subnet. The host can be configured to NAT traffic from containers to the outside world, and perform any kind of filtering and monitoring on it using iptables.

Concepts

  • host: machine running warden and the containers it spawn.

Containers

Every container boots into a fresh base system. This is done by using one or more read only base file systems that contain enough to boot a container. The combination of these file systems with a writable per-container file system is perceived by the container as a single fully writable file system. This union of file systems is created using aufs. Because all write operations to the resulting file system are persisted in the writable scratch space, all containers can share the same, read only, base file systems.

Base file structure

The directory structure in place to build these scratch spaces is located in the root/ directory of this repository. Its contents looks like this:

root/lxc
├── 000-base
├── 001-apt
├── ...
└── 0??-xxx

The read only file systems are sorted aphabetically when layering one over the order, so the root of all unions becomes 000-base, next in line is 001-apt and so forth. Note that directories starting with a . are not included in this list.

These directories do not hold the file system they represent directly, but contain a setup.rb script which is responsible for setting up its part of the file system. When this script is run, it creates a rootfs and a union directory. The list of rootfs directories of directories in root/, that are aphabetically smaller than the directory where the script is run, is layered together with this rootfs directory and mounted in the union directory. Note that all previous rootfs directories are read only, and writes caused by the setup script end up in the rootfs directory. For instance: when a directory 002-something is added to root, and its setup.rb runs, the union directory will layer in order: 001-base/rootfs read only, 002-apt/rootfs read only and 002-something/rootfs read/write.

Creating a new container

The skeleton for new containers is located in the .instance-skeleton directory and contains a series of scripts and templates to set up its file system. New containers do not use this skeleton directly. Rather, the skeleton is copied to a container-specific directory in the same root (e.g. .instance-my_container) before setting it up.

After copying, the setup.rb in the container-specific directory is run and creates a number of configuration files and scripts. These scripts are used to start and stop the container. Container-specific configuration, such as its name and IP address, is pulled from environment variables by setup.rb and cannot be changed later.

Talking with warden

warden runs on EventMachine. It uses a line-based JSON protocol to communicate with the outside world, and does so over a Unix socket, which is located at /tmp/warden.sock by default. The daemon can respond to a number of verbs, that either create new containers, modify container state, or run scripts inside of a container. The verbs that warden responds to are:

  • create: This creates a new container. In the future this verb may accept an optional configuration parameter. This command returns handle (or name) of the container which is used to identify it. The handle is equal to the hexadecimal representation of its IP address, as acquired from the pool of unused network addresses in the configured subnet.

  • spawn <handle> <script>: Run the Bash script <script> in the context of the container identified by <handle>. This command returns a job identifier that can be used to reap its exit status at some point in the future. Also, the connection that issued the command may go away and reconnect later while still being able to reap the job.

  • link <handle> <job_id>: Try to reap the script identified by <job_id> running in the container identified by <handle>. When the script is still being executed, this command blocks the connection. When the script finishes, or has already finished, this command returns a 3-element tuple. This tuple contains, in order, the integer exit status, path to the captured STDOUT, and the path to the captured STDERR. These elements may be nil when they cannot be determined.

  • limit <handle> (mem|disk) [<value>]: Set or get resource limits for the container identified by <handle>. The following resources can be limited:

    • The memory limit is specified in number of bytes. It is enforced using the control group associated with the container. When a container exceeds this limit, one or more of its processes will be killed by the kernel. Additionally, the warden will be notified that an OOM happened and it subsequently tears down the container.
    • The disk space limit is specified in the number of blocks. It is enforced by means of a disk space quota for the user associated with the container. When a container exceeds this limit, the warden will be notified and it subsequently tears down the container.
  • net <handle> in: Forward a port on the external interface of the host to the container identified by <handle>. The port number is the same on the outside as it is on the inside of the container. This command returns the mapped port number.

  • net <handle> out <address[/mask][:port]>: Allow traffic from the container identified by <handle> to the network address specified by <address>. The address may optionally contain a mask to allow a network of addresses, and a port to only allow traffic to that specific port.

  • stop <handle>: Stop processes running inside the container identified by the specified handle. Because all processes are taken down, unfinished scripts will likely terminate without an exit status being available.

  • destroy <handle>: Stop processes and destroy all resources associated with the container identified by the specified handle. Because everything related to the container is destroyed, artifacts from running an earlier script should be copied out before calling destroy.

Lifecycle management

Since warden thinks of containers as being ephemeral, it includes logic to clean up containers once they can no longer be used. The lifecycle of a container is associated with the client connections that reference it. When some client creates a container and subsequently disconnects, the container is implicily destroyed. However, when the handle of that container is used by another client, it only is destroyed once both connections disconnect. This is done by means of a connection-oriented reference count. Whenever the set of connections referencing a container becomes empty, it is destroyed. There is no difference between a container being manually or automatically destroyed.

Networking

Every container is assigned an individual /30 subnet. Containers use a virtual ethernet pair for networking. The host side of the pair is assigned the <network>+1 IP address, where the container side is assigned the <network>+2 IP address. The subnets are allocated from a pool of available subnets, which can be configured by the pool_start_address and pool_size configuration parameters under the network key. The pool consist of pool_size subnets starting with the subnet of pool_start_address.

A frequently unused private range of IP addresses is the 172.16.0.0/12 range.

System prerequisites

warden is only tested on Ubuntu 10.04 with a backported kernel, but should also work on later Ubuntu versions. Before running warden, the following packages need to be installed:

  • linux-image-server-lts-backport-natty
  • debootstrap

Make sure that no cgroup type file system is mounted. The warden mounts this file system with a specific set of options.

Other dependencies can be compiled and installed by running rake setup.

Hacking

The packaged tests create and destroy actual containers, so require system prerequisites to be in place. They need to be run as root.

Setting up the base system is done by running setup.rb in the root/lxc/ directory. This script loops over the subdirectories not starting with a . alphabetically and runs their setup.rb.

Quickly creating a container to see if the (changed) configuration works can be done using the create.sh script in the root/lxc/ directory.