Merge pull request #19856 from moxiegirl/carry-close-19240
Updating for CAS changes and new select a driver section
|
@ -10,184 +10,203 @@ parent = "engine_driver"
|
|||
|
||||
# Docker and AUFS in practice
|
||||
|
||||
AUFS was the first storage driver in use with Docker. As a result, it has a long and close history with Docker, is very stable, has a lot of real-world deployments, and has strong community support. AUFS has several features that make it a good choice for Docker. These features enable:
|
||||
AUFS was the first storage driver in use with Docker. As a result, it has a
|
||||
long and close history with Docker, is very stable, has a lot of real-world
|
||||
deployments, and has strong community support. AUFS has several features that
|
||||
make it a good choice for Docker. These features enable:
|
||||
|
||||
- Fast container startup times.
|
||||
- Efficient use of storage.
|
||||
- Efficient use of memory.
|
||||
|
||||
Despite its capabilities and long history with Docker, some Linux distributions do not support AUFS. This is usually because AUFS is not included in the mainline (upstream) Linux kernel.
|
||||
Despite its capabilities and long history with Docker, some Linux distributions
|
||||
do not support AUFS. This is usually because AUFS is not included in the
|
||||
mainline (upstream) Linux kernel.
|
||||
|
||||
The following sections examine some AUFS features and how they relate to Docker.
|
||||
The following sections examine some AUFS features and how they relate to
|
||||
Docker.
|
||||
|
||||
## Image layering and sharing with AUFS
|
||||
|
||||
AUFS is a *unification filesystem*. This means that it takes multiple directories on a single Linux host, stacks them on top of each other, and provides a single unified view. To achieve this, AUFS uses *union mount*.
|
||||
AUFS is a *unification filesystem*. This means that it takes multiple
|
||||
directories on a single Linux host, stacks them on top of each other, and
|
||||
provides a single unified view. To achieve this, AUFS uses a *union mount*.
|
||||
|
||||
AUFS stacks multiple directories and exposes them as a unified view through a single mount point. All of the directories in the stack, as well as the union mount point, must all exist on the same Linux host. AUFS refers to each directory that it stacks as a *branch*.
|
||||
AUFS stacks multiple directories and exposes them as a unified view through a
|
||||
single mount point. All of the directories in the stack, as well as the union
|
||||
mount point, must all exist on the same Linux host. AUFS refers to each
|
||||
directory that it stacks as a *branch*.
|
||||
|
||||
Within Docker, AUFS union mounts enable image layering. The AUFS storage driver implements Docker image layers using this union mount system. AUFS branches correspond to Docker image layers. The diagram below shows a Docker container based on the `ubuntu:latest` image.
|
||||
Within Docker, AUFS union mounts enable image layering. The AUFS storage driver
|
||||
implements Docker image layers using this union mount system. AUFS branches
|
||||
correspond to Docker image layers. The diagram below shows a Docker container
|
||||
based on the `ubuntu:latest` image.
|
||||
|
||||
![](images/aufs_layers.jpg)
|
||||
|
||||
This diagram shows the relationship between the Docker image layers and the AUFS branches (directories) in `/var/lib/docker/aufs`. Each image layer and the container layer correspond to an AUFS branch (directory) in the Docker host's local storage area. The union mount point gives the unified view of all layers.
|
||||
This diagram shows that each image layer, and the container layer, is
|
||||
represented in the Docker hosts filesystem as a directory under
|
||||
`/var/lib/docker/`. The union mount point provides the unified view of all
|
||||
layers. As of Docker 1.10, image layer IDs do not correspond to the names of
|
||||
the directories that contain their data.
|
||||
|
||||
AUFS also supports the copy-on-write technology (CoW). Not all storage drivers do.
|
||||
AUFS also supports the copy-on-write technology (CoW). Not all storage drivers
|
||||
do.
|
||||
|
||||
## Container reads and writes with AUFS
|
||||
|
||||
Docker leverages AUFS CoW technology to enable image sharing and minimize the use of disk space. AUFS works at the file level. This means that all AUFS CoW operations copy entire files - even if only a small part of the file is being modified. This behavior can have a noticeable impact on container performance, especially if the files being copied are large, below a lot of image layers, or the CoW operation must search a deep directory tree.
|
||||
Docker leverages AUFS CoW technology to enable image sharing and minimize the
|
||||
use of disk space. AUFS works at the file level. This means that all AUFS CoW
|
||||
operations copy entire files - even if only a small part of the file is being
|
||||
modified. This behavior can have a noticeable impact on container performance,
|
||||
especially if the files being copied are large, below a lot of image layers,
|
||||
or the CoW operation must search a deep directory tree.
|
||||
|
||||
Consider, for example, an application running in a container needs to add a single new value to a large key-value store (file). If this is the first time the file is modified it does not yet exist in the container's top writable layer. So, the CoW must *copy up* the file from the underlying image. The AUFS storage driver searches each image layer for the file. The search order is from top to bottom. When it is found, the entire file is *copied up* to the container's top writable layer. From there, it can be opened and modified.
|
||||
|
||||
Larger files obviously take longer to *copy up* than smaller files, and files that exist in lower image layers take longer to locate than those in higher layers. However, a *copy up* operation only occurs once per file on any given container. Subsequent reads and writes happen against the file's copy already *copied-up* to the container's top layer.
|
||||
Consider, for example, an application running in a container needs to add a
|
||||
single new value to a large key-value store (file). If this is the first time
|
||||
the file is modified, it does not yet exist in the container's top writable
|
||||
layer. So, the CoW must *copy up* the file from the underlying image. The AUFS
|
||||
storage driver searches each image layer for the file. The search order is from
|
||||
top to bottom. When it is found, the entire file is *copied up* to the
|
||||
container's top writable layer. From there, it can be opened and modified.
|
||||
|
||||
Larger files obviously take longer to *copy up* than smaller files, and files
|
||||
that exist in lower image layers take longer to locate than those in higher
|
||||
layers. However, a *copy up* operation only occurs once per file on any given
|
||||
container. Subsequent reads and writes happen against the file's copy already
|
||||
*copied-up* to the container's top layer.
|
||||
|
||||
## Deleting files with the AUFS storage driver
|
||||
|
||||
The AUFS storage driver deletes a file from a container by placing a *whiteout
|
||||
file* in the container's top layer. The whiteout file effectively obscures the
|
||||
existence of the file in image's lower, read-only layers. The simplified
|
||||
existence of the file in the read-only image layers below. The simplified
|
||||
diagram below shows a container based on an image with three image layers.
|
||||
|
||||
![](images/aufs_delete.jpg)
|
||||
|
||||
The `file3` was deleted from the container. So, the AUFS storage driver placed
|
||||
a whiteout file in the container's top layer. This whiteout file effectively
|
||||
"deletes" `file3` from the container by obscuring any of the original file's
|
||||
existence in the image's read-only base layer. Of course, the image could have
|
||||
been in any of the other layers instead or in addition depending on how the
|
||||
layers are built.
|
||||
"deletes" `file3` from the container by obscuring any of the original file's
|
||||
existence in the image's read-only layers. This works the same no matter which
|
||||
of the image's read-only layers the file exists in.
|
||||
|
||||
## Configure Docker with AUFS
|
||||
|
||||
You can only use the AUFS storage driver on Linux systems with AUFS installed. Use the following command to determine if your system supports AUFS.
|
||||
You can only use the AUFS storage driver on Linux systems with AUFS installed.
|
||||
Use the following command to determine if your system supports AUFS.
|
||||
|
||||
```bash
|
||||
$ grep aufs /proc/filesystems
|
||||
nodev aufs
|
||||
```
|
||||
$ grep aufs /proc/filesystems
|
||||
nodev aufs
|
||||
|
||||
This output indicates the system supports AUFS. Once you've verified your
|
||||
This output indicates the system supports AUFS. Once you've verified your
|
||||
system supports AUFS, you can must instruct the Docker daemon to use it. You do
|
||||
this from the command line with the `docker daemon` command:
|
||||
|
||||
```bash
|
||||
$ sudo docker daemon --storage-driver=aufs &
|
||||
```
|
||||
$ sudo docker daemon --storage-driver=aufs &
|
||||
|
||||
|
||||
Alternatively, you can edit the Docker config file and add the
|
||||
`--storage-driver=aufs` option to the `DOCKER_OPTS` line.
|
||||
|
||||
```bash
|
||||
# Use DOCKER_OPTS to modify the daemon startup options.
|
||||
DOCKER_OPTS="--storage-driver=aufs"
|
||||
```
|
||||
# Use DOCKER_OPTS to modify the daemon startup options.
|
||||
DOCKER_OPTS="--storage-driver=aufs"
|
||||
|
||||
Once your daemon is running, verify the storage driver with the `docker info` command.
|
||||
Once your daemon is running, verify the storage driver with the `docker info`
|
||||
command.
|
||||
|
||||
```bash
|
||||
$ sudo docker info
|
||||
Containers: 1
|
||||
Images: 4
|
||||
Storage Driver: aufs
|
||||
Root Dir: /var/lib/docker/aufs
|
||||
Backing Filesystem: extfs
|
||||
Dirs: 6
|
||||
Dirperm1 Supported: false
|
||||
Execution Driver: native-0.2
|
||||
...output truncated...
|
||||
```
|
||||
$ sudo docker info
|
||||
Containers: 1
|
||||
Images: 4
|
||||
Storage Driver: aufs
|
||||
Root Dir: /var/lib/docker/aufs
|
||||
Backing Filesystem: extfs
|
||||
Dirs: 6
|
||||
Dirperm1 Supported: false
|
||||
Execution Driver: native-0.2
|
||||
...output truncated...
|
||||
|
||||
The output above shows that the Docker daemon is running the AUFS storage driver on top of an existing ext4 backing filesystem.
|
||||
The output above shows that the Docker daemon is running the AUFS storage
|
||||
driver on top of an existing `ext4` backing filesystem.
|
||||
|
||||
## Local storage and AUFS
|
||||
|
||||
As the `docker daemon` runs with the AUFS driver, the driver stores images and containers on within the Docker host's local storage area in the `/var/lib/docker/aufs` directory.
|
||||
As the `docker daemon` runs with the AUFS driver, the driver stores images and
|
||||
containers within the Docker host's local storage area under
|
||||
`/var/lib/docker/aufs/`.
|
||||
|
||||
### Images
|
||||
|
||||
Image layers and their contents are stored under
|
||||
`/var/lib/docker/aufs/diff/<image-id>` directory. The contents of an image
|
||||
layer in this location includes all the files and directories belonging in that
|
||||
image layer.
|
||||
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, image layer IDs do
|
||||
not correspond to directory names.
|
||||
|
||||
The `/var/lib/docker/aufs/layers/` directory contains metadata about how image
|
||||
layers are stacked. This directory contains one file for every image or
|
||||
container layer on the Docker host. Inside each file are the image layers names
|
||||
that exist below it. The diagram below shows an image with 4 layers.
|
||||
container layer on the Docker host (though file names no longer match image
|
||||
layer IDs). Inside each file are the names of the directories that exist below
|
||||
it in the stack
|
||||
|
||||
![](images/aufs_metadata.jpg)
|
||||
The command below shows the contents of a metadata file in
|
||||
`/var/lib/docker/aufs/layers/` that lists the the three directories that are
|
||||
stacked below it in the union mount. Remember, these directory names do no map
|
||||
to image layer IDs with Docker 1.10 and higher.
|
||||
|
||||
Inspecting the contents of the file relating to the top layer of the image
|
||||
shows the three image layers below it. They are listed in the order they are
|
||||
stacked.
|
||||
|
||||
```bash
|
||||
$ cat /var/lib/docker/aufs/layers/91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
|
||||
|
||||
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
|
||||
|
||||
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
|
||||
|
||||
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
|
||||
```
|
||||
$ cat /var/lib/docker/aufs/layers/91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
|
||||
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
|
||||
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
|
||||
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
|
||||
|
||||
The base layer in an image has no image layers below it, so its file is empty.
|
||||
|
||||
### Containers
|
||||
|
||||
Running containers are mounted at locations in the
|
||||
`/var/lib/docker/aufs/mnt/<container-id>` directory. This is the AUFS union
|
||||
mount point that exposes the container and all underlying image layers as a
|
||||
single unified view. If a container is not running, its directory still exists
|
||||
but is empty. This is because containers are only mounted when they are running.
|
||||
Running containers are mounted below `/var/lib/docker/aufs/mnt/<container-id>`.
|
||||
This is where the AUFS union mount point that exposes the container and all
|
||||
underlying image layers as a single unified view exists. If a container is not
|
||||
running, it still has a directory here but it is empty. This is because AUFS
|
||||
only mounts a container when it is running. With Docker 1.10 and higher,
|
||||
container IDs no longer correspond to directory names under
|
||||
`/var/lib/docker/aufs/mnt/<container-id>`.
|
||||
|
||||
Container metadata and various config files that are placed into the running
|
||||
container are stored in `/var/lib/containers/<container-id>`. Files in this
|
||||
directory exist for all containers on the system, including ones that are
|
||||
stopped. However, when a container is running the container's log files are also
|
||||
in this directory.
|
||||
|
||||
A container's thin writable layer is stored under
|
||||
`/var/lib/docker/aufs/diff/<container-id>`. This directory is stacked by AUFS as
|
||||
the containers top writable layer and is where all changes to the container are
|
||||
stored. The directory exists even if the container is stopped. This means that
|
||||
restarting a container will not lose changes made to it. Once a container is
|
||||
deleted this directory is deleted.
|
||||
|
||||
Information about which image layers are stacked below a container's top
|
||||
writable layer is stored in the following file
|
||||
`/var/lib/docker/aufs/layers/<container-id>`. The command below shows that the
|
||||
container with ID `b41a6e5a508d` has 4 image layers below it:
|
||||
|
||||
```bash
|
||||
$ cat /var/lib/docker/aufs/layers/b41a6e5a508dfa02607199dfe51ed9345a675c977f2cafe8ef3e4b0b5773404e-init
|
||||
91e54dfb11794fad694460162bf0cb0a4fa710cfa3f60979c177d920813e267c
|
||||
d74508fb6632491cea586a1fd7d748dfc5274cd6fdfedee309ecdcbc2bf5cb82
|
||||
c22013c8472965aa5b62559f2b540cd440716ef149756e7b958a1b2aba421e87
|
||||
d3a1f33e8a5a513092f01bb7eb1c2abf4d711e5105390a3fe1ae2248cfde1391
|
||||
```
|
||||
|
||||
The image layers are shown in order. In the output above, the layer starting
|
||||
with image ID "d3a1..." is the image's base layer. The image layer starting
|
||||
with "91e5..." is the image's topmost layer.
|
||||
container are stored in `/var/lib/docker/containers/<container-id>`. Files in
|
||||
this directory exist for all containers on the system, including ones that are
|
||||
stopped. However, when a container is running the container's log files are
|
||||
also in this directory.
|
||||
|
||||
A container's thin writable layer is stored in a directory under
|
||||
`/var/lib/docker/aufs/diff/`. With Docker 1.10 and higher, container IDs no
|
||||
longer correspond to directory names. However, the containers thin writable
|
||||
layer still exists under here and is stacked by AUFS as the top writable layer
|
||||
and is where all changes to the container are stored. The directory exists even
|
||||
if the container is stopped. This means that restarting a container will not
|
||||
lose changes made to it. Once a container is deleted, it's thin writable layer
|
||||
in this directory is deleted.
|
||||
|
||||
## AUFS and Docker performance
|
||||
|
||||
To summarize some of the performance related aspects already mentioned:
|
||||
|
||||
- The AUFS storage driver is a good choice for PaaS and other similar use-cases where container density is important. This is because AUFS efficiently shares images between multiple running containers, enabling fast container start times and minimal use of disk space.
|
||||
- The AUFS storage driver is a good choice for PaaS and other similar use-cases
|
||||
where container density is important. This is because AUFS efficiently shares
|
||||
images between multiple running containers, enabling fast container start times
|
||||
and minimal use of disk space.
|
||||
|
||||
- The underlying mechanics of how AUFS shares files between image layers and containers uses the systems page cache very efficiently.
|
||||
- The underlying mechanics of how AUFS shares files between image layers and
|
||||
containers uses the systems page cache very efficiently.
|
||||
|
||||
- The AUFS storage driver can introduce significant latencies into container write performance. This is because the first time a container writes to any file, the file has be located and copied into the containers top writable layer. These latencies increase and are compounded when these files exist below many image layers and the files themselves are large.
|
||||
- The AUFS storage driver can introduce significant latencies into container
|
||||
write performance. This is because the first time a container writes to any
|
||||
file, the file has be located and copied into the containers top writable
|
||||
layer. These latencies increase and are compounded when these files exist below
|
||||
many image layers and the files themselves are large.
|
||||
|
||||
One final point. Data volumes provide the best and most predictable performance.
|
||||
This is because they bypass the storage driver and do not incur any of the
|
||||
potential overheads introduced by thin provisioning and copy-on-write. For this
|
||||
reason, you may want to place heavy write workloads on data volumes.
|
||||
One final point. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you may want to place heavy write workloads on
|
||||
data volumes.
|
||||
|
||||
## Related information
|
||||
|
||||
|
|
|
@ -13,126 +13,118 @@ parent = "engine_driver"
|
|||
Btrfs is a next generation copy-on-write filesystem that supports many advanced
|
||||
storage technologies that make it a good fit for Docker. Btrfs is included in
|
||||
the mainline Linux kernel and its on-disk-format is now considered stable.
|
||||
However, many of its features are still under heavy development and users should
|
||||
consider it a fast-moving target.
|
||||
However, many of its features are still under heavy development and users
|
||||
should consider it a fast-moving target.
|
||||
|
||||
Docker's `btrfs` storage driver leverages many Btrfs features for image and
|
||||
container management. Among these features are thin provisioning, copy-on-write,
|
||||
and snapshotting.
|
||||
container management. Among these features are thin provisioning,
|
||||
copy-on-write, and snapshotting.
|
||||
|
||||
This article refers to Docker's Btrfs storage driver as `btrfs` and the overall Btrfs Filesystem as Btrfs.
|
||||
This article refers to Docker's Btrfs storage driver as `btrfs` and the overall
|
||||
Btrfs Filesystem as Btrfs.
|
||||
|
||||
>**Note**: The [Commercially Supported Docker Engine (CS-Engine)](https://www.docker.com/compatibility-maintenance) does not currently support the `btrfs` storage driver.
|
||||
|
||||
## The future of Btrfs
|
||||
|
||||
Btrfs has been long hailed as the future of Linux filesystems. With full support in the mainline Linux kernel, a stable on-disk-format, and active development with a focus on stability, this is now becoming more of a reality.
|
||||
Btrfs has been long hailed as the future of Linux filesystems. With full
|
||||
support in the mainline Linux kernel, a stable on-disk-format, and active
|
||||
development with a focus on stability, this is now becoming more of a reality.
|
||||
|
||||
As far as Docker on the Linux platform goes, many people see the `btrfs` storage driver as a potential long-term replacement for the `devicemapper` storage driver. However, at the time of writing, the `devicemapper` storage driver should be considered safer, more stable, and more *production ready*. You should only consider the `btrfs` driver for production deployments if you understand it well and have existing experience with Btrfs.
|
||||
As far as Docker on the Linux platform goes, many people see the `btrfs`
|
||||
storage driver as a potential long-term replacement for the `devicemapper`
|
||||
storage driver. However, at the time of writing, the `devicemapper` storage
|
||||
driver should be considered safer, more stable, and more *production ready*.
|
||||
You should only consider the `btrfs` driver for production deployments if you
|
||||
understand it well and have existing experience with Btrfs.
|
||||
|
||||
## Image layering and sharing with Btrfs
|
||||
|
||||
Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk components of image and container layers. Btrfs subvolumes look and feel like a normal Unix filesystem. As such, they can have their own internal directory structure that hooks into the wider Unix filesystem.
|
||||
Docker leverages Btrfs *subvolumes* and *snapshots* for managing the on-disk
|
||||
components of image and container layers. Btrfs subvolumes look and feel like
|
||||
a normal Unix filesystem. As such, they can have their own internal directory
|
||||
structure that hooks into the wider Unix filesystem.
|
||||
|
||||
Subvolumes are natively copy-on-write and have space allocated to them on-demand
|
||||
from an underlying storage pool. They can also be nested and snapped. The
|
||||
diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are nested,
|
||||
whereas 'Subvolume 4' shows its own internal directory tree.
|
||||
Subvolumes are natively copy-on-write and have space allocated to them
|
||||
on-demand from an underlying storage pool. They can also be nested and snapped.
|
||||
The diagram blow shows 4 subvolumes. 'Subvolume 2' and 'Subvolume 3' are
|
||||
nested, whereas 'Subvolume 4' shows its own internal directory tree.
|
||||
|
||||
![](images/btfs_subvolume.jpg)
|
||||
|
||||
Snapshots are a point-in-time read-write copy of an entire subvolume. They exist directly below the subvolume they were created from. You can create snapshots of snapshots as shown in the diagram below.
|
||||
Snapshots are a point-in-time read-write copy of an entire subvolume. They
|
||||
exist directly below the subvolume they were created from. You can create
|
||||
snapshots of snapshots as shown in the diagram below.
|
||||
|
||||
![](images/btfs_snapshots.jpg)
|
||||
|
||||
Btfs allocates space to subvolumes and snapshots on demand from an underlying pool of storage. The unit of allocation is referred to as a *chunk* and *chunks* are normally ~1GB in size.
|
||||
Btfs allocates space to subvolumes and snapshots on demand from an underlying
|
||||
pool of storage. The unit of allocation is referred to as a *chunk*, and
|
||||
*chunks* are normally ~1GB in size.
|
||||
|
||||
Snapshots are first-class citizens in a Btrfs filesystem. This means that they look, feel, and operate just like regular subvolumes. The technology required to create them is built directly into the Btrfs filesystem thanks to its native copy-on-write design. This means that Btrfs snapshots are space efficient with little or no performance overhead. The diagram below shows a subvolume and its snapshot sharing the same data.
|
||||
Snapshots are first-class citizens in a Btrfs filesystem. This means that they
|
||||
look, feel, and operate just like regular subvolumes. The technology required
|
||||
to create them is built directly into the Btrfs filesystem thanks to its
|
||||
native copy-on-write design. This means that Btrfs snapshots are space
|
||||
efficient with little or no performance overhead. The diagram below shows a
|
||||
subvolume and its snapshot sharing the same data.
|
||||
|
||||
![](images/btfs_pool.jpg)
|
||||
|
||||
Docker's `btrfs` storage driver stores every image layer and container in its own Btrfs subvolume or snapshot. The base layer of an image is stored as a subvolume whereas child image layers and containers are stored as snapshots. This is shown in the diagram below.
|
||||
Docker's `btrfs` storage driver stores every image layer and container in its
|
||||
own Btrfs subvolume or snapshot. The base layer of an image is stored as a
|
||||
subvolume whereas child image layers and containers are stored as snapshots.
|
||||
This is shown in the diagram below.
|
||||
|
||||
![](images/btfs_container_layer.jpg)
|
||||
|
||||
The high level process for creating images and containers on Docker hosts running the `btrfs` driver is as follows:
|
||||
The high level process for creating images and containers on Docker hosts
|
||||
running the `btrfs` driver is as follows:
|
||||
|
||||
1. The image's base layer is stored in a Btrfs subvolume under
|
||||
1. The image's base layer is stored in a Btrfs *subvolume* under
|
||||
`/var/lib/docker/btrfs/subvolumes`.
|
||||
|
||||
The image ID is used as the subvolume name. E.g., a base layer with image ID
|
||||
"f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b" will be
|
||||
stored in
|
||||
`/var/lib/docker/btrfs/subvolumes/f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b`
|
||||
2. Subsequent image layers are stored as a Btrfs *snapshot* of the parent
|
||||
layer's subvolume or snapshot.
|
||||
|
||||
2. Subsequent image layers are stored as a Btrfs snapshot of the parent layer's subvolume or snapshot.
|
||||
|
||||
The diagram below shows a three-layer image. The base layer is a subvolume. Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of Layer 1's snapshot.
|
||||
The diagram below shows a three-layer image. The base layer is a subvolume.
|
||||
Layer 1 is a snapshot of the base layer's subvolume. Layer 2 is a snapshot of
|
||||
Layer 1's snapshot.
|
||||
|
||||
![](images/btfs_constructs.jpg)
|
||||
|
||||
As of Docker 1.10, image layer IDs no longer correspond to directory names
|
||||
under `/var/lib/docker/`.
|
||||
|
||||
## Image and container on-disk constructs
|
||||
|
||||
Image layers and containers are visible in the Docker host's filesystem at
|
||||
`/var/lib/docker/btrfs/subvolumes/<image-id> OR <container-id>`. Directories for
|
||||
`/var/lib/docker/btrfs/subvolumes/`. However, as previously stated, directory
|
||||
names no longer correspond to image layer IDs. That said, directories for
|
||||
containers are present even for containers with a stopped status. This is
|
||||
because the `btrfs` storage driver mounts a default, top-level subvolume at
|
||||
`/var/lib/docker/subvolumes`. All other subvolumes and snapshots exist below
|
||||
that as Btrfs filesystem objects and not as individual mounts.
|
||||
|
||||
The following example shows a single Docker image with four image layers.
|
||||
|
||||
```bash
|
||||
$ sudo docker images -a
|
||||
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
|
||||
ubuntu latest 0a17decee413 2 weeks ago 188.3 MB
|
||||
<none> <none> 3c9a9d7cc6a2 2 weeks ago 188.3 MB
|
||||
<none> <none> eeb7cb91b09d 2 weeks ago 188.3 MB
|
||||
<none> <none> f9a9f253f610 2 weeks ago 188.1 MB
|
||||
```
|
||||
|
||||
Each image layer exists as a Btrfs subvolume or snapshot with the same name as its image ID as illustrated by the `btrfs subvolume list` command shown below:
|
||||
|
||||
```bash
|
||||
$ sudo btrfs subvolume list /var/lib/docker
|
||||
ID 257 gen 9 top level 5 path btrfs/subvolumes/f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b
|
||||
ID 258 gen 10 top level 5 path btrfs/subvolumes/eeb7cb91b09d5de9edb2798301aeedf50848eacc2123e98538f9d014f80f243c
|
||||
ID 260 gen 11 top level 5 path btrfs/subvolumes/3c9a9d7cc6a235eb2de58ca9ef3551c67ae42a991933ba4958d207b29142902b
|
||||
ID 261 gen 12 top level 5 path btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751
|
||||
```
|
||||
|
||||
Under the `/var/lib/docker/btrfs/subvolumes` directory, each of these subvolumes and snapshots are visible as a normal Unix directory:
|
||||
|
||||
```bash
|
||||
$ ls -l /var/lib/docker/btrfs/subvolumes/
|
||||
total 0
|
||||
drwxr-xr-x 1 root root 132 Oct 16 14:44 0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751
|
||||
drwxr-xr-x 1 root root 132 Oct 16 14:44 3c9a9d7cc6a235eb2de58ca9ef3551c67ae42a991933ba4958d207b29142902b
|
||||
drwxr-xr-x 1 root root 132 Oct 16 14:44 eeb7cb91b09d5de9edb2798301aeedf50848eacc2123e98538f9d014f80f243c
|
||||
drwxr-xr-x 1 root root 132 Oct 16 14:44 f9a9f253f6105141e0f8e091a6bcdb19e3f27af949842db93acba9048ed2410b
|
||||
```
|
||||
|
||||
Because Btrfs works at the filesystem level and not the block level, each image
|
||||
and container layer can be browsed in the filesystem using normal Unix commands.
|
||||
The example below shows a truncated output of an `ls -l` command against the
|
||||
image's top layer:
|
||||
and container layer can be browsed in the filesystem using normal Unix
|
||||
commands. The example below shows a truncated output of an `ls -l` command an
|
||||
image layer:
|
||||
|
||||
```bash
|
||||
$ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/
|
||||
total 0
|
||||
drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 boot
|
||||
drwxr-xr-x 1 root root 882 Oct 9 08:38 dev
|
||||
drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 home
|
||||
...output truncated...
|
||||
```
|
||||
$ ls -l /var/lib/docker/btrfs/subvolumes/0a17decee4139b0de68478f149cc16346f5e711c5ae3bb969895f22dd6723751/
|
||||
total 0
|
||||
drwxr-xr-x 1 root root 1372 Oct 9 08:39 bin
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 boot
|
||||
drwxr-xr-x 1 root root 882 Oct 9 08:38 dev
|
||||
drwxr-xr-x 1 root root 2040 Oct 12 17:27 etc
|
||||
drwxr-xr-x 1 root root 0 Apr 10 2014 home
|
||||
...output truncated...
|
||||
|
||||
## Container reads and writes with Btrfs
|
||||
|
||||
A container is a space-efficient snapshot of an image. Metadata in the snapshot
|
||||
points to the actual data blocks in the storage pool. This is the same as with a
|
||||
subvolume. Therefore, reads performed against a snapshot are essentially the
|
||||
points to the actual data blocks in the storage pool. This is the same as with
|
||||
a subvolume. Therefore, reads performed against a snapshot are essentially the
|
||||
same as reads performed against a subvolume. As a result, no performance
|
||||
overhead is incurred from the Btrfs driver.
|
||||
|
||||
|
@ -145,28 +137,34 @@ new files to a container's snapshot operate at native Btrfs speeds.
|
|||
Updating an existing file in a container causes a copy-on-write operation
|
||||
(technically *redirect-on-write*). The driver leaves the original data and
|
||||
allocates new space to the snapshot. The updated data is written to this new
|
||||
space. Then, the driver updates the filesystem metadata in the snapshot to point
|
||||
to this new data. The original data is preserved in-place for subvolumes and
|
||||
snapshots further up the tree. This behavior is native to copy-on-write
|
||||
space. Then, the driver updates the filesystem metadata in the snapshot to
|
||||
point to this new data. The original data is preserved in-place for subvolumes
|
||||
and snapshots further up the tree. This behavior is native to copy-on-write
|
||||
filesystems like Btrfs and incurs very little overhead.
|
||||
|
||||
With Btfs, writing and updating lots of small files can result in slow performance. More on this later.
|
||||
With Btfs, writing and updating lots of small files can result in slow
|
||||
performance. More on this later.
|
||||
|
||||
## Configuring Docker with Btrfs
|
||||
|
||||
The `btrfs` storage driver only operates on a Docker host where `/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure shows how to configure Btrfs on Ubuntu 14.04 LTS.
|
||||
The `btrfs` storage driver only operates on a Docker host where
|
||||
`/var/lib/docker` is mounted as a Btrfs filesystem. The following procedure
|
||||
shows how to configure Btrfs on Ubuntu 14.04 LTS.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
If you have already used the Docker daemon on your Docker host and have images you want to keep, `push` them to Docker Hub or your private Docker Trusted Registry before attempting this procedure.
|
||||
If you have already used the Docker daemon on your Docker host and have images
|
||||
you want to keep, `push` them to Docker Hub or your private Docker Trusted
|
||||
Registry before attempting this procedure.
|
||||
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev/xvdb`. The device identifier may be different in your environment and you should substitute your own values throughout the procedure.
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at
|
||||
`/dev/xvdb`. The device identifier may be different in your environment and you
|
||||
should substitute your own values throughout the procedure.
|
||||
|
||||
The procedure also assumes your kernel has the appropriate Btrfs modules loaded. To verify this, use the following command:
|
||||
The procedure also assumes your kernel has the appropriate Btrfs modules
|
||||
loaded. To verify this, use the following command:
|
||||
|
||||
```bash
|
||||
$ cat /proc/filesystems | grep btrfs
|
||||
```
|
||||
$ cat /proc/filesystems | grep btrfs
|
||||
|
||||
### Configure Btrfs on Ubuntu 14.04 LTS
|
||||
|
||||
|
@ -181,7 +179,9 @@ Assuming your system meets the prerequisites, do the following:
|
|||
|
||||
2. Create the Btrfs storage pool.
|
||||
|
||||
Btrfs storage pools are created with the `mkfs.btrfs` command. Passing multiple devices to the `mkfs.btrfs` command creates a pool across all of those devices. Here you create a pool with a single device at `/dev/xvdb`.
|
||||
Btrfs storage pools are created with the `mkfs.btrfs` command. Passing
|
||||
multiple devices to the `mkfs.btrfs` command creates a pool across all of those
|
||||
devices. Here you create a pool with a single device at `/dev/xvdb`.
|
||||
|
||||
$ sudo mkfs.btrfs -f /dev/xvdb
|
||||
WARNING! - Btrfs v3.12 IS EXPERIMENTAL
|
||||
|
@ -199,7 +199,8 @@ Assuming your system meets the prerequisites, do the following:
|
|||
noted earlier, Btrfs is not currently recommended for production deployments
|
||||
unless you already have extensive experience.
|
||||
|
||||
3. If it does not already exist, create a directory for the Docker host's local storage area at `/var/lib/docker`.
|
||||
3. If it does not already exist, create a directory for the Docker host's local
|
||||
storage area at `/var/lib/docker`.
|
||||
|
||||
$ sudo mkdir /var/lib/docker
|
||||
|
||||
|
@ -210,7 +211,10 @@ Assuming your system meets the prerequisites, do the following:
|
|||
$ sudo blkid /dev/xvdb
|
||||
/dev/xvdb: UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" UUID_SUB="c3927a64-4454-4eef-95c2-a7d44ac0cf27" TYPE="btrfs"
|
||||
|
||||
b. Create a `/etc/fstab` entry to automatically mount `/var/lib/docker` each time the system boots.
|
||||
b. Create an `/etc/fstab` entry to automatically mount `/var/lib/docker`
|
||||
each time the system boots. Either of the following lines will work, just
|
||||
remember to substitute the UUID value with the value obtained from the previous
|
||||
command.
|
||||
|
||||
/dev/xvdb /var/lib/docker btrfs defaults 0 0
|
||||
UUID="a0ed851e-158b-4120-8416-c9b072c8cf47" /var/lib/docker btrfs defaults 0 0
|
||||
|
@ -223,10 +227,11 @@ Assuming your system meets the prerequisites, do the following:
|
|||
<output truncated>
|
||||
/dev/xvdb on /var/lib/docker type btrfs (rw)
|
||||
|
||||
The last line in the output above shows the `/dev/xvdb` mounted at `/var/lib/docker` as Btrfs.
|
||||
The last line in the output above shows the `/dev/xvdb` mounted at
|
||||
`/var/lib/docker` as Btrfs.
|
||||
|
||||
|
||||
Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon should automatically load with the `btrfs` storage driver.
|
||||
Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon
|
||||
should automatically load with the `btrfs` storage driver.
|
||||
|
||||
1. Start the Docker daemon.
|
||||
|
||||
|
@ -236,9 +241,10 @@ Now that you have a Btrfs filesystem mounted at `/var/lib/docker`, the daemon sh
|
|||
The procedure for starting the Docker daemon may differ depending on the
|
||||
Linux distribution you are using.
|
||||
|
||||
You can start the Docker daemon with the `btrfs` storage driver by passing
|
||||
the `--storage-driver=btrfs` flag to the `docker daemon` command or you can
|
||||
add the `DOCKER_OPTS` line to the Docker config file.
|
||||
You can force the the Docker daemon to start with the `btrfs` storage
|
||||
driver by either passing the `--storage-driver=btrfs` flag to the `docker
|
||||
daemon` at startup, or adding it to the `DOCKER_OPTS` line to the Docker config
|
||||
file.
|
||||
|
||||
2. Verify the storage driver with the `docker info` command.
|
||||
|
||||
|
@ -252,25 +258,54 @@ Your Docker host is now configured to use the `btrfs` storage driver.
|
|||
|
||||
## Btrfs and Docker performance
|
||||
|
||||
There are several factors that influence Docker's performance under the `btrfs` storage driver.
|
||||
There are several factors that influence Docker's performance under the `btrfs`
|
||||
storage driver.
|
||||
|
||||
- **Page caching**. Btrfs does not support page cache sharing. This means that *n* containers accessing the same file require *n* copies to be cached. As a result, the `btrfs` driver may not be the best choice for PaaS and other high density container use cases.
|
||||
- **Page caching**. Btrfs does not support page cache sharing. This means that
|
||||
*n* containers accessing the same file require *n* copies to be cached. As a
|
||||
result, the `btrfs` driver may not be the best choice for PaaS and other high
|
||||
density container use cases.
|
||||
|
||||
- **Small writes**. Containers performing lots of small writes (including Docker hosts that start and stop many containers) can lead to poor use of Btrfs chunks. This can ultimately lead to out-of-space conditions on your Docker host and stop it working. This is currently a major drawback to using current versions of Btrfs.
|
||||
- **Small writes**. Containers performing lots of small writes (including
|
||||
Docker hosts that start and stop many containers) can lead to poor use of Btrfs
|
||||
chunks. This can ultimately lead to out-of-space conditions on your Docker
|
||||
host and stop it working. This is currently a major drawback to using current
|
||||
versions of Btrfs.
|
||||
|
||||
If you use the `btrfs` storage driver, closely monitor the free space on your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the output of normal Unix commands such as `df`; always use the Btrfs native commands.
|
||||
If you use the `btrfs` storage driver, closely monitor the free space on
|
||||
your Btrfs filesystem using the `btrfs filesys show` command. Do not trust the
|
||||
output of normal Unix commands such as `df`; always use the Btrfs native
|
||||
commands.
|
||||
|
||||
- **Sequential writes**. Btrfs writes data to disk via journaling technique. This can impact sequential writes, where performance can be up to half.
|
||||
- **Sequential writes**. Btrfs writes data to disk via journaling technique.
|
||||
This can impact sequential writes, where performance can be up to half.
|
||||
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write filesystems like Btrfs. Many small random writes can compound this issue. It can manifest as CPU spikes on Docker hosts using SSD media and head thrashing on Docker hosts using spinning media. Both of these result in poor performance.
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
|
||||
filesystems like Btrfs. Many small random writes can compound this issue. It
|
||||
can manifest as CPU spikes on Docker hosts using SSD media and head thrashing
|
||||
on Docker hosts using spinning media. Both of these result in poor performance.
|
||||
|
||||
Recent versions of Btrfs allow you to specify `autodefrag` as a mount option. This mode attempts to detect random writes and defragment them. You should perform your own tests before enabling this option on your Docker hosts. Some tests have shown this option has a negative performance impact on Docker hosts performing lots of small writes (including systems that start and stop many containers).
|
||||
Recent versions of Btrfs allow you to specify `autodefrag` as a mount
|
||||
option. This mode attempts to detect random writes and defragment them. You
|
||||
should perform your own tests before enabling this option on your Docker hosts.
|
||||
Some tests have shown this option has a negative performance impact on Docker
|
||||
hosts performing lots of small writes (including systems that start and stop
|
||||
many containers).
|
||||
|
||||
- **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media. To enable these, mount with the `-o ssd` mount option. These optimizations include enhanced SSD write performance by avoiding things like *seek optimizations* that have no use on SSD media.
|
||||
- **Solid State Devices (SSD)**. Btrfs has native optimizations for SSD media.
|
||||
To enable these, mount with the `-o ssd` mount option. These optimizations
|
||||
include enhanced SSD write performance by avoiding things like *seek
|
||||
optimizations* that have no use on SSD media.
|
||||
|
||||
Btfs also supports the TRIM/Discard primitives. However, mounting with the `-o discard` mount option can cause performance issues. Therefore, it is recommended you perform your own tests before using this option.
|
||||
Btfs also supports the TRIM/Discard primitives. However, mounting with the
|
||||
`-o discard` mount option can cause performance issues. Therefore, it is
|
||||
recommended you perform your own tests before using this option.
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||
|
||||
## Related Information
|
||||
|
||||
|
|
|
@ -51,56 +51,84 @@ Device Mapper technology works at the block level rather than the file level.
|
|||
This means that `devicemapper` storage driver's thin provisioning and
|
||||
copy-on-write operations work with blocks rather than entire files.
|
||||
|
||||
>**Note**: Snapshots are also referred to as *thin devices* or *virtual devices*. They all mean the same thing in the context of the `devicemapper` storage driver.
|
||||
>**Note**: Snapshots are also referred to as *thin devices* or *virtual
|
||||
>devices*. They all mean the same thing in the context of the `devicemapper`
|
||||
>storage driver.
|
||||
|
||||
With the `devicemapper` the high level process for creating images is as follows:
|
||||
With `devicemapper` the high level process for creating images is as follows:
|
||||
|
||||
1. The `devicemapper` storage driver creates a thin pool.
|
||||
|
||||
The pool is created from block devices or loop mounted sparse files (more on this later).
|
||||
The pool is created from block devices or loop mounted sparse files (more
|
||||
on this later).
|
||||
|
||||
2. Next it creates a *base device*.
|
||||
|
||||
A base device is a thin device with a filesystem. You can see which filesystem is in use by running the `docker info` command and checking the `Backing filesystem` value.
|
||||
A base device is a thin device with a filesystem. You can see which
|
||||
filesystem is in use by running the `docker info` command and checking the
|
||||
`Backing filesystem` value.
|
||||
|
||||
3. Each new image (and image layer) is a snapshot of this base device.
|
||||
|
||||
These are thin provisioned copy-on-write snapshots. This means that they are initially empty and only consume space from the pool when data is written to them.
|
||||
These are thin provisioned copy-on-write snapshots. This means that they
|
||||
are initially empty and only consume space from the pool when data is written
|
||||
to them.
|
||||
|
||||
With `devicemapper`, container layers are snapshots of the image they are created from. Just as with images, container snapshots are thin provisioned copy-on-write snapshots. The container snapshot stores all updates to the container. The `devicemapper` allocates space to them on-demand from the pool as and when data is written to the container.
|
||||
With `devicemapper`, container layers are snapshots of the image they are
|
||||
created from. Just as with images, container snapshots are thin provisioned
|
||||
copy-on-write snapshots. The container snapshot stores all updates to the
|
||||
container. The `devicemapper` allocates space to them on-demand from the pool
|
||||
as and when data is written to the container.
|
||||
|
||||
The high level diagram below shows a thin pool with a base device and two images.
|
||||
The high level diagram below shows a thin pool with a base device and two
|
||||
images.
|
||||
|
||||
![](images/base_device.jpg)
|
||||
|
||||
If you look closely at the diagram you'll see that it's snapshots all the way down. Each image layer is a snapshot of the layer below it. The lowest layer of each image is a snapshot of the the base device that exists in the pool. This base device is a `Device Mapper` artifact and not a Docker image layer.
|
||||
If you look closely at the diagram you'll see that it's snapshots all the way
|
||||
down. Each image layer is a snapshot of the layer below it. The lowest layer of
|
||||
each image is a snapshot of the the base device that exists in the pool. This
|
||||
base device is a `Device Mapper` artifact and not a Docker image layer.
|
||||
|
||||
A container is a snapshot of the image it is created from. The diagram below shows two containers - one based on the Ubuntu image and the other based on the Busybox image.
|
||||
A container is a snapshot of the image it is created from. The diagram below
|
||||
shows two containers - one based on the Ubuntu image and the other based on the
|
||||
Busybox image.
|
||||
|
||||
![](images/two_dm_container.jpg)
|
||||
|
||||
|
||||
## Reads with the devicemapper
|
||||
|
||||
Let's look at how reads and writes occur using the `devicemapper` storage driver. The diagram below shows the high level process for reading a single block (`0x44f`) in an example container.
|
||||
Let's look at how reads and writes occur using the `devicemapper` storage
|
||||
driver. The diagram below shows the high level process for reading a single
|
||||
block (`0x44f`) in an example container.
|
||||
|
||||
![](images/dm_container.jpg)
|
||||
|
||||
1. An application makes a read request for block 0x44f in the container.
|
||||
1. An application makes a read request for block `0x44f` in the container.
|
||||
|
||||
Because the container is a thin snapshot of an image it does not have the data. Instead, it has a pointer (PTR) to where the data is stored in the image snapshot lower down in the image stack.
|
||||
Because the container is a thin snapshot of an image it does not have the
|
||||
data. Instead, it has a pointer (PTR) to where the data is stored in the image
|
||||
snapshot lower down in the image stack.
|
||||
|
||||
2. The storage driver follows the pointer to block `0xf33` in the snapshot relating to image layer `a005...`.
|
||||
2. The storage driver follows the pointer to block `0xf33` in the snapshot
|
||||
relating to image layer `a005...`.
|
||||
|
||||
3. The `devicemapper` copies the contents of block `0xf33` from the image snapshot to memory in the container.
|
||||
3. The `devicemapper` copies the contents of block `0xf33` from the image
|
||||
snapshot to memory in the container.
|
||||
|
||||
4. The storage driver returns the data to the requesting application.
|
||||
|
||||
### Write examples
|
||||
|
||||
With the `devicemapper` driver, writing new data to a container is accomplished by an *allocate-on-demand* operation. Updating existing data uses a copy-on-write operation. Because Device Mapper is a block-based technology these operations occur at the block level.
|
||||
With the `devicemapper` driver, writing new data to a container is accomplished
|
||||
by an *allocate-on-demand* operation. Updating existing data uses a
|
||||
copy-on-write operation. Because Device Mapper is a block-based technology
|
||||
these operations occur at the block level.
|
||||
|
||||
For example, when making a small change to a large file in a container, the `devicemapper` storage driver does not copy the entire file. It only copies the blocks to be modified. Each block is 64KB.
|
||||
For example, when making a small change to a large file in a container, the
|
||||
`devicemapper` storage driver does not copy the entire file. It only copies the
|
||||
blocks to be modified. Each block is 64KB.
|
||||
|
||||
#### Writing new data
|
||||
|
||||
|
@ -108,9 +136,11 @@ To write 56KB of new data to a container:
|
|||
|
||||
1. An application makes a request to write 56KB of new data to the container.
|
||||
|
||||
2. The allocate-on-demand operation allocates a single new 64KB block to the containers snapshot.
|
||||
2. The allocate-on-demand operation allocates a single new 64KB block to the
|
||||
container's snapshot.
|
||||
|
||||
If the write operation is larger than 64KB, multiple new blocks are allocated to the container snapshot.
|
||||
If the write operation is larger than 64KB, multiple new blocks are
|
||||
allocated to the container's snapshot.
|
||||
|
||||
3. The data is written to the newly allocated block.
|
||||
|
||||
|
@ -122,7 +152,8 @@ To modify existing data for the first time:
|
|||
|
||||
2. A copy-on-write operation locates the blocks that need updating.
|
||||
|
||||
3. The operation allocates new blocks to the container snapshot and copies the data into those blocks.
|
||||
3. The operation allocates new empty blocks to the container snapshot and
|
||||
copies the data into those blocks.
|
||||
|
||||
4. The modified data is written into the newly allocated blocks.
|
||||
|
||||
|
@ -133,7 +164,8 @@ to the application's read and write operations.
|
|||
## Configuring Docker with Device Mapper
|
||||
|
||||
The `devicemapper` is the default Docker storage driver on some Linux
|
||||
distributions. This includes RHEL and most of its forks. Currently, the following distributions support the driver:
|
||||
distributions. This includes RHEL and most of its forks. Currently, the
|
||||
following distributions support the driver:
|
||||
|
||||
* RHEL/CentOS/Fedora
|
||||
* Ubuntu 12.04
|
||||
|
@ -142,9 +174,9 @@ distributions. This includes RHEL and most of its forks. Currently, the followin
|
|||
|
||||
Docker hosts running the `devicemapper` storage driver default to a
|
||||
configuration mode known as `loop-lvm`. This mode uses sparse files to build
|
||||
the thin pool used by image and container snapshots. The mode is designed to work out-of-the-box
|
||||
with no additional configuration. However, production deployments should not run
|
||||
under `loop-lvm` mode.
|
||||
the thin pool used by image and container snapshots. The mode is designed to
|
||||
work out-of-the-box with no additional configuration. However, production
|
||||
deployments should not run under `loop-lvm` mode.
|
||||
|
||||
You can detect the mode by viewing the `docker info` command:
|
||||
|
||||
|
@ -161,56 +193,84 @@ You can detect the mode by viewing the `docker info` command:
|
|||
Library Version: 1.02.93-RHEL7 (2015-01-28)
|
||||
...
|
||||
|
||||
The output above shows a Docker host running with the `devicemapper` storage driver operating in `loop-lvm` mode. This is indicated by the fact that the `Data loop file` and a `Metadata loop file` are on files under `/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse files.
|
||||
The output above shows a Docker host running with the `devicemapper` storage
|
||||
driver operating in `loop-lvm` mode. This is indicated by the fact that the
|
||||
`Data loop file` and a `Metadata loop file` are on files under
|
||||
`/var/lib/docker/devicemapper/devicemapper`. These are loopback mounted sparse
|
||||
files.
|
||||
|
||||
### Configure direct-lvm mode for production
|
||||
|
||||
The preferred configuration for production deployments is `direct lvm`. This
|
||||
mode uses block devices to create the thin pool. The following procedure shows
|
||||
you how to configure a Docker host to use the `devicemapper` storage driver in a
|
||||
`direct-lvm` configuration.
|
||||
you how to configure a Docker host to use the `devicemapper` storage driver in
|
||||
a `direct-lvm` configuration.
|
||||
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host
|
||||
> and have images you want to keep, `push` them Docker Hub or your private
|
||||
> Docker Trusted Registry before attempting this procedure.
|
||||
|
||||
The procedure below will create a 90GB data volume and 4GB metadata volume to use as backing for the storage pool. It assumes that you have a spare block device at `/dev/xvdf` with enough free space to complete the task. The device identifier and volume sizes may be be different in your environment and you should substitute your own values throughout the procedure. The procedure also assumes that the Docker daemon is in the `stopped` state.
|
||||
The procedure below will create a 90GB data volume and 4GB metadata volume to
|
||||
use as backing for the storage pool. It assumes that you have a spare block
|
||||
device at `/dev/xvdf` with enough free space to complete the task. The device
|
||||
identifier and volume sizes may be be different in your environment and you
|
||||
should substitute your own values throughout the procedure. The procedure also
|
||||
assumes that the Docker daemon is in the `stopped` state.
|
||||
|
||||
1. Log in to the Docker host you want to configure and stop the Docker daemon.
|
||||
|
||||
2. If it exists, delete your existing image store by removing the `/var/lib/docker` directory.
|
||||
2. If it exists, delete your existing image store by removing the
|
||||
`/var/lib/docker` directory.
|
||||
|
||||
$ sudo rm -rf /var/lib/docker
|
||||
|
||||
3. Create an LVM physical volume (PV) on your spare block device using the `pvcreate` command.
|
||||
3. Create an LVM physical volume (PV) on your spare block device using the
|
||||
`pvcreate` command.
|
||||
|
||||
$ sudo pvcreate /dev/xvdf
|
||||
Physical volume `/dev/xvdf` successfully created
|
||||
|
||||
The device identifier may be different on your system. Remember to substitute your value in the command above.
|
||||
The device identifier may be different on your system. Remember to
|
||||
substitute your value in the command above.
|
||||
|
||||
4. Create a new volume group (VG) called `vg-docker` using the PV created in the previous step.
|
||||
4. Create a new volume group (VG) called `vg-docker` using the PV created in
|
||||
the previous step.
|
||||
|
||||
$ sudo vgcreate vg-docker /dev/xvdf
|
||||
Volume group `vg-docker` successfully created
|
||||
|
||||
5. Create a new 90GB logical volume (LV) called `data` from space in the `vg-docker` volume group.
|
||||
5. Create a new 90GB logical volume (LV) called `data` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
$ sudo lvcreate -L 90G -n data vg-docker
|
||||
Logical volume `data` created.
|
||||
|
||||
The command creates an LVM logical volume called `data` and an associated block device file at `/dev/vg-docker/data`. In a later step, you instruct the `devicemapper` storage driver to use this block device to store image and container data.
|
||||
The command creates an LVM logical volume called `data` and an associated
|
||||
block device file at `/dev/vg-docker/data`. In a later step, you instruct the
|
||||
`devicemapper` storage driver to use this block device to store image and
|
||||
container data.
|
||||
|
||||
If you receive a signature detection warning, make sure you are working on the correct devices before continuing. Signature warnings indicate that the device you're working on is currently in use by LVM or has been used by LVM in the past.
|
||||
If you receive a signature detection warning, make sure you are working on
|
||||
the correct devices before continuing. Signature warnings indicate that the
|
||||
device you're working on is currently in use by LVM or has been used by LVM in
|
||||
the past.
|
||||
|
||||
6. Create a new logical volume (LV) called `metadata` from space in the `vg-docker` volume group.
|
||||
6. Create a new logical volume (LV) called `metadata` from space in the
|
||||
`vg-docker` volume group.
|
||||
|
||||
$ sudo lvcreate -L 4G -n metadata vg-docker
|
||||
Logical volume `metadata` created.
|
||||
|
||||
This creates an LVM logical volume called `metadata` and an associated block device file at `/dev/vg-docker/metadata`. In the next step you instruct the `devicemapper` storage driver to use this block device to store image and container metadata.
|
||||
This creates an LVM logical volume called `metadata` and an associated
|
||||
block device file at `/dev/vg-docker/metadata`. In the next step you instruct
|
||||
the `devicemapper` storage driver to use this block device to store image and
|
||||
container metadata.
|
||||
|
||||
5. Start the Docker daemon with the `devicemapper` storage driver and the `--storage-opt` flags.
|
||||
7. Start the Docker daemon with the `devicemapper` storage driver and the
|
||||
`--storage-opt` flags.
|
||||
|
||||
The `data` and `metadata` devices that you pass to the `--storage-opt` options were created in the previous steps.
|
||||
The `data` and `metadata` devices that you pass to the `--storage-opt`
|
||||
options were created in the previous steps.
|
||||
|
||||
$ sudo docker daemon --storage-driver=devicemapper --storage-opt dm.datadev=/dev/vg-docker/data --storage-opt dm.metadatadev=/dev/vg-docker/metadata &
|
||||
[1] 2163
|
||||
|
@ -221,11 +281,12 @@ The procedure below will create a 90GB data volume and 4GB metadata volume to us
|
|||
INFO[0027] Daemon has completed initialization
|
||||
INFO[0027] Docker daemon commit=0a8c2e3 execdriver=native-0.2 graphdriver=devicemapper version=1.8.2
|
||||
|
||||
It is also possible to set the `--storage-driver` and `--storage-opt` flags in
|
||||
the Docker config file and start the daemon normally using the `service` or
|
||||
`systemd` commands.
|
||||
It is also possible to set the `--storage-driver` and `--storage-opt` flags
|
||||
in the Docker config file and start the daemon normally using the `service` or
|
||||
`systemd` commands.
|
||||
|
||||
6. Use the `docker info` command to verify that the daemon is using `data` and `metadata` devices you created.
|
||||
8. Use the `docker info` command to verify that the daemon is using `data` and
|
||||
`metadata` devices you created.
|
||||
|
||||
$ sudo docker info
|
||||
INFO[0180] GET /v1.20/info
|
||||
|
@ -239,11 +300,14 @@ The procedure below will create a 90GB data volume and 4GB metadata volume to us
|
|||
Metadata file: /dev/vg-docker/metadata
|
||||
[...]
|
||||
|
||||
The output of the command above shows the storage driver as `devicemapper`. The last two lines also confirm that the correct devices are being used for the `Data file` and the `Metadata file`.
|
||||
The output of the command above shows the storage driver as `devicemapper`.
|
||||
The last two lines also confirm that the correct devices are being used for
|
||||
the `Data file` and the `Metadata file`.
|
||||
|
||||
### Examine devicemapper structures on the host
|
||||
|
||||
You can use the `lsblk` command to see the device files created above and the `pool` that the `devicemapper` storage driver creates on top of them.
|
||||
You can use the `lsblk` command to see the device files created above and the
|
||||
`pool` that the `devicemapper` storage driver creates on top of them.
|
||||
|
||||
$ sudo lsblk
|
||||
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
|
||||
|
@ -255,11 +319,14 @@ You can use the `lsblk` command to see the device files created above and the `p
|
|||
└─vg--docker-metadata 253:1 0 4G 0 lvm
|
||||
└─docker-202:1-1032-pool 253:2 0 10G 0 dm
|
||||
|
||||
The diagram below shows the image from prior examples updated with the detail from the `lsblk` command above.
|
||||
The diagram below shows the image from prior examples updated with the detail
|
||||
from the `lsblk` command above.
|
||||
|
||||
![](http://farm1.staticflickr.com/703/22116692899_0471e5e160_b.jpg)
|
||||
|
||||
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data` and `metadata` devices created earlier. The `devicemapper` constructs the pool name as follows:
|
||||
In the diagram, the pool is named `Docker-202:1-1032-pool` and spans the `data`
|
||||
and `metadata` devices created earlier. The `devicemapper` constructs the pool
|
||||
name as follows:
|
||||
|
||||
```
|
||||
Docker-MAJ:MIN-INO-pool
|
||||
|
@ -268,41 +335,74 @@ Docker-MAJ:MIN-INO-pool
|
|||
`MAJ`, `MIN` and `INO` refer to the major and minor device numbers and inode.
|
||||
|
||||
Because Device Mapper operates at the block level it is more difficult to see
|
||||
diffs between image layers and containers. However, there are two key
|
||||
directories. The `/var/lib/docker/devicemapper/mnt` directory contains the mount
|
||||
points for images and containers. The `/var/lib/docker/devicemapper/metadata`
|
||||
directory contains one file for every image and container snapshot. The files
|
||||
contain metadata about each snapshot in JSON format.
|
||||
diffs between image layers and containers. Docker 1.10 and later no longer
|
||||
matches image layer IDs with directory names in `/var/lib/docker`. However,
|
||||
there are two key directories. The `/var/lib/docker/devicemapper/mnt` directory
|
||||
contains the mount points for image and container layers. The
|
||||
`/var/lib/docker/devicemapper/metadata`directory contains one file for every
|
||||
image layer and container snapshot. The files contain metadata about each
|
||||
snapshot in JSON format.
|
||||
|
||||
## Device Mapper and Docker performance
|
||||
|
||||
It is important to understand the impact that allocate-on-demand and copy-on-write operations can have on overall container performance.
|
||||
It is important to understand the impact that allocate-on-demand and
|
||||
copy-on-write operations can have on overall container performance.
|
||||
|
||||
### Allocate-on-demand performance impact
|
||||
|
||||
The `devicemapper` storage driver allocates new blocks to a container via an allocate-on-demand operation. This means that each time an app writes to somewhere new inside a container, one or more empty blocks has to be located from the pool and mapped into the container.
|
||||
The `devicemapper` storage driver allocates new blocks to a container via an
|
||||
allocate-on-demand operation. This means that each time an app writes to
|
||||
somewhere new inside a container, one or more empty blocks has to be located
|
||||
from the pool and mapped into the container.
|
||||
|
||||
All blocks are 64KB. A write that uses less than 64KB still results in a single 64KB block being allocated. Writing more than 64KB of data uses multiple 64KB blocks. This can impact container performance, especially in containers that perform lots of small writes. However, once a block is allocated to a container subsequent reads and writes can operate directly on that block.
|
||||
All blocks are 64KB. A write that uses less than 64KB still results in a single
|
||||
64KB block being allocated. Writing more than 64KB of data uses multiple 64KB
|
||||
blocks. This can impact container performance, especially in containers that
|
||||
perform lots of small writes. However, once a block is allocated to a container
|
||||
subsequent reads and writes can operate directly on that block.
|
||||
|
||||
### Copy-on-write performance impact
|
||||
|
||||
Each time a container updates existing data for the first time, the `devicemapper` storage driver has to perform a copy-on-write operation. This copies the data from the image snapshot to the container's snapshot. This process can have a noticeable impact on container performance.
|
||||
Each time a container updates existing data for the first time, the
|
||||
`devicemapper` storage driver has to perform a copy-on-write operation. This
|
||||
copies the data from the image snapshot to the container's snapshot. This
|
||||
process can have a noticeable impact on container performance.
|
||||
|
||||
All copy-on-write operations have a 64KB granularity. As a results, updating 32KB of a 1GB file causes the driver to copy a single 64KB block into the container's snapshot. This has obvious performance advantages over file-level copy-on-write operations which would require copying the entire 1GB file into the container layer.
|
||||
All copy-on-write operations have a 64KB granularity. As a results, updating
|
||||
32KB of a 1GB file causes the driver to copy a single 64KB block into the
|
||||
container's snapshot. This has obvious performance advantages over file-level
|
||||
copy-on-write operations which would require copying the entire 1GB file into
|
||||
the container layer.
|
||||
|
||||
In practice, however, containers that perform lots of small block writes (<64KB) can perform worse with `devicemapper` than with AUFS.
|
||||
In practice, however, containers that perform lots of small block writes
|
||||
(<64KB) can perform worse with `devicemapper` than with AUFS.
|
||||
|
||||
### Other device mapper performance considerations
|
||||
|
||||
There are several other things that impact the performance of the `devicemapper` storage driver..
|
||||
There are several other things that impact the performance of the
|
||||
`devicemapper` storage driver.
|
||||
|
||||
- **The mode.** The default mode for Docker running the `devicemapper` storage driver is `loop-lvm`. This mode uses sparse files and suffers from poor performance. It is **not recommended for production**. The recommended mode for production environments is `direct-lvm` where the storage driver writes directly to raw block devices.
|
||||
- **The mode.** The default mode for Docker running the `devicemapper` storage
|
||||
driver is `loop-lvm`. This mode uses sparse files and suffers from poor
|
||||
performance. It is **not recommended for production**. The recommended mode for
|
||||
production environments is `direct-lvm` where the storage driver writes
|
||||
directly to raw block devices.
|
||||
|
||||
- **High speed storage.** For best performance you should place the `Data file` and `Metadata file` on high speed storage such as SSD. This can be direct attached storage or from a SAN or NAS array.
|
||||
- **High speed storage.** For best performance you should place the `Data file`
|
||||
and `Metadata file` on high speed storage such as SSD. This can be direct
|
||||
attached storage or from a SAN or NAS array.
|
||||
|
||||
- **Memory usage.** `devicemapper` is not the most memory efficient Docker storage driver. Launching *n* copies of the same container loads *n* copies of its files into memory. This can have a memory impact on your Docker host. As a result, the `devicemapper` storage driver may not be the best choice for PaaS and other high density use cases.
|
||||
- **Memory usage.** `devicemapper` is not the most memory efficient Docker
|
||||
storage driver. Launching *n* copies of the same container loads *n* copies of
|
||||
its files into memory. This can have a memory impact on your Docker host. As a
|
||||
result, the `devicemapper` storage driver may not be the best choice for PaaS
|
||||
and other high density use cases.
|
||||
|
||||
One final point, data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
|
||||
One final point, data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should to place heavy write workloads on
|
||||
data volumes.
|
||||
|
||||
## Related Information
|
||||
|
||||
|
|
Двоичные данные
docs/userguide/storagedriver/images/aufs_layers.jpg
До Ширина: | Высота: | Размер: 78 KiB После Ширина: | Высота: | Размер: 81 KiB |
Двоичные данные
docs/userguide/storagedriver/images/btfs_constructs.jpg
До Ширина: | Высота: | Размер: 47 KiB После Ширина: | Высота: | Размер: 62 KiB |
Двоичные данные
docs/userguide/storagedriver/images/btfs_container_layer.jpg
До Ширина: | Высота: | Размер: 51 KiB После Ширина: | Высота: | Размер: 66 KiB |
После Ширина: | Высота: | Размер: 136 KiB |
После Ширина: | Высота: | Размер: 103 KiB |
Двоичные данные
docs/userguide/storagedriver/images/saving-space.jpg
До Ширина: | Высота: | Размер: 43 KiB После Ширина: | Высота: | Размер: 56 KiB |
Двоичные данные
docs/userguide/storagedriver/images/shared-uuid.jpg
До Ширина: | Высота: | Размер: 59 KiB После Ширина: | Высота: | Размер: 246 KiB |
|
@ -13,25 +13,159 @@ weight = -2
|
|||
# Understand images, containers, and storage drivers
|
||||
|
||||
To use storage drivers effectively, you must understand how Docker builds and
|
||||
stores images. Then, you need an understanding of how these images are used in containers. Finally, you'll need a short introduction to the technologies that enable both images and container operations.
|
||||
stores images. Then, you need an understanding of how these images are used by
|
||||
containers. Finally, you'll need a short introduction to the technologies that
|
||||
enable both images and container operations.
|
||||
|
||||
## Images and containers rely on layers
|
||||
## Images and layers
|
||||
|
||||
Each Docker image references a list of read-only layers that represent filesystem differences. Layers are stacked on top of each other to form a base for a container's root filesystem. The diagram below shows the Ubuntu 15.04 image comprising 4 stacked image layers.
|
||||
Each Docker image references a list of read-only layers that represent
|
||||
filesystem differences. Layers are stacked on top of each other to form a base
|
||||
for a container's root filesystem. The diagram below shows the Ubuntu 15.04
|
||||
image comprising 4 stacked image layers.
|
||||
|
||||
![](images/image-layers.jpg)
|
||||
|
||||
When you make a change inside a container by, for example, adding a new file to a container created from Ubuntu 15.04 image, you add a new layer on top of the underlying stack. This change creates a new writable layer containing the newly added file on top of the image layers. Each image layer is stored by a cryptographic hash over its contents and multiple images can share the same layers. The diagram below shows a container running the Ubuntu 15.04 image.
|
||||
The Docker storage driver is responsible for stacking these layers and
|
||||
providing a single unified view.
|
||||
|
||||
When you create a new container, you add a new, thin, writable layer on top of
|
||||
the underlying stack. This layer is often called the "container layer". All
|
||||
changes made to the running container - such as writing new files, modifying
|
||||
existing files, and deleting files - are written to this thin writable
|
||||
container layer. The diagram below shows a container based on the Ubuntu 15.04
|
||||
image.
|
||||
|
||||
![](images/container-layers.jpg)
|
||||
|
||||
The major difference between a container and an image is this writable layer. All writes to the container that add new or modifying existing data are stored in this writable layer. When the container is deleted the writeable layer is also deleted. The image remains unchanged.
|
||||
### Content addressable storage
|
||||
|
||||
Because each container has its own thin writable container layer and all data is stored this container layer, this means that multiple containers can share access to the same underlying image and yet have their own data state. The diagram below shows multiple containers sharing the same Ubuntu 15.04 image.
|
||||
Docker 1.10 introduced a new content addressable storage model. This is a
|
||||
completely new way to address image and layer data on disk. Previously, image
|
||||
and layer data was referenced and stored using a a randomly generated UUID. In
|
||||
the new model this is replaced by a secure *content hash*.
|
||||
|
||||
The new model improves security, provides a built-in way to avoid ID
|
||||
collisions, and guarantees data integrity after pull, push, load, and save
|
||||
operations. It also enables better sharing of layers by allowing many images to
|
||||
freely share their layers even if they didn’t come from the same build.
|
||||
|
||||
The diagram below shows an updated version of the previous diagram,
|
||||
highlighting the changes implemented by Docker 1.10.
|
||||
|
||||
![](images/container-layers-cas.jpg)
|
||||
|
||||
As can be seen, all image layer IDs are cryptographic hashes, whereas the
|
||||
container ID is still a randomly generated UUID.
|
||||
|
||||
There are several things to note regarding the new model. These include:
|
||||
|
||||
1. Migration of existing images
|
||||
2. Image and layer filesystem structures
|
||||
|
||||
Existing images, those created and pulled by earlier versions of Docker, need
|
||||
to be migrated before they can be used with the new model. This migration
|
||||
involves calculating new secure checksums and is performed automatically the
|
||||
first time you start an updated Docker daemon. After the migration is complete,
|
||||
all images and tags will have brand new secure IDs.
|
||||
|
||||
Although the migration is automatic and transparent, it is computationally
|
||||
intensive. This means it and can take time if you have lots of image data.
|
||||
During this time your Docker daemon will not respond to other requests.
|
||||
|
||||
A migration tool exists that allows you to migrate existing images to the new
|
||||
format before upgrading your Docker daemon. This means that upgraded Docker
|
||||
daemons do not need to perform the migration in-band, and therefore avoids any
|
||||
associated downtime. It also provides a way to manually migrate existing images
|
||||
so that they can be distributed to other Docker daemons in your environment
|
||||
that are already running the latest versions of Docker.
|
||||
|
||||
The migration tool is provided by Docker, Inc., and runs as a container. You
|
||||
can download it from [https://github.com/docker/v1.10-migrator/releases](https://github.com/docker/v1.10-migrator/releases).
|
||||
|
||||
While running the "migrator" image you need to expose your Docker host's data
|
||||
directory to the container. If you are using the default Docker data path, the
|
||||
command to run the container will look like this
|
||||
|
||||
$ sudo docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
|
||||
|
||||
If you use the `devicemapper` storage driver, you will need to include the
|
||||
`--privileged` option so that the container has access to your storage devices.
|
||||
|
||||
#### Migration example
|
||||
|
||||
The following example shows the migration tool in use on a Docker host running
|
||||
version 1.9.1 of the Docker daemon and the AUFS storage driver. The Docker host
|
||||
is running on a **t2.micro** AWS EC2 instance with 1 vCPU, 1GB RAM, and a
|
||||
single 8GB general purpose SSD EBS volume. The Docker data directory
|
||||
(`/var/lib/docker`) was consuming 2GB of space.
|
||||
|
||||
$ docker images
|
||||
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
|
||||
jenkins latest 285c9f0f9d3d 17 hours ago 708.5 MB
|
||||
mysql latest d39c3fa09ced 8 days ago 360.3 MB
|
||||
mongo latest a74137af4532 13 days ago 317.4 MB
|
||||
postgres latest 9aae83d4127f 13 days ago 270.7 MB
|
||||
redis latest 8bccd73928d9 2 weeks ago 151.3 MB
|
||||
centos latest c8a648134623 4 weeks ago 196.6 MB
|
||||
ubuntu 15.04 c8be1ac8145a 7 weeks ago 131.3 MB
|
||||
|
||||
$ du -hs /var/lib/docker
|
||||
2.0G /var/lib/docker
|
||||
|
||||
$ time docker run --rm -v /var/lib/docker:/var/lib/docker docker/v1.10-migrator
|
||||
Unable to find image 'docker/v1.10-migrator:latest' locally
|
||||
latest: Pulling from docker/v1.10-migrator
|
||||
ed1f33c5883d: Pull complete
|
||||
b3ca410aa2c1: Pull complete
|
||||
2b9c6ed9099e: Pull complete
|
||||
dce7e318b173: Pull complete
|
||||
Digest: sha256:bd2b245d5d22dd94ec4a8417a9b81bb5e90b171031c6e216484db3fe300c2097
|
||||
Status: Downloaded newer image for docker/v1.10-migrator:latest
|
||||
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d from /var/lib/docker/aufs/diff/01e70da302a553ba13485ad020a0d77dbb47575a31c4f48221137bb08f45878d"
|
||||
time="2016-01-27T12:31:06Z" level=debug msg="Assembling tar data for 07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d from /var/lib/docker/aufs/diff/07ac220aeeef9febf1ac16a9d1a4eff7ef3c8cbf5ed0be6b6f4c35952ed7920d"
|
||||
<snip>
|
||||
time="2016-01-27T12:32:00Z" level=debug msg="layer dbacfa057b30b1feaf15937c28bd8ca0d6c634fc311ccc35bd8d56d017595d5b took 10.80 seconds"
|
||||
|
||||
real 0m59.583s
|
||||
user 0m0.046s
|
||||
sys 0m0.008s
|
||||
|
||||
The Unix `time` command prepends the `docker run` command to produce timings
|
||||
for the operation. As can be seen, the overall time taken to migrate 7 images
|
||||
comprising 2GB of disk space took approximately 1 minute. However, this
|
||||
included the time taken to pull the `docker/v1.10-migrator` image
|
||||
(approximately 3.5 seconds). The same operation on an m4.10xlarge EC2 instance
|
||||
with 40 vCPUs, 160GB RAM and an 8GB provisioned IOPS EBS volume resulted in the
|
||||
following improved timings:
|
||||
|
||||
real 0m9.871s
|
||||
user 0m0.094s
|
||||
sys 0m0.021s
|
||||
|
||||
This shows that the migration operation is affected by the hardware spec of the
|
||||
machine performing the migration.
|
||||
|
||||
## Container and layers
|
||||
|
||||
The major difference between a container and an image is the top writable
|
||||
layer. All writes to the container that add new or modify existing data are
|
||||
stored in this writable layer. When the container is deleted the writable layer
|
||||
is also deleted. The underlying image remains unchanged.
|
||||
|
||||
Because each container has its own thin writable container layer, and all
|
||||
changes are stored this container layer, this means that multiple containers
|
||||
can share access to the same underlying image and yet have their own data
|
||||
state. The diagram below shows multiple containers sharing the same Ubuntu
|
||||
15.04 image.
|
||||
|
||||
![](images/sharing-layers.jpg)
|
||||
|
||||
A storage driver is responsible for enabling and managing both the image layers and the writeable container layer. How a storage driver accomplishes these behaviors can vary. Two key technologies behind Docker image and container management are stackable image layers and copy-on-write (CoW).
|
||||
The Docker storage driver is responsible for enabling and managing both the
|
||||
image layers and the writable container layer. How a storage driver
|
||||
accomplishes these can vary between drivers. Two key technologies behind Docker
|
||||
image and container management are stackable image layers and copy-on-write
|
||||
(CoW).
|
||||
|
||||
|
||||
## The copy-on-write strategy
|
||||
|
@ -40,24 +174,29 @@ Sharing is a good way to optimize resources. People do this instinctively in
|
|||
daily life. For example, twins Jane and Joseph taking an Algebra class at
|
||||
different times from different teachers can share the same exercise book by
|
||||
passing it between each other. Now, suppose Jane gets an assignment to complete
|
||||
the homework on page 11 in the book. At that point, Jane copies page 11, completes the homework, and hands in her copy. The original exercise book is unchanged and only Jane has a copy of the changed page 11.
|
||||
the homework on page 11 in the book. At that point, Jane copies page 11,
|
||||
completes the homework, and hands in her copy. The original exercise book is
|
||||
unchanged and only Jane has a copy of the changed page 11.
|
||||
|
||||
Copy-on-write is a similar strategy of sharing and copying. In this strategy,
|
||||
system processes that need the same data share the same instance of that data
|
||||
rather than having their own copy. At some point, if one process needs to modify
|
||||
or write to the data, only then does the operating system make a copy of the
|
||||
data for that process to use. Only the process that needs to write has access to
|
||||
the data copy. All the other processes continue to use the original data.
|
||||
rather than having their own copy. At some point, if one process needs to
|
||||
modify or write to the data, only then does the operating system make a copy of
|
||||
the data for that process to use. Only the process that needs to write has
|
||||
access to the data copy. All the other processes continue to use the original
|
||||
data.
|
||||
|
||||
Docker uses a copy-on-write technology with both images and containers. This CoW
|
||||
strategy optimizes both image disk space usage and the performance of container
|
||||
start times. The next sections look at how copy-on-write is leveraged with
|
||||
images and containers thru sharing and copying.
|
||||
Docker uses a copy-on-write technology with both images and containers. This
|
||||
CoW strategy optimizes both image disk space usage and the performance of
|
||||
container start times. The next sections look at how copy-on-write is leveraged
|
||||
with images and containers through sharing and copying.
|
||||
|
||||
### Sharing promotes smaller images
|
||||
|
||||
This section looks at image layers and copy-on-write technology. All image and container layers exist inside the Docker host's *local storage area* and are managed by the storage driver. It is a location on the host's
|
||||
filesystem.
|
||||
This section looks at image layers and copy-on-write technology. All image and
|
||||
container layers exist inside the Docker host's *local storage area* and are
|
||||
managed by the storage driver. On Linux-based Docker hosts this is usually
|
||||
located under `/var/lib/docker/`.
|
||||
|
||||
The Docker client reports on image layers when instructed to pull and push
|
||||
images with `docker pull` and `docker push`. The command below pulls the
|
||||
|
@ -65,38 +204,85 @@ images with `docker pull` and `docker push`. The command below pulls the
|
|||
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
6e6a100fa147: Pull complete
|
||||
13c0c663a321: Pull complete
|
||||
2bd276ed39d5: Pull complete
|
||||
013f3d01d247: Pull complete
|
||||
Digest: sha256:c7ecf33cef00ae34b131605c31486c91f5fd9a76315d075db2afd39d1ccdf3ed
|
||||
1ba8ac955b97: Pull complete
|
||||
f157c4e5ede7: Pull complete
|
||||
0b7e98f84c4c: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
From the output, you'll see that the command actually pulls 4 image layers.
|
||||
Each of the above lines lists an image layer and its UUID. The combination of
|
||||
these four layers makes up the `ubuntu:15.04` Docker image.
|
||||
Each of the above lines lists an image layer and its UUID or cryptographic
|
||||
hash. The combination of these four layers makes up the `ubuntu:15.04` Docker
|
||||
image.
|
||||
|
||||
The image layers are stored in the Docker host's local storage area. Typically,
|
||||
the local storage area is in the host's `/var/lib/docker` directory. Depending
|
||||
on which storage driver the local storage area may be in a different location. You can list the layers in the local storage area. The following example shows the storage as it appears under the AUFS storage driver:
|
||||
Each of these layers is stored in its own directory inside the Docker host's
|
||||
local storage are.
|
||||
|
||||
$ sudo ls /var/lib/docker/aufs/layers
|
||||
013f3d01d24738964bb7101fa83a926181d600ebecca7206dced59669e6e6778 2bd276ed39d5fcfd3d00ce0a190beeea508332f5aec3c6a125cc619a3fdbade6
|
||||
13c0c663a321cd83a97f4ce1ecbaf17c2ba166527c3b06daaefe30695c5fcb8c 6e6a100fa147e6db53b684c8516e3e2588b160fd4898b6265545d5d4edb6796d
|
||||
Versions of Docker prior to 1.10 stored each layer in a directory with the same
|
||||
name as the image layer ID. However, this is not the case for images pulled
|
||||
with Docker version 1.10 and later. For example, the command below shows an
|
||||
image being pulled from Docker Hub, followed by a directory listing on a host
|
||||
running version 1.9.1 of the Docker Engine.
|
||||
|
||||
If you `pull` another image that shares some of the same image layers as the `ubuntu:15.04` image, the Docker daemon recognize this, and only pull the layers it hasn't already stored. After the second pull, the two images will share any common image layers.
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
47984b517ca9: Pull complete
|
||||
df6e891a3ea9: Pull complete
|
||||
e65155041eed: Pull complete
|
||||
c8be1ac8145a: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
You can illustrate this now for yourself. Starting the `ubuntu:15.04` image that
|
||||
you just pulled, make a change to it, and build a new image based on the change.
|
||||
One way to do this is using a Dockerfile and the `docker build` command.
|
||||
$ ls /var/lib/docker/aufs/layers
|
||||
47984b517ca9ca0312aced5c9698753ffa964c2015f2a5f18e5efa9848cf30e2
|
||||
c8be1ac8145a6e59a55667f573883749ad66eaeef92b4df17e5ea1260e2d7356
|
||||
df6e891a3ea9cdce2a388a2cf1b1711629557454fd120abd5be6d32329a0e0ac
|
||||
e65155041eed7ec58dea78d90286048055ca75d41ea893c7246e794389ecf203
|
||||
|
||||
1. In an empty directory, create a simple `Dockerfile` that starts with the ubuntu:15.04 image.
|
||||
Notice how the four directories match up with the layer IDs of the downloaded
|
||||
image. Now compare this with the same operations performed on a host running
|
||||
version 1.10 of the Docker Engine.
|
||||
|
||||
$ docker pull ubuntu:15.04
|
||||
15.04: Pulling from library/ubuntu
|
||||
1ba8ac955b97: Pull complete
|
||||
f157c4e5ede7: Pull complete
|
||||
0b7e98f84c4c: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:5e279a9df07990286cce22e1b0f5b0490629ca6d187698746ae5e28e604a640e
|
||||
Status: Downloaded newer image for ubuntu:15.04
|
||||
|
||||
$ ls /var/lib/docker/aufs/layers/
|
||||
1d6674ff835b10f76e354806e16b950f91a191d3b471236609ab13a930275e24
|
||||
5dbb0cbe0148cf447b9464a358c1587be586058d9a4c9ce079320265e2bb94e7
|
||||
bef7199f2ed8e86fa4ada1309cfad3089e0542fec8894690529e4c04a7ca2d73
|
||||
ebf814eccfe98f2704660ca1d844e4348db3b5ccc637eb905d4818fbfb00a06a
|
||||
|
||||
See how the four directories do not match up with the image layer IDs pulled in
|
||||
the previous step.
|
||||
|
||||
Despite the differences between image management before and after version 1.10,
|
||||
all versions of Docker still allow images to share layers. For example, If you
|
||||
`pull` an image that shares some of the same image layers as an image that has
|
||||
already been pulled, the Docker daemon recognizes this, and only pulls the
|
||||
layers it doesn't already have stored locally. After the second pull, the two
|
||||
images will share any common image layers.
|
||||
|
||||
You can illustrate this now for yourself. Starting with the `ubuntu:15.04`
|
||||
image that you just pulled, make a change to it, and build a new image based on
|
||||
the change. One way to do this is using a `Dockerfile` and the `docker build`
|
||||
command.
|
||||
|
||||
1. In an empty directory, create a simple `Dockerfile` that starts with the
|
||||
2. ubuntu:15.04 image.
|
||||
|
||||
FROM ubuntu:15.04
|
||||
|
||||
2. Add a new file called "newfile" in the image's `/tmp` directory with the text "Hello world" in it.
|
||||
2. Add a new file called "newfile" in the image's `/tmp` directory with the
|
||||
3. text "Hello world" in it.
|
||||
|
||||
When you are done, the `Dockerfile` contains two lines:
|
||||
When you are done, the `Dockerfile` contains two lines:
|
||||
|
||||
FROM ubuntu:15.04
|
||||
|
||||
|
@ -104,78 +290,125 @@ One way to do this is using a Dockerfile and the `docker build` command.
|
|||
|
||||
3. Save and close the file.
|
||||
|
||||
2. From a terminal in the same folder as your Dockerfile, run the following command:
|
||||
4. From a terminal in the same folder as your `Dockerfile`, run the following
|
||||
5. command:
|
||||
|
||||
$ docker build -t changed-ubuntu .
|
||||
Sending build context to Docker daemon 2.048 kB
|
||||
Step 0 : FROM ubuntu:15.04
|
||||
---> 013f3d01d247
|
||||
Step 1 : RUN echo "Hello world" > /tmp/newfile
|
||||
---> Running in 2023460815df
|
||||
---> 03b964f68d06
|
||||
Removing intermediate container 2023460815df
|
||||
Successfully built 03b964f68d06
|
||||
Step 1 : FROM ubuntu:15.04
|
||||
---> 3f7bcee56709
|
||||
Step 2 : RUN echo "Hello world" > /tmp/newfile
|
||||
---> Running in d14acd6fad4e
|
||||
---> 94e6b7d2c720
|
||||
Removing intermediate container d14acd6fad4e
|
||||
Successfully built 94e6b7d2c720
|
||||
|
||||
> **Note:** The period (.) at the end of the above command is important. It tells the `docker build` command to use the current working directory as its build context.
|
||||
> **Note:** The period (.) at the end of the above command is important. It
|
||||
> tells the `docker build` command to use the current working directory as
|
||||
> its build context.
|
||||
|
||||
The output above shows a new image with image ID `03b964f68d06`.
|
||||
The output above shows a new image with image ID `94e6b7d2c720`.
|
||||
|
||||
3. Run the `docker images` command to verify the new image is in the Docker host's local storage area.
|
||||
5. Run the `docker images` command to verify the new `changed-ubuntu` image is
|
||||
6. in the Docker host's local storage area.
|
||||
|
||||
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
|
||||
changed-ubuntu latest 03b964f68d06 33 seconds ago 131.4 MB
|
||||
ubuntu
|
||||
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
|
||||
changed-ubuntu latest 03b964f68d06 33 seconds ago 131.4 MB
|
||||
ubuntu 15.04 013f3d01d247 6 weeks ago 131.3 MB
|
||||
|
||||
4. Run the `docker history` command to see which image layers were used to create the new `changed-ubuntu` image.
|
||||
6. Run the `docker history` command to see which image layers were used to
|
||||
7. create the new `changed-ubuntu` image.
|
||||
|
||||
$ docker history changed-ubuntu
|
||||
IMAGE CREATED CREATED BY SIZE COMMENT
|
||||
03b964f68d06 About a minute ago /bin/sh -c echo "Hello world" > /tmp/newfile 12 B
|
||||
013f3d01d247 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
|
||||
<missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.879 kB
|
||||
<missing> 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 701 B
|
||||
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:49710b44e2ae0edef4 131.4 MB
|
||||
IMAGE CREATED CREATED BY SIZE COMMENT
|
||||
94e6b7d2c720 2 minutes ago /bin/sh -c echo "Hello world" > /tmp/newfile 12 B
|
||||
3f7bcee56709 6 weeks ago /bin/sh -c #(nop) CMD ["/bin/bash"] 0 B
|
||||
<missing> 6 weeks ago /bin/sh -c sed -i 's/^#\s*\(deb.*universe\)$/ 1.879 kB
|
||||
<missing> 6 weeks ago /bin/sh -c echo '#!/bin/sh' > /usr/sbin/polic 701 B
|
||||
<missing> 6 weeks ago /bin/sh -c #(nop) ADD file:8e4943cd86e9b2ca13 131.3 MB
|
||||
|
||||
The `docker history` output shows the new `03b964f68d06` image layer at the
|
||||
top. You know that the `03b964f68d06` layer was added because it was created
|
||||
by the `echo "Hello world" > /tmp/newfile` command in your `Dockerfile`.
|
||||
The 4 image layers below it are the exact same image layers the make up the
|
||||
ubuntu:15.04 image as their UUIDs match.
|
||||
The `docker history` output shows the new `94e6b7d2c720` image layer at the
|
||||
top. You know that this is the new image layer added because it was created
|
||||
by the `echo "Hello world" > /tmp/newfile` command in your `Dockerfile`.
|
||||
The 4 image layers below it are the exact same image layers
|
||||
that make up the `ubuntu:15.04` image.
|
||||
|
||||
Notice the new `changed-ubuntu` image does not have its own copies of every layer. As can be seen in the diagram below, the new image is sharing it's four underlying layers with the `ubuntu:15.04` image.
|
||||
> **Note:** Under the content addressable storage model introduced with Docker
|
||||
> 1.10, image history data is no longer stored in a config file with each image
|
||||
> layer. It is now stored as a string of text in a single config file that
|
||||
> relates to the overall image. This can result in some image layers showing as
|
||||
> "missing" in the output of the `docker history` command. This is normal
|
||||
> behaviour and can be ignored.
|
||||
>
|
||||
> You may hear images like these referred to as *flat images*.
|
||||
|
||||
Notice the new `changed-ubuntu` image does not have its own copies of every
|
||||
layer. As can be seen in the diagram below, the new image is sharing its four
|
||||
underlying layers with the `ubuntu:15.04` image.
|
||||
|
||||
![](images/saving-space.jpg)
|
||||
|
||||
The `docker history` command also shows the size of each image layer. The `03b964f68d06` is only consuming 13 Bytes of disk space. Because all of the layers below it already exist on the Docker host and are shared with the `ubuntu15:04` image, this means the entire `changed-ubuntu` image only consumes 13 Bytes of disk space.
|
||||
The `docker history` command also shows the size of each image layer. As you
|
||||
can see, the `94e6b7d2c720` layer is only consuming 12 Bytes of disk space.
|
||||
This means that the `changed-ubuntu` image we just created is only consuming an
|
||||
additional 12 Bytes of disk space on the Docker host - all layers below the
|
||||
`94e6b7d2c720` layer already exist on the Docker host and are shared by other
|
||||
images.
|
||||
|
||||
This sharing of image layers is what makes Docker images and containers so space
|
||||
efficient.
|
||||
This sharing of image layers is what makes Docker images and containers so
|
||||
space efficient.
|
||||
|
||||
### Copying makes containers efficient
|
||||
|
||||
You learned earlier that a container a Docker image with a thin writable, container layer added. The diagram below shows the layers of a container based on the `ubuntu:15.04` image:
|
||||
You learned earlier that a container is a Docker image with a thin writable,
|
||||
container layer added. The diagram below shows the layers of a container based
|
||||
on the `ubuntu:15.04` image:
|
||||
|
||||
![](images/container-layers.jpg)
|
||||
![](images/container-layers-cas.jpg)
|
||||
|
||||
All writes made to a container are stored in the thin writable container layer. The other layers are read-only (RO) image layers and can't be changed. This means that multiple containers can safely share a single underlying image. The diagram below shows multiple containers sharing a single copy of the `ubuntu:15.04` image. Each container has its own thin RW layer, but they all share a single instance of the ubuntu:15.04 image:
|
||||
All writes made to a container are stored in the thin writable container layer.
|
||||
The other layers are read-only (RO) image layers and can't be changed. This
|
||||
means that multiple containers can safely share a single underlying image. The
|
||||
diagram below shows multiple containers sharing a single copy of the
|
||||
`ubuntu:15.04` image. Each container has its own thin RW layer, but they all
|
||||
share a single instance of the ubuntu:15.04 image:
|
||||
|
||||
![](images/sharing-layers.jpg)
|
||||
|
||||
When a write operation occurs in a container, Docker uses the storage driver to perform a copy-on-write operation. The type of operation depends on the storage driver. For AUFS and OverlayFS storage drivers the copy-on-write operation is pretty much as follows:
|
||||
When an existing file in a container is modified, Docker uses the storage
|
||||
driver to perform a copy-on-write operation. The specifics of operation depends
|
||||
on the storage driver. For the AUFS and OverlayFS storage drivers, the
|
||||
copy-on-write operation is pretty much as follows:
|
||||
|
||||
* Search through the layers for the file to update. The process starts at the top, newest layer and works down to the base layer one-at-a-time.
|
||||
* Perform a "copy-up" operation on the first copy of the file that is found. A "copy up" copies the file up to the container's own thin writable layer.
|
||||
* Search through the image layers for the file to update. The process starts
|
||||
at the top, newest layer and works down to the base layer one layer at a
|
||||
time.
|
||||
* Perform a "copy-up" operation on the first copy of the file that is found. A
|
||||
"copy up" copies the file up to the container's own thin writable layer.
|
||||
* Modify the *copy of the file* in container's thin writable layer.
|
||||
|
||||
BTFS, ZFS, and other drivers handle the copy-on-write differently. You can read more about the methods of these drivers later in their detailed descriptions.
|
||||
Btrfs, ZFS, and other drivers handle the copy-on-write differently. You can
|
||||
read more about the methods of these drivers later in their detailed
|
||||
descriptions.
|
||||
|
||||
Containers that write a lot of data will consume more space than containers that do not. This is because most write operations consume new space in the containers thin writable top layer. If your container needs to write a lot of data, you can use a data volume.
|
||||
Containers that write a lot of data will consume more space than containers
|
||||
that do not. This is because most write operations consume new space in the
|
||||
container's thin writable top layer. If your container needs to write a lot of
|
||||
data, you should consider using a data volume.
|
||||
|
||||
A copy-up operation can incur a noticeable performance overhead. This overhead is different depending on which storage driver is in use. However, large files, lots of layers, and deep directory trees can make the impact more noticeable. Fortunately, the operation only occurs the first time any particular file is modified. Subsequent modifications to the same file do not cause a copy-up operation and can operate directly on the file's existing copy already present in container layer.
|
||||
A copy-up operation can incur a noticeable performance overhead. This overhead
|
||||
is different depending on which storage driver is in use. However, large files,
|
||||
lots of layers, and deep directory trees can make the impact more noticeable.
|
||||
Fortunately, the operation only occurs the first time any particular file is
|
||||
modified. Subsequent modifications to the same file do not cause a copy-up
|
||||
operation and can operate directly on the file's existing copy already present
|
||||
in the container layer.
|
||||
|
||||
Let's see what happens if we spin up 5 containers based on our `changed-ubuntu` image we built earlier:
|
||||
Let's see what happens if we spin up 5 containers based on our `changed-ubuntu`
|
||||
image we built earlier:
|
||||
|
||||
1. From a terminal on your Docker host, run the following `docker run` command 5 times.
|
||||
1. From a terminal on your Docker host, run the following `docker run` command
|
||||
5 times.
|
||||
|
||||
$ docker run -dit changed-ubuntu bash
|
||||
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
|
||||
|
@ -188,28 +421,38 @@ Let's see what happens if we spin up 5 containers based on our `changed-ubuntu`
|
|||
$ docker run -dit changed-ubuntu bash
|
||||
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
|
||||
|
||||
This launches 5 containers based on the `changed-ubuntu` image. As the container is created, Docker adds a writable layer and assigns it a UUID. This is the value returned from the `docker run` command.
|
||||
This launches 5 containers based on the `changed-ubuntu` image. As each
|
||||
container is created, Docker adds a writable layer and assigns it a random
|
||||
UUID. This is the value returned from the `docker run` command.
|
||||
|
||||
2. Run the `docker ps` command to verify the 5 containers are running.
|
||||
|
||||
$ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
0ad25d06bdf6 changed-ubuntu "bash" About a minute ago Up About a minute stoic_ptolemy
|
||||
8eb24b3b2d24 changed-ubuntu "bash" About a minute ago Up About a minute pensive_bartik
|
||||
a651680bd6c2 changed-ubuntu "bash" 2 minutes ago Up 2 minutes hopeful_turing
|
||||
9280e777d109 changed-ubuntu "bash" 2 minutes ago Up 2 minutes backstabbing_mahavira
|
||||
75bab0d54f3c changed-ubuntu "bash" 2 minutes ago Up 2 minutes boring_pasteur
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
0ad25d06bdf6 changed-ubuntu "bash" About a minute ago Up About a minute stoic_ptolemy
|
||||
8eb24b3b2d24 changed-ubuntu "bash" About a minute ago Up About a minute pensive_bartik
|
||||
a651680bd6c2 changed-ubuntu "bash" 2 minutes ago Up 2 minutes hopeful_turing
|
||||
9280e777d109 changed-ubuntu "bash" 2 minutes ago Up 2 minutes backstabbing_mahavira
|
||||
75bab0d54f3c changed-ubuntu "bash" 2 minutes ago Up 2 minutes boring_pasteur
|
||||
|
||||
The output above shows 5 running containers, all sharing the `changed-ubuntu` image. Each `CONTAINER ID` is derived from the UUID when creating each container.
|
||||
The output above shows 5 running containers, all sharing the
|
||||
`changed-ubuntu` image. Each `CONTAINER ID` is derived from the UUID when
|
||||
creating each container.
|
||||
|
||||
3. List the contents of the local storage area.
|
||||
|
||||
$ sudo ls containers
|
||||
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef 9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
|
||||
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4 a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
|
||||
$ sudo ls /var/lib/docker/containers
|
||||
0ad25d06bdf6fca0dedc38301b2aff7478b3e1ce3d1acd676573bba57cb1cfef
|
||||
9280e777d109e2eb4b13ab211553516124a3d4d4280a0edfc7abf75c59024d47
|
||||
75bab0d54f3cf193cfdc3a86483466363f442fba30859f7dcd1b816b6ede82d4
|
||||
a651680bd6c2ef64902e154eeb8a064b85c9abf08ac46f922ad8dfc11bb5cd8a
|
||||
8eb24b3b2d246f225b24f2fca39625aaad71689c392a7b552b78baf264647373
|
||||
|
||||
Docker's copy-on-write strategy not only reduces the amount of space consumed by containers, it also reduces the time required to start a container. At start time, Docker only has to create the thin writable layer for each container. The diagram below shows these 5 containers sharing a single read-only (RO) copy of the `changed-ubuntu` image.
|
||||
Docker's copy-on-write strategy not only reduces the amount of space consumed
|
||||
by containers, it also reduces the time required to start a container. At start
|
||||
time, Docker only has to create the thin writable layer for each container.
|
||||
The diagram below shows these 5 containers sharing a single read-only (RO)
|
||||
copy of the `changed-ubuntu` image.
|
||||
|
||||
![](images/shared-uuid.jpg)
|
||||
|
||||
|
@ -219,18 +462,30 @@ significantly increased.
|
|||
|
||||
## Data volumes and the storage driver
|
||||
|
||||
When a container is deleted, any data written to the container that is not stored in a *data volume* is deleted along with the container. A data volume is directory or file that is mounted directly into a container.
|
||||
When a container is deleted, any data written to the container that is not
|
||||
stored in a *data volume* is deleted along with the container.
|
||||
|
||||
Data volumes are not controlled by the storage driver. Reads and writes to data
|
||||
volumes bypass the storage driver and operate at native host speeds. You can mount any number of data volumes into a container. Multiple containers can also share one or more data volumes.
|
||||
A data volume is a directory or file in the Docker host's filesystem that is
|
||||
mounted directly into a container. Data volumes are not controlled by the
|
||||
storage driver. Reads and writes to data volumes bypass the storage driver and
|
||||
operate at native host speeds. You can mount any number of data volumes into a
|
||||
container. Multiple containers can also share one or more data volumes.
|
||||
|
||||
The diagram below shows a single Docker host running two containers. Each container exists inside of its own address space within the Docker host's local storage area. There is also a single shared data volume located at `/data` on the Docker host. This is mounted directly into both containers.
|
||||
The diagram below shows a single Docker host running two containers. Each
|
||||
container exists inside of its own address space within the Docker host's local
|
||||
storage area (`/var/lib/docker/...`). There is also a single shared data
|
||||
volume located at `/data` on the Docker host. This is mounted directly into
|
||||
both containers.
|
||||
|
||||
![](images/shared-volume.jpg)
|
||||
|
||||
The data volume resides outside of the local storage area on the Docker host further reinforcing its independence from the storage driver's control. When a container is deleted, any data stored in shared data volumes persists on the Docker host.
|
||||
Data volumes reside outside of the local storage area on the Docker host,
|
||||
further reinforcing their independence from the storage driver's control. When
|
||||
a container is deleted, any data stored in data volumes persists on the Docker
|
||||
host.
|
||||
|
||||
For detailed information about data volumes [Managing data in containers](https://docs.docker.com/userguide/dockervolumes/).
|
||||
For detailed information about data volumes
|
||||
[Managing data in containers](https://docs.docker.com/userguide/dockervolumes/).
|
||||
|
||||
## Related information
|
||||
|
||||
|
|
|
@ -10,47 +10,83 @@ parent = "engine_driver"
|
|||
|
||||
# Docker and OverlayFS in practice
|
||||
|
||||
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison to AUFS, OverlayFS:
|
||||
OverlayFS is a modern *union filesystem* that is similar to AUFS. In comparison
|
||||
to AUFS, OverlayFS:
|
||||
|
||||
* has a simpler design
|
||||
* has been in the mainline Linux kernel since version 3.18
|
||||
* is potentially faster
|
||||
|
||||
As a result, OverlayFS is rapidly gaining popularity in the Docker community and is seen by many as a natural successor to AUFS. As promising as OverlayFS is, it is still relatively young. Therefore caution should be taken before using it in production Docker environments.
|
||||
As a result, OverlayFS is rapidly gaining popularity in the Docker community
|
||||
and is seen by many as a natural successor to AUFS. As promising as OverlayFS
|
||||
is, it is still relatively young. Therefore caution should be taken before
|
||||
using it in production Docker environments.
|
||||
|
||||
Docker's `overlay` storage driver leverages several OverlayFS features to build and manage the on-disk structures of images and containers.
|
||||
|
||||
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel module* was renamed from "overlayfs" to "overlay". As a result you may see the two terms used interchangeably in some documentation. However, this document uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer to Docker's storage-driver.
|
||||
Docker's `overlay` storage driver leverages several OverlayFS features to build
|
||||
and manage the on-disk structures of images and containers.
|
||||
|
||||
>**Note**: Since it was merged into the mainline kernel, the OverlayFS *kernel
|
||||
>module* was renamed from "overlayfs" to "overlay". As a result you may see the
|
||||
> two terms used interchangeably in some documentation. However, this document
|
||||
> uses "OverlayFS" to refer to the overall filesystem, and `overlay` to refer
|
||||
> to Docker's storage-driver.
|
||||
|
||||
## Image layering and sharing with OverlayFS
|
||||
|
||||
OverlayFS takes two directories on a single Linux host, layers one on top of the other, and provides a single unified view. These directories are often referred to as *layers* and the technology used to layer them is known as a *union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and "upperdir" for the top layer. The unified view is exposed through its own directory called "merged".
|
||||
OverlayFS takes two directories on a single Linux host, layers one on top of
|
||||
the other, and provides a single unified view. These directories are often
|
||||
referred to as *layers* and the technology used to layer them is known as a
|
||||
*union mount*. The OverlayFS terminology is "lowerdir" for the bottom layer and
|
||||
"upperdir" for the top layer. The unified view is exposed through its own
|
||||
directory called "merged".
|
||||
|
||||
The diagram below shows how a Docker image and a Docker container are layered. The image layer is the "lowerdir" and the container layer is the "upperdir". The unified view is exposed through a directory called "merged" which is effectively the containers mount point. The diagram shows how Docker constructs map to OverlayFS constructs.
|
||||
The diagram below shows how a Docker image and a Docker container are layered.
|
||||
The image layer is the "lowerdir" and the container layer is the "upperdir".
|
||||
The unified view is exposed through a directory called "merged" which is
|
||||
effectively the containers mount point. The diagram shows how Docker constructs
|
||||
map to OverlayFS constructs.
|
||||
|
||||
![](images/overlay_constructs.jpg)
|
||||
|
||||
Notice how the image layer and container layer can contain the same files. When this happens, the files in the container layer ("upperdir") are dominant and obscure the existence of the same files in the image layer ("lowerdir"). The container mount ("merged") presents the unified view.
|
||||
Notice how the image layer and container layer can contain the same files. When
|
||||
this happens, the files in the container layer ("upperdir") are dominant and
|
||||
obscure the existence of the same files in the image layer ("lowerdir"). The
|
||||
container mount ("merged") presents the unified view.
|
||||
|
||||
OverlayFS only works with two layers. This means that multi-layered images cannot be implemented as multiple OverlayFS layers. Instead, each image layer is implemented as its own directory under `/var/lib/docker/overlay`. Hard links are then used as a space-efficient way to reference data shared with lower layers. The diagram below shows a four-layer image and how it is represented in the Docker host's filesystem.
|
||||
OverlayFS only works with two layers. This means that multi-layered images
|
||||
cannot be implemented as multiple OverlayFS layers. Instead, each image layer
|
||||
is implemented as its own directory under `/var/lib/docker/overlay`.
|
||||
Hard links are then used as a space-efficient way to reference data shared with
|
||||
lower layers. As of Docker 1.10, image layer IDs no longer correspond to
|
||||
directory names in `/var/lib/docker/`
|
||||
|
||||
![](images/overlay_constructs2.jpg)
|
||||
|
||||
To create a container, the `overlay` driver combines the directory representing the image's top layer plus a new directory for the container. The image's top layer is the "lowerdir" in the overlay and read-only. The new directory for the container is the "upperdir" and is writable.
|
||||
To create a container, the `overlay` driver combines the directory representing
|
||||
the image's top layer plus a new directory for the container. The image's top
|
||||
layer is the "lowerdir" in the overlay and read-only. The new directory for the
|
||||
container is the "upperdir" and is writable.
|
||||
|
||||
## Example: Image and container on-disk constructs
|
||||
|
||||
The following `docker images -a` command shows a Docker host with a single image. As can be seen, the image consists of four layers.
|
||||
The following `docker pull` command shows a Docker host with downloading a
|
||||
Docker image comprising four layers.
|
||||
|
||||
$ docker images -a
|
||||
REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE
|
||||
ubuntu latest 1d073211c498 7 days ago 187.9 MB
|
||||
<none> <none> 5a4526e952f0 7 days ago 187.9 MB
|
||||
<none> <none> 99fcaefe76ef 7 days ago 187.9 MB
|
||||
<none> <none> c63fb41c2213 7 days ago 187.7 MB
|
||||
$ sudo docker pull ubuntu
|
||||
Using default tag: latest
|
||||
latest: Pulling from library/ubuntu
|
||||
8387d9ff0016: Pull complete
|
||||
3b52deaaf0ed: Pull complete
|
||||
4bd501fad6de: Pull complete
|
||||
a3ed95caeb02: Pull complete
|
||||
Digest: sha256:457b05828bdb5dcc044d93d042863fba3f2158ae249a6db5ae3934307c757c54
|
||||
Status: Downloaded newer image for ubuntu:latest
|
||||
|
||||
Below, the command's output illustrates that each of the four image layers has it's own directory under `/var/lib/docker/overlay/`.
|
||||
Each image layer has it's own directory under `/var/lib/docker/overlay/`. This
|
||||
is where the the contents of each image layer are stored.
|
||||
|
||||
The output of the command below shows the four directories that store the
|
||||
contents of each image layer just pulled. However, as can be seen, the image
|
||||
layer IDs do not match the directory names in `/var/lib/docker/overlay`. This
|
||||
is normal behavior in Docker 1.10 and later.
|
||||
|
||||
$ ls -l /var/lib/docker/overlay/
|
||||
total 24
|
||||
|
@ -59,35 +95,42 @@ Below, the command's output illustrates that each of the four image layers has i
|
|||
drwx------ 5 root root 4096 Oct 28 11:06 99fcaefe76ef1aa4077b90a413af57fd17d19dce4e50d7964a273aae67055235
|
||||
drwx------ 3 root root 4096 Oct 28 11:01 c63fb41c2213f511f12f294dd729b9903a64d88f098c20d2350905ac1fdbcbba
|
||||
|
||||
Each directory is named after the image layer IDs in the previous `docker images -a` command. The image layer directories contain the files unique to that layer as well as hard links to the data that is shared with lower layers. This allows for efficient use of disk space.
|
||||
The image layer directories contain the files unique to that layer as well as
|
||||
hard links to the data that is shared with lower layers. This allows for
|
||||
efficient use of disk space.
|
||||
|
||||
The following `docker ps` command shows the same Docker host running a single container. The container ID is "73de7176c223".
|
||||
Containers also exist on-disk in the Docker host's filesystem under
|
||||
`/var/lib/docker/overlay/`. If you inspect the directory relating to a running
|
||||
container using the `ls -l` command, you find the following file and
|
||||
directories.
|
||||
|
||||
$ docker ps
|
||||
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
|
||||
73de7176c223 ubuntu "bash" 2 days ago Up 2 days stupefied_nobel
|
||||
|
||||
This container exists on-disk in the Docker host's filesystem under `/var/lib/docker/overlay/73de7176c223...`. If you inspect this directory using the `ls -l` command you find the following file and directories.
|
||||
|
||||
$ ls -l /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e
|
||||
$ ls -l /var/lib/docker/overlay/<directory-of-running-container>
|
||||
total 16
|
||||
-rw-r--r-- 1 root root 64 Oct 28 11:06 lower-id
|
||||
drwxr-xr-x 1 root root 4096 Oct 28 11:06 merged
|
||||
drwxr-xr-x 4 root root 4096 Oct 28 11:06 upper
|
||||
drwx------ 3 root root 4096 Oct 28 11:06 work
|
||||
|
||||
These four filesystem objects are all artifacts of OverlayFS. The "lower-id" file contains the ID of the top layer of the image the container is based on. This is used by OverlayFS as the "lowerdir".
|
||||
These four filesystem objects are all artefacts of OverlayFS. The "lower-id"
|
||||
file contains the ID of the top layer of the image the container is based on.
|
||||
This is used by OverlayFS as the "lowerdir".
|
||||
|
||||
$ cat /var/lib/docker/overlay/73de7176c223a6c82fd46c48c5f152f2c8a7e49ecb795a7197c3bb795c4d879e/lower-id
|
||||
1d073211c498fd5022699b46a936b4e4bdacb04f637ad64d3475f558783f5c3e
|
||||
|
||||
The "upper" directory is the containers read-write layer. Any changes made to the container are written to this directory.
|
||||
The "upper" directory is the containers read-write layer. Any changes made to
|
||||
the container are written to this directory.
|
||||
|
||||
The "merged" directory is effectively the containers mount point. This is where the unified view of the image ("lowerdir") and container ("upperdir") is exposed. Any changes written to the container are immediately reflected in this directory.
|
||||
The "merged" directory is effectively the containers mount point. This is where
|
||||
the unified view of the image ("lowerdir") and container ("upperdir") is
|
||||
exposed. Any changes written to the container are immediately reflected in this
|
||||
directory.
|
||||
|
||||
The "work" directory is required for OverlayFS to function. It is used for things such as *copy_up* operations.
|
||||
The "work" directory is required for OverlayFS to function. It is used for
|
||||
things such as *copy_up* operations.
|
||||
|
||||
You can verify all of these constructs from the output of the `mount` command. (Ellipses and line breaks are used in the output below to enhance readability.)
|
||||
You can verify all of these constructs from the output of the `mount` command.
|
||||
(Ellipses and line breaks are used in the output below to enhance readability.)
|
||||
|
||||
$ mount | grep overlay
|
||||
overlay on /var/lib/docker/overlay/73de7176c223.../merged
|
||||
|
@ -95,39 +138,73 @@ You can verify all of these constructs from the output of the `mount` command. (
|
|||
upperdir=/var/lib/docker/overlay/73de7176c223.../upper,
|
||||
workdir=/var/lib/docker/overlay/73de7176c223.../work)
|
||||
|
||||
The output reflects the overlay is mounted as read-write ("rw").
|
||||
The output reflects that the overlay is mounted as read-write ("rw").
|
||||
|
||||
## Container reads and writes with overlay
|
||||
|
||||
Consider three scenarios where a container opens a file for read access with overlay.
|
||||
Consider three scenarios where a container opens a file for read access with
|
||||
overlay.
|
||||
|
||||
- **The file does not exist in the container layer**. If a container opens a file for read access and the file does not already exist in the container ("upperdir") it is read from the image ("lowerdir"). This should incur very little performance overhead.
|
||||
- **The file does not exist in the container layer**. If a container opens a
|
||||
file for read access and the file does not already exist in the container
|
||||
("upperdir") it is read from the image ("lowerdir"). This should incur very
|
||||
little performance overhead.
|
||||
|
||||
- **The file only exists in the container layer**. If a container opens a file for read access and the file exists in the container ("upperdir") and not in the image ("lowerdir"), it is read directly from the container.
|
||||
- **The file only exists in the container layer**. If a container opens a file
|
||||
for read access and the file exists in the container ("upperdir") and not in
|
||||
the image ("lowerdir"), it is read directly from the container.
|
||||
|
||||
- **The file exists in the container layer and the image layer**. If a container opens a file for read access and the file exists in the image layer and the container layer, the file's version in the container layer is read. This is because files in the container layer ("upperdir") obscure files with the same name in the image layer ("lowerdir").
|
||||
- **The file exists in the container layer and the image layer**. If a
|
||||
container opens a file for read access and the file exists in the image layer
|
||||
and the container layer, the file's version in the container layer is read.
|
||||
This is because files in the container layer ("upperdir") obscure files with
|
||||
the same name in the image layer ("lowerdir").
|
||||
|
||||
Consider some scenarios where files in a container are modified.
|
||||
|
||||
- **Writing to a file for the first time**. The first time a container writes to an existing file, that file does not exist in the container ("upperdir"). The `overlay` driver performs a *copy_up* operation to copy the file from the image ("lowerdir") to the container ("upperdir"). The container then writes the changes to the new copy of the file in the container layer.
|
||||
- **Writing to a file for the first time**. The first time a container writes
|
||||
to an existing file, that file does not exist in the container ("upperdir").
|
||||
The `overlay` driver performs a *copy_up* operation to copy the file from the
|
||||
image ("lowerdir") to the container ("upperdir"). The container then writes the
|
||||
changes to the new copy of the file in the container layer.
|
||||
|
||||
However, OverlayFS works at the file level not the block level. This means that all OverlayFS copy-up operations copy entire files, even if the file is very large and only a small part of it is being modified. This can have a noticeable impact on container write performance. However, two things are worth noting:
|
||||
However, OverlayFS works at the file level not the block level. This means
|
||||
that all OverlayFS copy-up operations copy entire files, even if the file is
|
||||
very large and only a small part of it is being modified. This can have a
|
||||
noticeable impact on container write performance. However, two things are
|
||||
worth noting:
|
||||
|
||||
* The copy_up operation only occurs the first time any given file is written to. Subsequent writes to the same file will operate against the copy of the file already copied up to the container.
|
||||
* The copy_up operation only occurs the first time any given file is
|
||||
written to. Subsequent writes to the same file will operate against the copy of
|
||||
the file already copied up to the container.
|
||||
|
||||
* OverlayFS only works with two layers. This means that performance should be better than AUFS which can suffer noticeable latencies when searching for files in images with many layers.
|
||||
* OverlayFS only works with two layers. This means that performance should
|
||||
be better than AUFS which can suffer noticeable latencies when searching for
|
||||
files in images with many layers.
|
||||
|
||||
- **Deleting files and directories**. When files are deleted within a container a *whiteout* file is created in the containers "upperdir". The version of the file in the image layer ("lowerdir") is not deleted. However, the whiteout file in the container obscures it.
|
||||
- **Deleting files and directories**. When files are deleted within a container
|
||||
a *whiteout* file is created in the containers "upperdir". The version of the
|
||||
file in the image layer ("lowerdir") is not deleted. However, the whiteout file
|
||||
in the container obscures it.
|
||||
|
||||
Deleting a directory in a container results in *opaque directory* being created in the "upperdir". This has the same effect as a whiteout file and effectively masks the existence of the directory in the image's "lowerdir".
|
||||
Deleting a directory in a container results in *opaque directory* being
|
||||
created in the "upperdir". This has the same effect as a whiteout file and
|
||||
effectively masks the existence of the directory in the image's "lowerdir".
|
||||
|
||||
## Configure Docker with the overlay storage driver
|
||||
|
||||
To configure Docker to use the overlay storage driver your Docker host must be running version 3.18 of the Linux kernel (preferably newer) with the overlay kernel module loaded. OverlayFS can operate on top of most supported Linux filesystems. However, ext4 is currently recommended for use in production environments.
|
||||
To configure Docker to use the overlay storage driver your Docker host must be
|
||||
running version 3.18 of the Linux kernel (preferably newer) with the overlay
|
||||
kernel module loaded. OverlayFS can operate on top of most supported Linux
|
||||
filesystems. However, ext4 is currently recommended for use in production
|
||||
environments.
|
||||
|
||||
The following procedure shows you how to configure your Docker host to use OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
|
||||
The following procedure shows you how to configure your Docker host to use
|
||||
OverlayFS. The procedure assumes that the Docker daemon is in a stopped state.
|
||||
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
|
||||
> **Caution:** If you have already run the Docker daemon on your Docker host
|
||||
> and have images you want to keep, `push` them Docker Hub or your private
|
||||
> Docker Trusted Registry before attempting this procedure.
|
||||
|
||||
1. If it is running, stop the Docker `daemon`.
|
||||
|
||||
|
@ -163,28 +240,60 @@ The following procedure shows you how to configure your Docker host to use Overl
|
|||
Backing Filesystem: extfs
|
||||
<output truncated>
|
||||
|
||||
Notice that the *Backing filesystem* in the output above is showing as `extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is recommended for production use cases.
|
||||
Notice that the *Backing filesystem* in the output above is showing as
|
||||
`extfs`. Multiple backing filesystems are supported but `extfs` (ext4) is
|
||||
recommended for production use cases.
|
||||
|
||||
Your Docker host is now using the `overlay` storage driver. If you run the `mount` command, you'll find Docker has automatically created the `overlay` mount with the required "lowerdir", "upperdir", "merged" and "workdir" constructs.
|
||||
Your Docker host is now using the `overlay` storage driver. If you run the
|
||||
`mount` command, you'll find Docker has automatically created the `overlay`
|
||||
mount with the required "lowerdir", "upperdir", "merged" and "workdir"
|
||||
constructs.
|
||||
|
||||
## OverlayFS and Docker Performance
|
||||
|
||||
As a general rule, the `overlay` driver should be fast. Almost certainly faster than `aufs` and `devicemapper`. In certain circumstances it may also be faster than `btrfs`. That said, there are a few things to be aware of relative to the performance of Docker using the `overlay` storage driver.
|
||||
As a general rule, the `overlay` driver should be fast. Almost certainly faster
|
||||
than `aufs` and `devicemapper`. In certain circumstances it may also be faster
|
||||
than `btrfs`. That said, there are a few things to be aware of relative to the
|
||||
performance of Docker using the `overlay` storage driver.
|
||||
|
||||
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple containers accessing the same file can share a single page cache entry (or entries). This makes the `overlay` driver efficient with memory and a good option for PaaS and other high density use cases.
|
||||
- **Page Caching**. OverlayFS supports page cache sharing. This means multiple
|
||||
containers accessing the same file can share a single page cache entry (or
|
||||
entries). This makes the `overlay` driver efficient with memory and a good
|
||||
option for PaaS and other high density use cases.
|
||||
|
||||
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any time a container writes to a file for the first time. This can insert latency into the write operation — especially if the file being copied up is large. However, once the file has been copied up, all subsequent writes to that file occur without the need for further copy-up operations.
|
||||
- **copy_up**. As with AUFS, OverlayFS has to perform copy-up operations any
|
||||
time a container writes to a file for the first time. This can insert latency
|
||||
into the write operation — especially if the file being copied up is
|
||||
large. However, once the file has been copied up, all subsequent writes to that
|
||||
file occur without the need for further copy-up operations.
|
||||
|
||||
The OverlayFS copy_up operation should be faster than the same operation with AUFS. This is because AUFS supports more layers than OverlayFS and it is possible to incur far larger latencies if searching through many AUFS layers.
|
||||
The OverlayFS copy_up operation should be faster than the same operation
|
||||
with AUFS. This is because AUFS supports more layers than OverlayFS and it is
|
||||
possible to incur far larger latencies if searching through many AUFS layers.
|
||||
|
||||
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards. This can result in certain OverlayFS operations breaking POSIX standards. One such operation is the *copy-up* operation. Therefore, using `yum` inside of a container on a Docker host using the `overlay` storage driver is unlikely to work without implementing workarounds.
|
||||
- **RPMs and Yum**. OverlayFS only implements a subset of the POSIX standards.
|
||||
This can result in certain OverlayFS operations breaking POSIX standards. One
|
||||
such operation is the *copy-up* operation. Therefore, using `yum` inside of a
|
||||
container on a Docker host using the `overlay` storage driver is unlikely to
|
||||
work without implementing workarounds.
|
||||
|
||||
- **Inode limits**. Use of the `overlay` storage driver can cause excessive inode consumption. This is especially so as the number of images and containers on the Docker host grows. A Docker host with a large number of images and lots of started and stopped containers can quickly run out of inodes.
|
||||
- **Inode limits**. Use of the `overlay` storage driver can cause excessive
|
||||
inode consumption. This is especially so as the number of images and containers
|
||||
on the Docker host grows. A Docker host with a large number of images and lots
|
||||
of started and stopped containers can quickly run out of inodes.
|
||||
|
||||
Unfortunately you can only specify the number of inodes in a filesystem at the time of creation. For this reason, you may wish to consider putting `/var/lib/docker` on a separate device with its own filesystem or manually specifying the number of inodes when creating the filesystem.
|
||||
Unfortunately you can only specify the number of inodes in a filesystem at the
|
||||
time of creation. For this reason, you may wish to consider putting
|
||||
`/var/lib/docker` on a separate device with its own filesystem, or manually
|
||||
specifying the number of inodes when creating the filesystem.
|
||||
|
||||
The following generic performance best practices also apply to OverlayFS.
|
||||
|
||||
- **Solid State Devices (SSD)**. For best performance it is always a good idea to use fast storage media such as solid state devices (SSD).
|
||||
- **Solid State Devices (SSD)**. For best performance it is always a good idea
|
||||
to use fast storage media such as solid state devices (SSD).
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||
|
|
|
@ -12,15 +12,27 @@ weight = -1
|
|||
# Select a storage driver
|
||||
|
||||
This page describes Docker's storage driver feature. It lists the storage
|
||||
driver's that Docker supports and the basic commands associated with managing them. Finally, this page provides guidance on choosing a storage driver.
|
||||
driver's that Docker supports and the basic commands associated with managing
|
||||
them. Finally, this page provides guidance on choosing a storage driver.
|
||||
|
||||
The material on this page is intended for readers who already have an [understanding of the storage driver technology](imagesandcontainers.md).
|
||||
The material on this page is intended for readers who already have an
|
||||
[understanding of the storage driver technology](imagesandcontainers.md).
|
||||
|
||||
## A pluggable storage driver architecture
|
||||
|
||||
The Docker has a pluggable storage driver architecture. This gives you the flexibility to "plug in" the storage driver is best for your environment and use-case. Each Docker storage driver is based on a Linux filesystem or volume manager. Further, each storage driver is free to implement the management of image layers and the container layer in it's own unique way. This means some storage drivers perform better than others in different circumstances.
|
||||
Docker has a pluggable storage driver architecture. This gives you the
|
||||
flexibility to "plug in" the storage driver that is best for your environment
|
||||
and use-case. Each Docker storage driver is based on a Linux filesystem or
|
||||
volume manager. Further, each storage driver is free to implement the
|
||||
management of image layers and the container layer in its own unique way. This
|
||||
means some storage drivers perform better than others in different
|
||||
circumstances.
|
||||
|
||||
Once you decide which driver is best, you set this driver on the Docker daemon at start time. As a result, the Docker daemon can only run one storage driver, and all containers created by that daemon instance use the same storage driver. The table below shows the supported storage driver technologies and their driver names:
|
||||
Once you decide which driver is best, you set this driver on the Docker daemon
|
||||
at start time. As a result, the Docker daemon can only run one storage driver,
|
||||
and all containers created by that daemon instance use the same storage driver.
|
||||
The table below shows the supported storage driver technologies and their
|
||||
driver names:
|
||||
|
||||
|Technology |Storage driver name |
|
||||
|--------------|---------------------|
|
||||
|
@ -31,7 +43,8 @@ Once you decide which driver is best, you set this driver on the Docker daemon a
|
|||
|VFS* |`vfs` |
|
||||
|ZFS |`zfs` |
|
||||
|
||||
To find out which storage driver is set on the daemon , you use the `docker info` command:
|
||||
To find out which storage driver is set on the daemon , you use the
|
||||
`docker info` command:
|
||||
|
||||
$ docker info
|
||||
Containers: 0
|
||||
|
@ -44,9 +57,19 @@ To find out which storage driver is set on the daemon , you use the `docker info
|
|||
Operating System: Ubuntu 15.04
|
||||
... output truncated ...
|
||||
|
||||
The `info` subcommand reveals that the Docker daemon is using the `overlay` storage driver with a `Backing Filesystem` value of `extfs`. The `extfs` value means that the `overlay` storage driver is operating on top of an existing (ext) filesystem. The backing filesystem refers to the filesystem that was used to create the Docker host's local storage area under `/var/lib/docker`.
|
||||
The `info` subcommand reveals that the Docker daemon is using the `overlay`
|
||||
storage driver with a `Backing Filesystem` value of `extfs`. The `extfs` value
|
||||
means that the `overlay` storage driver is operating on top of an existing
|
||||
(ext) filesystem. The backing filesystem refers to the filesystem that was used
|
||||
to create the Docker host's local storage area under `/var/lib/docker`.
|
||||
|
||||
Which storage driver you use, in part, depends on the backing filesystem you plan to use for your Docker host's local storage area. Some storage drivers can operate on top of different backing filesystems. However, other storage drivers require the backing filesystem to be the same as the storage driver. For example, the `btrfs` storage driver can only work with a `btrfs` backing filesystem. The following table lists each storage driver and whether it must match the host's backing file system:
|
||||
Which storage driver you use, in part, depends on the backing filesystem you
|
||||
plan to use for your Docker host's local storage area. Some storage drivers can
|
||||
operate on top of different backing filesystems. However, other storage
|
||||
drivers require the backing filesystem to be the same as the storage driver.
|
||||
For example, the `btrfs` storage driver on a Btrfs backing filesystem. The
|
||||
following table lists each storage driver and whether it must match the host's
|
||||
backing file system:
|
||||
|
||||
|Storage driver |Must match backing filesystem |
|
||||
|---------------|------------------------------|
|
||||
|
@ -58,9 +81,12 @@ Which storage driver you use, in part, depends on the backing filesystem you pla
|
|||
|zfs |Yes |
|
||||
|
||||
|
||||
You can set the storage driver by passing the `--storage-driver=<name>` option to the `docker daemon` command line or by setting the option on the `DOCKER_OPTS` line in `/etc/default/docker` file.
|
||||
You can set the storage driver by passing the `--storage-driver=<name>` option
|
||||
to the `docker daemon` command line, or by setting the option on the
|
||||
`DOCKER_OPTS` line in the `/etc/default/docker` file.
|
||||
|
||||
The following command shows how to start the Docker daemon with the `devicemapper` storage driver using the `docker daemon` command:
|
||||
The following command shows how to start the Docker daemon with the
|
||||
`devicemapper` storage driver using the `docker daemon` command:
|
||||
|
||||
$ docker daemon --storage-driver=devicemapper &
|
||||
|
||||
|
@ -90,25 +116,82 @@ The following command shows how to start the Docker daemon with the `devicemappe
|
|||
Operating System: Ubuntu 15.04
|
||||
<output truncated>
|
||||
|
||||
Your choice of storage driver can affect the performance of your containerized applications. So it's important to understand the different storage driver options available and select the right one for your application. Later, in this page you'll find some advice for choosing an appropriate driver.
|
||||
Your choice of storage driver can affect the performance of your containerized
|
||||
applications. So it's important to understand the different storage driver
|
||||
options available and select the right one for your application. Later, in this
|
||||
page you'll find some advice for choosing an appropriate driver.
|
||||
|
||||
## Shared storage systems and the storage driver
|
||||
|
||||
Many enterprises consume storage from shared storage systems such as SAN and NAS arrays. These often provide increased performance and availability, as well as advanced features such as thin provisioning, deduplication and compression.
|
||||
Many enterprises consume storage from shared storage systems such as SAN and
|
||||
NAS arrays. These often provide increased performance and availability, as well
|
||||
as advanced features such as thin provisioning, deduplication and compression.
|
||||
|
||||
The Docker storage driver and data volumes can both operate on top of storage provided by shared storage systems. This allows Docker to leverage the increased performance and availability these systems provide. However, Docker does not integrate with these underlying systems.
|
||||
The Docker storage driver and data volumes can both operate on top of storage
|
||||
provided by shared storage systems. This allows Docker to leverage the
|
||||
increased performance and availability these systems provide. However, Docker
|
||||
does not integrate with these underlying systems.
|
||||
|
||||
Remember that each Docker storage driver is based on a Linux filesystem or volume manager. Be sure to follow existing best practices for operating your storage driver (filesystem or volume manager) on top of your shared storage system. For example, if using the ZFS storage driver on top of *XYZ* shared storage system, be sure to follow best practices for operating ZFS filesystems on top of XYZ shared storage system.
|
||||
Remember that each Docker storage driver is based on a Linux filesystem or
|
||||
volume manager. Be sure to follow existing best practices for operating your
|
||||
storage driver (filesystem or volume manager) on top of your shared storage
|
||||
system. For example, if using the ZFS storage driver on top of *XYZ* shared
|
||||
storage system, be sure to follow best practices for operating ZFS filesystems
|
||||
on top of XYZ shared storage system.
|
||||
|
||||
## Which storage driver should you choose?
|
||||
|
||||
As you might expect, the answer to this question is "it depends". While there are some clear cases where one particular storage driver outperforms other for certain workloads, you should factor all of the following into your decision:
|
||||
Several factors influence the selection of a storage driver. However, these two
|
||||
facts must be kept in mind:
|
||||
|
||||
Choose a storage driver that you and your team/organization are comfortable with. Consider how much experience you have with a particular storage driver. There is no substitute for experience and it is rarely a good idea to try something brand new in production. That's what labs and laptops are for!
|
||||
1. No single driver is well suited to every use-case
|
||||
2. Storage drivers are improving and evolving all of the time
|
||||
|
||||
If your Docker infrastructure is under support contracts, choose an option that will get you good support. You probably don't want to go with a solution that your support partners have little or no experience with.
|
||||
With these factors in mind, the following points, coupled with the table below,
|
||||
should provide some guidance.
|
||||
|
||||
Whichever driver you choose, make sure it has strong community support and momentum. This is important because storage driver development in the Docker project relies on the community as much as the Docker staff to thrive.
|
||||
### Stability
|
||||
For the most stable and hassle-free Docker experience, you should consider the
|
||||
following:
|
||||
|
||||
- **Use the default storage driver for your distribution**. When Docker
|
||||
installs, it chooses a default storage driver based on the configuration of
|
||||
your system. Stability is an important factor influencing which storage driver
|
||||
is used by default. Straying from this default may increase your chances of
|
||||
encountering bugs and nuances.
|
||||
- **Follow the configuration specified on the CS Engine
|
||||
[compatibility matrix](https://www.docker.com/compatibility-maintenance)**. The
|
||||
CS Engine is the commercially supported version of the Docker Engine. It's
|
||||
code-base is identical to the open source Engine, but it has a limited set of
|
||||
supported configurations. These *supported configurations* use the most stable
|
||||
and mature storage drivers. Straying from these configurations may also
|
||||
increase your chances of encountering bugs and nuances.
|
||||
|
||||
### Experience and expertise
|
||||
|
||||
Choose a storage driver that you and your team/organization have experience
|
||||
with. For example, if you use RHEL or one of its downstream forks, you may
|
||||
already have experience with LVM and Device Mapper. If so, you may wish to use
|
||||
the `devicemapper` driver.
|
||||
|
||||
If you do not feel you have expertise with any of the storage drivers supported
|
||||
by Docker, and you want an easy-to-use stable Docker experience, you should
|
||||
consider using the default driver installed by your distribution's Docker
|
||||
package.
|
||||
|
||||
### Future-proofing
|
||||
|
||||
Many people consider OverlayFS as the future of the Docker storage driver.
|
||||
However, it is less mature, and potentially less stable than some of the more
|
||||
mature drivers such as `aufs` and `devicemapper`. For this reason, you should
|
||||
use the OverlayFS driver with caution and expect to encounter more bugs and
|
||||
nuances than if you were using a more mature driver.
|
||||
|
||||
The following diagram lists each storage driver and provides insight into some
|
||||
of their pros and cons. When selecting which storage driver to use, consider
|
||||
the guidance offered by the table below along with the points mentioned above.
|
||||
|
||||
![](images/driver-pros-cons.png)
|
||||
|
||||
|
||||
## Related information
|
||||
|
|
|
@ -10,13 +10,24 @@ parent = "engine_driver"
|
|||
|
||||
# Docker and ZFS in practice
|
||||
|
||||
ZFS is a next generation filesystem that supports many advanced storage technologies such as volume management, snapshots, checksumming, compression and deduplication, replication and more.
|
||||
ZFS is a next generation filesystem that supports many advanced storage
|
||||
technologies such as volume management, snapshots, checksumming, compression
|
||||
and deduplication, replication and more.
|
||||
|
||||
It was created by Sun Microsystems (now Oracle Corporation) and is open sourced under the CDDL license. Due to licensing incompatibilities between the CDDL and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However, the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and userspace tools which can be installed separately.
|
||||
It was created by Sun Microsystems (now Oracle Corporation) and is open sourced
|
||||
under the CDDL license. Due to licensing incompatibilities between the CDDL
|
||||
and GPL, ZFS cannot be shipped as part of the mainline Linux kernel. However,
|
||||
the ZFS On Linux (ZoL) project provides an out-of-tree kernel module and
|
||||
userspace tools which can be installed separately.
|
||||
|
||||
The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in time it is not recommended to use the `zfs` Docker storage driver for production use unless you have substantial experience with ZFS on Linux.
|
||||
The ZFS on Linux (ZoL) port is healthy and maturing. However, at this point in
|
||||
time it is not recommended to use the `zfs` Docker storage driver for
|
||||
production use unless you have substantial experience with ZFS on Linux.
|
||||
|
||||
> **Note:** There is also a FUSE implementation of ZFS on the Linux platform. This should work with Docker but is not recommended. The native ZFS driver (ZoL) is more tested, more performant, and is more widely used. The remainder of this document will relate to the native ZoL port.
|
||||
> **Note:** There is also a FUSE implementation of ZFS on the Linux platform.
|
||||
> This should work with Docker but is not recommended. The native ZFS driver
|
||||
> (ZoL) is more tested, more performant, and is more widely used. The remainder
|
||||
> of this document will relate to the native ZoL port.
|
||||
|
||||
|
||||
## Image layering and sharing with ZFS
|
||||
|
@ -27,53 +38,96 @@ The Docker `zfs` storage driver makes extensive use of three ZFS datasets:
|
|||
- snapshots
|
||||
- clones
|
||||
|
||||
ZFS filesystems are thinly provisioned and have space allocated to them from a ZFS pool (zpool) via allocate on demand operations. Snapshots and clones are space-efficient point-in-time copies of ZFS filesystems. Snapshots are read-only. Clones are read-write. Clones can only be created from snapshots. This simple relationship is shown in the diagram below.
|
||||
ZFS filesystems are thinly provisioned and have space allocated to them from a
|
||||
ZFS pool (zpool) via allocate on demand operations. Snapshots and clones are
|
||||
space-efficient point-in-time copies of ZFS filesystems. Snapshots are
|
||||
read-only. Clones are read-write. Clones can only be created from snapshots.
|
||||
This simple relationship is shown in the diagram below.
|
||||
|
||||
![](images/zfs_clones.jpg)
|
||||
|
||||
The solid line in the diagram shows the process flow for creating a clone. Step 1 creates a snapshot of the filesystem, and step two creates the clone from the snapshot. The dashed line shows the relationship between the clone and the filesystem, via the snapshot. All three ZFS datasets draw space form the same underlying zpool.
|
||||
The solid line in the diagram shows the process flow for creating a clone. Step
|
||||
1 creates a snapshot of the filesystem, and step two creates the clone from
|
||||
the snapshot. The dashed line shows the relationship between the clone and the
|
||||
filesystem, via the snapshot. All three ZFS datasets draw space form the same
|
||||
underlying zpool.
|
||||
|
||||
On Docker hosts using the `zfs` storage driver, the base layer of an image is a ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top layer of the image it's created from. All ZFS datasets draw their space from a common zpool. The diagram below shows how this is put together with a running container based on a two-layer image.
|
||||
On Docker hosts using the `zfs` storage driver, the base layer of an image is a
|
||||
ZFS filesystem. Each child layer is a ZFS clone based on a ZFS snapshot of the
|
||||
layer below it. A container is a ZFS clone based on a ZFS Snapshot of the top
|
||||
layer of the image it's created from. All ZFS datasets draw their space from a
|
||||
common zpool. The diagram below shows how this is put together with a running
|
||||
container based on a two-layer image.
|
||||
|
||||
![](images/zfs_zpool.jpg)
|
||||
|
||||
The following process explains how images are layered and containers created. The process is based on the diagram above.
|
||||
The following process explains how images are layered and containers created.
|
||||
The process is based on the diagram above.
|
||||
|
||||
1. The base layer of the image exists on the Docker host as a ZFS filesystem.
|
||||
|
||||
This filesystem consumes space from the zpool used to create the Docker host's local storage area at `/var/lib/docker`.
|
||||
This filesystem consumes space from the zpool used to create the Docker
|
||||
host's local storage area at `/var/lib/docker`.
|
||||
|
||||
2. Additional image layers are clones of the dataset hosting the image layer directly below it.
|
||||
2. Additional image layers are clones of the dataset hosting the image layer
|
||||
directly below it.
|
||||
|
||||
In the diagram, "Layer 1" is added by making a ZFS snapshot of the base layer and then creating a clone from that snapshot. The clone is writable and consumes space on-demand from the zpool. The snapshot is read-only, maintaining the base layer as an immutable object.
|
||||
In the diagram, "Layer 1" is added by making a ZFS snapshot of the base
|
||||
layer and then creating a clone from that snapshot. The clone is writable and
|
||||
consumes space on-demand from the zpool. The snapshot is read-only, maintaining
|
||||
the base layer as an immutable object.
|
||||
|
||||
3. When the container is launched, a read-write layer is added above the image.
|
||||
|
||||
In the diagram above, the container's read-write layer is created by making a snapshot of the top layer of the image (Layer 1) and creating a clone from that snapshot.
|
||||
In the diagram above, the container's read-write layer is created by making
|
||||
a snapshot of the top layer of the image (Layer 1) and creating a clone from
|
||||
that snapshot.
|
||||
|
||||
As changes are made to the container, space is allocated to it from the zpool via allocate-on-demand operations. By default, ZFS will allocate space in blocks of 128K.
|
||||
As changes are made to the container, space is allocated to it from the
|
||||
zpool via allocate-on-demand operations. By default, ZFS will allocate space in
|
||||
blocks of 128K.
|
||||
|
||||
This process of creating child layers and containers from *read-only* snapshots allows images to be maintained as immutable objects.
|
||||
This process of creating child layers and containers from *read-only* snapshots
|
||||
allows images to be maintained as immutable objects.
|
||||
|
||||
## Container reads and writes with ZFS
|
||||
|
||||
Container reads with the `zfs` storage driver are very simple. A newly launched container is based on a ZFS clone. This clone initially shares all of its data with the dataset it was created from. This means that read operations with the `zfs` storage driver are fast – even if the data being read was copied into the container yet. This sharing of data blocks is shown in the diagram below.
|
||||
Container reads with the `zfs` storage driver are very simple. A newly launched
|
||||
container is based on a ZFS clone. This clone initially shares all of its data
|
||||
with the dataset it was created from. This means that read operations with the
|
||||
`zfs` storage driver are fast – even if the data being read was note
|
||||
copied into the container yet. This sharing of data blocks is shown in the
|
||||
diagram below.
|
||||
|
||||
![](images/zpool_blocks.jpg)
|
||||
|
||||
Writing new data to a container is accomplished via an allocate-on-demand operation. Every time a new area of the container needs writing to, a new block is allocated from the zpool. This means that containers consume additional space as new data is written to them. New space is allocated to the container (ZFS Clone) from the underlying zpool.
|
||||
Writing new data to a container is accomplished via an allocate-on-demand
|
||||
operation. Every time a new area of the container needs writing to, a new block
|
||||
is allocated from the zpool. This means that containers consume additional
|
||||
space as new data is written to them. New space is allocated to the container
|
||||
(ZFS Clone) from the underlying zpool.
|
||||
|
||||
Updating *existing data* in a container is accomplished by allocating new blocks to the containers clone and storing the changed data in those new blocks. The original are unchanged, allowing the underlying image dataset to remain immutable. This is the same as writing to a normal ZFS filesystem and is an implementation of copy-on-write semantics.
|
||||
Updating *existing data* in a container is accomplished by allocating new
|
||||
blocks to the containers clone and storing the changed data in those new
|
||||
blocks. The original blocks are unchanged, allowing the underlying image
|
||||
dataset to remain immutable. This is the same as writing to a normal ZFS
|
||||
filesystem and is an implementation of copy-on-write semantics.
|
||||
|
||||
## Configure Docker with the ZFS storage driver
|
||||
|
||||
The `zfs` storage driver is only supported on a Docker host where `/var/lib/docker` is mounted as a ZFS filesystem. This section shows you how to install and configure native ZFS on Linux (ZoL) on an Ubuntu 14.04 system.
|
||||
The `zfs` storage driver is only supported on a Docker host where
|
||||
`/var/lib/docker` is mounted as a ZFS filesystem. This section shows you how to
|
||||
install and configure native ZFS on Linux (ZoL) on an Ubuntu 14.04 system.
|
||||
|
||||
### Prerequisites
|
||||
|
||||
If you have already used the Docker daemon on your Docker host and have images you want to keep, `push` them Docker Hub or your private Docker Trusted Registry before attempting this procedure.
|
||||
If you have already used the Docker daemon on your Docker host and have images
|
||||
you want to keep, `push` them Docker Hub or your private Docker Trusted
|
||||
Registry before attempting this procedure.
|
||||
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev/xvdb`. The device identifier may be be different in your environment and you should substitute your own values throughout the procedure.
|
||||
Stop the Docker daemon. Then, ensure that you have a spare block device at
|
||||
`/dev/xvdb`. The device identifier may be be different in your environment and
|
||||
you should substitute your own values throughout the procedure.
|
||||
|
||||
### Install Zfs on Ubuntu 14.04 LTS
|
||||
|
||||
|
@ -98,7 +152,8 @@ Stop the Docker daemon. Then, ensure that you have a spare block device at `/dev
|
|||
gpg: imported: 1 (RSA: 1)
|
||||
OK
|
||||
|
||||
3. Get the latest package lists for all registered repositories and package archives.
|
||||
3. Get the latest package lists for all registered repositories and package
|
||||
archives.
|
||||
|
||||
$ sudo apt-get update
|
||||
Ign http://us-west-2.ec2.archive.ubuntu.com trusty InRelease
|
||||
|
@ -156,7 +211,8 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
|
|||
zpool-docker 93.5K 3.84G 19K /zpool-docker
|
||||
zpool-docker/docker 19K 3.84G 19K /var/lib/docker
|
||||
|
||||
Now that you have a ZFS filesystem mounted to `/var/lib/docker`, the daemon should automatically load with the `zfs` storage driver.
|
||||
Now that you have a ZFS filesystem mounted to `/var/lib/docker`, the daemon
|
||||
should automatically load with the `zfs` storage driver.
|
||||
|
||||
5. Start the Docker daemon.
|
||||
|
||||
|
@ -165,9 +221,9 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
|
|||
|
||||
The procedure for starting the Docker daemon may differ depending on the
|
||||
Linux distribution you are using. It is possible to force the Docker daemon
|
||||
to start with the `zfs` storage driver by passing the `--storage-driver=zfs`
|
||||
flag to the `docker daemon` command, or to the `DOCKER_OPTS` line in the
|
||||
Docker config file.
|
||||
to start with the `zfs` storage driver by passing the
|
||||
`--storage-driver=zfs`flag to the `docker daemon` command, or to the
|
||||
`DOCKER_OPTS` line in the Docker config file.
|
||||
|
||||
6. Verify that the daemon is using the `zfs` storage driver.
|
||||
|
||||
|
@ -186,33 +242,55 @@ Once ZFS is installed and loaded, you're ready to configure ZFS for Docker.
|
|||
[...]
|
||||
|
||||
The output of the command above shows that the Docker daemon is using the
|
||||
`zfs` storage driver and that the parent dataset is the `zpool-docker/docker`
|
||||
filesystem created earlier.
|
||||
`zfs` storage driver and that the parent dataset is the
|
||||
`zpool-docker/docker` filesystem created earlier.
|
||||
|
||||
Your Docker host is now using ZFS to store to manage its images and containers.
|
||||
|
||||
## ZFS and Docker performance
|
||||
|
||||
There are several factors that influence the performance of Docker using the `zfs` storage driver.
|
||||
There are several factors that influence the performance of Docker using the
|
||||
`zfs` storage driver.
|
||||
|
||||
- **Memory**. Memory has a major impact on ZFS performance. This goes back to the fact that ZFS was originally designed for use on big Sun Solaris servers with large amounts of memory. Keep this in mind when sizing your Docker hosts.
|
||||
- **Memory**. Memory has a major impact on ZFS performance. This goes back to
|
||||
the fact that ZFS was originally designed for use on big Sun Solaris servers
|
||||
with large amounts of memory. Keep this in mind when sizing your Docker hosts.
|
||||
|
||||
- **ZFS Features**. Using ZFS features, such as deduplication, can significantly increase the amount
|
||||
of memory ZFS uses. For memory consumption and performance reasons it is
|
||||
recommended to turn off ZFS deduplication. However, deduplication at other
|
||||
layers in the stack (such as SAN or NAS arrays) can still be used as these do
|
||||
not impact ZFS memory usage and performance. If using SAN, NAS or other hardware
|
||||
RAID technologies you should continue to follow existing best practices for
|
||||
using them with ZFS.
|
||||
- **ZFS Features**. Using ZFS features, such as deduplication, can
|
||||
significantly increase the amount of memory ZFS uses. For memory consumption
|
||||
and performance reasons it is recommended to turn off ZFS deduplication.
|
||||
However, deduplication at other layers in the stack (such as SAN or NAS arrays)
|
||||
can still be used as these do not impact ZFS memory usage and performance. If
|
||||
using SAN, NAS or other hardware RAID technologies you should continue to
|
||||
follow existing best practices for using them with ZFS.
|
||||
|
||||
* **ZFS Caching**. ZFS caches disk blocks in a memory structure called the adaptive replacement cache (ARC). The *Single Copy ARC* feature of ZFS allows a single cached copy of a block to be shared by multiple clones of a filesystem. This means that multiple running containers can share a single copy of cached block. This means that ZFS is a good option for PaaS and other high density use cases.
|
||||
- **ZFS Caching**. ZFS caches disk blocks in a memory structure called the
|
||||
adaptive replacement cache (ARC). The *Single Copy ARC* feature of ZFS allows a
|
||||
single cached copy of a block to be shared by multiple clones of a filesystem.
|
||||
This means that multiple running containers can share a single copy of cached
|
||||
block. This means that ZFS is a good option for PaaS and other high density use
|
||||
cases.
|
||||
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write filesystems like ZFS. However, ZFS writes in 128K blocks and allocates *slabs* (multiple 128K blocks) to CoW operations in an attempt to reduce fragmentation. The ZFS intent log (ZIL) and the coalescing of writes (delayed writes) also help to reduce fragmentation.
|
||||
- **Fragmentation**. Fragmentation is a natural byproduct of copy-on-write
|
||||
filesystems like ZFS. However, ZFS writes in 128K blocks and allocates *slabs*
|
||||
(multiple 128K blocks) to CoW operations in an attempt to reduce fragmentation.
|
||||
The ZFS intent log (ZIL) and the coalescing of writes (delayed writes) also
|
||||
help to reduce fragmentation.
|
||||
|
||||
- **Use the native ZFS driver for Linux**. Although the Docker `zfs` storage driver supports the ZFS FUSE implementation, it is not recommended when high performance is required. The native ZFS on Linux driver tends to perform better than the FUSE implementation.
|
||||
- **Use the native ZFS driver for Linux**. Although the Docker `zfs` storage
|
||||
driver supports the ZFS FUSE implementation, it is not recommended when high
|
||||
performance is required. The native ZFS on Linux driver tends to perform better
|
||||
than the FUSE implementation.
|
||||
|
||||
The following generic performance best practices also apply to ZFS.
|
||||
|
||||
- **Use of SSD**. For best performance it is always a good idea to use fast storage media such as solid state devices (SSD). However, if you only have a limited amount of SSD storage available it is recommended to place the ZIL on SSD.
|
||||
- **Use of SSD**. For best performance it is always a good idea to use fast
|
||||
storage media such as solid state devices (SSD). However, if you only have a
|
||||
limited amount of SSD storage available it is recommended to place the ZIL on
|
||||
SSD.
|
||||
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable performance. This is because they bypass the storage driver and do not incur any of the potential overheads introduced by thin provisioning and copy-on-write. For this reason, you may want to place heavy write workloads on data volumes.
|
||||
- **Use Data Volumes**. Data volumes provide the best and most predictable
|
||||
performance. This is because they bypass the storage driver and do not incur
|
||||
any of the potential overheads introduced by thin provisioning and
|
||||
copy-on-write. For this reason, you should place heavy write workloads on data
|
||||
volumes.
|
||||
|
|