Allows people to create out-of-process graphdrivers that can be used
with Docker.
Extensions must be started before Docker otherwise Docker will fail to
start.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This adds a data struct in the aufs driver for including more
information about active mounts along with their reference count.
Signed-off-by: Michael Crosby <crosbymichael@gmail.com>
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
Windows: add support for images stored in alternate location.
Signed-off-by: Stefan J. Wernli <swernli@microsoft.com>
Commit e27c904 added a wrong and misleading comment
to GetMetadata(). Fix it using the wording from
commit 407a626 which introduced GetMetadata().
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
These functions are not part of the graphdriver.Driver
interface and should therefore be private.
Also, remove comments added by commit e27c904 as they are
* pretty obvious
* no longer required by golint
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
Ploop graph driver provides its own ext4 filesystem to every
container. It so happens that ext4 root comes with lost+found
directory, causing failures from DriverTestCreateEmpty() and
DriverTestCreateBase() tests on ploop.
While I am not yet ready to submit ploop graph driver for review,
this change looks simple enough to push.
Note that filtering is done without any additional allocations,
as described in https://github.com/golang/go/wiki/SliceTricks.
[v2: added a comment about lost+found]
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
TL;DR: check for IsExist(err) after a failed MkdirAll() is both
redundant and wrong -- so two reasons to remove it.
Quoting MkdirAll documentation:
> MkdirAll creates a directory named path, along with any necessary
> parents, and returns nil, or else returns an error. If path
> is already a directory, MkdirAll does nothing and returns nil.
This means two things:
1. If a directory to be created already exists, no error is returned.
2. If the error returned is IsExist (EEXIST), it means there exists
a non-directory with the same name as MkdirAll need to use for
directory. Example: we want to MkdirAll("a/b"), but file "a"
(or "a/b") already exists, so MkdirAll fails.
The above is a theory, based on quoted documentation and my UNIX
knowledge.
3. In practice, though, current MkdirAll implementation [1] returns
ENOTDIR in most of cases described in #2, with the exception when
there is a race between MkdirAll and someone else creating the
last component of MkdirAll argument as a file. In this very case
MkdirAll() will indeed return EEXIST.
Because of #1, IsExist check after MkdirAll is not needed.
Because of #2 and #3, ignoring IsExist error is just plain wrong,
as directory we require is not created. It's cleaner to report
the error now.
Note this error is all over the tree, I guess due to copy-paste,
or trying to follow the same usage pattern as for Mkdir(),
or some not quite correct examples on the Internet.
[v2: a separate aufs commit is merged into this one]
[1] https://github.com/golang/go/blob/f9ed2f75/src/os/path.go
Signed-off-by: Kir Kolyshkin <kir@openvz.org>
The `ApplyDiff` function takes a tar archive stream that is
automagically decompressed later. This was causing a double
decompression, and when the layer was empty, that causes an early EOF.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
The ZFS driver should raise proper errors when the ZFS utility is
missing or when there's no zfs partition active on the system. Raising the
proper errors make possible to silently ignore the ZFS storage
driver when no default storage driver is specified.
Previous to this commit it was no longer possible to start the
docker daemon in that way:
docker -d --storage-opt dm.loopdatasize=2GB
The above command resulted in an exit error because the ZFS driver
tried to use the storage options.
Signed-off-by: Flavio Castelli <fcastelli@suse.com>
Current default basesize is 10G. Change it to 100G. Reason being that for
some people 10G is turning out to be too small and we don't have capabilities
to grow it dyamically.
This is just overcommitting and no real space is allocated till container
actually writes data. And this is no different then fs based graphdrivers
where virtual size of a container root is unlimited.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Replaced github.com/docker/libcontainer with
github.com/opencontainers/runc/libcontaier.
Also I moved AppArmor profile generation to docker.
Main idea of this update is to fix mounting cgroups inside containers.
After updating docker on CI we can even remove dind.
Signed-off-by: Alexander Morozov <lk4d4@docker.com>
The ability to save and verify base device UUID (#13896) introduced a
situation where the initialization would panic when removing the device
returns EBUSY.
Functions `verifyBaseDeviceUUID` and `saveBaseDeviceUUID` now take the
lock on the `DeviceSet`, which solves the problem as `removeDevice`
assumes it owns the lock.
Signed-off-by: Arnaud Porterie <arnaud.porterie@docker.com>
Often it happens that docker is not able to shutdown/remove the thin
pool it created because some device has leaked into some mount name
space. That means device is in use and that means pool can't be removed.
Docker will leave pool as it is and exit. Later when user starts the
docker, it finds pool is already there and docker uses it. But docker
does not know it is same pool which is using the loop devices. Now
docker thinks loop devices are not being used. That means it does not
display the data correctly in "docker info", giving user wrong information.
This patch tries to detect if loop devices as created by docker are
being used for pool and fills in the right details in "docker info".
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
DeviceMapper must be explicitly selected because the Docker binary might not be linked to the right devmapper library.
With this change, Docker fails fast if the driver detection finds the devicemapper directory but the driver is not the default option.
The option `override_udev_sync_check` doesn't make sense anymore, since the user must be explicit to select devicemapper, so it's being removed.
Docker fails to use devicemapper only if Docker has been built statically unless the option was explicit.
Signed-off-by: David Calavera <david.calavera@gmail.com>
Export metadata for container and image in docker-inspect when overlay
graphdriver is in use. Right now it is done only for devicemapper graph
driver.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
It is easy for one to use docker for a while, shut it down and restart
docker with different set of storage options for device mapper driver
which will effectively change the thin pool. That means any of the
metadata stored in /var/lib/docker/devicemapper/metadata/ is not valid
for the new pool and user will run into various kind of issues like
container not found in the pool etc.
Users think that their images or containers are lost but it might just
be the case of configuration issue. People might use wrong metadata
with wrong pool.
To detect such situations, save UUID of base image and once docker
starts later, query and compare the UUID of base image with the
stored one. If they don't match, fail the initialization with the
error that UUID failed to match.
That way user will be forced to cleanup /var/lib/docker/ directory
and start docker again.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Export image/container metadata stored in graph driver. Right now 3 fields
DeviceId, DeviceSize and DeviceName are being exported from devicemapper.
Other graph drivers can export fields as they see fit.
This data can be used to mount the thin device outside of docker and tools
can look into image/container and do some kind of inspection.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Previously the cache was only updated once on startup, because the graph
code only check for filesystems on startup. However this breaks the API as it
was supposed and so unit tests.
Fixes#13142
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
The docker graph call driver.Exists() on initialisation for each filesystem in
the graph. This results will results in a lot `zfs get all` commands. To reduce
this, retrieve all descend filesystem at startup and cache it for later checks
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
instead of let zfs automaticly mount datasets, mount them on demand using mount(2).
This speed up this graph driver in 2 ways:
- less zfs processes needed to start a container
- /proc/mounts get smaller, so zfs userspace tools has less to read (which can
a significant amount of data as the number of layer grows)
This ways it can be also ensured that the correct mountpoint is always used.
Signed-off-by: Jörg Thalheim <joerg@higgsboson.tk>
Right now devicemapper mounts thin device using online discards by default
and passes mount option "discard". Generally people discourage usage of
online discards as they can be a drain on performance. Instead it is
recommended to use fstrim once in a while to reclaim the space.
In case of containers, we recommend to keep data volumes separate. So
there might not be lot of rm, unlink operations going on and there might
not be lot of space being freed by containers. So it might not matter
much if we don't reclaim that free space in pool.
User can still pass mount option explicitly using dm.mountopt=discard to
enable discards if they would like to.
So this is more like setting the containers by default for better performance
instead of better space efficiency in pool. And user can change the behavior
if they don't like default behavior.
Reported-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
If device is being reactivated before it could go away and deferred
deactivation is scheduled on it, cancel it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
This will help with debugging as one could just do "docker info" and figure
out of deferred removal is enabled or not.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Provide a new command line knob dm.deferred_device_removal which will enable
deferred device deactivation if driver and library support it.
This patch also checks for library support and driver version.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Before this, a storage driver would be defaulted to based on the
priority list, and only print a warning if there is state from other
drivers.
This meant a reordering of priority list would "break" users in an
upgrade of docker, such that there images in the prior driver's state
were now invisible.
With this change, prior state is scanned, and if present that driver is
preferred.
As such, we can reorder the priority list, and after an upgrade,
existing installs with prior drivers can have a contiguous experience,
while fresh installs may default to a driver in the new priority list.
Ref: https://github.com/docker/docker/pull/11962#issuecomment-88274858
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Signed-off-by: Megan Kostick <mkostick@us.ibm.com>
Alphabetize FSMagic list to make more human-readable.
Signed-off-by: Megan Kostick <mkostick@us.ibm.com>
This provides an override for forcing the daemon to still attempt
running the devicemapper driver even when udev sync is not supported.
Intended to be a very clear impairment for those choosing to use it. If
udev sync is false, there will still be an error in the daemon logs,
even when the override is in place. The docs have an explicit WARNING.
Including link to the docs for users that encounter this daemon error
during an upgrade.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Right now we try device removal at the interval of 10ms and keep on trying
till either device is removed or 10 seconds are over. That means if device
is busy, we will try 1000 times in those 10 seconds.
Sounds too high a frequency of deivce removal retrial. All the logs are
filled easily. I think it is a good idea to slow down a bit and retry at
the interval of 100ms instead of 10ms.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
During device removal, we are first waiting for device to close() in a tight
loop for 10 seconds. I am not sure why do we need it. First of all we come
here once the umount() is successful so device should be free. For some reason
of device is temporarily busy, then removeDevice() logic retries device removal
logic in a loop for 10 seconds and that should cover it. Can't see why one
more 10 seoncds loop is required before attempting device removal.
One loop should be able to cover all the temporary device busy conditions and
if condition is not temporary then 10 seconds loop is not going to help anyway.
So instead of two loops of 10 seconds each, I am converting it to a single
loop of 20 seconds. May be 10 second loop is good enough but for now I am
keeping it 20 seconds to avoid any regressions.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently in device removal path (device deactivation), we wait
for 10 seconds for devive to actually go away. waitRemove().
In current code this is not required. If dm removal task has completed
and one has done the wait on udev cookie, then device is gone and there
is no need to write another loop to wait for device removal.
This patch removes the waitRemove() which waits for 10 seconds after
device removal. This seems unnecessary.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
devmapper graph driver retries device removal 1000 times in case of failure
and if this fills up console with 1000 messages (when daemon is running in
debug mode). So remove these debug messages.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
There are issues with libdm logging. Right now if docker daemon is run
in debug mode, logging by libdm is too verbose. And if a device can't
be removed, thousands of messages fill the console and one can not see
what's going on.
This patch removes devicemapper.LogInitVerbose() call as that call will
only work if docker was not registering its own log handler with libdm.
For some reason docker registers one with libdm and libdm hands over
all the messages to docker (including debug ones). And now it is up to
devmapper backend to figure out which ones should go to console and
which ones should not.
So by default log only fatal messages from libdm. One can easily modify
the code to change it for debugging purposes.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
It's about time to let folks not hit 'vfs', when 'overlay' is supported
on their kernel. Especially now that v3.18.y is a long-term kernel.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Automatically detect support for aufs `dirperm1` option and apply it.
`dirperm1` tells aufs to check the permission bits of the directory on the
topmost branch and ignore the permission bits on all lower branches.
It can be used to fix aufs' permission bug (i.e., upper layer having
broader mask than the lower layer).
More information about the bug can be found at https://github.com/docker/docker/issues/783
`dirperm1` man page is at: http://aufs.sourceforge.net/aufs3/man.html
Signed-off-by: Daniel, Dao Quang Minh <dqminh89@gmail.com>
We removed it, because upstream removed it. But now it will be coming
back, so work with it either way.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
They say we should only use the BTRFS_LIB_VERSION
They will no longer support this since it had to be managed manually
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
In several cases graphdriver were just returning the low-level syscall
error and that was making it all the way up to the daemon logs and in
many cases was difficult to tell it was even coming from the graphdriver
at all.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
daemon/volumes.go
This SetFileCon call made no sense, it was changing the labels of any
directory mounted into the containers SELinux label. If it came from me,
then I apologize since it is a huge bug.
The Volumes Mount code should optionally do this, but it should not always
happen, and should never happen on a --privileged container.
The change to
daemon/graphdriver/vfs/driver.go, is a simplification since this it not
a relabel, it is only a setting of the shared label for docker volumes.
Docker-DCO-1.1-Signed-off-by: Dan Walsh <dwalsh@redhat.com> (github: rhatdan)
when initializing the devmapper driver, attempt to sync udev and device
mapper. If udev sync is not supported, print a warning. Eventually we'll
likely bail here to avoid unpredictable behavior for users.
Signed-off-by: Vincent Batts <vbatts@redhat.com>
Fixes#9960
This adds the output of a "Backing Filesystem:" entry to `docker info`
to overlay, aufs, and devicemapper graphdrivers. The default list
includes a fairly complete list of common filesystem names from
linux/include/uapi/linux/magic.h, but if the backing filesystem is not
recognized, the code will simply show "<unknown>"
Docker-DCO-1.1-Signed-off-by: Phil Estes <estesp@linux.vnet.ibm.com>
There are a couple of drivers that swallow errors that may occur in
their Put() implementation.
This changes the signature of (*Driver).Put for all the drivers implemented.
Signed-off-by: Vincent Batts <vbatts@hashbangbash.com>
Presenly the "Data file:" shows either the loopback _file_ or the block device.
With this, the "Data file:" will always show the device, and if it is a
loopback, then there will additionally be a "Data loop file:".
(Same for "Metadata file:")
Signed-off-by: Vincent Batts <vbatts@redhat.com>
`uint64(buf.Type)` on i686 is ffffffff9123683e on i686 due to sign extension, so it cannot be compared with `FsMagic(0x9123683E)`
Signed-off-by: Andrii Melnykov <andy.melnikov@gmail.com>
If .dockerignore mentions either then the client will send them to the
daemon but the daemon will erase them after the Dockerfile has been parsed
to simulate them never being sent in the first place.
an events test kept failing for me so I tried to fix that too
Closes#8330
Signed-off-by: Doug Davis <dug@us.ibm.com>
syscall.Unmount failed sometimes when user interrupted exporting,
for example a Ctrl-C, or pipe to commands which closed the pipe early,
like "docker export <container_name> | file -"; this syscall.Unmount
could sometimes return EBUSY and didn't actually umount the filesystem;
which would cause a following export command fail to mount;
change to lazy Unmount with MNT_DETACH can fix the problem, this is
the same behavior as in Shutdown;
```text
time="2015-01-03T21:27:26Z" level=error msg="Warning: error unmounting device
34a3e77cdbca17ceffd0636aee0415bb412996adb12360bfe2585ce30467fa8e: device or resource busy"
```
```
$ docker export thirsty_ardinghelli | file -
/dev/stdin: POSIX tar archive
time="2015-01-03T21:58:17Z" level=fatal msg="write /dev/stdout: broken pipe"
$ docker export thirsty_ardinghelli
time="2015-01-03T21:54:33Z" level=fatal msg="Error: thirsty_ardinghelli: Error getting container
34a3e77cdbca17ceffd0636aee0415bb412996adb12360bfe2585ce30467fa8e from driver devicemapper:
Error mounting '/dev/mapper/docker-253:0-3148372-34a3e77cdbca17ceffd0636aee0415bb412996adb12360bfe2585ce30467fa8e'
on '/var/lib/docker/devicemapper/mnt/34a3e77cdbca17ceffd0636aee0415bb412996adb12360bfe2585ce30467fa8e': device or resource busy"
```
Signed-off-by: Derek Che <drc@yahoo-inc.com>
To avoid an expensive call to archive.ChangesDirs() which walks two directory
trees and compares every entry, archive.ApplyLayer() has been extended to
also return the size of the layer changes.
Docker-DCO-1.1-Signed-off-by: Josh Hawn <josh.hawn@docker.com> (github: jlhawn)
Use transaction logic during device deletion and do rollback if transaction
is not complete. Following is the sequence of events.
- Open transaction and save to metafile
- Delete device from pool
- Delete device metadata file from disk
- Close Transaction
If docker crashes without closing transaction then rollback will take
place upon next docker start.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Finally this patch uses the notion of transaction for device or snapshot
device creation.
Following is sequence of event.
- Open a trasaction and save details in a file.
- Create a new device/snapshot device
- If a new device id is used, refresh transaction with new device id details.
- Create device metadata file
- Close transaction.
If docker crashes anywhere in between without closing transaction, then
upon next start, docker will figure out that there was a pending transaction
and it will roll back transaction. That is it will do following.
- Delete Device from pool
- Delete device metadata file
- Remove transaction file to mark no transaction is pending.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Finally, we seem to have all the bits to keep track of all used device
Ids and find a free device Id to use when creating a new device. Start
using it.
Ideally we should completely move away from retry logic when pool returns
-EEXISTS. For now I have retained that logic and I simply output a warning.
When things are stable, we should be able to get rid of it.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Open code createDevice() and createSnapDevice() and move all the logic
in the caller.
This is a sheer code reorganization so that all device Id allocation
logic is in one function. That way in case of erros, one can easily
cleanup and mark device Id free again. (Later patches benefit from
it).
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now we are accessing devices.NextDeviceId directly and also
incrementing it at various places.
Instead provide a helper function which is responsile for
incrementing NextDeviceId and return next deviceId.
This is just code structuring. This will help later once we
convert this function to find a free device Id and it goes
through a bitmap of used/free device Ids.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
When docker starts, build a used/free Device Id map from the per
device meta files we already have. These meta files have the data
which device Ids are in use. Parse these files and mark device as
used in the map.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently devicemapper backend does not keep track of used device Ids in
the pool. It tries a device Id and if that device Id exists in pool, it
tries with a different Id and keeps on doing this in a loop till it succeeds.
This worked fine so far but now we are moving to transaction based
device creation and deletion. We will keep deviceId information in
transaction which will be rolled back if docker crashed before transaction
was complete.
If we store a deviceId in transaction and later figure out it already
existed in pool and docker crashed, then we will rollback and remove
that existing device Id from pool (which we should not have).
That means, we should know free device Id in pool in advance before
we put that device Id in transaction.
Hence this patch creates a bitmap (one bit each for a deviceId), and
sets the bit if device Id is used otherwise resets it. This patch
is just preparing the ground right now. Actual usage will follow
in later patches.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Right now setupBaseImage() uses deleteDevice() to delete uninitialized
base image while rest of the code uses DeleteDevice(). Change it and
use a common function everywhere for the sake of uniformity.
I can't see what harm can be done by doing little extra locking done
by DeleteDevice().
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Very soon we will have the notion of an open transaction and keep its
details in a metafile.
When a new transaction is opened, we allocate a new transaction Id,
do the device creation/deletion and then we will close the transaction.
I thought that OpenTransactionId better represents the semantics of
transaction Id associated with an open transaction instead of NewtransactionId.
This patch just does the renaming. No functionality change.
I have also introduced a structure "Transaction" which will keep all
the details associated with a transaction. Later patches will add more
fields in this structure.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
Currently new transaction Id is created using allocateTransactionId()
function. This function takes NewTransactionId and bumps up by one
to create NewTransactionId.
I think ideally we should be bumping up devices.TransactionId by 1
to come up with NewTransactionId. Because idea is that devices.TransactionId
contains the current pool transaction Id and to come up with a new
transaction Id bump it up by one.
Current code is not wrong as we are keeping NewTransactionId and
TransactionId in sync. But it will be more direct if we look at
devices.TransactionId to come up with NewTransactionId. That way
we don't have to even initialize NewTransactionId during startup
as first time somebody wants to do a transaction, it will be
allocated fresh.
So simplify the code a bit. No functionality change.
Signed-off-by: Vivek Goyal <vgoyal@redhat.com>