It's a common scenario for admins and/or monitoring applications to
mount in the daemon root dir into a container. When doing so all mounts
get coppied into the container, often with private references.
This can prevent removal of a container due to the various mounts that
must be configured before a container is started (for example, for
shared /dev/shm, or secrets) being leaked into another namespace,
usually with private references.
This is particularly problematic on older kernels (e.g. RHEL < 7.4)
where a mount may be active in another namespace and attempting to
remove a mountpoint which is active in another namespace fails.
This change moves all container resource mounts into a common directory
so that the directory can be made unbindable.
What this does is prevents sub-mounts of this new directory from leaking
into other namespaces when mounted with `rbind`... which is how all
binds are handled for containers.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This is a fix to the following issue:
$ docker run --tmpfs /dev/shm busybox sh
docker: Error response from daemon: linux mounts: Duplicate mount point '/dev/shm'.
In current code (daemon.createSpec()), tmpfs mount from --tmpfs is added
to list of mounts (`ms`), when the mount from IpcMounts() is added.
While IpcMounts() is checking for existing mounts first, it does that
by using container.HasMountFor() function which only checks container.Mounts
but not container.Tmpfs.
Ultimately, the solution is to get rid of container.Tmpfs (moving its
data to container.Mounts). Current workaround is to add checking
of container.Tmpfs into container.HasMountFor().
A unit test case is included.
Unfortunately we can't call daemon.createSpec() from a unit test,
as the code relies a lot on various daemon structures to be initialized
properly, and it is hard to achieve. Therefore, we minimally mimick
the code flow of daemon.createSpec() -- barely enough to reproduce
the issue.
https://github.com/moby/moby/issues/35455
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The code in question looks up mounts two times: first by using
HasMountFor(), and then directly by looking in container.MountPoints.
There is no need to do it twice.
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
The shutdown timeout for containers in insufficient on Windows. If the daemon is shutting down, and a container takes longer than expected to shut down, this can cause the container to remain in a bad state after restart, and never be able to start again. Increasing the timeout makes this less likely to occur.
Signed-off-by: Darren Stahl <darst@microsoft.com>
This enables docker cp and ADD/COPY docker build support for LCOW.
Originally, the graphdriver.Get() interface returned a local path
to the container root filesystem. This does not work for LCOW, so
the Get() method now returns an interface that LCOW implements to
support copying to and from the container.
Signed-off-by: Akash Gupta <akagup@microsoft.com>
Use strongly typed errors to set HTTP status codes.
Error interfaces are defined in the api/errors package and errors
returned from controllers are checked against these interfaces.
Errors can be wraeped in a pkg/errors.Causer, as long as somewhere in the
line of causes one of the interfaces is implemented. The special error
interfaces take precedence over Causer, meaning if both Causer and one
of the new error interfaces are implemented, the Causer is not
traversed.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
Since the commit d88fe447df ("Add support for sharing /dev/shm/ and
/dev/mqueue between containers") container's /dev/shm is mounted on the
host first, then bind-mounted inside the container. This is done that
way in order to be able to share this container's IPC namespace
(and the /dev/shm mount point) with another container.
Unfortunately, this functionality breaks container checkpoint/restore
(even if IPC is not shared). Since /dev/shm is an external mount, its
contents is not saved by `criu checkpoint`, and so upon restore any
application that tries to access data under /dev/shm is severily
disappointed (which usually results in a fatal crash).
This commit solves the issue by introducing new IPC modes for containers
(in addition to 'host' and 'container:ID'). The new modes are:
- 'shareable': enables sharing this container's IPC with others
(this used to be the implicit default);
- 'private': disables sharing this container's IPC.
In 'private' mode, container's /dev/shm is truly mounted inside the
container, without any bind-mounting from the host, which solves the
issue.
While at it, let's also implement 'none' mode. The motivation, as
eloquently put by Justin Cormack, is:
> I wondered a while back about having a none shm mode, as currently it is
> not possible to have a totally unwriteable container as there is always
> a /dev/shm writeable mount. It is a bit of a niche case (and clearly
> should never be allowed to be daemon default) but it would be trivial to
> add now so maybe we should...
...so here's yet yet another mode:
- 'none': no /dev/shm mount inside the container (though it still
has its own private IPC namespace).
Now, to ultimately solve the abovementioned checkpoint/restore issue, we'd
need to make 'private' the default mode, but unfortunately it breaks the
backward compatibility. So, let's make the default container IPC mode
per-daemon configurable (with the built-in default set to 'shareable'
for now). The default can be changed either via a daemon CLI option
(--default-shm-mode) or a daemon.json configuration file parameter
of the same name.
Note one can only set either 'shareable' or 'private' IPC modes as a
daemon default (i.e. in this context 'host', 'container', or 'none'
do not make much sense).
Some other changes this patch introduces are:
1. A mount for /dev/shm is added to default OCI Linux spec.
2. IpcMode.Valid() is simplified to remove duplicated code that parsed
'container:ID' form. Note the old version used to check that ID does
not contain a semicolon -- this is no longer the case (tests are
modified accordingly). The motivation is we should either do a
proper check for container ID validity, or don't check it at all
(since it is checked in other places anyway). I chose the latter.
3. IpcMode.Container() is modified to not return container ID if the
mode value does not start with "container:", unifying the check to
be the same as in IpcMode.IsContainer().
3. IPC mode unit tests (runconfig/hostconfig_test.go) are modified
to add checks for newly added values.
[v2: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-51345997]
[v3: addressed review at https://github.com/moby/moby/pull/34087#pullrequestreview-53902833]
[v4: addressed the case of upgrading from older daemon, in this case
container.HostConfig.IpcMode is unset and this is valid]
[v5: document old and new IpcMode values in api/swagger.yaml]
[v6: add the 'none' mode, changelog entry to docs/api/version-history.md]
Signed-off-by: Kir Kolyshkin <kolyshkin@gmail.com>
Replicate relevant mutations to the in-memory ACID store. Readers will
then be able to query container state without locking.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
The Solaris version (previously daemon/inspect_solaris.go) was
apparently missing some fields that should be available on that
platform.
Signed-off-by: Fabio Kung <fabio.kung@gmail.com>
Doing a chown/chmod automatically can cause `EPERM` in some cases (e.g.
with an NFS mount). Currently Docker will always call chown+chmod on a
volume path unless `:nocopy` is passed in, but we don't need to make
these calls if the perms and ownership already match and potentially
avoid an uneccessary `EPERM`.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
This fix tries to address the issue raised in 31032 where it was
not possible to specify `--cpus` for `docker update`.
This fix adds `--cpus` support for `docker update`. In case both
`--cpus` and `--cpu-period/--cpu-quota` have been specified,
an error will be returned.
Related docs has been updated.
Integration tests have been added.
This fix fixes 31032.
This fix is related to 27921, 27958.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: John Howard <jhoward@microsoft.com>
Working directory processing was handled differently for Hyper-V and Windows-Server containers, as annotated in the builder documentation (updated in this PR). For Hyper-V containers, the working directory set by WORKDIR was not created. This PR makes Hyper-V containers work the same as Windows Server containers (and the same as Linux).
Example (only applies to Hyper-V containers, so not reproducible under CI environment)
Dockerfile:
FROM microsoft/nanoserver
WORKDIR c:\installer
ENV GOROOT=c:\installer
ADD go.exe .
RUN go --help
Running on Windows Server 2016, using docker master without this change, but with daemon set to --exec-opt isolation=hyperv as it would be for Client operating systems.
PS E:\go\src\github.com\docker\docker> dockerd -g c:\control --exec-opt isolation=hyperv
time="2017-02-01T15:48:09.657286100-08:00" level=info msg="Windows default isolation mode: hyperv"
time="2017-02-01T15:48:09.662720900-08:00" level=info msg="[graphdriver] using prior storage driver: windowsfilter"
time="2017-02-01T15:48:10.011588000-08:00" level=info msg="Graph migration to content-addressability took 0.00 seconds"
time="2017-02-01T15:48:10.016655800-08:00" level=info msg="Loading containers: start."
time="2017-02-01T15:48:10.460820000-08:00" level=info msg="Loading containers: done."
time="2017-02-01T15:48:10.509859600-08:00" level=info msg="Daemon has completed initialization"
time="2017-02-01T15:48:10.509859600-08:00" level=info msg="Docker daemon" commit=3c64061 graphdriver=windowsfilter version=1.14.0-dev
First with no explicit isolation:
PS E:\docker\build\unifyworkdir> docker build --no-cache .
Sending build context to Docker daemon 10.1 MB
Step 1/5 : FROM microsoft/nanoserver
---> 89b8556cb9ca
Step 2/5 : WORKDIR c:\installer
---> 7e0f41d08204
Removing intermediate container 236c7802042a
Step 3/5 : ENV GOROOT c:\installer
---> Running in 8ea5237183c1
---> 394b70435261
Removing intermediate container 8ea5237183c1
Step 4/5 : ADD go.exe .
---> e47401a1745c
Removing intermediate container 88dcc28e74b1
Step 5/5 : RUN go --help
---> Running in efe90e1b6b8b
container efe90e1b6b8b76586abc5c1dc0e2797b75adc26517c48733d90651e767c8463b encountered an error during CreateProcess: failure in a Windows system call: The directory name is invalid. (0x10b) extra info: {"ApplicationName":"","CommandLine":"cmd /S /C go --help","User":"","WorkingDirectory":"C:\\installer","Environment":{"GOROOT":"c:\\installer"},"EmulateConsole":false,"CreateStdInPipe":true,"CreateStdOutPipe":true,"CreateStdErrPipe":true,"ConsoleSize":[0,0]}
PS E:\docker\build\unifyworkdir>
Then forcing process isolation:
PS E:\docker\build\unifyworkdir> docker build --isolation=process --no-cache .
Sending build context to Docker daemon 10.1 MB
Step 1/5 : FROM microsoft/nanoserver
---> 89b8556cb9ca
Step 2/5 : WORKDIR c:\installer
---> 350c955980c8
Removing intermediate container 8339c1e9250c
Step 3/5 : ENV GOROOT c:\installer
---> Running in bde511c5e3e0
---> b8820063b5b6
Removing intermediate container bde511c5e3e0
Step 4/5 : ADD go.exe .
---> e4ac32f8902b
Removing intermediate container d586e8492eda
Step 5/5 : RUN go --help
---> Running in 9e1aa235af5f
Cannot mkdir: C:\installer is not a directory
PS E:\docker\build\unifyworkdir>
Now compare the same results after this PR. Again, first with no explicit isolation (defaulting to Hyper-V containers as that's what the daemon it set to) - note it now succeeds 😄
PS E:\docker\build\unifyworkdir> docker build --no-cache .
Sending build context to Docker daemon 10.1 MB
Step 1/5 : FROM microsoft/nanoserver
---> 89b8556cb9ca
Step 2/5 : WORKDIR c:\installer
---> 4f319f301c69
Removing intermediate container 61b9c0b1ff6f
Step 3/5 : ENV GOROOT c:\installer
---> Running in c464a1d612d8
---> 96a26ab9a7b5
Removing intermediate container c464a1d612d8
Step 4/5 : ADD go.exe .
---> 0290d61faf57
Removing intermediate container dc5a085fffe3
Step 5/5 : RUN go --help
---> Running in 60bd56042ff8
Go is a tool for managing Go source code.
Usage:
go command [arguments]
The commands are:
build compile packages and dependencies
clean remove object files
doc show documentation for package or symbol
env print Go environment information
fix run go tool fix on packages
fmt run gofmt on package sources
generate generate Go files by processing source
get download and install packages and dependencies
install compile and install packages and dependencies
list list packages
run compile and run Go program
test test packages
tool run specified go tool
version print Go version
vet run go tool vet on packages
Use "go help [command]" for more information about a command.
Additional help topics:
c calling between Go and C
buildmode description of build modes
filetype file types
gopath GOPATH environment variable
environment environment variables
importpath import path syntax
packages description of package lists
testflag description of testing flags
testfunc description of testing functions
Use "go help [topic]" for more information about that topic.
The command 'cmd /S /C go --help' returned a non-zero code: 2
And the same with forcing process isolation. Also works 😄
PS E:\docker\build\unifyworkdir> docker build --isolation=process --no-cache .
Sending build context to Docker daemon 10.1 MB
Step 1/5 : FROM microsoft/nanoserver
---> 89b8556cb9ca
Step 2/5 : WORKDIR c:\installer
---> f423b9cc3e78
Removing intermediate container 41330c88893d
Step 3/5 : ENV GOROOT c:\installer
---> Running in 0b99a2d7bf19
---> e051144bf8ec
Removing intermediate container 0b99a2d7bf19
Step 4/5 : ADD go.exe .
---> 7072e32b7c37
Removing intermediate container a7a97aa37fd1
Step 5/5 : RUN go --help
---> Running in 7097438a54e5
Go is a tool for managing Go source code.
Usage:
go command [arguments]
The commands are:
build compile packages and dependencies
clean remove object files
doc show documentation for package or symbol
env print Go environment information
fix run go tool fix on packages
fmt run gofmt on package sources
generate generate Go files by processing source
get download and install packages and dependencies
install compile and install packages and dependencies
list list packages
run compile and run Go program
test test packages
tool run specified go tool
version print Go version
vet run go tool vet on packages
Use "go help [command]" for more information about a command.
Additional help topics:
c calling between Go and C
buildmode description of build modes
filetype file types
gopath GOPATH environment variable
environment environment variables
importpath import path syntax
packages description of package lists
testflag description of testing flags
testfunc description of testing functions
Use "go help [topic]" for more information about that topic.
The command 'cmd /S /C go --help' returned a non-zero code: 2
PS E:\docker\build\unifyworkdir>
This fix fixes issue raised in 29492 where it was not
possible to specify a default `--default-shm-size` in daemon
configuration for each `docker run``.
The flag `--default-shm-size` which is reloadable, has been
added to the daemon configuation.
Related docs has been updated.
This fix fixes 29492.
Signed-off-by: Yong Tang <yong.tang.github@outlook.com>
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
use secret store interface instead of embedded secret data into container
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
Fixes#22564
When an error occurs on mount, there should not be any call later to
unmount. This can throw off refcounting in the underlying driver
unexpectedly.
Consider these two cases:
```
$ docker run -v foo:/bar busybox true
```
```
$ docker run -v foo:/bar -w /foo busybox true
```
In the first case, if mounting `foo` fails, the volume driver will not
get a call to unmount (this is the incorrect behavior).
In the second case, the volume driver will not get a call to unmount
(correct behavior).
This occurs because in the first case, `/bar` does not exist in the
container, and as such there is no call to `volume.Mount()` during the
`create` phase. It will error out during the `start` phase.
In the second case `/bar` is created before dealing with the volume
because of the `-w`. Because of this, when the volume is being setup
docker will try to copy the image path contents in the volume, in which
case it will attempt to mount the volume and fail. This happens during
the `create` phase. This makes it so the container will not be created
(or at least fully created) and the user gets the error on `create`
instead of `start`. The error handling is different in these two phases.
Changed to only send `unmount` if the volume is mounted.
While investigating the cause of the reported issue I found some odd
behavior in unmount calls so I've cleaned those up a bit here as well.
Signed-off-by: Brian Goff <cpuguy83@gmail.com>
- use /secrets for swarm secret create route
- do not specify omitempty for secret and secret reference
- simplify lookup for secret ids
- do not use pointer for secret grpc conversion
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
- fix lint issues
- use errors pkg for wrapping errors
- cleanup on error when setting up secrets mount
- fix erroneous import
- remove unneeded switch for secret reference mode
- return single mount for secrets instead of slice
Signed-off-by: Evan Hazlett <ejhazlett@gmail.com>
"--restart" and "--rm" are conflict options, if a container is started
with AutoRemove flag, we should forbid the update action for its Restart
Policy.
Signed-off-by: Zhang Wei <zhangwei555@huawei.com>
The memory should always be smaller than memoryswap,
we should error out with message that user know how
to do rather than just an invalid argument error if
user update the memory limit bigger than already set
memory swap.
Signed-off-by: Lei Jitang <leijitang@huawei.com>