We can now keep the part of the policy implemented by upcalls to
userspace in the same place as the part of the policy that's handled
entirely in the kernel. This will become more useful in the future
(e.g., bug 930258).
This is more complicated than I'd like it to be, because we don't have
a good way to combine a specific trap function's knowledge that we want
to get a crash dump with the SIGSYS handler's copy of the unprocessed
signal info (which breakpad wants). The bpf_dsl interface requires a
specific trap function type (via the TrapRegistry superclass), so even
if we implement our own registry we can't change what's passed to it.
Normally we could use thread-local storage to get around that, but it's
not async signal safe.
As a result there is an imperfect compromise: the trap function returns
a failure with ENOSYS, Chromium's SIGSYS handler writes it into the
context, our SIGSYS handler reads it back out and uses a copy of
the original signal context for the crash dump. Other error codes
(and returning ENOSYS via the seccomp-bpf policy itself) are handled
normally.
This completely rewrites SandboxFilter.cpp and removes SandboxAssembler.
System calls are now loosely grouped by what they do, now that order
doesn't matter, and most of the intersection the content and media
plugin whitelists is moved into a common superclass. Hopefully this
improves the readability and comprehensibility of the syscall policies.
Also, the macros that take the syscall name are gone, because a plain
case label usually suffices now (the CASES_FOR_thing macros are a little
unsightly, but they're relatively simple), and at one point we saw
strange macro expansion issues with system header files that #define'd
some syscall names.
The signal handling is not migrated yet, so Trap() actions can't be used
yet; the next patch will take care of that, and to keep the intermediate
state working there's a minimal shim.
Bonus fix: non-const global variables use the "g" prefix; "s" is for
static class members and static variables in a function (where the
default is to allocate a separate copy per instance/activation).
This needs more unit tests for the various pieces of what's going on
here (LinuxCapabilities, SandboxChroot, UnshareUserNamespace()) but
that's nontrivial due to needing a single-threaded process -- and
currently they can't be run on Mozilla's CI anyway due to needing user
namespaces, and local testing can just try using GMP and manually
inspecting the child process. So that will be a followup.
Using the equivalent of release assertions in the patch after this one
is easier to justify if I can't come up with vaguely legitimate reasons
why they might fail; this detects the ones I thought of.
This means that B2G plugin-container must (dynamically) link against
libmozsandbox in order to call into it before initializing Binder.
(Desktop Linux plugin-container already contains the sandbox code.)
See bug, and comment at top of SandboxInfo.cpp, for rationale.
Bonus fix: reword comment about nested namespace limit; the exact limit
is 33 (not counting the root) but doesn't particularly matter.
Allow read to whole filesystem until chrome:// and file:// URLs are brokered through another process.
Except $HOME/Library in which we allow only access to profile add-ons subdir.
Add level 2, which allows read only from $HOME and /tmp (while still restricting $HOME/Library.
Change default back to 1.
Currently, only user namespace support is detected. This is targeted at
desktop, where (1) user namespace creation is effectively a prerequisite
for unsharing any other namespace, and (2) any kernel with user
namespace support almost certainly has all the others.
Bonus fix: remove extra copy of sandbox flag key names in about:support;
if JS property iteration order ever ceases to follow creation order, the
table rows could be permuted, but this doesn't really matter.