gecko-dev/toolkit/crashreporter/breakpad-client
Gabriele Svelto 21790b6674 Bug 1624467 - Fix a race in child process crash generation which could lead to a full browser crash r=KrisWright
In bug 1614933 we discovered a potential deadlock in Windows-specific code
could cause the main process to get stuck waiting for a child process to send
out its annotations after it crashed. As it turns out this flaw was also
present in the Linux and macOS versions but was not visible because on those
platforms we used non-blocking writes when writing out annotations and we
didn't check if they were actually written out (see bug 1666383). Since the
child process would not stop it couldn't possible deadlock. However, the
relevant code is still racy: if the child process' pipe would start rejecting
writes early, the child process could race past the main process leading to
the crash. The sequence of events in the racy case would be the following:

- We hit an exception in the child process, we enter the exception handler and
  signal the main process to write a minidump
- The crash generation thread in the main process is woken up, writes the
  minidump then signals the child process it can continue
- The child process writes out the crash annotations then exits
- At this stage the crash generation thread in the main process should have
  picked up the annotations and stored the crash report in the pid-to-minidump
  table. But let's assume the scheduler doesn't wake up this thread for now.
- The main thread in the main process wakes up in response to the child
  process shutdown, it will try to grab the minidump
- The minidump is not available yet because the crash generation client thread
  hasn't run yet and there's nothing preventing the main thread to race past it:
  the main process crashes trying to access a NULL pointer

To this issue the act of transfering the exception-time annotations is decoupled
from the act of writing the minidump in both the main process and the child
process. This way when the child process is forced to wait for the main process
to act on the minidump before the crash annotations are written out, and by the
time the child process quits the main process has reliably acquired the lock to
the pid-to-minidump table so that the race can't happen anymore.

Note: to implement the second change the child process exception handler
should execute the minidump callback after it has request the creation of a
minidump. For some reason this was implemented in breakpad only for the
in-process crash case, not the out-of-process one. I modified the relevant
code in the exception handler to invoke the callback in both cases.

Differential Revision: https://phabricator.services.mozilla.com/D97964
2020-12-01 17:26:55 +00:00
..
linux Bug 1624467 - Fix a race in child process crash generation which could lead to a full browser crash r=KrisWright 2020-12-01 17:26:55 +00:00
mac Bug 1624467 - Fix a race in child process crash generation which could lead to a full browser crash r=KrisWright 2020-12-01 17:26:55 +00:00
windows
minidump_file_writer-inl.h
minidump_file_writer.cc Bug 1660516 - Crash pings sometimes have null bytes after module names. r=gsvelto 2020-10-28 09:56:53 +00:00
minidump_file_writer.h Bug 1660516 - Crash pings sometimes have null bytes after module names. r=gsvelto 2020-10-28 09:56:53 +00:00
minidump_file_writer_unittest.cc
moz.build Bug 1588710 - Do not fail on stack protector on some asm chromium & breakpad sandboxing code r=mhentges 2020-11-27 10:14:01 +00:00