I don't think this will fully resolve the shutdown hangs associated
with this bug. The underlying problem is that writing the startup
cache during shutdown can take a long time. This seems like A Bad Thing,
and I have a mild leaning toward the thought that we shouldn't write the
startup cache at all during shutdown. However, I do think that trading
shutdown time regressions for startup time wins is probably good, so
I don't feel super comfortable making that change.
However, this change at least fixes the problem where we A) block the main
thread waiting for the write to finish, and B) are more likely to hit a
shutdown hang, since we concentrate all of the hang time in one phase,
rather than it being spread across the shutdown timeline. This is likely
the major change that the regressing bug introduced.
The unfortunate consequence is that if we try to GetBuffer during shutdown,
we will now read out of the omnijar. However, it should overall be better
since we'll be waiting on a (hopefully) small number of small reads rather
than a large write.
Differential Revision: https://phabricator.services.mozilla.com/D86054
This is similar to the first patch in the stack, but instead of freeing the
entries on write, we free them when we send them to the parent to be written.
Differential Revision: https://phabricator.services.mozilla.com/D83401
Prior to this patch, we were sending a boolean from InitContentChild (which
creates our StartupCache IPC actors) indicating whether we wanted to collect
new entries from a given process or not. This was so that we wouldn't accept
PutBuffer requests in these processes, since collecting them in one process
would be enough, and we don't want to waste memory. However, we actually
want the cache to be available before we can even get that IPC constructor
to the child process, so there's a window where we accept new entries
no matter what. This patch changes this by sending a boolean argument via
the command line indicating that we want to disable the Startupcache in this
process entirely. We send this when we didn't load a StartupCache off disk,
as this should be the only circumstance in which we're actually collecting
a substantial number of entries in content processes.
Differential Revision: https://phabricator.services.mozilla.com/D83400
On first run, when the cache is empty, we collect a lot of freshly allocated
buffers for StartupCache entries. These entries are generally not requested
again after being put in the StartupCache, so they are generally safe to free
(as ownership of them was given to us by contract in PutBuffer.) This tends
to free up >10MB.
Differential Revision: https://phabricator.services.mozilla.com/D83399
Keys can take up a nontrivial chunk of memory. This change ensures that the
memory cost is not per-process. In order to do this we had to switch from
nsCStrings to MaybeOwnedCharPtrs pointing to null-terminated char sequences.
This meant fiddling around with the code that loads / stores to disk and
making it a little bit uglier, rather than adding another method to the
IOBuffers classes to support our weird maybe-shared string type.
Differential Revision: https://phabricator.services.mozilla.com/D83398
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
The overall goal of this patch is to make the StartupCache accessible anywhere.
There's two main pieces to that equation:
1. Allowing it to be accessed off main thread, which means modifying the
mutex usage to ensure that all data accessed from non-main threads is
protected.
2. Allowing it to be accessed out of the chrome process, which means passing
a handle to a shared cache buffer down to child processes.
Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll
hope that the comments and the code are sufficient to explain what's going on
there.
Number 2 has some decisions to be made:
- The first decision was to pass a handle to a frozen chunk of memory down to
all child processes, rather than passing a handle to an actual file. There's
two reasons for this: 1) since we want to compress the underlying file on
disk, giving that file to child processes would mean they have to decompress
it themselves, eating CPU time. 2) since they would have to decompress it
themselves, they would have to allocate the memory for the decompressed
buffers, meaning they cannot all simply share one big decompressed buffer.
- The drawback of this decision is that we have to load and decompress the
buffer up front, before we spawn any child processes. We attempt to
mitigate this by keeping track of all the entries that child processes
access, and only including those in the frozen decompressed shared buffer.
- We base our implementation of this approach off of the shared preferences
implementation. Hopefully I got all of the pieces to fit together
correctly. They seem to work in local testing and on try, but I think
they require a set of experienced eyes looking carefully at them.
- Another decision was whether to send the handles to the buffers over IPC or
via command line. We went with the command line approach, because the startup
cache would need to be accessed very early on in order to ensure we do not
read from any omnijars, and we could not make that work via IPC.
- Unfortunately this means adding another hard-coded FD, similar to
kPrefMapFileDescriptor. It seems like at the very least we need to rope all
of these together into one place, but I think that should be filed as a
follow-up?
Lastly, because this patch is a bit of a monster to review - first, thank you
for looking at it, and second, the reason we're invested in this is because we
saw a >10% improvement in cold startup times on reference hardware, with a p
value less than 0.01. It's still not abundantly clear how reference hardware
numbers translate to numbers on release, and they certainly don't translate
well to Nightly numbers, but it's enough to convince me that it's worth some
effort.
Depends on D78584
Differential Revision: https://phabricator.services.mozilla.com/D77635
We need to be able to init StartupCache before the Omnijar in order to cache
all of the Omnijar contents we access. This patch implements that.
Depends on D77632
Differential Revision: https://phabricator.services.mozilla.com/D77633
We would like to be able to defer opening the omnijar files until after startup
if the StartupCache has already been populated. Opening the omnijar files takes
a nontrivial time, at least on Windows, and almost everything in the omnijar
should be fairly compressible, and thus makes sense to live in the StartupCache.
See the last patch in this series for a little more discussion on numbers, but
tl;dr: we saw a 12% improvement in time to about:home being finished on reference
hardware with these changes together with the changes from the descendant patches.
Differential Revision: https://phabricator.services.mozilla.com/D77632
Prior to this patch, the startupcache created its own mWriteThread off which it
wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called
at shutdown, to do the shutdown write if there was any reason to do so, and
from a timer that is re-initialized after every addition to the startup cache,
to run 60s after the last change to the cache.
It then joined that write thread on the main thread (in other words, blocks
on that off-main-thread write completing from the main thread) when:
- xpcom-shutdown fired
- the startupcache itself gets destroyed
- someone calls any of:
* HasEntry
* GetBuffer
* PutBuffer
* InvalidateCache
This patch removes the separate write thread, and instead dispatches a task to
the background task queue, indicating it can block. The task is started in
the same circumstances where we previously used to write (timer from the last
PutBuffer call, and shutdown if necessary).
To ensure it cannot be trying to use the data it writes out (mTable) from
the other thread while that data changes on the main thread, we use a mutex.
The task locks the mutex before starting, and unlocks when finished.
Enumerating the cases that we used to block on joining the thread:
In terms of application shutdown, we expect the background task queue to
either finish the write task, or fail to run it if it hasn't started it yet.
In the FastStartup case, we check if a write was necessary; if so, we
attempt to gain the lock without waiting. If we're successful, the write has
not yet started, and we instead run the write on the main thread. Otherwise,
we retry gaining the lock, blocking this time, thus guaranteeing the
off-the-main-thread write completes.
The task keeps a reference to the startupcache object, so it cannot be
destroyed while the task is pending.
Because the write does not modify `mTable`, and neither does `HasEntry`,
we do not need to do anything there.
In the `GetBuffer` case, we do not modify the table unless we have to read
the entry off disk (memmapped into `mCacheData`). This can only happen if
`mCacheData.initialized()` returns true, and we specifically call
`mCacheData.reset()` before firing off the write task to avoid this.
`mCacheData` is only re-initialized if someone calls `LoadArchive()`,
which can only happen from `Init()` (which is guaranteed not to run
again because this is a singleton), or `InvalidateCache()`, where we lock
the mutex (see below). So this is safe - but we assert on the lock to try
and avoid people breaking this chain of assumptions in the future.
When `PutBuffer` is called, we try to lock the mutex - but if locking fails
(ie the background thread is writing), we simply fail to store the entry
in the startupcache. In practice, this should be rare - it'd happen if
new calls to PutBuffer happen while writing during shutdown (when really,
we don't care) or when it's been 60 seconds since the last PutBuffer so
we started writing the startupcache.
When InvalidateCache is called, we lock the mutex - we shouldn't try to
write while invalidating, or invalidate while writing. This may be slow,
but in practice nothing should call `InvalidateCache` except developer
restarts or the `-purgecaches` commandline flag, so it shouldn't
matter a great deal.
Differential Revision: https://phabricator.services.mozilla.com/D70413
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Adds a new startup cache info service to expose some of the internal
state of the startup cache.
Differential Revision: https://phabricator.services.mozilla.com/D69298
--HG--
extra : moz-landing-system : lando
Please double check that I am using this correctly. I believe we are
seeing the crash in the linked bug because we are not handling hardware
faults when reading from the memory mapped file. This patch just wraps
all accesses in the MMAP_FAULT_HANDLER_ macros.
Depends on D53042
Differential Revision: https://phabricator.services.mozilla.com/D53043
--HG--
rename : modules/libjar/MmapFaultHandler.cpp => mozglue/misc/MmapFaultHandler.cpp
rename : modules/libjar/MmapFaultHandler.h => mozglue/misc/MmapFaultHandler.h
extra : moz-landing-system : lando
This is not, AFAICT, causing crashes - it is just an oversight noticed while
investigating the crash in the bug.
Differential Revision: https://phabricator.services.mozilla.com/D53042
--HG--
extra : moz-landing-system : lando