gecko-dev/startupcache/StartupCache.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

453 строки
16 KiB
C
Исходник Обычный вид История

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
2012-05-21 15:12:37 +04:00
/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
#ifndef StartupCache_h_
#define StartupCache_h_
#include <utility>
#include "nsClassHashtable.h"
#include "nsComponentManagerUtils.h"
#include "nsTArray.h"
#include "nsTStringHasher.h" // mozilla::DefaultHasher<nsCString>
#include "nsZipArchive.h"
#include "nsITimer.h"
#include "nsIMemoryReporter.h"
#include "nsIObserverService.h"
#include "nsIObserver.h"
#include "nsIObjectOutputStream.h"
#include "nsIFile.h"
#include "mozilla/Attributes.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/AutoMemMap.h"
#include "mozilla/Compression.h"
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
#include "mozilla/EnumSet.h"
#include "mozilla/MemoryReporting.h"
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
#include "mozilla/Mutex.h"
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
#include "mozilla/Omnijar.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/Result.h"
#include "mozilla/UniquePtr.h"
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
#include "mozilla/UniquePtrExtensions.h"
/**
* The StartupCache is a persistent cache of simple key-value pairs,
* where the keys are null-terminated c-strings and the values are
* arbitrary data, passed as a (char*, size) tuple.
*
* Clients should use the GetSingleton() static method to access the cache. It
* will be available from the end of XPCOM init (NS_InitXPCOM3 in
* XPCOMInit.cpp), until XPCOM shutdown begins. The GetSingleton() method will
* return null if the cache is unavailable. The cache is only provided for
* libxul builds -- it will fail to link in non-libxul builds. The XPCOM
* interface is provided only to allow compiled-code tests; clients should avoid
* using it.
*
* The API provided is very simple: GetBuffer() returns a buffer that was
* previously stored in the cache (if any), and PutBuffer() inserts a buffer
* into the cache. GetBuffer returns a new buffer, and the caller must take
* ownership of it. PutBuffer will assert if the client attempts to insert a
* buffer with the same name as an existing entry. The cache makes a copy of the
* passed-in buffer, so client retains ownership.
*
* InvalidateCache() may be called if a client suspects data corruption
* or wishes to invalidate for any other reason. This will remove all existing
* cache data. Additionally, the static method IgnoreDiskCache() can be called
* if it is believed that the on-disk cache file is itself corrupt. This call
* implicitly calls InvalidateCache (if the singleton has been initialized) to
* ensure any data already read from disk is discarded. The cache will not load
* data from the disk file until a successful write occurs.
*
* Finally, getDebugObjectOutputStream() allows debug code to wrap an
* objectstream with a debug objectstream, to check for multiply-referenced
* objects. These will generally fail to deserialize correctly, unless they are
* stateless singletons or the client maintains their own object data map for
* deserialization.
*
* Writes before the final-ui-startup notification are placed in an intermediate
* cache in memory, then written out to disk at a later time, to get writes off
* the startup path. In any case, clients should not rely on being able to
* GetBuffer() data that is written to the cache, since it may not have been
* written to disk or another client may have invalidated the cache. In other
* words, it should be used as a cache only, and not a reliable persistent
* store.
*
* Some utility functions are provided in StartupCacheUtils. These functions
* wrap the buffers into object streams, which may be useful for serializing
* objects. Note the above caution about multiply-referenced objects, though --
* the streams are just as 'dumb' as the underlying buffers about
* multiply-referenced objects. They just provide some convenience in writing
* out data.
*/
namespace mozilla {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
namespace dom {
class ContentParent;
}
namespace ipc {
class GeckoChildProcessHost;
} // namespace ipc
namespace scache {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
class StartupCacheChild;
#ifdef XP_UNIX
// Please see bug 1440207 about improving the problem of random fixed FDs,
// which the addition of the below constant exacerbates.
static const int kStartupCacheFd = 11;
#endif
// We use INT_MAX here just to simplify the sorting - we want to push
// unrequested entries to the back, and have requested entries in the order
// they came in.
static const int kStartupCacheEntryNotRequested = INT_MAX;
static const int kStartupcacheEntryNotInSharedData = -1;
// StartupCache entries can be backed by a buffer which they allocate as
// soon as they are requested, into which they decompress the contents out
// of the memory mapped file, *or* they can be backed by a contiguous buffer
// which we allocate up front and decompress into, in order to share it with
// child processes. This class is a helper class to hold a buffer which the
// entry itself may or may not own.
//
// Side note: it may be appropriate for StartupCache entries to never own
// their underlying buffers. We explicitly work to ensure that anything the
// StartupCache returns to a caller survives for the lifetime of the
// application, so it may be preferable to have a set of large contiguous
// buffers which we allocate on demand, and fill up with cache entry contents,
// but at that point we're basically implementing our own hacky pseudo-malloc,
// for relatively uncertain performance gains. For the time being, we just
// keep the existing model unchanged.
class MaybeOwnedCharPtr {
public:
char* mPtr;
bool mOwned;
~MaybeOwnedCharPtr() {
if (mOwned) {
delete[] mPtr;
}
}
MaybeOwnedCharPtr(const MaybeOwnedCharPtr& other);
MaybeOwnedCharPtr& operator=(const MaybeOwnedCharPtr& other);
MaybeOwnedCharPtr(MaybeOwnedCharPtr&& other)
: mPtr(std::exchange(other.mPtr, nullptr)),
mOwned(std::exchange(other.mOwned, false)) {}
MaybeOwnedCharPtr& operator=(MaybeOwnedCharPtr&& other) {
std::swap(mPtr, other.mPtr);
std::swap(mOwned, other.mOwned);
return *this;
}
MaybeOwnedCharPtr& operator=(decltype(nullptr)) {
mPtr = nullptr;
mOwned = false;
return *this;
}
explicit operator bool() const { return !!mPtr; }
char* get() { return mPtr; }
explicit MaybeOwnedCharPtr(char* aBytes) : mPtr(aBytes), mOwned(false) {}
explicit MaybeOwnedCharPtr(UniquePtr<char[]>&& aBytes)
: mPtr(aBytes.release()), mOwned(true) {}
explicit MaybeOwnedCharPtr(size_t size)
: mPtr(new char[size]), mOwned(true) {}
};
enum class StartupCacheEntryFlags {
Shared,
RequestedByChild,
AddedThisSession,
};
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
struct StartupCacheEntry {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
MaybeOwnedCharPtr mData;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
uint32_t mOffset;
uint32_t mCompressedSize;
uint32_t mUncompressedSize;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
int32_t mSharedDataOffset;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
int32_t mHeaderOffsetInFile;
int32_t mRequestedOrder;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
EnumSet<StartupCacheEntryFlags> mFlags;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
MOZ_IMPLICIT StartupCacheEntry(uint32_t aOffset, uint32_t aCompressedSize,
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
uint32_t aUncompressedSize,
EnumSet<StartupCacheEntryFlags> aFlags)
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
: mData(nullptr),
mOffset(aOffset),
mCompressedSize(aCompressedSize),
mUncompressedSize(aUncompressedSize),
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
mSharedDataOffset(kStartupcacheEntryNotInSharedData),
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mHeaderOffsetInFile(0),
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
mRequestedOrder(kStartupCacheEntryNotRequested),
mFlags(aFlags) {}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
StartupCacheEntry(UniquePtr<char[]> aData, size_t aLength,
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
int32_t aRequestedOrder,
EnumSet<StartupCacheEntryFlags> aFlags)
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
: mData(std::move(aData)),
mOffset(0),
mCompressedSize(0),
mUncompressedSize(aLength),
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
mSharedDataOffset(kStartupcacheEntryNotInSharedData),
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mHeaderOffsetInFile(0),
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
mRequestedOrder(aRequestedOrder),
mFlags(aFlags) {}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
struct Comparator {
using Value = std::pair<const nsCString*, StartupCacheEntry*>;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
bool Equals(const Value& a, const Value& b) const {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
// This is a bit ugly. Here and below, just note that we want entries
// with the RequestedByChild flag to be sorted before any other entries,
// because we're going to want to decompress them and send them down to
// child processes pretty early during startup.
return a.second->mFlags.contains(
StartupCacheEntryFlags::RequestedByChild) ==
b.second->mFlags.contains(
StartupCacheEntryFlags::RequestedByChild) &&
a.second->mRequestedOrder == b.second->mRequestedOrder;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
}
bool LessThan(const Value& a, const Value& b) const {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
bool requestedByChildA =
a.second->mFlags.contains(StartupCacheEntryFlags::RequestedByChild);
bool requestedByChildB =
b.second->mFlags.contains(StartupCacheEntryFlags::RequestedByChild);
if (requestedByChildA == requestedByChildB) {
return a.second->mRequestedOrder < b.second->mRequestedOrder;
} else {
return requestedByChildA;
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
}
};
};
// We don't want to refcount StartupCache, and ObserverService wants to
// refcount its listeners, so we'll let it refcount this instead.
class StartupCacheListener final : public nsIObserver {
~StartupCacheListener() = default;
NS_DECL_THREADSAFE_ISUPPORTS
NS_DECL_NSIOBSERVER
};
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
// This mirrors a bit of logic in the script preloader. Basically, there's
// certainly some overhead in child processes sending us lists of requested
// startup cache items, so we want to limit that. Accordingly, we only
// request to be notified of requested cache items for the first occurrence
// of each process type, enumerated below.
enum class ProcessType : uint8_t {
Uninitialized,
Parent,
Web,
Extension,
PrivilegedAbout,
};
class StartupCache : public nsIMemoryReporter {
friend class StartupCacheListener;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
friend class StartupCacheChild;
public:
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
using Table = HashMap<nsCString, StartupCacheEntry>;
NS_DECL_THREADSAFE_ISUPPORTS
NS_DECL_NSIMEMORYREPORTER
// StartupCache methods. See above comments for a more detailed description.
// true if the archive has an entry for the buffer or not.
bool HasEntry(const char* id);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// Returns a buffer that was previously stored, caller does not take ownership
nsresult GetBuffer(const char* id, const char** outbuf, uint32_t* length);
// Stores a buffer. Caller yields ownership.
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
nsresult PutBuffer(const char* id, UniquePtr<char[]>&& inbuf, uint32_t length,
bool isFromChildProcess = false);
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
void InvalidateCache();
// For use during shutdown - this will write the startupcache's data
// to disk if the timer hasn't already gone off.
void MaybeInitShutdownWrite();
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// For use during shutdown - ensure we complete the shutdown write
// before shutdown, even in the FastShutdown case.
void EnsureShutdownWriteComplete();
// Signal that data should not be loaded from the cache file
static void IgnoreDiskCache();
// In DEBUG builds, returns a stream that will attempt to check for
// and disallow multiple writes of the same object.
nsresult GetDebugObjectOutputStream(nsIObjectOutputStream* aStream,
nsIObjectOutputStream** outStream);
static ProcessType GetChildProcessType(const nsACString& remoteType);
static StartupCache* GetSingleton();
// This will get the StartupCache up and running to get cached entries, but
// it won't init some of the deferred things which require later services
// to be up and running.
static nsresult PartialInitSingleton(nsIFile* aProfileLocalDir);
// If the startup cache singleton exists (initialized via
// PartialInitSingleton), this will ensure that all of the ancillary
// requirements of the startup cache are met.
static nsresult FullyInitSingleton();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
static nsresult InitChildSingleton(char* aScacheHandleStr,
char* aScacheSizeStr);
static void DeleteSingleton();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
static void InitContentChild(dom::ContentParent& parent);
void AddStartupCacheCmdLineArgs(ipc::GeckoChildProcessHost& procHost,
std::vector<std::string>& aExtraOpts);
nsresult ParseStartupCacheCmdLineArgs(char* aScacheHandleStr,
char* aScacheSizeStr);
// This measures all the heap memory used by the StartupCache, i.e. it
// excludes the mapping.
size_t HeapSizeOfIncludingThis(mozilla::MallocSizeOf mallocSizeOf) const;
bool ShouldCompactCache();
nsresult ResetStartupWriteTimer();
bool StartupWriteComplete();
private:
StartupCache();
virtual ~StartupCache();
friend class StartupCacheInfo;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
Result<Ok, nsresult> LoadArchive();
nsresult PartialInit(nsIFile* aProfileLocalDir);
nsresult FullyInit();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
nsresult InitChild(StartupCacheChild* cacheChild);
// Removes the cache file.
void InvalidateCacheImpl(bool memoryOnly = false);
nsresult ResetStartupWriteTimerCheckingReadCount();
nsresult ResetStartupWriteTimerImpl();
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// Returns a file pointer for the cache file with the given name in the
// current profile.
Result<nsCOMPtr<nsIFile>, nsresult> GetCacheFile(const nsAString& suffix);
// Opens the cache file for reading.
Result<Ok, nsresult> OpenCache();
// Writes the cache to disk
Result<Ok, nsresult> WriteToDisk();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
Result<Ok, nsresult> DecompressEntry(StartupCacheEntry& aEntry);
Result<Ok, nsresult> LoadEntriesOffDisk();
Result<Ok, nsresult> LoadEntriesFromSharedMemory();
void WaitOnPrefetchThread();
void StartPrefetchMemoryThread();
static void WriteTimeout(nsITimer* aTimer, void* aClosure);
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
static void SendEntriesTimeout(nsITimer* aTimer, void* aClosure);
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
void MaybeWriteOffMainThread();
static void ThreadedPrefetch(void* aClosure);
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
EnumSet<ProcessType> mInitializedProcesses{};
nsCString mContentStartupFinishedTopic;
Table mTable;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// owns references to the contents of tables which have been invalidated.
// In theory grows forever if the cache is continually filled and then
// invalidated, but this should not happen in practice.
nsTArray<decltype(mTable)> mOldTables;
nsCOMPtr<nsIFile> mFile;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
loader::AutoMemMap mCacheData;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
loader::AutoMemMap mSharedData;
UniqueFileHandle mSharedDataHandle;
// This lock must protect a few members of the StartupCache. Essentially,
// we want to protect everything accessed by GetBuffer and PutBuffer. This
// includes:
// - mTable
// - mCacheData
// - mDecompressionContext
// - mCurTableReferenced
// - mOldTables
// - mWrittenOnce
// - gIgnoreDiskCache
// - mFile
// - mWriteTimer
// - mStartupWriteInitiated
mutable Mutex mLock;
nsCOMPtr<nsIObserverService> mObserverService;
Bug 1207245 - part 6 - rename nsRefPtr<T> to RefPtr<T>; r=ehsan; a=Tomcat The bulk of this commit was generated with a script, executed at the top level of a typical source code checkout. The only non-machine-generated part was modifying MFBT's moz.build to reflect the new naming. CLOSED TREE makes big refactorings like this a piece of cake. # The main substitution. find . -name '*.cpp' -o -name '*.cc' -o -name '*.h' -o -name '*.mm' -o -name '*.idl'| \ xargs perl -p -i -e ' s/nsRefPtr\.h/RefPtr\.h/g; # handle includes s/nsRefPtr ?</RefPtr</g; # handle declarations and variables ' # Handle a special friend declaration in gfx/layers/AtomicRefCountedWithFinalize.h. perl -p -i -e 's/::nsRefPtr;/::RefPtr;/' gfx/layers/AtomicRefCountedWithFinalize.h # Handle nsRefPtr.h itself, a couple places that define constructors # from nsRefPtr, and code generators specially. We do this here, rather # than indiscriminantly s/nsRefPtr/RefPtr/, because that would rename # things like nsRefPtrHashtable. perl -p -i -e 's/nsRefPtr/RefPtr/g' \ mfbt/nsRefPtr.h \ xpcom/glue/nsCOMPtr.h \ xpcom/base/OwningNonNull.h \ ipc/ipdl/ipdl/lower.py \ ipc/ipdl/ipdl/builtin.py \ dom/bindings/Codegen.py \ python/lldbutils/lldbutils/utils.py # In our indiscriminate substitution above, we renamed # nsRefPtrGetterAddRefs, the class behind getter_AddRefs. Fix that up. find . -name '*.cpp' -o -name '*.h' -o -name '*.idl' | \ xargs perl -p -i -e 's/nsRefPtrGetterAddRefs/RefPtrGetterAddRefs/g' if [ -d .git ]; then git mv mfbt/nsRefPtr.h mfbt/RefPtr.h else hg mv mfbt/nsRefPtr.h mfbt/RefPtr.h fi --HG-- rename : mfbt/nsRefPtr.h => mfbt/RefPtr.h
2015-10-18 08:24:48 +03:00
RefPtr<StartupCacheListener> mListener;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
nsCOMPtr<nsITimer> mWriteTimer;
nsCOMPtr<nsITimer> mSendEntriesTimer;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
Atomic<bool> mDirty;
Atomic<bool> mWrittenOnce;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
bool mCurTableReferenced;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
bool mLoaded;
bool mFullyInitialized;
uint32_t mRequestedCount;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
uint32_t mPrefetchSize;
uint32_t mSharedDataSize;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
size_t mCacheEntriesBaseOffset;
static StaticRefPtr<StartupCache> gStartupCache;
static bool gShutdownInitiated;
static bool gIgnoreDiskCache;
static bool gFoundDiskCacheOnInit;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
Atomic<StartupCacheChild*> mChildActor;
PRThread* mPrefetchThread;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
UniquePtr<Compression::LZ4FrameDecompressionContext> mDecompressionContext;
#ifdef DEBUG
nsTHashtable<nsISupportsHashKey> mWriteObjectMap;
#endif
};
// This debug outputstream attempts to detect if clients are writing multiple
// references to the same object. We only support that if that object
// is a singleton.
#ifdef DEBUG
class StartupCacheDebugOutputStream final : public nsIObjectOutputStream {
~StartupCacheDebugOutputStream() = default;
NS_DECL_ISUPPORTS
NS_DECL_NSIOBJECTOUTPUTSTREAM
StartupCacheDebugOutputStream(nsIObjectOutputStream* binaryStream,
nsTHashtable<nsISupportsHashKey>* objectMap)
: mBinaryStream(binaryStream), mObjectMap(objectMap) {}
NS_FORWARD_SAFE_NSIBINARYOUTPUTSTREAM(mBinaryStream)
NS_FORWARD_SAFE_NSIOUTPUTSTREAM(mBinaryStream)
bool CheckReferences(nsISupports* aObject);
nsCOMPtr<nsIObjectOutputStream> mBinaryStream;
nsTHashtable<nsISupportsHashKey>* mObjectMap;
};
#endif // DEBUG
} // namespace scache
} // namespace mozilla
#endif // StartupCache_h_