gecko-dev/startupcache/StartupCache.cpp

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

947 строки
30 KiB
C++
Исходник Обычный вид История

/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
/* vim: set ts=8 sts=2 et sw=2 tw=80: */
2012-05-21 15:12:37 +04:00
/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
#include "prio.h"
#include "PLDHashTable.h"
#include "mozilla/IOInterposer.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/AutoMemMap.h"
#include "mozilla/IOBuffers.h"
#include "mozilla/MemoryReporting.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/MemUtils.h"
#include "mozilla/MmapFaultHandler.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/ResultExtensions.h"
#include "mozilla/scache/StartupCache.h"
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
#include "mozilla/ScopeExit.h"
#include "nsClassHashtable.h"
#include "nsComponentManagerUtils.h"
#include "nsCRT.h"
#include "nsDirectoryServiceUtils.h"
#include "nsIClassInfo.h"
#include "nsIFile.h"
#include "nsIObserver.h"
#include "nsIOutputStream.h"
#include "nsISupports.h"
#include "nsITimer.h"
#include "mozilla/Omnijar.h"
#include "prenv.h"
#include "mozilla/Telemetry.h"
#include "nsThreadUtils.h"
#include "nsXULAppAPI.h"
#include "nsIProtocolHandler.h"
#include "GeckoProfiler.h"
#include "nsAppRunner.h"
#include "xpcpublic.h"
#ifdef MOZ_BACKGROUNDTASKS
# include "mozilla/BackgroundTasks.h"
#endif
#if defined(XP_WIN)
# include <windows.h>
#endif
#ifdef IS_BIG_ENDIAN
# define SC_ENDIAN "big"
#else
# define SC_ENDIAN "little"
#endif
#if PR_BYTES_PER_WORD == 4
# define SC_WORDSIZE "4"
#else
# define SC_WORDSIZE "8"
#endif
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
using namespace mozilla::Compression;
namespace mozilla {
namespace scache {
MOZ_DEFINE_MALLOC_SIZE_OF(StartupCacheMallocSizeOf)
NS_IMETHODIMP
StartupCache::CollectReports(nsIHandleReportCallback* aHandleReport,
nsISupports* aData, bool aAnonymize) {
MOZ_COLLECT_REPORT(
"explicit/startup-cache/mapping", KIND_NONHEAP, UNITS_BYTES,
mCacheData.nonHeapSizeOfExcludingThis(),
"Memory used to hold the mapping of the startup cache from file. "
"This memory is likely to be swapped out shortly after start-up.");
MOZ_COLLECT_REPORT("explicit/startup-cache/data", KIND_HEAP, UNITS_BYTES,
HeapSizeOfIncludingThis(StartupCacheMallocSizeOf),
"Memory used by the startup cache for things other than "
"the file mapping.");
return NS_OK;
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
static const uint8_t MAGIC[] = "startupcache0002";
// This is a heuristic value for how much to reserve for mTable to avoid
// rehashing. This is not a hard limit in release builds, but it is in
// debug builds as it should be stable. If we exceed this number we should
// just increase it.
static const size_t STARTUP_CACHE_RESERVE_CAPACITY = 450;
// This is a hard limit which we will assert on, to ensure that we don't
// have some bug causing runaway cache growth.
static const size_t STARTUP_CACHE_MAX_CAPACITY = 5000;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// Not const because we change it for gtests.
static uint8_t STARTUP_CACHE_WRITE_TIMEOUT = 60;
#define STARTUP_CACHE_NAME "startupCache." SC_WORDSIZE "." SC_ENDIAN
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
static inline Result<Ok, nsresult> Write(PRFileDesc* fd, const void* data,
int32_t len) {
if (PR_Write(fd, data, len) != len) {
return Err(NS_ERROR_FAILURE);
}
return Ok();
}
static inline Result<Ok, nsresult> Seek(PRFileDesc* fd, int32_t offset) {
if (PR_Seek(fd, offset, PR_SEEK_SET) == -1) {
return Err(NS_ERROR_FAILURE);
}
return Ok();
}
static nsresult MapLZ4ErrorToNsresult(size_t aError) {
return NS_ERROR_FAILURE;
}
StartupCache* StartupCache::GetSingletonNoInit() {
return StartupCache::gStartupCache;
}
StartupCache* StartupCache::GetSingleton() {
#ifdef MOZ_BACKGROUNDTASKS
if (BackgroundTasks::IsBackgroundTaskMode()) {
return nullptr;
}
#endif
if (!gStartupCache) {
if (!XRE_IsParentProcess()) {
return nullptr;
}
#ifdef MOZ_DISABLE_STARTUPCACHE
return nullptr;
#else
StartupCache::InitSingleton();
#endif
}
return StartupCache::gStartupCache;
}
void StartupCache::DeleteSingleton() { StartupCache::gStartupCache = nullptr; }
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
nsresult StartupCache::InitSingleton() {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
nsresult rv;
StartupCache::gStartupCache = new StartupCache();
rv = StartupCache::gStartupCache->Init();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
if (NS_FAILED(rv)) {
StartupCache::gStartupCache = nullptr;
}
return rv;
}
StaticRefPtr<StartupCache> StartupCache::gStartupCache;
bool StartupCache::gShutdownInitiated;
bool StartupCache::gIgnoreDiskCache;
bool StartupCache::gFoundDiskCacheOnInit;
NS_IMPL_ISUPPORTS(StartupCache, nsIMemoryReporter)
StartupCache::StartupCache()
: mTableLock("StartupCache::mTableLock"),
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
mDirty(false),
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mWrittenOnce(false),
mCurTableReferenced(false),
mRequestedCount(0),
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mCacheEntriesBaseOffset(0),
mPrefetchThread(nullptr) {}
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
StartupCache::~StartupCache() { UnregisterWeakMemoryReporter(this); }
nsresult StartupCache::Init() {
// workaround for bug 653936
nsCOMPtr<nsIProtocolHandler> jarInitializer(
do_GetService(NS_NETWORK_PROTOCOL_CONTRACTID_PREFIX "jar"));
nsresult rv;
if (mozilla::RunningGTest()) {
STARTUP_CACHE_WRITE_TIMEOUT = 3;
}
// This allows to override the startup cache filename
// which is useful from xpcshell, when there is no ProfLDS directory to keep
// cache in.
char* env = PR_GetEnv("MOZ_STARTUP_CACHE");
if (env && *env) {
rv = NS_NewLocalFile(NS_ConvertUTF8toUTF16(env), false,
getter_AddRefs(mFile));
} else {
nsCOMPtr<nsIFile> file;
rv = NS_GetSpecialDirectory("ProfLDS", getter_AddRefs(file));
if (NS_FAILED(rv)) {
// return silently, this will fail in mochitests's xpcshell process.
return rv;
}
rv = file->AppendNative("startupCache"_ns);
NS_ENSURE_SUCCESS(rv, rv);
// Try to create the directory if it's not there yet
rv = file->Create(nsIFile::DIRECTORY_TYPE, 0777);
if (NS_FAILED(rv) && rv != NS_ERROR_FILE_ALREADY_EXISTS) return rv;
rv = file->AppendNative(nsLiteralCString(STARTUP_CACHE_NAME));
NS_ENSURE_SUCCESS(rv, rv);
mFile = file;
}
NS_ENSURE_TRUE(mFile, NS_ERROR_UNEXPECTED);
mObserverService = do_GetService("@mozilla.org/observer-service;1");
if (!mObserverService) {
NS_WARNING("Could not get observerService.");
return NS_ERROR_UNEXPECTED;
}
mListener = new StartupCacheListener();
rv = mObserverService->AddObserver(mListener, NS_XPCOM_SHUTDOWN_OBSERVER_ID,
false);
NS_ENSURE_SUCCESS(rv, rv);
rv = mObserverService->AddObserver(mListener, "startupcache-invalidate",
false);
NS_ENSURE_SUCCESS(rv, rv);
rv = mObserverService->AddObserver(mListener, "intl:app-locales-changed",
false);
NS_ENSURE_SUCCESS(rv, rv);
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
auto result = LoadArchive();
rv = result.isErr() ? result.unwrapErr() : NS_OK;
gFoundDiskCacheOnInit = rv != NS_ERROR_FILE_NOT_FOUND;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
// Sometimes we don't have a cache yet, that's ok.
// If it's corrupted, just remove it and start over.
if (gIgnoreDiskCache || (NS_FAILED(rv) && rv != NS_ERROR_FILE_NOT_FOUND)) {
NS_WARNING("Failed to load startupcache file correctly, removing!");
InvalidateCache();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
}
RegisterWeakMemoryReporter(this);
mDecompressionContext = MakeUnique<LZ4FrameDecompressionContext>(true);
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
return NS_OK;
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
}
void StartupCache::StartPrefetchMemoryThread() {
// XXX: It would be great for this to not create its own thread, unfortunately
// there doesn't seem to be an existing thread that makes sense for this, so
// barring a coordinated global scheduling system this is the best we get.
mPrefetchThread = PR_CreateThread(
PR_USER_THREAD, StartupCache::ThreadedPrefetch, this, PR_PRIORITY_NORMAL,
PR_GLOBAL_THREAD, PR_JOINABLE_THREAD, 256 * 1024);
}
/**
* LoadArchive can only be called from the main thread.
*/
Result<Ok, nsresult> StartupCache::LoadArchive() {
MOZ_ASSERT(NS_IsMainThread(), "Can only load startup cache on main thread");
if (gIgnoreDiskCache) return Err(NS_ERROR_FAILURE);
MOZ_TRY(mCacheData.init(mFile));
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
auto size = mCacheData.size();
if (CanPrefetchMemory()) {
StartPrefetchMemoryThread();
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
uint32_t headerSize;
if (size < sizeof(MAGIC) + sizeof(headerSize)) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return Err(NS_ERROR_UNEXPECTED);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-09-28 01:15:35 +03:00
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
auto data = mCacheData.get<uint8_t>();
auto end = data + size;
MMAP_FAULT_HANDLER_BEGIN_BUFFER(data.get(), size)
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
if (memcmp(MAGIC, data.get(), sizeof(MAGIC))) {
return Err(NS_ERROR_UNEXPECTED);
}
data += sizeof(MAGIC);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-09-28 01:15:35 +03:00
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
headerSize = LittleEndian::readUint32(data.get());
data += sizeof(headerSize);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-09-28 01:15:35 +03:00
if (headerSize > end - data) {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-02 06:39:46 +03:00
MOZ_ASSERT(false, "StartupCache file is corrupt.");
return Err(NS_ERROR_UNEXPECTED);
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
Range<uint8_t> header(data, data + headerSize);
data += headerSize;
mCacheEntriesBaseOffset = sizeof(MAGIC) + sizeof(headerSize) + headerSize;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
{
if (!mTable.reserve(STARTUP_CACHE_RESERVE_CAPACITY)) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return Err(NS_ERROR_UNEXPECTED);
}
auto cleanup = MakeScopeExit([&]() {
WaitOnPrefetchThread();
mTable.clear();
mCacheData.reset();
});
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
loader::InputBuffer buf(header);
uint32_t currentOffset = 0;
while (!buf.finished()) {
uint32_t offset = 0;
uint32_t compressedSize = 0;
uint32_t uncompressedSize = 0;
nsCString key;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
buf.codeUint32(offset);
buf.codeUint32(compressedSize);
buf.codeUint32(uncompressedSize);
buf.codeString(key);
if (offset + compressedSize > end - data) {
MOZ_ASSERT(false, "StartupCache file is corrupt.");
return Err(NS_ERROR_UNEXPECTED);
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// Make sure offsets match what we'd expect based on script ordering and
// size, as a basic sanity check.
if (offset != currentOffset) {
return Err(NS_ERROR_UNEXPECTED);
}
currentOffset += compressedSize;
// We could use mTable.putNew if we knew the file we're loading weren't
// corrupt. However, we don't know that, so check if the key already
// exists. If it does, we know the file must be corrupt.
decltype(mTable)::AddPtr p = mTable.lookupForAdd(key);
if (p) {
return Err(NS_ERROR_UNEXPECTED);
}
if (!mTable.add(
p, key,
StartupCacheEntry(offset, compressedSize, uncompressedSize))) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return Err(NS_ERROR_UNEXPECTED);
}
}
if (buf.error()) {
return Err(NS_ERROR_UNEXPECTED);
}
cleanup.release();
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
}
MMAP_FAULT_HANDLER_CATCH(Err(NS_ERROR_UNEXPECTED))
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return Ok();
}
bool StartupCache::HasEntry(const char* id) {
AUTO_PROFILER_LABEL("StartupCache::HasEntry", OTHER);
MOZ_ASSERT(NS_IsMainThread(), "Startup cache only available on main thread");
return mTable.has(nsDependentCString(id));
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
nsresult StartupCache::GetBuffer(const char* id, const char** outbuf,
uint32_t* length)
MOZ_NO_THREAD_SAFETY_ANALYSIS {
AUTO_PROFILER_LABEL("StartupCache::GetBuffer", OTHER);
NS_ASSERTION(NS_IsMainThread(),
"Startup cache only available on main thread");
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
Telemetry::LABELS_STARTUP_CACHE_REQUESTS label =
Telemetry::LABELS_STARTUP_CACHE_REQUESTS::Miss;
auto telemetry =
MakeScopeExit([&label] { Telemetry::AccumulateCategorical(label); });
decltype(mTable)::Ptr p = mTable.lookup(nsDependentCString(id));
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
if (!p) {
return NS_ERROR_NOT_AVAILABLE;
}
auto& value = p->value();
if (value.mData) {
label = Telemetry::LABELS_STARTUP_CACHE_REQUESTS::HitMemory;
} else {
if (!mCacheData.initialized()) {
return NS_ERROR_NOT_AVAILABLE;
}
#ifdef DEBUG
// It should be impossible for a write to be pending here. This is because
// we just checked mCacheData.initialized(), and this is reset before
// writing to the cache. It's not re-initialized unless we call
// LoadArchive(), either from Init() (which must have already happened) or
// InvalidateCache(). InvalidateCache() locks the mutex, so a write can't be
// happening. Really, we want to MOZ_ASSERT(!mTableLock.IsLocked()) here,
// but there is no such method. So we hack around by attempting to gain the
// lock. This should always succeed; if it fails, someone's broken the
// assumptions.
if (!mTableLock.TryLock()) {
MOZ_ASSERT(false, "Could not gain mTableLock - should never happen!");
return NS_ERROR_NOT_AVAILABLE;
}
mTableLock.Unlock();
#endif
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
size_t totalRead = 0;
size_t totalWritten = 0;
Span<const char> compressed = Span(
mCacheData.get<char>().get() + mCacheEntriesBaseOffset + value.mOffset,
value.mCompressedSize);
value.mData = MakeUnique<char[]>(value.mUncompressedSize);
Span<char> uncompressed = Span(value.mData.get(), value.mUncompressedSize);
MMAP_FAULT_HANDLER_BEGIN_BUFFER(uncompressed.Elements(),
uncompressed.Length())
bool finished = false;
while (!finished) {
auto result = mDecompressionContext->Decompress(
uncompressed.From(totalWritten), compressed.From(totalRead));
if (NS_WARN_IF(result.isErr())) {
value.mData = nullptr;
InvalidateCache();
return NS_ERROR_FAILURE;
}
auto decompressionResult = result.unwrap();
totalRead += decompressionResult.mSizeRead;
totalWritten += decompressionResult.mSizeWritten;
finished = decompressionResult.mFinished;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-09-28 01:15:35 +03:00
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
MMAP_FAULT_HANDLER_CATCH(NS_ERROR_FAILURE)
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
label = Telemetry::LABELS_STARTUP_CACHE_REQUESTS::HitDisk;
}
if (!value.mRequested) {
value.mRequested = true;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
value.mRequestedOrder = ++mRequestedCount;
MOZ_ASSERT(mRequestedCount <= mTable.count(),
"Somehow we requested more StartupCache items than exist.");
ResetStartupWriteTimerCheckingReadCount();
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
}
// Track that something holds a reference into mTable, so we know to hold
// onto it in case the cache is invalidated.
mCurTableReferenced = true;
*outbuf = value.mData.get();
*length = value.mUncompressedSize;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return NS_OK;
}
// Makes a copy of the buffer, client retains ownership of inbuf.
nsresult StartupCache::PutBuffer(const char* id, UniquePtr<char[]>&& inbuf,
uint32_t len) MOZ_NO_THREAD_SAFETY_ANALYSIS {
NS_ASSERTION(NS_IsMainThread(),
"Startup cache only available on main thread");
if (StartupCache::gShutdownInitiated) {
return NS_ERROR_NOT_AVAILABLE;
}
bool exists = mTable.has(nsDependentCString(id));
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
if (exists) {
NS_WARNING("Existing entry in StartupCache.");
// Double-caching is undesirable but not an error.
return NS_OK;
}
// Try to gain the table write lock. If the background task to write the
// cache is running, this will fail.
if (!mTableLock.TryLock()) {
return NS_ERROR_NOT_AVAILABLE;
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
}
auto lockGuard = MakeScopeExit([&] {
mTableLock.AssertCurrentThreadOwns();
mTableLock.Unlock();
});
// putNew returns false on alloc failure - in the very unlikely event we hit
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
// that and aren't going to crash elsewhere, there's no reason we need to
// crash here.
if (mTable.putNew(nsCString(id), StartupCacheEntry(std::move(inbuf), len,
++mRequestedCount))) {
return ResetStartupWriteTimer();
}
MOZ_DIAGNOSTIC_ASSERT(mTable.count() < STARTUP_CACHE_MAX_CAPACITY,
"Too many StartupCache entries.");
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return NS_OK;
}
size_t StartupCache::HeapSizeOfIncludingThis(
mozilla::MallocSizeOf aMallocSizeOf) const {
// This function could measure more members, but they haven't been found by
// DMD to be significant. They can be added later if necessary.
size_t n = aMallocSizeOf(this);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
n += mTable.shallowSizeOfExcludingThis(aMallocSizeOf);
for (auto iter = mTable.iter(); !iter.done(); iter.next()) {
if (iter.get().value().mData) {
n += aMallocSizeOf(iter.get().value().mData.get());
}
n += iter.get().key().SizeOfExcludingThisIfUnshared(aMallocSizeOf);
}
return n;
}
/**
* WriteToDisk writes the cache out to disk. Callers of WriteToDisk need to call
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
* WaitOnWriteComplete to make sure there isn't a write
* happening on another thread
*/
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
Result<Ok, nsresult> StartupCache::WriteToDisk() {
mTableLock.AssertCurrentThreadOwns();
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
if (!mDirty || mWrittenOnce) {
return Ok();
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
if (!mFile) {
return Err(NS_ERROR_UNEXPECTED);
}
AutoFDClose fd;
MOZ_TRY(mFile->OpenNSPRFileDesc(PR_WRONLY | PR_CREATE_FILE | PR_TRUNCATE,
0644, &fd.rwget()));
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-07 20:03:28 +03:00
nsTArray<std::pair<const nsCString*, StartupCacheEntry*>> entries;
for (auto iter = mTable.iter(); !iter.done(); iter.next()) {
if (iter.get().value().mRequested) {
entries.AppendElement(
std::make_pair(&iter.get().key(), &iter.get().value()));
}
}
if (entries.IsEmpty()) {
return Ok();
}
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
entries.Sort(StartupCacheEntry::Comparator());
loader::OutputBuffer buf;
for (auto& e : entries) {
auto key = e.first;
auto value = e.second;
auto uncompressedSize = value->mUncompressedSize;
// Set the mHeaderOffsetInFile so we can go back and edit the offset.
value->mHeaderOffsetInFile = buf.cursor();
// Write a 0 offset/compressed size as a placeholder until we get the real
// offset after compressing.
buf.codeUint32(0);
buf.codeUint32(0);
buf.codeUint32(uncompressedSize);
buf.codeString(*key);
}
uint8_t headerSize[4];
LittleEndian::writeUint32(headerSize, buf.cursor());
MOZ_TRY(Write(fd, MAGIC, sizeof(MAGIC)));
MOZ_TRY(Write(fd, headerSize, sizeof(headerSize)));
size_t headerStart = sizeof(MAGIC) + sizeof(headerSize);
size_t dataStart = headerStart + buf.cursor();
MOZ_TRY(Seek(fd, dataStart));
size_t offset = 0;
const size_t chunkSize = 1024 * 16;
LZ4FrameCompressionContext ctx(6, /* aCompressionLevel */
chunkSize, /* aReadBufLen */
true, /* aChecksum */
true); /* aStableSrc */
size_t writeBufLen = ctx.GetRequiredWriteBufferLength();
auto writeBuffer = MakeUnique<char[]>(writeBufLen);
auto writeSpan = Span(writeBuffer.get(), writeBufLen);
for (auto& e : entries) {
auto value = e.second;
value->mOffset = offset;
Span<const char> result;
MOZ_TRY_VAR(result,
ctx.BeginCompressing(writeSpan).mapErr(MapLZ4ErrorToNsresult));
MOZ_TRY(Write(fd, result.Elements(), result.Length()));
offset += result.Length();
for (size_t i = 0; i < value->mUncompressedSize; i += chunkSize) {
size_t size = std::min(chunkSize, value->mUncompressedSize - i);
char* uncompressed = value->mData.get() + i;
MOZ_TRY_VAR(result, ctx.ContinueCompressing(Span(uncompressed, size))
.mapErr(MapLZ4ErrorToNsresult));
MOZ_TRY(Write(fd, result.Elements(), result.Length()));
offset += result.Length();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
}
MOZ_TRY_VAR(result, ctx.EndCompressing().mapErr(MapLZ4ErrorToNsresult));
MOZ_TRY(Write(fd, result.Elements(), result.Length()));
offset += result.Length();
value->mCompressedSize = offset - value->mOffset;
MOZ_TRY(Seek(fd, dataStart + offset));
}
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-01 20:55:38 +03:00
for (auto& e : entries) {
auto value = e.second;
uint8_t* headerEntry = buf.Get() + value->mHeaderOffsetInFile;
LittleEndian::writeUint32(headerEntry, value->mOffset);
LittleEndian::writeUint32(headerEntry + sizeof(value->mOffset),
value->mCompressedSize);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
}
MOZ_TRY(Seek(fd, headerStart));
MOZ_TRY(Write(fd, buf.Get(), buf.cursor()));
mDirty = false;
mWrittenOnce = true;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return Ok();
}
void StartupCache::InvalidateCache(bool memoryOnly) {
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-08 05:46:11 +03:00
WaitOnPrefetchThread();
// Ensure we're not writing using mTable...
MutexAutoLock unlock(mTableLock);
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mWrittenOnce = false;
if (memoryOnly) {
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// This should only be called in tests.
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
auto writeResult = WriteToDisk();
if (NS_WARN_IF(writeResult.isErr())) {
gIgnoreDiskCache = true;
return;
}
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-09-28 01:15:35 +03:00
}
if (mCurTableReferenced) {
// There should be no way for this assert to fail other than a user manually
// sending startupcache-invalidate messages through the Browser Toolbox. If
// something knowingly invalidates the cache, the event can be counted with
// mAllowedInvalidationsCount.
MOZ_DIAGNOSTIC_ASSERT(
xpc::IsInAutomation() ||
// The allowed invalidations can grow faster than the old tables, so
// guard against incorrect unsigned subtraction.
mAllowedInvalidationsCount > mOldTables.Length() ||
// Now perform the real check.
mOldTables.Length() - mAllowedInvalidationsCount < 10,
"Startup cache invalidated too many times.");
mOldTables.AppendElement(std::move(mTable));
mCurTableReferenced = false;
} else {
mTable.clear();
Bug 1627075 - OMT and OMP StartupCache access r=froydnj The overall goal of this patch is to make the StartupCache accessible anywhere. There's two main pieces to that equation: 1. Allowing it to be accessed off main thread, which means modifying the mutex usage to ensure that all data accessed from non-main threads is protected. 2. Allowing it to be accessed out of the chrome process, which means passing a handle to a shared cache buffer down to child processes. Number 1 is somewhat fiddly, but it's all generally straightforward work. I'll hope that the comments and the code are sufficient to explain what's going on there. Number 2 has some decisions to be made: - The first decision was to pass a handle to a frozen chunk of memory down to all child processes, rather than passing a handle to an actual file. There's two reasons for this: 1) since we want to compress the underlying file on disk, giving that file to child processes would mean they have to decompress it themselves, eating CPU time. 2) since they would have to decompress it themselves, they would have to allocate the memory for the decompressed buffers, meaning they cannot all simply share one big decompressed buffer. - The drawback of this decision is that we have to load and decompress the buffer up front, before we spawn any child processes. We attempt to mitigate this by keeping track of all the entries that child processes access, and only including those in the frozen decompressed shared buffer. - We base our implementation of this approach off of the shared preferences implementation. Hopefully I got all of the pieces to fit together correctly. They seem to work in local testing and on try, but I think they require a set of experienced eyes looking carefully at them. - Another decision was whether to send the handles to the buffers over IPC or via command line. We went with the command line approach, because the startup cache would need to be accessed very early on in order to ensure we do not read from any omnijars, and we could not make that work via IPC. - Unfortunately this means adding another hard-coded FD, similar to kPrefMapFileDescriptor. It seems like at the very least we need to rope all of these together into one place, but I think that should be filed as a follow-up? Lastly, because this patch is a bit of a monster to review - first, thank you for looking at it, and second, the reason we're invested in this is because we saw a >10% improvement in cold startup times on reference hardware, with a p value less than 0.01. It's still not abundantly clear how reference hardware numbers translate to numbers on release, and they certainly don't translate well to Nightly numbers, but it's enough to convince me that it's worth some effort. Depends on D78584 Differential Revision: https://phabricator.services.mozilla.com/D77635
2020-07-01 20:55:38 +03:00
}
mRequestedCount = 0;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
if (!memoryOnly) {
mCacheData.reset();
nsresult rv = mFile->Remove(false);
if (NS_FAILED(rv) && rv != NS_ERROR_FILE_NOT_FOUND) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
gIgnoreDiskCache = true;
return;
}
}
gIgnoreDiskCache = false;
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
auto result = LoadArchive();
if (NS_WARN_IF(result.isErr())) {
gIgnoreDiskCache = true;
}
}
void StartupCache::CountAllowedInvalidation() { mAllowedInvalidationsCount++; }
void StartupCache::MaybeInitShutdownWrite() {
if (mTimer) {
mTimer->Cancel();
}
gShutdownInitiated = true;
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
MaybeWriteOffMainThread();
}
void StartupCache::EnsureShutdownWriteComplete() {
// If we've already written or there's nothing to write,
// we don't need to do anything. This is the common case.
if (mWrittenOnce || (mCacheData.initialized() && !ShouldCompactCache())) {
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
return;
}
// Otherwise, ensure the write happens. The timer should have been cancelled
// already in MaybeInitShutdownWrite.
if (!mTableLock.TryLock()) {
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// Uh oh, we're writing away from the main thread. Wait to gain the lock,
// to ensure the write completes.
mTableLock.Lock();
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
} else {
// We got the lock. Keep the following in sync with
// MaybeWriteOffMainThread:
WaitOnPrefetchThread();
mDirty = true;
mCacheData.reset();
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// Most of this should be redundant given MaybeWriteOffMainThread should
// have run before now.
auto writeResult = WriteToDisk();
Unused << NS_WARN_IF(writeResult.isErr());
// We've had the lock, and `WriteToDisk()` sets mWrittenOnce and mDirty
// when done, and checks for them when starting, so we don't need to do
// anything else.
}
mTableLock.Unlock();
}
void StartupCache::IgnoreDiskCache() {
gIgnoreDiskCache = true;
if (gStartupCache) gStartupCache->InvalidateCache();
}
void StartupCache::WaitOnPrefetchThread() {
if (!mPrefetchThread || mPrefetchThread == PR_GetCurrentThread()) return;
PR_JoinThread(mPrefetchThread);
mPrefetchThread = nullptr;
}
void StartupCache::ThreadedPrefetch(void* aClosure) {
AUTO_PROFILER_REGISTER_THREAD("StartupCache");
NS_SetCurrentThreadName("StartupCache");
mozilla::IOInterposer::RegisterCurrentThread();
StartupCache* startupCacheObj = static_cast<StartupCache*>(aClosure);
uint8_t* buf = startupCacheObj->mCacheData.get<uint8_t>().get();
size_t size = startupCacheObj->mCacheData.size();
MMAP_FAULT_HANDLER_BEGIN_BUFFER(buf, size)
PrefetchMemory(buf, size);
MMAP_FAULT_HANDLER_CATCH()
mozilla::IOInterposer::UnregisterCurrentThread();
}
bool StartupCache::ShouldCompactCache() {
// If we've requested less than 4/5 of the startup cache, then we should
// probably compact it down. This can happen quite easily after the first run,
// which seems to request quite a few more things than subsequent runs.
CheckedInt<uint32_t> threshold = CheckedInt<uint32_t>(mTable.count()) * 4 / 5;
MOZ_RELEASE_ASSERT(threshold.isValid(), "Runaway StartupCache size");
return mRequestedCount < threshold.value();
}
/*
* The write-thread is spawned on a timeout(which is reset with every write).
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
* This can avoid a slow shutdown.
*/
void StartupCache::WriteTimeout(nsITimer* aTimer, void* aClosure) {
/*
* It is safe to use the pointer passed in aClosure to reference the
* StartupCache object because the timer's lifetime is tightly coupled to
* the lifetime of the StartupCache object; this timer is canceled in the
* StartupCache destructor, guaranteeing that this function runs if and only
* if the StartupCache object is valid.
*/
StartupCache* startupCacheObj = static_cast<StartupCache*>(aClosure);
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
startupCacheObj->MaybeWriteOffMainThread();
}
/*
* See StartupCache::WriteTimeout above - this is just the non-static body.
*/
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
void StartupCache::MaybeWriteOffMainThread() {
if (mWrittenOnce) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return;
}
if (mCacheData.initialized() && !ShouldCompactCache()) {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
return;
}
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// Keep this code in sync with EnsureShutdownWriteComplete.
WaitOnPrefetchThread();
mDirty = true;
mCacheData.reset();
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
RefPtr<StartupCache> self = this;
nsCOMPtr<nsIRunnable> runnable =
NS_NewRunnableFunction("StartupCache::Write", [self]() mutable {
MutexAutoLock unlock(self->mTableLock);
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
auto result = self->WriteToDisk();
Unused << NS_WARN_IF(result.isErr());
});
NS_DispatchBackgroundTask(runnable.forget(), NS_DISPATCH_EVENT_MAY_BLOCK);
}
// We don't want to refcount StartupCache, so we'll just
// hold a ref to this and pass it to observerService instead.
NS_IMPL_ISUPPORTS(StartupCacheListener, nsIObserver)
nsresult StartupCacheListener::Observe(nsISupports* subject, const char* topic,
const char16_t* data) {
StartupCache* sc = StartupCache::GetSingleton();
if (!sc) return NS_OK;
if (strcmp(topic, NS_XPCOM_SHUTDOWN_OBSERVER_ID) == 0) {
// Do not leave the thread running past xpcom shutdown
sc->WaitOnPrefetchThread();
StartupCache::gShutdownInitiated = true;
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// Note that we don't do anything special for the background write
// task; we expect the threadpool to finish running any tasks already
// posted to it prior to shutdown. FastShutdown will call
// EnsureShutdownWriteComplete() to ensure any pending writes happen
// in that case.
} else if (strcmp(topic, "startupcache-invalidate") == 0) {
sc->InvalidateCache(data && nsCRT::strcmp(data, u"memoryOnly") == 0);
} else if (strcmp(topic, "intl:app-locales-changed") == 0) {
// Live language switching invalidates the startup cache due to the history
// sidebar retaining localized strings in its internal SQL query. This
// should be a relatively rare event, but a user could do it an arbitrary
// number of times.
sc->CountAllowedInvalidation();
}
return NS_OK;
}
nsresult StartupCache::GetDebugObjectOutputStream(
nsIObjectOutputStream* aStream, nsIObjectOutputStream** aOutStream) {
NS_ENSURE_ARG_POINTER(aStream);
#ifdef DEBUG
auto* stream = new StartupCacheDebugOutputStream(aStream, &mWriteObjectMap);
NS_ADDREF(*aOutStream = stream);
#else
NS_ADDREF(*aOutStream = aStream);
#endif
return NS_OK;
}
nsresult StartupCache::ResetStartupWriteTimerCheckingReadCount() {
nsresult rv = NS_OK;
if (!mTimer)
mTimer = NS_NewTimer();
else
rv = mTimer->Cancel();
NS_ENSURE_SUCCESS(rv, rv);
// Wait for the specified timeout, then write out the cache.
mTimer->InitWithNamedFuncCallback(
StartupCache::WriteTimeout, this, STARTUP_CACHE_WRITE_TIMEOUT * 1000,
nsITimer::TYPE_ONE_SHOT, "StartupCache::WriteTimeout");
return NS_OK;
}
nsresult StartupCache::ResetStartupWriteTimer() {
Bug 1550108 - Change StartupCache format from zip to custom r=froydnj I am not aware of anything that depends on StartupCache being a zip file, and since I want to use lz4 compression because inflate is showing up quite a lot in profiles, it's simplest to just use a custom format. This loosely mimicks the ScriptPreloader code, with a few diversions: - Obviously the contents of the cache are compressed. I used lz4 for this as I hit the same file size as deflate at a compression level of 1, which is what the StartupCache was using previously, while decompressing an order of magnitude faster. Seemed like the most conservative change to make. I think it's worth investigating what the impact of slower algs with higher ratios would be, but for right now I settled on this. We'd probably want to look at zstd next. - I use streaming compression for this via lz4frame. This is not strictly necessary, but has the benefit of not requiring as much memory for large buffers, as well as giving us a built-in checksum, rather than relying on the much slower CRC that we were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to jump back to add the offset and compressed size, which would make the nice Code(...) method for the ScriptPreloader stuff rather more complex. Open to cleaner solutions, but moving it out just felt like extra hoops for the reader to jump through to understand without the benefit of being more concise. Differential Revision: https://phabricator.services.mozilla.com/D34652 --HG-- extra : moz-landing-system : lando
2019-10-04 23:44:59 +03:00
mDirty = true;
nsresult rv = NS_OK;
if (!mTimer)
mTimer = NS_NewTimer();
else
rv = mTimer->Cancel();
NS_ENSURE_SUCCESS(rv, rv);
// Wait for the specified timeout, then write out the cache.
mTimer->InitWithNamedFuncCallback(
StartupCache::WriteTimeout, this, STARTUP_CACHE_WRITE_TIMEOUT * 1000,
nsITimer::TYPE_ONE_SHOT, "StartupCache::WriteTimeout");
return NS_OK;
}
Bug 1614795 - use the background task queue for startupcache writes, r=dthayer,decoder Prior to this patch, the startupcache created its own mWriteThread off which it wrote to disk. It's initialized by MaybeSpawnWriteThread, which got called at shutdown, to do the shutdown write if there was any reason to do so, and from a timer that is re-initialized after every addition to the startup cache, to run 60s after the last change to the cache. It then joined that write thread on the main thread (in other words, blocks on that off-main-thread write completing from the main thread) when: - xpcom-shutdown fired - the startupcache itself gets destroyed - someone calls any of: * HasEntry * GetBuffer * PutBuffer * InvalidateCache This patch removes the separate write thread, and instead dispatches a task to the background task queue, indicating it can block. The task is started in the same circumstances where we previously used to write (timer from the last PutBuffer call, and shutdown if necessary). To ensure it cannot be trying to use the data it writes out (mTable) from the other thread while that data changes on the main thread, we use a mutex. The task locks the mutex before starting, and unlocks when finished. Enumerating the cases that we used to block on joining the thread: In terms of application shutdown, we expect the background task queue to either finish the write task, or fail to run it if it hasn't started it yet. In the FastStartup case, we check if a write was necessary; if so, we attempt to gain the lock without waiting. If we're successful, the write has not yet started, and we instead run the write on the main thread. Otherwise, we retry gaining the lock, blocking this time, thus guaranteeing the off-the-main-thread write completes. The task keeps a reference to the startupcache object, so it cannot be destroyed while the task is pending. Because the write does not modify `mTable`, and neither does `HasEntry`, we do not need to do anything there. In the `GetBuffer` case, we do not modify the table unless we have to read the entry off disk (memmapped into `mCacheData`). This can only happen if `mCacheData.initialized()` returns true, and we specifically call `mCacheData.reset()` before firing off the write task to avoid this. `mCacheData` is only re-initialized if someone calls `LoadArchive()`, which can only happen from `Init()` (which is guaranteed not to run again because this is a singleton), or `InvalidateCache()`, where we lock the mutex (see below). So this is safe - but we assert on the lock to try and avoid people breaking this chain of assumptions in the future. When `PutBuffer` is called, we try to lock the mutex - but if locking fails (ie the background thread is writing), we simply fail to store the entry in the startupcache. In practice, this should be rare - it'd happen if new calls to PutBuffer happen while writing during shutdown (when really, we don't care) or when it's been 60 seconds since the last PutBuffer so we started writing the startupcache. When InvalidateCache is called, we lock the mutex - we shouldn't try to write while invalidating, or invalidate while writing. This may be slow, but in practice nothing should call `InvalidateCache` except developer restarts or the `-purgecaches` commandline flag, so it shouldn't matter a great deal. Differential Revision: https://phabricator.services.mozilla.com/D70413 --HG-- extra : moz-landing-system : lando
2020-04-15 23:43:44 +03:00
// Used only in tests:
bool StartupCache::StartupWriteComplete() {
// Need to have written to disk and not added new things since;
return !mDirty && mWrittenOnce;
}
// StartupCacheDebugOutputStream implementation
#ifdef DEBUG
NS_IMPL_ISUPPORTS(StartupCacheDebugOutputStream, nsIObjectOutputStream,
nsIBinaryOutputStream, nsIOutputStream)
bool StartupCacheDebugOutputStream::CheckReferences(nsISupports* aObject) {
nsresult rv;
nsCOMPtr<nsIClassInfo> classInfo = do_QueryInterface(aObject);
if (!classInfo) {
NS_ERROR("aObject must implement nsIClassInfo");
return false;
}
uint32_t flags;
rv = classInfo->GetFlags(&flags);
NS_ENSURE_SUCCESS(rv, false);
if (flags & nsIClassInfo::SINGLETON) return true;
bool inserted = mObjectMap->EnsureInserted(aObject);
if (!inserted) {
NS_ERROR(
"non-singleton aObject is referenced multiple times in this"
"serialization, we don't support that.");
}
return inserted;
}
// nsIObjectOutputStream implementation
nsresult StartupCacheDebugOutputStream::WriteObject(nsISupports* aObject,
bool aIsStrongRef) {
nsCOMPtr<nsISupports> rootObject(do_QueryInterface(aObject));
NS_ASSERTION(rootObject.get() == aObject,
"bad call to WriteObject -- call WriteCompoundObject!");
bool check = CheckReferences(aObject);
NS_ENSURE_TRUE(check, NS_ERROR_FAILURE);
return mBinaryStream->WriteObject(aObject, aIsStrongRef);
}
nsresult StartupCacheDebugOutputStream::WriteSingleRefObject(
nsISupports* aObject) {
nsCOMPtr<nsISupports> rootObject(do_QueryInterface(aObject));
NS_ASSERTION(rootObject.get() == aObject,
"bad call to WriteSingleRefObject -- call WriteCompoundObject!");
bool check = CheckReferences(aObject);
NS_ENSURE_TRUE(check, NS_ERROR_FAILURE);
return mBinaryStream->WriteSingleRefObject(aObject);
}
nsresult StartupCacheDebugOutputStream::WriteCompoundObject(
nsISupports* aObject, const nsIID& aIID, bool aIsStrongRef) {
nsCOMPtr<nsISupports> rootObject(do_QueryInterface(aObject));
nsCOMPtr<nsISupports> roundtrip;
rootObject->QueryInterface(aIID, getter_AddRefs(roundtrip));
NS_ASSERTION(roundtrip.get() == aObject,
"bad aggregation or multiple inheritance detected by call to "
"WriteCompoundObject!");
bool check = CheckReferences(aObject);
NS_ENSURE_TRUE(check, NS_ERROR_FAILURE);
return mBinaryStream->WriteCompoundObject(aObject, aIID, aIsStrongRef);
}
nsresult StartupCacheDebugOutputStream::WriteID(nsID const& aID) {
return mBinaryStream->WriteID(aID);
}
char* StartupCacheDebugOutputStream::GetBuffer(uint32_t aLength,
uint32_t aAlignMask) {
return mBinaryStream->GetBuffer(aLength, aAlignMask);
}
void StartupCacheDebugOutputStream::PutBuffer(char* aBuffer, uint32_t aLength) {
mBinaryStream->PutBuffer(aBuffer, aLength);
}
#endif // DEBUG
} // namespace scache
} // namespace mozilla