gecko-dev/js/xpconnect/loader/AutoMemMap.h

78 строки
1.9 KiB
C
Исходник Обычный вид История

Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 4; -*- */
/* This Source Code Form is subject to the terms of the Mozilla Public
* License, v. 2.0. If a copy of the MPL was not distributed with this
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
#ifndef loader_AutoMemMap_h
#define loader_AutoMemMap_h
#include "mozilla/FileUtils.h"
#include "mozilla/MemoryReporting.h"
#include "mozilla/RangedPtr.h"
#include "mozilla/Result.h"
#include "mozilla/ipc/FileDescriptor.h"
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
#include "nsIMemoryReporter.h"
#include <prio.h>
class nsIFile;
namespace mozilla {
namespace loader {
using mozilla::ipc::FileDescriptor;
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
class AutoMemMap
{
public:
AutoMemMap() = default;
~AutoMemMap();
Result<Ok, nsresult>
init(nsIFile* file, int flags = PR_RDONLY, int mode = 0,
PRFileMapProtect prot = PR_PROT_READONLY);
Result<Ok, nsresult>
init(const ipc::FileDescriptor& file);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
bool initialized() { return addr; }
uint32_t size() const { MOZ_ASSERT(fd); return size_; }
template<typename T = void>
const RangedPtr<T> get()
{
MOZ_ASSERT(addr);
return { static_cast<T*>(addr), size_ };
}
template<typename T = void>
const RangedPtr<T> get() const
{
MOZ_ASSERT(addr);
return { static_cast<T*>(addr), size_ };
}
size_t nonHeapSizeOfExcludingThis() { return size_; }
FileDescriptor cloneFileDescriptor();
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
private:
Result<Ok, nsresult> initInternal(PRFileMapProtect prot = PR_PROT_READONLY);
Bug 1359653: Part 5 - Pre-load scripts needed during startup in a background thread. r=shu,erahm One of the things that I've noticed in profiling startup overhead is that, even with the startup cache, we spend about 130ms just loading and decoding scripts from the startup cache on my machine. I think we should be able to do better than that by doing some of that work in the background for scripts that we know we'll need during startup. With this change, we seem to consistently save about 3-5% on non-e10s startup overhead on talos. But there's a lot of room for tuning, and I think we get some considerable improvement with a few ongoing tweeks. Some notes about the approach: - Setting up the off-thread compile is fairly expensive, since we need to create a global object, and a lot of its built-in prototype objects for each compile. So in order for there to be a performance improvement for OMT compiles, the script has to be pretty large. Right now, the tipping point seems to be about 20K. There's currently no easy way to improve the per-compile setup overhead, but we should be able to combine the off-thread compiles for multiple smaller scripts into a single operation without any additional per-script overhead. - The time we spend setting up scripts for OMT compile is almost entirely CPU-bound. That means that we have a chunk of about 20-50ms where we can safely schedule thread-safe IO work during early startup, so if we schedule some of our current synchronous IO operations on background threads during the script cache setup, we basically get them for free, and can probably increase the number of scripts we compile in the background. - I went with an uncompressed mmap of the raw XDR data for a storage format. That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so compressing it might save some startup disk IO, but keeping it uncompressed simplifies a lot of the OMT and even main thread decoding process, but, more importantly: - We currently don't use the startup cache in content processes, for a variety of reasons. However, with this approach, I think we can safely store the cached script data from a content process before we load any untrusted code into it, and then share mmapped startup cache data between all content processes. That should speed up content process startup *a lot*, and very likely save memory, too. And: - If we're especially concerned about saving per-process memory, and we keep the cache data mapped for the lifetime of the JS runtime, I think that with some effort we can probably share the static string data from scripts between content processes, without any copying. Right now, it looks like for the main process, there's about 1.5MB of string-ish data in the XDR dumps. It's probably less for content processes, but if we could save .5MB per process this way, it might make it easier to increase the number of content processes we allow. MozReview-Commit-ID: CVJahyNktKB --HG-- extra : source : 1c7df945505930d2d86a076ee20807104324c8cc extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
2017-05-06 22:24:22 +03:00
AutoFDClose fd;
PRFileMap* fileMap = nullptr;
uint32_t size_ = 0;
void* addr = nullptr;
AutoMemMap(const AutoMemMap&) = delete;
void operator=(const AutoMemMap&) = delete;
};
} // namespace loader
} // namespace mozilla
#endif // loader_AutoMemMap_h