Currently the Gecko Profiler defines a moderate amount of stuff when
MOZ_GECKO_PROFILER is undefined. It also #includes various headers, including
JS ones. This is making it difficult to separate Gecko's media stack for
inclusion in Servo.
This patch greatly simplifies how things are exposed. The starting point is:
- GeckoProfiler.h can be #included unconditionally;
- everything else from the profiler must be guarded by MOZ_GECKO_PROFILER.
In practice this introduces way too many #ifdefs, so the patch loosens it by
adding no-op macros for a number of the most common operations.
The net result is that #ifdefs and macros are used a bit more, but almost
nothing is exposed in non-MOZ_GECKO_PROFILER builds (including
ProfilerMarkerPayload.h and GeckoProfiler.h), and understanding what is exposed
is much simpler than before.
Note also that in BHR, ThreadStackHelper is now entirely absent in
non-MOZ_GECKO_PROFILER builds.
(Path is actually r=froydnj.)
Bug 1400459 devirtualized nsIAtom so that it is no longer a subclass of
nsISupports. This means that nsAtom is now a better name for it than nsIAtom.
MozReview-Commit-ID: 91U22X2NydP
--HG--
rename : xpcom/ds/nsIAtom.h => xpcom/ds/nsAtom.h
extra : rebase_source : ac3e904a21b8b48e74534fff964f1623ee937c67
Adds a new directory provider key "XRESysExtDev" to be used by system extension
developers needing to load system extensions from a directory readable by
sandboxed content processes.
MozReview-Commit-ID: 4BKOZoPzCC3
--HG--
extra : rebase_source : 452db8d53a1f0248a080f858c48492978b5db808
Adds a new directory provider key "XRESysExtDev" to be used by system extension
developers needing to load system extensions from a directory readable by
sandboxed content processes.
MozReview-Commit-ID: 4BKOZoPzCC3
--HG--
extra : rebase_source : 8def79f66944f03943ea082dc3bbe746b7382010
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
The FileLocation(nsIZipArchive*) constructor is declared, but not actually
implemented, so attempting to use it causes a linking error.
Additionally, when a FileLocation is created from an existing zip archive
(such as one from the zip cache or the Omnijar service), it's helpful to have
direct access to that archive rather than having to open a new copy, or infer
it from the path.
MozReview-Commit-ID: 2U14gAm0FYL
--HG--
extra : rebase_source : 97a420f5cc1a97a24debee3764989916be79bcaf
To mitigate risk for beta uplift, the logic here is limited to what we need for QueryDosDeviceW on Win7x64. A better long-term fix would be to combine this with the more general mov logic in the REX.W section.
MozReview-Commit-ID: BykQSYY61Ua
--HG--
extra : rebase_source : 84ad8ff28865cb04e21d6234a09b06202ca4d363
We should not be declaring forward declarations for nsString classes directly,
instead we should use nsStringFwd.h. This will make changing the underlying
types easier.
--HG--
extra : rebase_source : b2c7554e8632f078167ff2f609392e63a136c299
Revert revision f760842b14a2, 051b765ca8f2 and 01125b5142e5 since the original
bug that we run out of TLS slots on Windows is no longer showing up after
firefox55. It should have been fixed elsewhere, very likely in the rust part.
MozReview-Commit-ID: 9j5hFSGT3OE
MozStackWalk() is different on Windows to the other platforms. It has two extra
arguments, which can be used to walk the stack of a different thread.
This patch makes those differences clearer. Instead of having a single function
and forbidding those two arguments on non-Windows, it removes those arguments
from MozStackWalk, and splits off MozStackWalkThread() which retains them. This
also allows those arguments to have more appropriate types (HANDLE instead of
uintptr_t; CONTEXT* instead of than void*) and names (aContext instead of
aPlatformData).
The patch also removes unnecessary reinterpret_casts for the aClosure argument
at a couple of MozStackWalk() callsites.
--HG--
extra : rebase_source : 111ab7d6426d7be921facc2264f6db86c501d127
As well as the straightforward things, this lets us remove ReadSysFile and
WriteSysFile, which in turn lets us remove TestFileUtils.cpp.
--HG--
extra : rebase_source : fc90c05352e654ffc41009d8504a9c54f394fc3f
This patch gives some structure and order to the profiler's API.
It also renames AutoProfilerRegister as AutoProfilerRegisterThread, to match
profiler_register_thread().
PROFILER_MARKER is now just a trivial wrapper for profiler_add_marker(). This
patch removes it.
--HG--
extra : rebase_source : 9858f34763bb343757896a91ab7ad8bd8e56b076
This patch reduces the differences between builds where the profiler is enabled
and those where the profiler is disabled. It does this by removing numerous
MOZ_GECKO_PROFILER checks.
These changes have the following consequences.
- Various functions and classes are now defined in all builds, and so can be
used unconditionally: profiler_add_marker(), profiler_set_js_context(),
profiler_clear_js_context(), profiler_get_pseudo_stack(), AutoProfilerLabel.
(They are effectively no-ops in non-profiler builds, of course.)
- The no-op versions of PROFILER_* are now gone. The remaining versions are
almost no-ops when the profiler isn't built.
--HG--
extra : rebase_source : 8fb5e8757600210c2f77865694d25162f0b7698a
xpcom/build/PoisonIOInterposerBase.cpp:79:21 [-Wunused-member-function] unused member function 'Clear'
MozReview-Commit-ID: ITI1mZk0DTx
--HG--
extra : source : 77e0683d430bac85376b476a783862db7acc384c
extra : intermediate-source : 935b7d158e4a1d2727998b0c2e7a9122ebefa257
The code already handles this if the r32 is eax. This allows it to use the other 32-bit registers.
--HG--
extra : histedit_source : 1cff5b54640cc48a0574b0b4323ad909e8a7e7b2
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB
--HG--
extra : source : 1c7df945505930d2d86a076ee20807104324c8cc
extra : histedit_source : 75e193839edf727874f01b2a9f6852f6c1f087fb%2C3ce966d7dcf2bd0454a7d673d0467097456bd782
One of the things that I've noticed in profiling startup overhead is that,
even with the startup cache, we spend about 130ms just loading and decoding
scripts from the startup cache on my machine.
I think we should be able to do better than that by doing some of that work in
the background for scripts that we know we'll need during startup. With this
change, we seem to consistently save about 3-5% on non-e10s startup overhead
on talos. But there's a lot of room for tuning, and I think we get some
considerable improvement with a few ongoing tweeks.
Some notes about the approach:
- Setting up the off-thread compile is fairly expensive, since we need to
create a global object, and a lot of its built-in prototype objects for each
compile. So in order for there to be a performance improvement for OMT
compiles, the script has to be pretty large. Right now, the tipping point
seems to be about 20K.
There's currently no easy way to improve the per-compile setup overhead, but
we should be able to combine the off-thread compiles for multiple smaller
scripts into a single operation without any additional per-script overhead.
- The time we spend setting up scripts for OMT compile is almost entirely
CPU-bound. That means that we have a chunk of about 20-50ms where we can
safely schedule thread-safe IO work during early startup, so if we schedule
some of our current synchronous IO operations on background threads during the
script cache setup, we basically get them for free, and can probably increase
the number of scripts we compile in the background.
- I went with an uncompressed mmap of the raw XDR data for a storage format.
That currently occupies about 5MB of disk space. Gzipped, it's ~1.2MB, so
compressing it might save some startup disk IO, but keeping it uncompressed
simplifies a lot of the OMT and even main thread decoding process, but, more
importantly:
- We currently don't use the startup cache in content processes, for a variety
of reasons. However, with this approach, I think we can safely store the
cached script data from a content process before we load any untrusted code
into it, and then share mmapped startup cache data between all content
processes. That should speed up content process startup *a lot*, and very
likely save memory, too. And:
- If we're especially concerned about saving per-process memory, and we keep
the cache data mapped for the lifetime of the JS runtime, I think that with
some effort we can probably share the static string data from scripts between
content processes, without any copying. Right now, it looks like for the main
process, there's about 1.5MB of string-ish data in the XDR dumps. It's
probably less for content processes, but if we could save .5MB per process
this way, it might make it easier to increase the number of content processes
we allow.
MozReview-Commit-ID: CVJahyNktKB
--HG--
extra : rebase_source : 2ec24c8b0000f9187a9bf4a096ee8d93403d7ab2
extra : absorb_source : bb9d799d664a03941447a294ac43c54f334ef6f5
NS_SetCurrentThreadName() is added as an alternative to PR_SetCurrentThreadName()
inside libxul. The thread names are collected in the form of crash annotation to
be processed on socorro.
MozReview-Commit-ID: 4RpAWzTuvPs
Separate AbstractThread::InitTLS and
AbstractThread::InitMainThread. Init AbstractThread main thread when
init nsThreadManager. Init AbstractThread TLS for all content process
types because for plugin and gmp processes we are doing IPC even
without init XPCOM and for content process init XPCOM requires IPC.
MozReview-Commit-ID: DhLub23oZz8
--HG--
extra : rebase_source : 6e4bfa03ec69e1eb694924903f1fa5e7259cbba3
This tracks TlsAlloc() and TlsFree() calls on Windows for diagnosing crashes when a proces reaches
its limit (1088) for TLS slots. Tracking of TLS allocation is done by intercepting TlsAlloc() and
TlsFree() in kernel32.dll. After initialization, we start tracking the number of allocated TLS
slots. If the number of observed TLS allocations exceeds a high water mark, we record the stack
when TlsAlloc() is called, and the recorded stacks gets serialized in a JSON string ready for
crash annotation.
MozReview-Commit-ID: 5fHVr0eiMy5
This change moves us away from NSPR primitives for our primary
synchronization primitives. We're still using PRMonitor for
ReentrantMonitor, however.
The benefits of this change:
* Slightly faster, as we don't have to deal with some of NSPR's overhead;
* Smaller datatypes. On POSIX platforms in particular, PRLock is
enormous. PRCondVar also has some unnecessary overhead.
* Less dynamic memory allocation. Out of necessity, Mutex and CondVar
allocated the NSPR data structures they needed, which lead to
unnecessary checks for failure.
While sizeof(Mutex) and sizeof(CondVar) may get bigger, since they're
embedding structures now, the total memory usage should be less.
* Less NSPR usage. This shouldn't need any explanation.
PseudoContext::sampleContext() is always called immediately after
profiler_get_pseudo_stack(). This patch introduces profiler_set_js_context()
and profiler_clear_js_context(), which replace the profiler_get_pseudo_stack()
+ sampleContext() pairs. This takes us a step closer to not having to export
PseudoStack outside the profiler.
--HG--
extra : rebase_source : 8558d1600eafd395cc696d31f3d21fb52a1a74b0
This tracks TlsAlloc() and TlsFree() calls on Windows for diagnosing crashes when a proces reaches
its limit (1088) for TLS slots. Tracking of TLS allocation is done by intercepting TlsAlloc() and
TlsFree() in kernel32.dll. After initialization, we start tracking the number of allocated TLS
slots. If the number of observed TLS allocations exceeds a high water mark, we record the stack
when TlsAlloc() is called, and the recorded stacks gets serialized in a JSON string ready for
crash annotation.
MozReview-Commit-ID: 5fHVr0eiMy5
Bug 1343075 - 1a. Add TextEventDispatcherListener::GetIMEUpdatePreference; r=masayuki
Add a GetIMEUpdatePreference method to TextEventDispatcherListener to
optionally control which IME notifications are received by NotifyIME.
This patch also makes nsBaseWidget forward its GetIMEUpdatePreference
call to the widget's native TextEventDispatcherListener.
Bug 1343075 - 1b. Implement GetIMEUpdatePreference for all TextEventDispatcherListener; r=masayuki
This patch implements GetIMEUpdatePreference for all
TextEventDispatcherListener implementations, by moving previous
implementations of nsIWidget::GetIMEUpdatePreference.
Bug 1343075 - 2. Allow setting a PuppetWidget's native TextEventDispatcherListener; r=masayuki
In PuppetWidget, add getter and setter for the widget's native
TextEventDispatcherListener. This allows overriding of PuppetWidget's
default IME handling. For example, on Android, the PuppetWidget's native
TextEventDispatcherListener will communicate directly with Java IME code
in the main process.
Bug 1343075 - 3. Add AIDL interface for main process; r=rbarker
Add AIDL definition and implementation for an interface for the main
process that child processes can access.
Bug 1343075 - 4. Set Gecko thread JNIEnv for child process; r=snorp
Add a JNIEnv* parameter to XRE_SetAndroidChildFds, which is used to set
the Gecko thread JNIEnv for child processes. XRE_SetAndroidChildFds is
the only Android-specific entry point for child processes, so I think
it's the most logical place to initialize JNI.
Bug 1343075 - 5. Support multiple remote GeckoEditableChild; r=esawin
Support remote GeckoEditableChild instances that are created in the
content processes and connect to the parent process GeckoEditableParent
through binders.
Support having multiple GeckoEditableChild instances in GeckoEditable by
keeping track of which child is currently focused, and only allow
calls to/from the focused child by using access tokens.
Bug 1343075 - 6. Add method to get GeckoEditableParent instance; r=esawin
Add IProcessManager.getEditableParent, which a content process can call
to get the GeckoEditableParent instance that corresponds to a given
content process tab, from the main process.
Bug 1343075 - 7. Support GeckoEditableSupport in content processes; r=esawin
Support creating and running GeckoEditableSupport attached to a
PuppetWidget in content processes.
Because we don't know PuppetWidget's lifetime as well as nsWindow's,
when attached to PuppetWidget, we need to attach/detach our native
object on focus/blur, respectively.
Bug 1343075 - 8. Connect GeckoEditableSupport on PuppetWidget creation; r=esawin
Listen to the "tab-child-created" notification and attach our content
process GeckoEditableSupport to the new PuppetWidget.
Bug 1343075 - 9. Update auto-generated bindings; r=me
The profiler can currently handle nested calls to profiler_{init,shutdown}() --
only the first call to profiler_init() and the last call to profiler_shutdown()
do anything. And sure enough, we have the following.
- Outer init/shutdown pairs in XRE_main()/XRE_InitChildProcess() (via
GeckoProfilerInitRAII).
- Inner init/shutdown pairs in NS_InitXPCOM2()/NS_InitMinimalXPCOM() (both shut
down in ShutdownXPCOM()).
This is a bit silly, so the patch removes the inner pairs, and adds a
now-needed pair in XRE_XPCShellMain. This will allow gInitCount -- which tracks
the nesting depth -- to be removed in a future patch.
--HG--
extra : rebase_source : 7e8dc6ce81ce10269d2db6a7bf32852c396dba0e
Hook this into the browser via the XREAppData. This patch does not include the changes to Chromium source code.
--HG--
extra : rebase_source : 4d5637bcdbeae605b0b99e9192598d48f371b698
Added ASSERTions to nsWindowsDllInterceptor in case of a failed detour hook, with an exception for the RET opcode that appears in ImmReleaseContext. Added documentation about TestDllInterceptor.
--HG--
extra : rebase_source : a3c6fe0949f5503979a062bdaa5f35526ddee73b
This includes a near-jump CALL instruction in x64, which expands to a far-jump CALL with a 64-bit address as inline data. This requires us to abandon the method where we memcpy the code block into the trampoline and, instead, build the trampoline function as we go.
--HG--
extra : rebase_source : 7f90ce5ba1a82dff731aff1ac17117c684b7b2cf
Hook this into the browser via the XREAppData. This patch does not include the changes to Chromium source code.
--HG--
extra : rebase_source : e34e8b50101cc40ded26e80791052123b24c8243
extra : histedit_source : 69c9b2dc91546adbfdad03b5d43842809191ffb9
Added ASSERTions to nsWindowsDllInterceptor in case of a failed detour hook, with an exception for the RET opcode that appears in ImmReleaseContext. Added documentation about TestDllInterceptor.
--HG--
extra : rebase_source : 48a38a09a1feb63600e12eba997a83f646cd1595
extra : histedit_source : 566cec5c47c400402e2e4dfa0cdc6d53d82b0815
This includes a near-jump CALL instruction in x64, which expands to a far-jump CALL with a 64-bit address as inline data. This requires us to abandon the method where we memcpy the code block into the trampoline and, instead, build the trampoline function as we go.
--HG--
extra : rebase_source : f0362c4b8200ba3d05191fdd45c5783dccd444bc
extra : histedit_source : 3018adf0c7d5849f87adc5e5459acf9f0e56301c
Removing the exported symbols is straightforward enough.
The only wrinkle is that PeerConnectionCtx.cpp gets compiled with and
without MOZILLA_INTERNAL_API. When compiling without
MOZILLA_INTERNAL_API, mozilla::services::Get*Service was redirected to
the _external_* symbol variants. But as the _external_* symbols no
longer exist, PeerConnectionCtx.cpp's code no longer worked.
Fortunately, PeerConnectionCtx.cpp already contains a few #ifdef
MOZILLA_INTERNAL_API blocks to handle internal/external compilation;
fixing this newest issue was just a matter of extending existing blocks
and adding new ones. The key observation is that we never added any
observers when compiling without MOZILLA_INTERNAL_API, so we can #ifdef
out the removals of observers, as those would have no effect, and simply
skip getting the observer service for the observer additions if we're
compiling without MOZILLA_INTERNAL_API.
The GMPLoader code was in plugin-container so that it was covered by
Adobe's voucher of plugin-container, but that's no longer necessary.
MozReview-Commit-ID: 3VRBAohRI9I
--HG--
extra : rebase_source : 58a30855ade14af4c4b1420edabd3abb398f232e
Also remove the #includes of some unused header files.
MozReview-Commit-ID: 6mRoIazEA3j
--HG--
extra : rebase_source : 6f96d22543509bf09b684b0bfbfa624eafc58b94
SHGetKnownFolderPath() is available on Windows Vista+ so we no longer need to GetProcAddress("SHGetKnownFolderPath"). We can set _WIN32_WINNT 0x0600 to expose the SHGetKnownFolderPath function declaration in shlobj.h and call it directly.
Also remove a redundant #include <shlobj.h>.
MozReview-Commit-ID: AoAlrfvQ5AB
--HG--
extra : rebase_source : 2fa3a0d3d122ca31fb3369a43a03b6e2c5d5dec2
Now that XRE_LibFuzzerGetFuncs is not used from outside libxul, it can
be inlined in LibFuzzerRunner::Run, simplifying things a little more.
--HG--
extra : rebase_source : 7a09f37444bf02983b232c964d85e7a866221f48