Two improvements.
- More testing of markers by introducing and using a ProfilerMarkerPayload
subclass that keeps track of how many instances have been created, streamed,
and destroyed. This change would have caught the defect that was in the first
landing of part 3 of this bug.
- More testing of JSON output, to check for various substrings that are
expected to be present.
I think all of these assertions are now unnecessary.
MozReview-Commit-ID: 9EI195QsizN
--HG--
extra : rebase_source : 4f03ef02ba6680ee6cad22d5d3f347db7d70aa9c
extra : intermediate-source : 2b8d50fcb20fc0bd808707aae00d6cbcb4536bac
extra : source : 327c145ded03d39970351a9cc01492f0541d9149
- The profiler gives the JS engine a reference to the pseudo-stack via
SetContextProfiilngStack(). That function now takes a |PseudoStack*| instead
of a |ProfileEntry*| and pointer to the stack pointer.
- PseudoStack now has a |kMaxEntries| field, which is easier to work with than
|mozilla::ArrayLength(entries)|.
- AddressOfStackPointer() is no longer needed.
- The patch also neatens up the push operations significantly. PseudoStack now
has pushCppFrame() and pushJsFrame(), which nicely encapsulate the two main
cases. These delegate to the updated initCppFrame() and initJsFrame()
functions in ProfileEntry.
- Renames max_stck in testProfileStrings.cpp as peakStackPointer, which is a
clearer name.
- Removes a couple of checks from testProfileStrings.cpp. These checks made
sense when the pseudo-stack was accessed via raw manipulation, but are not
applicable now because we can't artificially limit the maximum stack size so
easily.
This includes renaming its fields to match SpiderMonkey naming conventions
instead of Gecko naming conventions.
This patch is just about moving the code. The next patch will change
SpiderMonkey to actually use PseudoStack directly.
--HG--
extra : rebase_source : 27e77ddf950201eb6bdba60003218056442cf7ab
I think all of these assertions are now unnecessary.
MozReview-Commit-ID: 9EI195QsizN
--HG--
extra : rebase_source : ddeb80dfc61ff843b6ba4b35f73d005ca060e429
extra : source : 327c145ded03d39970351a9cc01492f0541d9149
If NotifyObservers is called on a background thread, dispatch a runnable to
the main thread which will send the observer notification. This can result in
awkward orders of the notifications if the profiler functions are called in
quick succession on both the main thread and another thread, but I'm not sure
what to do about that.
MozReview-Commit-ID: GlkVwGTa2b4
--HG--
extra : rebase_source : 628ae91c3adef85c7fb07441b6573eda55e7ba48
extra : intermediate-source : 8c96ac2d762fbed161b6e577f845c1b58ec8bc17
extra : source : b8c27d6137a43ecd031f0b58c62752b7a9cefacb
ProfileEntry has |string|, which can be static or dynamic, and |dynamicString|.
If |string| is dynamic, the FRAME_LABEL_COPY flag must be set, and it will be
copied into profiler output.
But there is only one place that uses dynamic |string| values, in SpiderMonkey.
And that place doesn't use |dynamicString|. So this patch changes that place to
use an empty |string| and put the old dynamic |string| value in
|dynamicString|. This in turn removes the need for FRAME_LABEL_COPY.
One minor wrinkle is that when |dynamicString| is used the old code put a space
between |string| and |dynamicString|. The new code omits the space if |string|
is empty.
The patch also renames ProfileEntry::string as ProfileEntry::label_, which
better matches how it's used, and ProfileEntry::dynamicString as
ProfileEntry::dynamicString_ so the getter can be renamed dynamicString().
profiler_resume() mistakenly has the PSAutoLock outside the scope created
specially for it. This patch move the PSAutoLock inside the scope, making it
the same as profiler_pause().
I think all of these assertions are now unnecessary.
MozReview-Commit-ID: 9EI195QsizN
--HG--
extra : rebase_source : 2a9969dd9e48873f6ec333a5ddfa32b6d938de80
extra : histedit_source : ea4dab2b111e465d3a1e29996cc7f10eb8cdbb67%2C8ab59878e4b1b5a715e0738408c26ac3aa0501e7
If NotifyObservers is called on a background thread, dispatch a runnable to
the main thread which will send the observer notification. This can result in
awkward orders of the notifications if the profiler functions are called in
quick succession on both the main thread and another thread, but I'm not sure
what to do about that.
MozReview-Commit-ID: GlkVwGTa2b4
--HG--
extra : rebase_source : 30537a31cc828ad7828e41c2885d84a8299a5e70
extra : source : b8c27d6137a43ecd031f0b58c62752b7a9cefacb
extra : histedit_source : f1340ef14acde098de281b81d5ea89e386e7ab42
It's a software memory barrier, and not a very strong one. If the values it is
protecting are Atomic, that provides a stronger hardware memory barrier.
This patch removes it, and changes one of the values it was protecting from
|volatile| to Atomic. (The other value it was protecting was already Atomic.)
This patch performs a refactoring to the internals of the profiler in order to
expose a function, profiler_suspend_and_sample_thread, which can be called from a
background thread to suspend, sample the native stack, and then resume the
target passed-in thread.
The interface was designed to expose as few internals of the profiler as
possible, exposing only a single callback which accepts the list of program
counters and stack pointers collected during the backtrace.
A method `profiler_current_thread_id` was also added to get the thread_id of the
current thread, which can then be passed by another thread into
profiler_suspend_sample_thread to sample the stack of that thread.
This is implemented in two parts:
1) Splitting SamplerThread into two classes: Sampler, and SamplerThread.
Sampler was created to extract the core logic from SamplerThread which manages
unix signals on android and linux, as well as suspends the target thread on all
platforms. SamplerThread was then modified to subclass this type, adding the
extra methods and fields required for the creation and management of the actual
Sampler Thread.
Some work was done to ensure that the methods on Sampler would not require
ActivePS to be present, as we intend to sample threads when the profiler is not
active for the Background Hang Reporter.
2) Moving the Tick() logic into the TickController interface.
A TickController interface was added to platform which has 2 methods: Tick and
Backtrace. The Tick method replaces the previous Tick() static method, allowing
it to be overridden by a different consumer of SuspendAndSampleAndResumeThread,
while the Backtrace() method replaces the previous MergeStacksIntoProfile
method, allowing it to be overridden by different consumers of
DoNativeBacktrace.
This interface object is then used to wrap implementation specific data, such as
the ProfilerBuffer, and is threaded through the SuspendAndSampleAndResumeThread
and DoNativeBacktrace methods.
This change added 2 virtual calls to the SamplerThread's critical section, which
I believe should be a small enough overhead that it will not affect profiling
performance. These virtual calls could be avoided using templating, but I
decided that doing so would be unnecessary.
MozReview-Commit-ID: AT48xb2asgV
Given that this basically hogs a core per Firefox process, this code only kicks in if you explicitly select a sub-millisecond resolution.
--HG--
extra : rebase_source : 58ca6f8f6537bc4b809e1634ed177c5d47fd499c
Bug 1359000 moved these functions from GeckoProfiler.h to platform.cpp, which
allowed a lot of follow-up simplifications. But it hurt performance.
This patch moves them back to GeckoProfiler.h and makes them |inline| again.
This required adding a second TLS pointer, sPseudoStack. Comments in the patch
explain why.
--HG--
extra : rebase_source : 4198e32b9e251f4014bce890936f4f85dabeb8ab
This removes the need for PROFILER_LIKELY_MEMORY_CONSTRAINED.
The patch also removes PROFILE_JAVA, USE_FAULTY_LIB, CONFIG_CASE_1,
CONFIG_CASE_2 and replaces all their uses with GP_OS_linux or GP_OS_android.
Finally, the patch removes a bogus |defined(GP_OS_darwin)| condition in
platform-linux-lul.cpp.
--HG--
extra : rebase_source : 77d1c625d65ddf551ab8cd4b962ae48c1a54466c
Currently the profiler mostly uses an array of strings to represent which
features are available and in use. This patch changes the profiler core to use
a uint32_t bitfield, which is a much simpler and faster representation.
(nsProfiler and the profiler add-on still use the array of strings, alas.) The
new ProfilerFeature type defines the values in the bitfield.
One side-effect of this change is that profiler_feature_active() now can be
used to query all features. Previously it was just a subset.
Another side-effect is that profiler_get_available_features() no longer incorrectly
indicates support for Java and stack-walking when they aren't supported. (The
handling of task tracer support is unchanged, because the old code handled it
correctly.)
- Use PROFILER_ consistently as the prefix for macros in this file. (As opposed
to PROFILE_ or SAMPLE_ or SAMPLER_ or MOZ_ or PLATFORM_ or no prefix.)
- Split overly long macros across multiple lines.
- Fix some macro indenting.
We pause/unpause the profiler before/after some streaming operations. But these
pause/unpause pairs occur with gPSMutex locked, and ActivePS::IsPaused() also
requires that gPSMutex be locked. Therefore these pause/unpause pairs cannot
be observed, and so this patch removes them.
ProfilerMarker is simple enough that it's best to fully define it in
ProfilerMarker.h, without introducing a ProfilerMarker.cpp.
This requires moving STORE_SEQUENCER() into its own header, StoreSequencer.h.
As a result, the following types are no longer visible outside the profiler:
ProfilerMarker, ProfilerLinkedList, ProfilerMarkerLinkedList,
ProfilerSignalSafeLinkedList. (PseudoStack.h now contains the PseudoStack class
and nothing else.)
The patch also makes the following non-obvious changes.
- It changes ProfilerMarker::{mMarkerName,mPayload} to unique pointers, which
removes the need for an explicit ~ProfilerMarker().
- It removes ProfilerMarker::GetMarkerName(), because that method is only used
within ProfilerMarker itself.
--HG--
extra : rebase_source : 22bdfb1c9c30751253ed66352d7edd51d308152d
Because ProfilerMarkerPayload is the main type defined in these files, and
because the next patch is going to introduce ProfilerMarker.{h,cpp}, which
would be confusingly similar to the old names.
--HG--
rename : tools/profiler/core/ProfilerMarkers.cpp => tools/profiler/core/ProfilerMarkerPayload.cpp
rename : tools/profiler/public/ProfilerMarkers.h => tools/profiler/public/ProfilerMarkerPayload.h
extra : rebase_source : df22a2ab3867650348ae78fe959ff0366aff230b
None of the accesses to these fields occur in hot operations, so it's
reasonable to do them with gPSMutex held. As a result, mJSSampling doesn't need
to be Atomic<>, and mContext's lack of Atomic-ness is no longer a cause for
concern.
This required adding an extra field, mJSContext, to TickSample.
--HG--
extra : rebase_source : 1485de5e493cef655233507248006574d0ab6ebd
PseudoStack is misnamed: it contains the pseudo-stack plus other stuff that is
accessed via TLS.
This patch moves the non-pseudo-stack parts of PseudoStack into a new type
called RacyThreadInfo, which is a subclass of PseudoStack. The patch also
capitalizes the first letter of the names of methods that it moves.
This means that PseudoStack is now accurately named. Also non-pseudo-stack
parts are now no longer visible outside the profiler, which is nice.
--HG--
extra : rebase_source : c110acfb6d2a1527ed33cc073fab3fb188851b22
Currently a reference to each thread's PseudoStack is stored in tlsPseudoStack.
This patch changes the TLS reference to refer to the enclosing ThreadInfo
instead. This allows profiler_clear_js_context() to access the current thread's
ThreadInfo via TLs, rather than searching with FindLiveThreadInfo().
The patch also encapsulates the TLS within a new class called TLSInfo. This
class allows access to the PseudoStack without protection from gPSMutex, but
access to the enclosing ThreadInfo requires a PSLockRef. This maintains the
current access regime.
--HG--
extra : rebase_source : de9967f6c055061bb65930ffd02e369703b1362e
This possibly incurs an extra function call (depends on exactly how much inling
the compiler does). But it helps enormously with subsequent refactorings,
because PseudoStack (and related types) don't need to be visible in
GeckoProfiler.h, which is exported outside the profiler.
--HG--
extra : rebase_source : f2dc5952d7444dfe12e627e86e6d37632b283107
This patch moves the manual polling up into the preceding loops, which is a
better place for it.
--HG--
extra : rebase_source : c95932d8f66635b9ca435f30bae78585dd7e04ca
PS contains some state that is always valid, and some state that is only valid
when the profiler is active. This patch does the following.
- Splits PS into two parts for the two kinds of state: CorePS and ActivePS.
- Moves gPS (now split in two) into CorePS and ActivePS, as static instances,
to improve encapsulation. This required changing all the state getters and
setters into static methods.
- Existence tests for CorePS and ActivePS are now done via the Exists()
methods.
Advantages of this change:
- It's now clear which parts of the global state (most of it!) are valid only
when the profiler is active, and we don't have to invalidate those parts with
zero/null/false when the profiler stops.
- Better OOP: more use of constructors and destructors, and more |const| to
indicate what state is immutable.
- With the old code there were some places where the order of things required
care, but with the new code it's not possible to get the order wrong.
--HG--
extra : rebase_source : dba177acb41e4dc2103ace2212ab5ae1f7b418ce
TimeStamp::ProcessCreations()'s aIsInconsistent outparam is ignored by the
majority of its caller. This patch makes it optional. Notably, this makes
ProcessCreation() easier to use in a constructor's initializer list.
On x86_64-Linux, LUL currently can only unwind frames for which CFI unwind data
is available. This causes a noticeable number of junk samples in the profiler,
characterised by failures at transition points between JIT and native frame
sequences. This patch allows LUL to try recovering the previous frame using
frame pointer chasing in the case where CFI isn't present. This allows LUL to
unwind through or jump over interleaved JIT frames, because, respectively:
* The baseline JIT produces frame-pointerised code.
* If the profiler is enabled, IonMonkey doesn't produce frame-pointerised code,
but also doesn't change the frame pointer register value. It can use the
frame pointer if profiling is disabled, but that's irrelevant here.
The patch also adds counts of FP-recovered frames to LUL's statistics printing,
to make it possible to assess how often this feature is used.
--HG--
extra : rebase_source : eadc54393788693b0e3f8d5129d48aaaad143a0b