New features cpuallthreads, stacksallthreads, and markersallthreads now allow the user to selectively profile non-selected threads for more information.
The gtest GeckoProfiler.FeatureCombinations is modified to test combinations of up to 3 of a set of features, to lower its cost, which allows adding the new features.
Differential Revision: https://phabricator.services.mozilla.com/D130011
The user shouldn't set MOZ_PROFILER_SHUTDOWN to an empty string, it wouldn't work anyway.
So now there is an extra check for that, to avoid even attempting to write a profile when there is no actual filename.
Thanks to this, the parent process can now just re-set MOZ_PROFILER_SHUTDOWN to "" in its children, so that they won't try to output their own profile into the same file. (This used to mostly work, because the parent process was the last to write its profile; but anecdotal data shows this may not alwaybe be true.)
As a bonus optimization, this means that child processes don't waste time needlessly saving their profile to disk.
Differential Revision: https://phabricator.services.mozilla.com/D129237
Previously, DeserializeAfterKindAndStream would take a JSON writer and a thread id, using the thread id (if specified) to only output markers from that thread to the given writer.
Now, DeserializeAfterKindAndStream takes a lambda, which will be called with the thread id found in the marker, that lambda can either return null (nothing to output) or a pointer to a writer, in which case the marker is read and output to the given writer.
This makes DeserializeAfterKindAndStream more flexible, and will allow handling markers from different threads, each possibly being output to different writers.
Also, for simplicity the entry is now always fully read, so there is no need for the caller to do anything. The return bool is therefore unnecessary, and has been removed.
Differential Revision: https://phabricator.services.mozilla.com/D128433
`profiler_add_marker()` now checks if the marker's target thread is actively being profiled. This is to prevent adding markers that would be discarded anyway, from taking CPU time to process, and from using space in the profile buffer.
This means that `profiler_thread_is_being_profiled(ProfilerThreadId)` must be used when a marker is intended for another thread, i.e., when it uses the MarkerThreadId option.
(Note: since baseprofiler::profiler_thread_is_being_profiled(ProfilerThreadId) is not available, baseprofiler::AddMarker cannot prevent markers targetted at non-profiled thread; There are none yet anyway.)
Differential Revision: https://phabricator.services.mozilla.com/D128576
If the profiler is paused, then really, threads are not *being* profiled.
profiler_is_active_and_unpaused() was added, to help with non-MOZ_GECKO_PROFILER builds.
(Note: baseprofiler::profiler_thread_is_being_profiled(ProfilerThreadId) is not possible to implement, but it's not needed anyway.)
Differential Revision: https://phabricator.services.mozilla.com/D128707
platform-linux-android.cpp:199:9: error: variable 'r' set but not used [-Werror,-Wunused-but-set-variable]
int r = sem_init(&mMessage2, /* pshared */ 0, 0);
^
platform-linux-android.cpp:206:9: error: variable 'r' set but not used [-Werror,-Wunused-but-set-variable]
not used [-Werror,-Wunused-but-set-variable]
int r = sem_destroy(&mMessage2);
^
Differential Revision: https://phabricator.services.mozilla.com/D126459
This only adds the API and then adds the profiler payload to the buffer. The
deserialization and streaming will happen in the next patch.
Differential Revision: https://phabricator.services.mozilla.com/D124026
You can see the `mozilla::MarkerSchema` for the C++ counterpart. This Rust
struct simply wraps the C++ object and keeps the reference of it as RAII. This
heap allocates the inner C++ object but it's fine to do it here, because it's
we only create a MarkerSchema object at the end of a profiling session and it
happens once per marker type. It should be very rare.
Differential Revision: https://phabricator.services.mozilla.com/D124025
This is the first and simplest API for the markers. There will be two more
APIs in the following patches (add_text_marker and add_marker). You can see the
PROFILER_MARKER_UNTYPED macro for the C++ counterpart.
Differential Revision: https://phabricator.services.mozilla.com/D124022
These structs are needed for the marker APIs. We also have the same structs as
the C++ classes. See `mozilla::MarkerTiming` and `mozilla::MarkerOptions`.
Differential Revision: https://phabricator.services.mozilla.com/D124020
Because string contents could be split in two separate chunks, the default ProfilerStringView deserializer needed to concatenate it together in an off-chunk buffer.
But now thanks to ProfileBufferEntryReader::ReadSpans, it is possible to know if the contents are in a single memory area inside one chunk (which should be the vast majority of cases), in which case the ProfilerStringView can just reference it using its internal std::string_view, which saves managing a separate buffer and copying data into it.
However this can only be done safely when the span is correctly aligned for the character type, which may not be the case for char16_t strings that must be even-aligned.
Differential Revision: https://phabricator.services.mozilla.com/D124430
Instead of always having to use `ReadBytes` to copy bytes out of the profile buffer into an external buffer, these functions provide one or two spans (pointer+size) pointing at the area, which can be used to look at the data without copying it, especially in the majority of cases where areas are fully inside one chunk and only need one span to address.
Differential Revision: https://phabricator.services.mozilla.com/D124429
`ProfilerStringView::Data()` would return a pointer to the start of the string, but there may not be a null terminator at the end!
To reduce the likelihood of misuses, that function has now been removed.
Instead, callers must now access the data through `AsSpan` or the `Span` conversion operator (which makes it easy to use with `NS_ConvertUTF16toUTF8` for example).
It was not an issue until now, because deserialized string would always be terminated when copied out of the profile buffer, but a following patch will add optimized code where the non-terminated string inside the buffer will be directly pointed at.
Differential Revision: https://phabricator.services.mozilla.com/D125027
Because string contents could be split in two separate chunks, the default ProfilerStringView deserializer needed to concatenate it together in an off-chunk buffer.
But now thanks to ProfileBufferEntryReader::ReadSpans, it is possible to know if the contents are in a single memory area inside one chunk (which should be the vast majority of cases), in which case the ProfilerStringView can just reference it using its internal std::string_view, which saves managing a separate buffer and copying data into it.
However this can only be done safely when the span is correctly aligned for the character type, which may not be the case for char16_t strings that must be even-aligned.
Differential Revision: https://phabricator.services.mozilla.com/D124430
Instead of always having to use `ReadBytes` to copy bytes out of the profile buffer into an external buffer, these functions provide one or two spans (pointer+size) pointing at the area, which can be used to look at the data without copying it, especially in the majority of cases where areas are fully inside one chunk and only need one span to address.
Differential Revision: https://phabricator.services.mozilla.com/D124429
Because string contents could be split in two separate chunks, the default ProfilerStringView deserializer needed to concatenate it together in an off-chunk buffer.
But now thanks to ProfileBufferEntryReader::ReadSpans, it is possible to know if the contents are in a single memory area inside one chunk (which should be the vast majority of cases), in which case the ProfilerStringView can just reference it using its internal std::string_view, which saves managing a separate buffer and copying data into it.
Differential Revision: https://phabricator.services.mozilla.com/D124430
Instead of always having to use `ReadBytes` to copy bytes out of the profile buffer into an external buffer, these functions provide one or two spans (pointer+size) pointing at the area, which can be used to look at the data without copying it, especially in the majority of cases where areas are fully inside one chunk and only need one span to address.
Differential Revision: https://phabricator.services.mozilla.com/D124429
When starting the profiler, also make a copy of the filter strings
converted to lower-case. This allows caseless comparisons to be made
against thread names without repeatedly converting the filters to
lower-case for each thread.
Differential Revision: https://phabricator.services.mozilla.com/D123302
Instead of blindly outputting floating-point numbers of milliseconds, which leads to things like 363.03499999999997, times in ms are now converted to integer number of nanoseconds, stringified, and then manually adjusted to milliseconds again, so we get smaller and friendlier outputs like 363.035.
Eventually, bug 1726675 may change all times to integer number of nanoseconds anyway, but this patch is already helpful in reducing the output, and paves the way by separating the time-output functions from other number outputs.
Differential Revision: https://phabricator.services.mozilla.com/D123329
Sampling overheads are very rarely useful, but they occupy some space during profiling, but also a lot of space in the final JSON profile.
So now they will only be recorded if the environment variable "MOZ_PROFILER_RECORD_OVERHEADS" is set to any non-empty value.
Differential Revision: https://phabricator.services.mozilla.com/D123303
This ensures our users will use the latest version of the frontend when
capturing 'cancel' network markers.
Depends on D123254
Differential Revision: https://phabricator.services.mozilla.com/D123255
Instead of copying the full stack from the previous sample when identical, the new ProfileBufferEntryKind::TimeBeforeSameSample + SameSample entry pair indicates that this is an identical sample. Later when producing the final JSON profile, we can just re-use the same sample identifier as before.
This effectively lowers the size of this kind of entry from hundreds of bytes, down to 20-30 bytes, which should help with capturing more samples in the same buffer size. And it also uses less CPU resources, since we don't need to find the previous stack and copy it.
We still need to perform a full copy at the start of a buffer chunk, to make sure there is always a full stack available in case older previous chunks have been destroyed.
Differential Revision: https://phabricator.services.mozilla.com/D122679
PROFILER_RUNTIME_STATS was broken recently. It's only used during development (mostly to benchmark new code), so it's not critical and no tests are needed.
Differential Revision: https://phabricator.services.mozilla.com/D122677
This hides the scProfilerMainThreadId detail, and makes for a safer API.
Also, ::profiler_init_main_thread_id() calls ::mozilla::baseprofiler::profiler_init_main_thread_id().
And in non-MOZ_GECKO_PROFILER builds, AUTO_PROFILER_INIT calls profiler_init_main_thread_id(), which makes other main-thread functions usable there (assuming profiler_current_thread_id works).
Differential Revision: https://phabricator.services.mozilla.com/D121695
This patch only shuffles source code around, so that all declarations in {,Base}ProfilerUtils.h are now implemented only in ProfilerUtils.cpp (instead of the different platform-*.cpp), the final generated code should be the same in MOZ_GECKO_PROFILER builds (the default on all our supported platforms).
This simplifies the headers and makes further changes easier.
In non-MOZ_GECKO_PROFILER builds: On supported platforms these functions are now fully defined; Unsupported platforms should all had `getpid()`, but thread ids are null.
So now `profiler_current_process_id()` is available on all platforms, at all tier levels.
Differential Revision: https://phabricator.services.mozilla.com/D121051
Since ProfilerProcessId and ProfilerThreadId (and their NumberTypes) will potentially grow to 64 bits on some platforms (in a later patch), all code that uses them must be able to handle bigger types.
Differential Revision: https://phabricator.services.mozilla.com/D121049
When mfplat.dll loads msvp9dec_store.dll, it posts a task
to unload the module to the work queue even if msvp9dec_store.dll
is already loaded and mfplat.dll skips LoadLibrary. Therefore,
we cannot safely lock msvp9dec_store.dll by loading it as data.
The proposed fix is to skip processing the module.
Differential Revision: https://phabricator.services.mozilla.com/D121777
Previously ProfilingCategoryList.h was a central place for profiling
categories. But with this patch, profiling_categories.yaml becomes the
canonical place for it and the macro header file is being generated
automatically. This is needed to make the profiling categories in sync with
Rust and C++.
Differential Revision: https://phabricator.services.mozilla.com/D120791