Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
/* -*- Mode: C++; tab-width: 8; indent-tabs-mode: nil; c-basic-offset: 2 -*- */
|
|
|
|
/* vim: set ts=8 sts=2 et sw=2 tw=80: */
|
|
|
|
/* This Source Code Form is subject to the terms of the Mozilla Public
|
|
|
|
* License, v. 2.0. If a copy of the MPL was not distributed with this
|
|
|
|
* file, You can obtain one at http://mozilla.org/MPL/2.0/. */
|
|
|
|
|
|
|
|
#include "mozilla/CPUUsageWatcher.h"
|
|
|
|
|
|
|
|
#include "prsystem.h"
|
|
|
|
|
|
|
|
#ifdef XP_MACOSX
|
2017-08-29 00:00:22 +03:00
|
|
|
#include <sys/resource.h>
|
|
|
|
#include <mach/clock.h>
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
#include <mach/mach_host.h>
|
|
|
|
#endif
|
|
|
|
|
|
|
|
namespace mozilla {
|
|
|
|
|
2017-08-29 00:00:22 +03:00
|
|
|
#ifdef CPU_USAGE_WATCHER_ACTIVE
|
|
|
|
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
// Even if the machine only has one processor, tolerate up to 50%
|
|
|
|
// external CPU usage.
|
|
|
|
static const float kTolerableExternalCPUUsageFloor = 0.5f;
|
|
|
|
|
|
|
|
struct CPUStats {
|
|
|
|
// The average CPU usage time, which can be summed across all cores in the
|
|
|
|
// system, or averaged between them. Whichever it is, it needs to be in the
|
|
|
|
// same units as updateTime.
|
|
|
|
uint64_t usageTime;
|
|
|
|
// A monotonically increasing value in the same units as usageTime, which can
|
|
|
|
// be used to determine the percentage of active vs idle time
|
|
|
|
uint64_t updateTime;
|
|
|
|
};
|
|
|
|
|
|
|
|
#ifdef XP_MACOSX
|
|
|
|
|
2017-08-29 00:00:22 +03:00
|
|
|
static const uint64_t kMicrosecondsPerSecond = 1000000LL;
|
|
|
|
static const uint64_t kNanosecondsPerMicrosecond = 1000LL;
|
|
|
|
static const uint64_t kCPUCheckInterval = kMicrosecondsPerSecond / 2LL;
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
|
2017-08-29 00:00:22 +03:00
|
|
|
uint64_t GetMicroseconds(timeval time) {
|
|
|
|
return ((uint64_t)time.tv_sec) * kMicrosecondsPerSecond +
|
|
|
|
(uint64_t)time.tv_usec;
|
|
|
|
}
|
|
|
|
|
|
|
|
uint64_t GetMicroseconds(mach_timespec_t time) {
|
|
|
|
return ((uint64_t)time.tv_sec) * kMicrosecondsPerSecond +
|
|
|
|
((uint64_t)time.tv_nsec) / kNanosecondsPerMicrosecond;
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
}
|
|
|
|
|
|
|
|
Result<CPUStats, CPUUsageWatcherError>
|
|
|
|
GetProcessCPUStats(int32_t numCPUs) {
|
|
|
|
CPUStats result = {};
|
2017-08-29 00:00:22 +03:00
|
|
|
rusage usage;
|
|
|
|
int32_t rusageResult = getrusage(RUSAGE_SELF, &usage);
|
|
|
|
if (rusageResult == -1) {
|
|
|
|
return Err(GetProcessTimesError);
|
|
|
|
}
|
|
|
|
result.usageTime = GetMicroseconds(usage.ru_utime) + GetMicroseconds(usage.ru_stime);
|
|
|
|
|
|
|
|
clock_serv_t realtimeClock;
|
|
|
|
kern_return_t errorResult =
|
|
|
|
host_get_clock_service(mach_host_self(), REALTIME_CLOCK, &realtimeClock);
|
|
|
|
if (errorResult != KERN_SUCCESS) {
|
|
|
|
return Err(GetProcessTimesError);
|
|
|
|
}
|
|
|
|
mach_timespec_t time;
|
|
|
|
errorResult = clock_get_time(realtimeClock, &time);
|
|
|
|
if (errorResult != KERN_SUCCESS) {
|
|
|
|
return Err(GetProcessTimesError);
|
|
|
|
}
|
|
|
|
result.updateTime = GetMicroseconds(time);
|
|
|
|
|
|
|
|
// getrusage will give us the sum of the values across all
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
// of our cores. Divide by the number of CPUs to get an average.
|
|
|
|
result.usageTime /= numCPUs;
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
Result<CPUStats, CPUUsageWatcherError>
|
|
|
|
GetGlobalCPUStats() {
|
|
|
|
CPUStats result = {};
|
|
|
|
host_cpu_load_info_data_t loadInfo;
|
|
|
|
mach_msg_type_number_t loadInfoCount = HOST_CPU_LOAD_INFO_COUNT;
|
|
|
|
kern_return_t statsResult = host_statistics(mach_host_self(),
|
|
|
|
HOST_CPU_LOAD_INFO,
|
|
|
|
(host_info_t)&loadInfo,
|
|
|
|
&loadInfoCount);
|
|
|
|
if (statsResult != KERN_SUCCESS) {
|
|
|
|
return Err(HostStatisticsError);
|
|
|
|
}
|
|
|
|
|
|
|
|
result.usageTime = loadInfo.cpu_ticks[CPU_STATE_USER] +
|
|
|
|
loadInfo.cpu_ticks[CPU_STATE_NICE] +
|
|
|
|
loadInfo.cpu_ticks[CPU_STATE_SYSTEM];
|
|
|
|
result.updateTime = result.usageTime + loadInfo.cpu_ticks[CPU_STATE_IDLE];
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif // XP_MACOSX
|
|
|
|
|
|
|
|
#ifdef XP_WIN
|
|
|
|
|
|
|
|
// A FILETIME represents the number of 100-nanosecond ticks since 1/1/1601 UTC
|
|
|
|
static const uint64_t kFILETIMETicksPerSecond = 10000000;
|
|
|
|
static const uint64_t kCPUCheckInterval = kFILETIMETicksPerSecond / 2;
|
|
|
|
|
|
|
|
uint64_t
|
|
|
|
FiletimeToInteger(FILETIME filetime) {
|
|
|
|
return ((uint64_t)filetime.dwLowDateTime) |
|
|
|
|
(uint64_t)filetime.dwHighDateTime << 32;
|
|
|
|
}
|
|
|
|
|
|
|
|
Result<CPUStats, CPUUsageWatcherError> GetProcessCPUStats(int32_t numCPUs) {
|
|
|
|
CPUStats result = {};
|
|
|
|
FILETIME creationFiletime;
|
|
|
|
FILETIME exitFiletime;
|
|
|
|
FILETIME kernelFiletime;
|
|
|
|
FILETIME userFiletime;
|
|
|
|
bool success = GetProcessTimes(GetCurrentProcess(),
|
|
|
|
&creationFiletime,
|
|
|
|
&exitFiletime,
|
|
|
|
&kernelFiletime,
|
|
|
|
&userFiletime);
|
|
|
|
if (!success) {
|
|
|
|
return Err(GetProcessTimesError);
|
|
|
|
}
|
|
|
|
|
|
|
|
result.usageTime = FiletimeToInteger(kernelFiletime) +
|
|
|
|
FiletimeToInteger(userFiletime);
|
|
|
|
|
|
|
|
FILETIME nowFiletime;
|
|
|
|
GetSystemTimeAsFileTime(&nowFiletime);
|
|
|
|
result.updateTime = FiletimeToInteger(nowFiletime);
|
|
|
|
|
|
|
|
result.usageTime /= numCPUs;
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
Result<CPUStats, CPUUsageWatcherError>
|
|
|
|
GetGlobalCPUStats() {
|
|
|
|
CPUStats result = {};
|
|
|
|
FILETIME idleFiletime;
|
|
|
|
FILETIME kernelFiletime;
|
|
|
|
FILETIME userFiletime;
|
|
|
|
bool success = GetSystemTimes(&idleFiletime,
|
|
|
|
&kernelFiletime,
|
|
|
|
&userFiletime);
|
|
|
|
|
|
|
|
if (!success) {
|
|
|
|
return Err(GetSystemTimesError);
|
|
|
|
}
|
|
|
|
|
|
|
|
result.usageTime = FiletimeToInteger(kernelFiletime) +
|
|
|
|
FiletimeToInteger(userFiletime);
|
|
|
|
result.updateTime = result.usageTime + FiletimeToInteger(idleFiletime);
|
|
|
|
|
|
|
|
return result;
|
|
|
|
}
|
|
|
|
|
|
|
|
#endif // XP_WIN
|
|
|
|
|
|
|
|
Result<Ok, CPUUsageWatcherError>
|
|
|
|
CPUUsageWatcher::Init()
|
|
|
|
{
|
|
|
|
mNumCPUs = PR_GetNumberOfProcessors();
|
|
|
|
if (mNumCPUs <= 0) {
|
|
|
|
mExternalUsageThreshold = 1.0f;
|
|
|
|
return Err(GetNumberOfProcessorsError);
|
|
|
|
}
|
|
|
|
mExternalUsageThreshold = std::max(1.0f - 1.0f / (float)mNumCPUs,
|
|
|
|
kTolerableExternalCPUUsageFloor);
|
|
|
|
|
|
|
|
CPUStats processTimes;
|
|
|
|
MOZ_TRY_VAR(processTimes, GetProcessCPUStats(mNumCPUs));
|
|
|
|
mProcessUpdateTime = processTimes.updateTime;
|
|
|
|
mProcessUsageTime = processTimes.usageTime;
|
|
|
|
|
|
|
|
CPUStats globalTimes;
|
|
|
|
MOZ_TRY_VAR(globalTimes, GetGlobalCPUStats());
|
|
|
|
mGlobalUpdateTime = globalTimes.updateTime;
|
|
|
|
mGlobalUsageTime = globalTimes.usageTime;
|
|
|
|
|
|
|
|
mInitialized = true;
|
|
|
|
|
|
|
|
CPUUsageWatcher* self = this;
|
|
|
|
NS_DispatchToMainThread(
|
|
|
|
NS_NewRunnableFunction("CPUUsageWatcher::Init",
|
|
|
|
[=]() { HangMonitor::RegisterAnnotator(*self); }));
|
|
|
|
|
|
|
|
return Ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
CPUUsageWatcher::Uninit()
|
|
|
|
{
|
2017-08-29 00:00:22 +03:00
|
|
|
if (mInitialized) {
|
|
|
|
HangMonitor::UnregisterAnnotator(*this);
|
|
|
|
}
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
mInitialized = false;
|
|
|
|
}
|
|
|
|
|
|
|
|
Result<Ok, CPUUsageWatcherError>
|
|
|
|
CPUUsageWatcher::CollectCPUUsage()
|
|
|
|
{
|
|
|
|
if (!mInitialized) {
|
|
|
|
return Ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
mExternalUsageRatio = 0.0f;
|
|
|
|
|
|
|
|
CPUStats processTimes;
|
|
|
|
MOZ_TRY_VAR(processTimes, GetProcessCPUStats(mNumCPUs));
|
|
|
|
CPUStats globalTimes;
|
|
|
|
MOZ_TRY_VAR(globalTimes, GetGlobalCPUStats());
|
|
|
|
|
|
|
|
uint64_t processUsageDelta = processTimes.usageTime - mProcessUsageTime;
|
|
|
|
uint64_t processUpdateDelta = processTimes.updateTime - mProcessUpdateTime;
|
|
|
|
float processUsageNormalized = processUsageDelta > 0 ?
|
|
|
|
(float)processUsageDelta / (float)processUpdateDelta :
|
|
|
|
0.0f;
|
|
|
|
|
|
|
|
uint64_t globalUsageDelta = globalTimes.usageTime - mGlobalUsageTime;
|
|
|
|
uint64_t globalUpdateDelta = globalTimes.updateTime - mGlobalUpdateTime;
|
|
|
|
float globalUsageNormalized = globalUsageDelta > 0 ?
|
|
|
|
(float)globalUsageDelta / (float)globalUpdateDelta :
|
|
|
|
0.0f;
|
|
|
|
|
|
|
|
mProcessUsageTime = processTimes.usageTime;
|
|
|
|
mProcessUpdateTime = processTimes.updateTime;
|
|
|
|
mGlobalUsageTime = globalTimes.usageTime;
|
|
|
|
mGlobalUpdateTime = globalTimes.updateTime;
|
|
|
|
|
|
|
|
mExternalUsageRatio = std::max(0.0f,
|
|
|
|
globalUsageNormalized - processUsageNormalized);
|
|
|
|
|
|
|
|
return Ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
void
|
|
|
|
CPUUsageWatcher::AnnotateHang(HangMonitor::HangAnnotations& aAnnotations) {
|
|
|
|
if (!mInitialized) {
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
|
|
|
if (mExternalUsageRatio > mExternalUsageThreshold) {
|
|
|
|
aAnnotations.AddAnnotation(NS_LITERAL_STRING("ExternalCPUHigh"), true);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2017-08-29 00:00:22 +03:00
|
|
|
#else // !CPU_USAGE_WATCHER_ACTIVE
|
|
|
|
|
|
|
|
Result<Ok, CPUUsageWatcherError>
|
|
|
|
CPUUsageWatcher::Init()
|
|
|
|
{
|
|
|
|
return Ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
void CPUUsageWatcher::Uninit() {}
|
|
|
|
|
|
|
|
Result<Ok, CPUUsageWatcherError>
|
|
|
|
CPUUsageWatcher::CollectCPUUsage()
|
|
|
|
{
|
|
|
|
return Ok();
|
|
|
|
}
|
|
|
|
|
|
|
|
void CPUUsageWatcher::AnnotateHang(HangMonitor::HangAnnotations& aAnnotations) {}
|
|
|
|
|
|
|
|
#endif // CPU_USAGE_WATCHER_ACTIVE
|
|
|
|
|
Bug 1382440 - Watch CPU usage in BHR r=froydnj
We would like to be able to see if a given hang in BHR occurred
under high CPU load, as this is an indication that the hang is
of less use to us, since it's likely that the external CPU use
is more responsible for it.
The way this works is fairly simple. We get the system CPU usage
on a scale from 0 to 1, and we get the current process's CPU
usage, also on a scale from 0 to 1, and we subtract the latter
from the former. We then compare this value to a threshold, which
is 1 - (1 / p), where p is the number of (virtual) cores on the
machine. This threshold might need to be tuned, so that we
require an entire physical core in order to not annotate the hang,
but for now it seemed the most reasonable line in the sand.
I should note that this considers CPU usage in child or parent
processes as external. While we are responsible for that CPU usage,
it still indicates that the stack we receive from BHR is of little
value to us, since the source of the actual hang is external to
that stack.
MozReview-Commit-ID: JkG53zq1MdY
--HG--
extra : rebase_source : 16553a9b5eac0a73cd1619c6ee01fa177ca60e58
2017-07-24 23:46:09 +03:00
|
|
|
} // namespace mozilla
|