Previously we were using the overall CPU idle percentage since the
machine started, instead of measuring the idle percentage over an
interval (like we were already doing with disk IO counters). Now we
measure the CPU usage over the same interval that we measure IO counters
and use that for determining if we are idle.
`fxrecorder` now starts a recording session via its `Recorder` and
requests `fxrunner` to start Firefox. After a delay, Firefox is shut
down and the recording stops.
Recorder and Splash mocks are also added for integration tests, but as
of yet they do nothing.
The `libfxrunner::osapi::process` module provides some abstractions
around dealing with processes (i.e., HANDLEs to processes) on Windows,
namely opening processes, terminating processes, and iterating the
child processes of another process.
Child iteration is accomplished by the win32 ProcessSnapshot API.
Unfortunately, the winapi crate is missing some type definitions
(`PSS_HANDLE_ENTRY` and friends), so they are included (as part of the
`process::detail` module, since they are not exposed as part of the
`process` API).
We need these capabilities because on Windows, `firefox.exe` by default
will start the launcher process, which then starts the main Firefox
process. We can ask the launcher to wait until the main process exits,
but terminating the launcher will not terminate the main process. (We
can also sidestep the launcher and start the main process directly by
changing prefs, but the launcher does important work, like pre-loading
DLLs that Firefox will use).
Therefore we need a way to enumate the child processes of the launcher
process, so we can terminate the main Firefox process, which is what
this API provides.
There is one caveat to the implementation of this API, and that is if we
call `PssCaptureSnapshot()` too soon on a process handle opened via
`OpenProcess()` we fail inside a call to `memcpy`:
```
[0x0] ntdll!memcpy + 0x92
[0x1] ntdll!PsspHandleDumper + 0x1a1
[0x2] ntdll!PsspWalkHandleTable + 0x21e
[0x3] ntdll!PsspCaptureHandleInformation + 0x238
[0x4] ntdll!PssNtCaptureSnapshot + 0x373
[0x5] KERNELBASE!PssCaptureSnapshot + 0x1e
[0x6] integration_tests_5e8edb1581cc8758!libfxrunner::osapi::process::ChildProcessIter::new + 0x9e
```
To work around this, we sleep for 500ms when calling
`ChildProcessIter::new()`, which in practice is called immediately after
`OpenProcess()`. Additionally, if we sleep for too short a duration,
e.g., 1ms, `ChildProcessIter::new()` ends up hanging forever inside the
tokio's thread parking implementation.
By their nature, winapi calls are not very rustic. They fall into three
common patterns:
1. Return a success vs non-success code (e.g. non-zero for success and
zero for failure) and set a specific error code via `SetLastError`.
2. Return either a pointer value that may be null if an error occurred.
In the error case, a specific error is set with `SetLastError`.
3. Returning an error code directly, with `ERROR_SUCCESS` representing
the success case.
To handle these APIs, three new utilities have been added:
1. `fn check_nonzero<T>(T) -> Result<T, std::io::Error>`, which handles
the success vs non-success case.
2. `fn check_nonnull<T>(*mut T) -> Result<*mut T, std::io::Error>`,
which handles pointer returning functions (EXCEPT functions that
return HANDLE -- those should use `Handle::TryFrom`, which compares
against `INVALID_HANDLE_VALUE` instead of null).
3. `fn check_success(DOWRD) -> Result<(), std::io::Error>`, which
handles functions that return error codes directly.
`Handle` is now only constructible through a `TryFrom` impl, which
checks against the invalid handle value and returns the last Windows
error if it matches.
This also corrects the error checking we were doing while acquiring a
handle to the logical C:\ drive, as the invalid handle value is not a
null pointer.
Every non-type item imported from winapi has been rewritten to be
relative to its containing module, which makes for a more reasonable
import block at the top of Windows-specific files for code that is
just as readable.
While updating the language, the main message from the recorder to the
runner (`Request`) is now an enum, instead of a struct containing an
enum as its only field.
The runner no longer exits when encountering an error after handling a
request that produced an error. This allows us to recover from, e.g.,
invalid requests, which are not fatal and should not cause the runner to
exit.
Requests are not intended to be served out of order. fxrunner must only
send a single new request followed by a corresponding resume request. We
now clean up the request directory after each resume request.
The runner now will cleanup resumed requests as they are completed
(either successfully or unsucessfully). Additionally, the
`FsRequestManager` will cleanup any requests that it cannot resume.
What was previously the `Taskcluster` struct is now the `Taskcluster`
trait and `FirefoxCi` struct, which implements that trait. This allows
us to only require testing the HTTP requests in our unit tests and
completely mock them out in integration tests, saving on test verbosity.
The `Taskcluster` trait is generic over an `Error` type, allowing for
easier mocking. The type that was `TaskclusterError` is now
`FirefoxCiError` (as it an implementation detail).
The `RunnerProtoError` is now generic over implementations of the
`Taskcluster` trait, again for easier mocking.
The RunnerProto only exposed only a single method: `handle_request`.
This interface has now been made a static function, replacing the
pattern of `RunnerProto::new().handle_request()` with
`RunnerProto::handle_request`.
Instead of the recorder constantly sending messages to the runner, it
now sends a single `Request` message and receives a series of responses
from the runner.
Tests will be fixed in a later commit.
Received profiles, prefs, and Firefox builds are not presently persisted
through restarts. This will be fixed in an upcoming patch.
The protocol docs no longer match what happens, so they've been removed
for now.
Instead of being generic over `S: Error + 'static`, the
`RunnerProtoError` type is now generic over `S: ShutdownProvider` and we
can use `S::Error` inside its definition, instead of instantiating it
from the `RunnerProto<S>` with `S::Error` directly.
This is a little cleaner and more readable, but does have a side effect
that requires `ShutdownProvider: Debug`.
This is in preparation for adding another generic parameter to
`RunnerProtoError`, making it generic over errors from the
`PerfProvider`. Without this change, it would require being generic over
both of the `PerfProvider`'s error types (`PerfProvider::DiskIoError`
and `PerfProvider::CpuTimeError`), making the types longer and messier.
We can now wait for the disk and CPU to settle out to acceptable levels
for our performance tests. LIke the `ShutdownProvider` API, the
`PerformanceProvider` API is implemented as a trait so that it can be
replaced for integration tests.
The `shutdown` module has been lightly refactored into an `osapi`
module. The error utilities (`WindowsError` and `get_last_error`) have
been refactored into the `osapi::error` module and the `Handle` API has
been refactored into the `osapi::handle` module.
The recorder now accepts prefs via the command-line, which will be sent
to the runner. If a profile was sent to the runner, that profile's
`user.js` will be appended with these new prefs. Otherwise, a new
profile will be used and these prefs will be written to that profile's
`user.js` instead.
The recorder can now specify a zipped Firefox profile for the runner to
use when running Firefox, which will be transferred from the recorder to
the runner.
The recorder will now request the runner download a specific build of
Firefox. A Taskcluster task ID for a build task is accepted by the
recorder on the command-line, which is used to determine the task to
download.
The runner now has an API for interacting with Taskcluster. Network
requests are run against a local server during unit tests so that they
do not require Internet access (or a valid Taskcluster setup) to run.
The interface between the `RunnerProto` and `RecorderProto` has been
more thoroughly tested for the handshake protocol. The infrastructure
added to accomplish this should make adding more tests easier as more of
the protocol is implemented.
The fxrunner has a `--skip-restart` flag that is only provided in debug
builds that allows it to skip doing an actual restart for local
development and testing.