The `libfxrunner::osapi::process` module provides some abstractions
around dealing with processes (i.e., HANDLEs to processes) on Windows,
namely opening processes, terminating processes, and iterating the
child processes of another process.
Child iteration is accomplished by the win32 ProcessSnapshot API.
Unfortunately, the winapi crate is missing some type definitions
(`PSS_HANDLE_ENTRY` and friends), so they are included (as part of the
`process::detail` module, since they are not exposed as part of the
`process` API).
We need these capabilities because on Windows, `firefox.exe` by default
will start the launcher process, which then starts the main Firefox
process. We can ask the launcher to wait until the main process exits,
but terminating the launcher will not terminate the main process. (We
can also sidestep the launcher and start the main process directly by
changing prefs, but the launcher does important work, like pre-loading
DLLs that Firefox will use).
Therefore we need a way to enumate the child processes of the launcher
process, so we can terminate the main Firefox process, which is what
this API provides.
There is one caveat to the implementation of this API, and that is if we
call `PssCaptureSnapshot()` too soon on a process handle opened via
`OpenProcess()` we fail inside a call to `memcpy`:
```
[0x0] ntdll!memcpy + 0x92
[0x1] ntdll!PsspHandleDumper + 0x1a1
[0x2] ntdll!PsspWalkHandleTable + 0x21e
[0x3] ntdll!PsspCaptureHandleInformation + 0x238
[0x4] ntdll!PssNtCaptureSnapshot + 0x373
[0x5] KERNELBASE!PssCaptureSnapshot + 0x1e
[0x6] integration_tests_5e8edb1581cc8758!libfxrunner::osapi::process::ChildProcessIter::new + 0x9e
```
To work around this, we sleep for 500ms when calling
`ChildProcessIter::new()`, which in practice is called immediately after
`OpenProcess()`. Additionally, if we sleep for too short a duration,
e.g., 1ms, `ChildProcessIter::new()` ends up hanging forever inside the
tokio's thread parking implementation.
By their nature, winapi calls are not very rustic. They fall into three
common patterns:
1. Return a success vs non-success code (e.g. non-zero for success and
zero for failure) and set a specific error code via `SetLastError`.
2. Return either a pointer value that may be null if an error occurred.
In the error case, a specific error is set with `SetLastError`.
3. Returning an error code directly, with `ERROR_SUCCESS` representing
the success case.
To handle these APIs, three new utilities have been added:
1. `fn check_nonzero<T>(T) -> Result<T, std::io::Error>`, which handles
the success vs non-success case.
2. `fn check_nonnull<T>(*mut T) -> Result<*mut T, std::io::Error>`,
which handles pointer returning functions (EXCEPT functions that
return HANDLE -- those should use `Handle::TryFrom`, which compares
against `INVALID_HANDLE_VALUE` instead of null).
3. `fn check_success(DOWRD) -> Result<(), std::io::Error>`, which
handles functions that return error codes directly.
What was previously the `Taskcluster` struct is now the `Taskcluster`
trait and `FirefoxCi` struct, which implements that trait. This allows
us to only require testing the HTTP requests in our unit tests and
completely mock them out in integration tests, saving on test verbosity.
The `Taskcluster` trait is generic over an `Error` type, allowing for
easier mocking. The type that was `TaskclusterError` is now
`FirefoxCiError` (as it an implementation detail).
The `RunnerProtoError` is now generic over implementations of the
`Taskcluster` trait, again for easier mocking.
We can now wait for the disk and CPU to settle out to acceptable levels
for our performance tests. LIke the `ShutdownProvider` API, the
`PerformanceProvider` API is implemented as a trait so that it can be
replaced for integration tests.
The recorder can now specify a zipped Firefox profile for the runner to
use when running Firefox, which will be transferred from the recorder to
the runner.
The runner now has an API for interacting with Taskcluster. Network
requests are run against a local server during unit tests so that they
do not require Internet access (or a valid Taskcluster setup) to run.
The main implementations of both fxrecorder and fxrunner have been split
into libraries which the binaries call into. This will enable
integration tests between the protocols.
The runner and recorder are now networked. The runner sets up a TCP
server and listens for incoming connections. The recorder connects to
the runner.
The bulk of this change is implementing the datatypes for messages and
protocols.
The `Proto` struct represents a protocol, which provides a way to send
and receive typed messages across a TCP socket. The `RunnerProto` and
`RecorderProto` are specializations of this type that implement that
actual business logic of sending and receiving messages in a certain
order for `fxrecorder` and `fxrunner` respectively.
Messages are the actual data sent across the socket. They are encoded by
the Proto as length-prefixed JSON blobs for transfer. There are two
types of messages:
1. `RecorderMessage`, which is sent from `fxrecorder` to `fxrunner`.
2. `RunnerMessage`, which is sent from `fxrecorder` to `fxrunner`.
Each type of message implements the `Message` interface, which allows
discriminationg between them with the `Message::kind()` method (and
imposes the serialization and deserialization constraints).
Message types are generated by the `impl_message!` and
`impl_message_inner!` macros to avoid writing boilerplate for each
message type.
This reverts commit 11254300fd.
As it turns out, tarpc cannot do what we need to do, vis-a-vis being
able to shutdown the server cleanly without causing errors on the client
side to reboot, as well as requiring a lot of state to be `Sync` (i.e.,
behind a `Rc<Mutex<T>>` or similar) for requests that will only ever
happen sequentially.
We now use the `tarpc` crate to allow fxrecorder to communicate with
fxrunner. The protocol is defined in `libfxrecord::service` and defines
a single method: `request_restart` (which is currently a no-op).