12 KiB
Remote agent overall architecture
This document will cover the Remote Agent architecture by following the sequence of steps needed to start the agent, connect a client and debug a target.
Remote Agent startup
Everything starts with RemoteAgent
class.
This singleton handles command lines arguments (--remote-debugger and --remote-debugging-port)
to eventually start a server listening on the TCP port 9222 (or the one specified by the command line).
The browser target websocket URL will be printed to stdout.
To do that this component glue together three main high level components:
-
server/HTTPD
This is a copy of httpd.js, from /netwerk/ folder. This is a JS implementation of an http server. This will be used to implement the various http endpoints of CDP. There is a few static URL implemented byJSONHandler
and one dynamic URL per target. -
JSONHandler
This implements the following three static http endpoints:- /json/version: Returns information about the runtime as well as the url of the browser target websocket url.
- /json/list: Returns a list of all debuggable targets with, for each, their dynamic websocket URL. For now it only reports tabs, but will report workers and addons as soon as we support them. The main browser target is the only one target not listed here.
- /json/protocol: Returns a big dictionary describing the supported protocol. This is currently hard coded and returns the full CDP protocol schema, including APIs we don’t support. We have a future intention to fix this and report only what Firefox implements. You can connect to these websocket URL in order to debug things.
-
targets/Targets
This component is responsible of maintaining the list of all debuggable targets. For now it can be either:- The main browser target
A special target which allows to inspect the browser, but not any particular tab.
This is implemented by
targets/MainProcessTarget
and is instantiated on startup. - Tab targets
Each opened tab will have a related
targets/TabTarget
instantiated on their opening, or on server startup for already opened ones. Each target aims at focusing on one particular context. This context is typically running in one particular environment. This can be a particular process or thread. In the future, we will most likely support targets for workers and add-ons. All targets inherit fromtargets/Target
.
- The main browser target
A special target which allows to inspect the browser, but not any particular tab.
This is implemented by
Connecting to Websocket endpoints
Each target's websocket URL will be registered as a HTTP endpoint via server/HTTPD:registerPathHandler
.
(This registration is done from RemoteAgent:init
)
Once a HTTP request happens, server/HTTPD
will call the handle
method on the object passed to registerPathHandler
.
For static endpoints registered by JSONHandler
, this will call JSONHandler:handle
and return a JSON string as http body.
For target's endpoint, it is slightly more complicated as it requires a special handshake to morph the HTTP connection into a WebSocket one.
The WebSocket is then going to be long lived and be used to inspect the target over time.
When a request is made to a target URL, targets/Target:handle
is called and:
-
delegate the complex HTTP to WebSocket handshake operation to
server/WebSocketHandshake:upgrade
In return we retrieve a WebSocket object. -
hand over this WebSocket to
server/WebSocketTransport
and get a transport object in return. The transport implements a basic JSON stream over WebSocket. With that, you can send and receive JSON objects over a WebSocket connection. -
hand over the transport to a freshly instantiated
Connection
The Connection has two goals:- Interpret incoming CDP packets by reading the JSON object attribute (
id
,method
,params
andsessionId
) This is done inConnection:onPacket
. - Format outgoing CDP packets by writing the right JSON object for command response (
id
,result
andsessionId
) and events (method
,params
andsessionId
) - Redirect CDP packet from/to the right session. A connection may have more than one session attached to it.
- Interpret incoming CDP packets by reading the JSON object attribute (
-
instantiate the default session The session is specific to each target kind and all of them inherit from
session/Session
. For example, tabs targets usessession/TabSession
and the main browser target usessession/MainProcessSession
. Which session class is used is defined by the Target subclass’ constructor, which pass a session class reference to targets/Target:constructor. A session is mostly responsible of accommodating the eventual cross process/cross thread aspects of the target. The code we are currently describing (targets/Target:handle
) is running in the parent process. The session class receive CDP commands from the connection and first try to execute the Domain commands in the parent process. Then, if the target actually runs in some other context, the session tries to forward this command to this other context, which can be a thread or a process. Typically, thesessions/TabSession
forward the CDP command to the content process where the tab is running. It also redirects back the command response as well as Domain events from that process back to the parent process in order to forward them to the connection. Sessions will be using theDomains
class as an helper to manage a list of Domain implementations in a given context.
Debugging additional Targets
From a given connection you can know about the other potential targets.
You typically do that via Target.setDiscoverTargets()
, which will emit Target.targetCreated
events providing a target ID.
You may create a new session for the new target by handing the ID to Target.attachToTarget()
, which will return a session ID.
"Target" here is a reference to the CDP Domain implemented in domains/parent/Target.jsm
. That is different from targets/Target
class which is an implementation detail of the Remote Agent.
Then, there is two ways to communicate with the other targets:
-
Use
Target.sendMessageToTarget()
andTarget.receivedMessageFromTarget
You will manually send commands via theTarget.sendMessageToTarget()
command and receive command's response as well as events viaTarget.receivedMessageFromTarget
. In both cases, a session ID attribute is passed in the command or event arguments in order to select which additional target you are communicating with. -
Use
Target.attachToTarget({ flatten: true })
and includesessionId
in CDP packets This requires a special client, which will use thesessionId
returned byTarget.attachToTarget()
in order to spawn a distinct client instance. This client will re-use the same WebSocket connection, but every single CDP packet will contain an additionalsessionId
attribute. This helps distinguish packets which relate to the original target as well as the multiple additional targets you may attach to.
In both cases, Target.attachToTarget()
is special as it will spawn session/TabSession
for the tab you are attaching to.
This is the codepath creating non-default session. The default session is related to the target you originally connected to,
so that you don't need any ID for this one. When you want to debug more than one target over a single connection
you need additional sessions, which will have a unique ID.
Target.attachToTarget
will compute this ID and instantiate a new session bound to the given target.
This additional session will be managed by the Connection
class, which will then redirect CDP packets to the
right session when you are using flatten session.
Cross Process / Layers
Because targets may runs in different contexts, the remote agent code runs in different processes.
The main and startup code of the Remote agent code runs in the parent process.
The handling of the command line as well as all the HTTP and WebSocket work is all done in the parent process.
The browser target is also all implemented in the parent process.
But when it comes to a tab target, as the tab runs in the content process, we have to run code there as well.
Let's start from the sessions/TabSession
class, which has already been described.
We receive here JSON packets from the WebSocket connection and we are in the parent process.
In this class, we route the messages to the parent process domains first.
If there is no implementation of the domain or the particular method,
we forward the command to a session/ContentProcessSession
which runs in the tab's content process.
These two Session classes will interact with each other in order to forward back the returned value
of the method we just called, as well as piping back any event being sent by a Domain implemented in any
of the two processes.
Organizational chart of all the classes
┌─────────────────────────────────────────────────┐
│ │
1 ▼ │
┌───────────────┐ 1 ┌───────────────┐ 1..n┌───────────────┐
│ RemoteAgent │──────▶│ HttpServer │◀───────▶│ JsonHandler │
└───────────────┘ └───────────────┘ 1 └───────────────┘
│
│
│ 1 ┌────────────┐ 1
└───────────────▶│ Targets │◀─┐
└────────────┘ │
│ │
▼ 1..n │
┌────────────┐ │
┌─────────────────│ Target [1]│ │
│ └────────────┘ │
│ ▲ 1 │
▼ 1..n │ │
┌────────────┐ 1..n┌────────────┐ │
│ Connection │◀─────────▶│ Session [2]│──┘
└────────────┘ 1 └────────────┘
│ 1 ▲
│ │
▼ 1 ▼ 1
┌────────────────────┐ ┌────────────┐ 1..n┌────────────┐
│ WebSocketTransport │ │ Domains │──────────▶│ Domain [3]│
└────────────────────┘ └────────────┘ └────────────┘
[1] Target is inherited by TabTarget and MainProcessTarget. [2] Session is inherited by TabSession and MainProcessSession. [3] Domain is inherited by Log, Page, Browser, Target.... i.e. all domain implementations. From both domains/parent and domains/content folders.