зеркало из https://github.com/mozilla/mig.git
333 строки
14 KiB
ReStructuredText
333 строки
14 KiB
ReStructuredText
======================
|
|
MIG Agent Architecture
|
|
======================
|
|
|
|
.. sectnum::
|
|
.. contents:: Table of Contents
|
|
|
|
Initialization process
|
|
----------------------
|
|
|
|
The agent tries to be as autonomous as possible. One of the goals is to minimize any
|
|
sort of reliance on configuration management tools to install a working agent. Therefore,
|
|
the agent generally attempts to install itself as a service on the system when it is executed
|
|
and also supports optional automatic upgrades via the `mig-loader`_ companion program.
|
|
|
|
.. _mig-loader: loader.rst
|
|
|
|
As a portable binary, the agent needs to detect the type of operating system
|
|
and init method that is used by an endpoint. Depending on the endpoint,
|
|
different initialization methods are used.
|
|
|
|
If the agent ``installservice`` configuration flag is set in the configuration file, the
|
|
agent will make the neccessary changes to the platform to install itself as a service. This
|
|
is supported on Linux (systemd, upstart, SV init), Darwin (launchd), and Windows (SMC).
|
|
|
|
As such, in this scenario executing the agent directly will cause the agent to detect if an
|
|
existing mig-agent service is present, adding it if missing, starting the service, and then
|
|
the process executed will exit leaving the daemonized mig-agent process running under the
|
|
service manager.
|
|
|
|
Registration process
|
|
~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The initialization process goes through several environment detection steps
|
|
which are used to select the proper init method. Once started, the agent will
|
|
send a heartbeat to the public relay, and also store that heartbeat in its
|
|
`run` directory. The location of the `run` directory is platform specific.
|
|
|
|
* windows: C:\\mig
|
|
* darwin: /Library/Preferences/mig/
|
|
* linux: /var/lib/mig/
|
|
|
|
Below is a sample heartbeat message from a linux agent stored in
|
|
`/var/lib/mig/mig-agent.ok`.
|
|
|
|
.. code:: json
|
|
|
|
{
|
|
"destructiontime": "0001-01-01T00:00:00Z",
|
|
"environment": {
|
|
"arch": "amd64",
|
|
"ident": "Red Hat Enterprise Linux Server release 6.5 (Santiago)",
|
|
"init": "upstart"
|
|
},
|
|
"heartbeatts": "2014-07-31T14:00:20.00442837-07:00",
|
|
"name": "someserver.example.net",
|
|
"os": "linux",
|
|
"pid": 26256,
|
|
"queueloc": "linux.someserver.example.net.5hsa811oda",
|
|
"starttime": "2014-07-30T21:34:48.525449401-07:00",
|
|
"version": "201407310027+bcbdd94.prod"
|
|
}
|
|
|
|
The agent sends information about the OS configuration and it's environment
|
|
to the scheduler periodically. This includes information like the hostname
|
|
of the system it is running on, IP addresses assigned, AWS instance related
|
|
information, and others. It's possible on an endpoint this changes while the
|
|
agent is running. For example, a new IP address could be assigned via DHCP.
|
|
The agent periodically checks the system; if changes to the environment
|
|
are detected the heartbeat message will automatically be updated to include
|
|
those changes. The frequency environment checks occur can be controlled
|
|
through the ``refreshenv`` configuration option in the agent configuration
|
|
file, or the ``REFRESHENV`` variable in the agent built-in configuration.
|
|
|
|
Check-In mode
|
|
~~~~~~~~~~~~~
|
|
|
|
With infrastructure where running the agent as a permanent process is not
|
|
acceptable, it is possible to run the agent as a cron job. By starting the
|
|
agent with the flag ``-m agent-checkin``, the agent will connect to the
|
|
configured relay, retrieve and run outstanding commands, and exit after 10
|
|
seconds of inactivity.
|
|
|
|
Check-in mode can also be used by enabling the ``checkin`` configuration
|
|
value in the agent configuration file.
|
|
|
|
Communication with modules
|
|
--------------------------
|
|
|
|
Upon processing of an action, the scheduler will retrieve a list of agents to
|
|
send the action to. One action is then derived into multiple commands and sent
|
|
to agents.
|
|
|
|
An agent receives a command from the scheduler on its personal AMQP queue (1).
|
|
It parses the command (2) and extracts all of the operations to perform.
|
|
Operations are passed to modules and executed in parallel (3). Rather than
|
|
maintaining a state of the running command, the agent create a goroutine and a
|
|
channel tasked with receiving the results from the modules. Each module
|
|
publishes its results inside that channel (4). The result parsing goroutine
|
|
receives them, and when it has received all of them, populates the `results` (5)
|
|
array of the command with the results from each module, and sends the command
|
|
back to the scheduler(6).
|
|
|
|
The modules while running are executed as a child process, communicating with the
|
|
agent over a pipe.
|
|
|
|
When the agent is done running the command, both the channel and the goroutine
|
|
are destroyed.
|
|
|
|
::
|
|
|
|
+-------+ [ - - - - - - A G E N T - - - - - - - - - - - - ]
|
|
|command|+---->(listener)
|
|
+-------+ |(2)
|
|
^ V
|
|
|(1) (parser)
|
|
| + [ m o d u l e s ]
|
|
+---------+ | (3)|----------> op1 +----------------+
|
|
|SCHEDULER|+---+ |------------> op2 +--------------|
|
|
| |<---+ |--------------> op3 +------------|
|
|
+---------+ | +----------------> op4 +----------+
|
|
| V(4)
|
|
|(6) (receiver)
|
|
| |
|
|
| V(5)
|
|
+ (publisher)
|
|
+-------+ /
|
|
|results|<-----------------------------------------'
|
|
+-------+
|
|
|
|
The command received by the agent is composed of a copy of the action described
|
|
previously, but signed with the private key of a trusted investigator. It also
|
|
contains additional parameters that are specific to the targeted agent, such as
|
|
command processing timestamps, name of the agent queue on the message broker,
|
|
action and command unique IDs and status and results of the command. Below is a
|
|
command derived from the root password checking action, and ran on the host named
|
|
'host1.example.net'.
|
|
|
|
.. code:: json
|
|
|
|
{
|
|
"id": 1.427392971126604e+18,
|
|
"action": { ... SIGNED COPY OF THE ACTION ... },
|
|
"agent": {
|
|
"id": 1.4271760437936648e+18,
|
|
"name": "host1.example.net",
|
|
"queueloc": "linux.host1.example.net.981alsd19aos1984",
|
|
"mode": "daemon",
|
|
"version": "20150324+0d0f88c.prod"
|
|
},
|
|
"status": "success",
|
|
"results": [
|
|
{
|
|
"foundanything": true,
|
|
"success": true,
|
|
"elements": {
|
|
"root_passwd_hashed_or_disabled": [
|
|
{
|
|
"file": "/etc/shadow",
|
|
"fileinfo": {
|
|
"lastmodified": "2015-02-07 01:51:07.17850601 +0000 UTC",
|
|
"mode": "----------",
|
|
"size": 1684
|
|
},
|
|
"search": {
|
|
"contents": [
|
|
"root:(\\*|!|\\$(1|2a|5|6)\\$).+"
|
|
],
|
|
"options": {
|
|
"matchall": false,
|
|
"matchlimit": 0,
|
|
"maxdepth": 0
|
|
},
|
|
"paths": [
|
|
"/etc"
|
|
]
|
|
}
|
|
}
|
|
]
|
|
},
|
|
"statistics": {
|
|
"exectime": "2.017849ms",
|
|
"filescount": 1,
|
|
"openfailed": 0,
|
|
"totalhits": 1
|
|
},
|
|
"errors": null
|
|
}
|
|
],
|
|
"starttime": "2015-03-26T18:02:51.126605Z",
|
|
"finishtime": "2015-03-26T18:03:00.671232Z"
|
|
}
|
|
|
|
The results of the command show that the file '/etc/shadow' has matched, and
|
|
thus "FoundAnything" returned "True".
|
|
|
|
The invocation of the file module has completed successfully, which is
|
|
represented by ``results->0->success=true``. In our example, there is only one
|
|
operation in the ``action->operations`` array, so only one result is present.
|
|
When multiple operations are performed, each has its results listed in a
|
|
corresponding entry of the results array (operations[0] is in results[0],
|
|
operations[1] in results[1], etc...).
|
|
|
|
Finally, the agent has performed all operations in the operations array
|
|
successfully, and returned ``**status=success**``. Had a failure occurred in the
|
|
agent, the returned status would be one of "failed", "timeout" or "cancelled".
|
|
|
|
Command expiration & timeouts
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
To prevent abuse of resources, agents will kill long-running modules after a
|
|
given period of time. That timeout is can be configured in the agent configuration
|
|
file using the ``moduletimeout`` option.
|
|
|
|
The timeout represents the **maximum** execution time of a single operation. If
|
|
an action contains 3 operations, each operation gets its own timeout. But because
|
|
operations run in parallel in the agent, the maximum runtime of an action should
|
|
be very close to the value of ``moduletimeout``.
|
|
|
|
In a typical deployment, it is safe to increase ``moduletimeout`` to allow for
|
|
longer operations. A value of 20 minutes is usual. Make sure to fine tune this
|
|
to your environment, and get the approval of your ops team because mig-agent
|
|
may end up consuming resources (but never more than 50% of the cpu available on
|
|
a system).
|
|
|
|
Oftentimes, an investigator will want a timeout that is much shorter than the value
|
|
of ``moduletimeout``. In the MIG command line, the flag ``-e`` controls the
|
|
expiration. It defaults to 5 minutes but can be set to 30 seconds for simple
|
|
investigations. When that happens, the agent will calculate an appropriate expiration
|
|
for the operations being run. If the expiration set on the action is set to 30 seconds,
|
|
the agent will kill operations that run for more than 30 seconds.
|
|
|
|
If the expiration is larger than the value of ``moduletimeout`` (for example, 2
|
|
hours), then ``moduletimeout`` is used. Setting a long expiration may be useful to
|
|
allow agents that only check in periodically to pick up actions long after they
|
|
are launched. This can be used to for example, create an action with a 24 hour
|
|
validity time; when an agent comes online it will see receive the action and see that
|
|
it is still valid, execute it using ``moduletimeout`` as the maximum timeout value,
|
|
and return the results. This is useful to target an action at a group of agents that
|
|
may not all be online at the same time.
|
|
|
|
Agent/Modules message format
|
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
|
|
The agent when running as a module accepts different classes of inputs on stdin,
|
|
as one-line JSON objects. The most common one is the ``parameters`` class, but it could also receive a
|
|
``stop`` input that indicates that the module should stop its execution immediately.
|
|
The format of module input messages is defined by ``modules.Message``.
|
|
|
|
.. code:: go
|
|
|
|
// Message defines the input messages received by modules.
|
|
type Message struct {
|
|
Class string // represent the type of message being passed to the module
|
|
Parameters interface{} // for `parameters` class, this interface contains the module parameters
|
|
}
|
|
|
|
const (
|
|
MsgClassParameters string = "parameters"
|
|
MsgClassStop string = "stop"
|
|
MsgClassPing MessageClass = "ping"
|
|
MsgClassLog MessageClass = "log"
|
|
MsgClassRegister MessageClass = "register"
|
|
MsgClassConfig MessageClass = "config"
|
|
)
|
|
|
|
When the agent receives a command to pass to a module for execution, it
|
|
extracts the operation parameters from ``Command.Action.Operations[N].Parameters``
|
|
and copies them into ``Message.Parameters``. It then sets ``Message.Class`` to
|
|
``modules.MsgClassParameters``, marshals the struct into JSON, and passes the
|
|
resulting ``[]byte`` to the module as an IO stream.
|
|
|
|
Agent upgrade process via mig-loader
|
|
------------------------------------
|
|
|
|
MIG supports upgrading agents in the wild through the use of the companion
|
|
program mig-loader. Using mig-loader is optional; you don't need to use
|
|
mig-loader in your environment if you want to upgrade agents yourself.
|
|
|
|
The following is a high level diagram of how the loader interacts with the
|
|
API and the agent during the upgrade process. Note this diagram focuses on
|
|
the agent being upgraded, but it could be any file in the manifest such as
|
|
the certificates, agent configuration, or loader. In all cases changes to
|
|
anything will result in a respawn of any running agent by the loader.
|
|
|
|
::
|
|
|
|
/------ Endpoint ---------\
|
|
Agent Loader API
|
|
+---+ +----+ +--+
|
|
| | |
|
|
| | 1. request manifest |
|
|
| |-------------------->|------+
|
|
| | | | 2. update loader
|
|
| 3. valid +--------| | | record in database
|
|
| manifest | | |<-----+
|
|
| sig? +------->| |
|
|
| | |
|
|
| 4. does +--------| |
|
|
| current | | |
|
|
| agent | | |
|
|
| match? +------->| |
|
|
| | |
|
|
| | 5. fetch new agent |
|
|
| | or other files |
|
|
| | from manifest |
|
|
| | that dont match |
|
|
| |-------------------->|
|
|
| | |
|
|
| 6. stage +--------| |
|
|
| agent on | | |
|
|
| disk +------->| |
|
|
| | |
|
|
| 7. agent +--------| |
|
|
| SHA256 | | |
|
|
| matches | | |
|
|
| manifest? +------->| |
|
|
| | |
|
|
| 8. install agent | |
|
|
|<-------------------| |
|
|
| | |
|
|
| 9. stop old agent | |
|
|
|<-------------------| |
|
|
| | |
|
|
| 10. start new | |
|
|
|<-------------------| |
|
|
| | |
|
|
|
|
For more information on how MIG loader can be used see the relevant
|
|
documentation in `MIG Loader`_.
|
|
|
|
.. _`MIG Loader`: loader.rst
|