avoids returning offline agent information to killDupAgents, in
circumstances where the returned data includes only one online agent
with several offline agents, we could end up sending a kill notice to
the only online agent.
There are a number of areas in the agent go-routines write to Terminate
to tell the agent to shut down. Depending on ordering, there are
circumstances where one go-routine will become blocked writing to
Terminate if there is no listener on the unbuffered channel, such as:
- publish is called, which picks up the publication lock
- while publish is running, another go-routine detects an error and
notifies Terminate
- the agent stops listening on Terminate, and attempts to pick up the
publication lock to try to exit, but blocks as it's held in publish
- publish detects a failure, and tries to write to Terminate before
releasing the publication lock, but blocks as there is no listener
on the channel anymore
The previous agent queue location included the hostname which could
potentially change while the agent is running. Exclude dynamic values
and add an additional random component to the queue location.
It's possible over the course of the process lifetime on a workstation
that address or other environment information can change. This adds
support for periodic refreshes of current environment information from
the system.
This commit also requires modifying the way we pass the Context around
the agent, from a copy to using a pointer so the changes made in the
refresh routine are visible.
Closes#205
This fixes lintian errors reported for deb packages, which cause the
package installation to fail in some cases where frontends to dpkg are
being used to install.
Changes primarily applied to loader/agent packages. deb-server package
was not modified as the target installs to /opt, which is reported as an
error by the linter. Since a change to the install path is required to
fix this just leave as is for that target for now.
Closes#197
err was not being reset, which can lead to the cron installation error
being returned later in the daemonize function and cause the agent to
fail.
Closes#200
If the AMQP channel used in getCommands is closed due to an underlying
error in the AMQP connection, the goroutine exits. The heartbeat process
will also encounter the connection error during publish, and this will
result in termination of the agent.
Depending on the configured heartbeat time, this can take a while to
occur. Rather than wait for the heartbeat publish to fail, if we are
unable to recieve messages from AMQP also send a termination note here.
This just uses the mig logging framework, as a second part to this we
need to implement rotation for file output in the logging framework.
Related to #204
Rather then have the installer create the launchd interval job, just do
it in the loader which simplifies the installation process and makes the
job installation operate similar to other mig components.
We need to make sure we are actively draining the DetectDupAgents
channel, otherwise routines writing here will end up blocking. We
check the ctx.Agent.KillDupAgents setting inside killDupAgents, so just
send the queue location here and it will be handled according to the
configuration value.
ensure we set err to nil before we continue on an error condition to
avoid returning the error from the function. this issue was causing the
QueuesCleanup routine to exit under certain circumstances.
previous implementation had potential to deadlock function, as it was
possible to enter the wait loop before routines had been incremented,
since routines was being incremented inside the goroutine. this would
result in the function never exiting the loop comparing completed to
routines, as on the first iteration routines would be 0 and completed
would be 1.
Remove the service install flag to the agent as it's not really
neccessary; the standard daemon invocation will achieve the same result.
Also add a check in loader runs to see if the agent is running even if
no changes were needed. If not, try to start the agent. This handles
cases where the loader doesnt need to make changes, but the agent is not
running for some reason, and prior the loader would not attempt to
recover this scenario.