12 KiB
Distributed Tracing in the C++ SDK
Azure has adopted W3C Distributed Tracing as a paradigm for correlating requests from clients across multiple services.
This document explains how the Azure C++ SDK implements distributed tracing, how clients integrate with distributed tracing, how services should integrate with distributed tracing and finally how the network pipeline and other functionality should integrate with distributed tracing.
Tracing Overview
The Azure SDK for C++ Tracing APIs are modeled after the opentelemetry-cpp API surface defined in the OpenTelemetry Tracing Specification. Additional architectural information about OpenTelemetry can be found in OpenTelemetry Concepts.
There are three major components which the Azure SDK components interact with:
TracerProvider
- this is a factory which createsTracer
objects.Tracer
- this is a factory which createsSpan
objects.Span
- Span objects are the APIs which allow tracing an operation. Eachspan
has a name, a type and a "status".Spans
also contain "attributes" and "events" which describe an operation.
There is typically a single TracerProvider
for each application, and for the Azure SDK, each
service will have a Tracer
implementation which creates Span
objects for each service client.
A Span
can be considered a "unit of work" for a service. Each service method (method which calls into the service) will have a single Span
reflecting the client method which
was called.
Span
's are hierarchical and each span can have multiple children (each Span
can only have a single parent). The typical way that this manifests itself during a
service method call is:
-
Service Method "MyServiceMethod" creates a span named "MyServiceMethod" and starts an HTTP request to communicate with the service.
- The HTTP pipeline (specifically the
RequestActivityPolicy
) will create a childspan
under the service methodspan
named"HTTP <verb> #0"
. This span
reflects the HTTP call into the service.
- If the HTTP call needs to be retried, the existing
span
will be closed an a new span namedHTTP <verb> #1
will be created for the retry.
- The HTTP pipeline (specifically the
Distributed Tracing Client Integration
Applications which wish to integrate Distributed Tracing are strongly encouraged to use the opentelemetry-cpp vcpkg package.
There are numerous examples on the OpenTelemetry web site which demonstrate how to integrate opentelemetry into a customer application and integrate the generated traces with Azure monitoring infrastructure such as Geneva Monitoring.
Following the examples from opentelemetry-cpp, the following can be used to establish an OpenTelemetry exporter which logs to the console or to an in-memory logger.
opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider>
CreateOpenTelemetryProvider()
{
#if USE_MEMORY_EXPORTER
auto exporter = std::make_unique<opentelemetry::exporter::memory::InMemorySpanExporter>();
#else
auto exporter = std::make_unique<opentelemetry::exporter::trace::OStreamSpanExporter>();
#endif
// simple processor
auto simple_processor = std::unique_ptr<opentelemetry::sdk::trace::SpanProcessor>(
new opentelemetry::sdk::trace::SimpleSpanProcessor(std::move(exporter)));
auto always_on_sampler = std::unique_ptr<opentelemetry::sdk::trace::AlwaysOnSampler>(
new opentelemetry::sdk::trace::AlwaysOnSampler);
auto resource_attributes = opentelemetry::sdk::resource::ResourceAttributes{
{"service.name", "telemetryTest"}, {"service.instance.id", "instance-1"}};
auto resource = opentelemetry::sdk::resource::Resource::Create(resource_attributes);
// Create using SDK configurations as parameter
return opentelemetry::nostd::shared_ptr<opentelemetry::trace::TracerProvider>(
new opentelemetry::sdk::trace::TracerProvider(
std::move(simple_processor), resource, std::move(always_on_sampler)));
}
Other exporters exist to export to Jaeger, Windows ETW and others.
Once the opentelemetry::trace::TracerProvider
has been created, The client needs to create a new Azure::Core::Tracing::OpenTelemetry::OpenTelemetryProvider
which
functions as an abstract class integration between OpenTelemetry and Azure Core:
std::shared_ptr<Azure::Core::Tracing::TracerProvider> traceProvider
= Azure::Core::Tracing::OpenTelemetry::OpenTelemetryProvider::Create(CreateOpenTelemetryProvider());
To finish the integration with Azure clients, there are two mechanisms to integrate OpenTelemetry into a client application:
Azure::Core::Context
integration.- Service Client Options integration.
Integrate an OpenTelemetryProvider via the ApplicationContext
To integrate OpenTelemetry for all Azure Clients in the application, the customer can call Azure::Core::Context::ApplicationContext.SetTracerProvider
to establish the
tracer provider for the application.
Azure::Core::Context::ApplicationContext.SetTracerProvider(provider);
Integrate an OpenTelemetryProvider via Service ClientOptions
While using the ApplicationContext is the simplest mechanism for integration OpenTelemetry with a customer application, there may be times the customer needs more flexibility when creating service clients.
To enable customers to further customize how tracing works, the application can set the Telemetry.TracingProvider
field in the service client options, which will establish the tracer provider used by
the service client.
auto tracerProvider(CreateOpenTelemetryProvider());
auto provider(Azure::Core::Tracing::OpenTelemetry::OpenTelemetryProvider::Create(tracerProvider));
ServiceClientOptions clientOptions;
clientOptions.Telemetry.TracingProvider = provider;
clientOptions.Telemetry.ApplicationId = "MyApplication";
ServiceClient myServiceClient(clientOptions);
Distributed Tracing Service Integration
There are two steps needed to integrate Distributed Tracing with a Service Client.
- Add a
DiagnosticTracingFactory
object to the ServiceClient object - Update each service method as follows:
- Add a call to the
CreateSpan
method on the diagnostic tracing factory. This will create a new span for the client operation. - Call
SetStatus
on the created span when the service method successfully completes. - Wrap the client method code with a try/catch handler which catches exceptions and call AddEvent with the value of the exception.
- Add a call to the
Add a DiagnosticTracingFactory
to the serviceClient class
To add a new DiagnosticTracingFactory
to the client, simply add the class as a member:
Azure::Core::Tracing::_internal::TracingContextFactory m_tracingFactory;
And construct the new tracing factory in the service constructor:
explicit ServiceClient(ServiceClientOptions const& clientOptions = ServiceClientOptions{})
: m_tracingFactory(clientOptions, "Azure.Core.OpenTelemetry.Test.Service", PackageVersion::ToString())
Update Each Service Method
There are three methods of interest when updating the service method:
DiagnosticTracingFactory::CreateSpan
- this creates and returns aSpan
andContext
object for the service method. The returned Context object must be used for subsequent service operations.Span::AddEvent(std::exception&)
- This registers the exception with the distributed tracing infrastructure.Span::SetStatus
- This sets the status of the operation in the trace.
Azure::Response<std::string> ServiceMethod(
std::string const&,
Azure::Core::Context const& context = Azure::Core::Context{})
{
// Create a new context and span for this request.
auto contextAndSpan = m_tracingFactory.CreateSpan("ServiceMethod", context);
// contextAndSpan.Context is the new context for the operation.
// contextAndSpan.Span is the new span for the operation.
try
{
// <Call Into Service via an HTTP pipeline>
Azure::Core::Http::Request requestToSend(
HttpMethod::Get, Azure::Core::Url("<Service URL>"));
std::unique_ptr<Azure::Core::Http::RawResponse> response
= m_pipeline->Send(requestToSend, contextAndSpan.Context);
contextAndSpan.Span.SetStatus(Azure::Core::Tracing::_internal::SpanStatus::Ok);
return Azure::Response<std::string>("", std::move(response));
}
catch (std::exception const& ex)
{
// Register that the exception has happened and that the span is now in error.
contextAndSpan.Span.AddEvent(ex);
throw;
}
// When contextAndSpan.second goes out of scope, it ends the span, which will record it.
}
};
Implementation Details
Distributed Tracing components
In order to maintain flexibility, the opentelemetry-cpp APIs are implemented in a separate package - azure-core-tracing-opentelemetry. This is consistent with how opentelemetry is distributed for the other Azure SDKs.
The Azure Core API surface interacts with a set of pure virtual base classes (aka "interfaces") in
the Azure::Core::Tracing
and Azure::Core::Tracing::_internal
namespace. These allow a level of separation
between the Azure Core API surface and the OpenTelemetry API surface - an alternative tracing mechanism needs
to provide APIs consistent with the Azure::Core::Tracing
APIs.
The azure-core-tracing-openetelemetry-cpp package implements a set of APIs in the Azure::Core::Tracing::OpenTelemetry
and Azure::Core::Tracing::OpenTelemetry::_detail
namespace. These provide an Azure Core compatable API surface for distributed tracing.
The core service client interface is the DiagnosticTracingFactory
class which implements two APIs: CreateSpan
and
CreateSpanFromContext
. CreateSpan
is intended to be used by service methods which have direct access to a
DiagnosticTracingFactory
object, CreateSpanFromContext
in intended to be used from code which does NOT have
direct access to the DiagnosticTracingFactory
.
The final significant piece of the distributed tracing infrastructure is the RequestActivityPolicy
- this policy MUST be
inserted into the HTTP pipeline AFTER the RetryPolicy
. It is responsible for creating the span associated with the HTTP request, it will
also propagate the W3C distributed tracing headers from the span into the HTTP request.
Generated traces
The Azure standards for distributed tracing are define in Azure Distributed Tracing Conventions. The actual tracing elements generated by Azure services are defined in Azure Tracing Conventions YAML.
In summary, these are the traces and attributes which should be generated for azure services:
Spans
The distributed tracing standards define the following traces:
Public APIs
All public APIs MUST create a span which will describes the API. The name of the span MUST be the API name.
HTTP Calls
Each HTTP request sent to the service MUST create a span describing the request to the service.
The name of the span MUST be of the form HTTP <HTTP VERB> #<HTTP RETRY>
.
Attributes
Generated traces have the following attributes:
Attribute Name | Semantics | Where Used |
---|---|---|
az.namespace |
Namespace of the azure service request | All spans. |
http.method |
HTTP Method ("GET", "PUT", etc) | HTTP Spans. |
http.url |
URL being retrieved (sanitized) | HTTP Spans. |
http.status_code |
HTTP status code returned by the service | HTTP Spans. |
http.user_agent |
The value of the User-Agent HTTP header sent to the service |
HTTP Spans. |
requestId |
The value of the x-ms-client-request-id header sent by the client |
HTTP Spans. |
serviceRequestId |
The value of the x-ms-request-id sent by the server |
HTTP Spans. |