Add article for fault tolerance

This commit is contained in:
xinchen 2016-11-20 22:17:43 -08:00
Родитель 32ebffa9c0
Коммит 2620fbc84e
5 изменённых файлов: 100 добавлений и 0 удалений

Просмотреть файл

@ -142,6 +142,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "articles", "articles", "{D9
docs\articles\building_application.md = docs\articles\building_application.md
docs\articles\contributing.md = docs\articles\contributing.md
docs\articles\device_to_iothub.md = docs\articles\device_to_iothub.md
docs\articles\fault_tolerance.md = docs\articles\fault_tolerance.md
docs\articles\hello_amqp.md = docs\articles\hello_amqp.md
docs\articles\installation.md = docs\articles\installation.md
docs\articles\listener.md = docs\articles\listener.md

Просмотреть файл

@ -107,6 +107,7 @@ Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "articles", "articles", "{56
docs\articles\building_application.md = docs\articles\building_application.md
docs\articles\contributing.md = docs\articles\contributing.md
docs\articles\device_to_iothub.md = docs\articles\device_to_iothub.md
docs\articles\fault_tolerance.md = docs\articles\fault_tolerance.md
docs\articles\hello_amqp.md = docs\articles\hello_amqp.md
docs\articles\installation.md = docs\articles\installation.md
docs\articles\listener.md = docs\articles\listener.md

Просмотреть файл

@ -253,3 +253,4 @@ Thread.Sleep is a hypothetical example of having blocking calls in the MessageCa
* [Listener](listener.md)
* [Serialization](serialization.md)
* [Buffer Management](buffer_management.md)
* [Fault Tolerance](fault_tolerance.md)

Просмотреть файл

@ -0,0 +1,95 @@
In messaging applications, fault tolerance is important. Communication could fail due to many error conditions,
such as networking failure, service temporary unavailability and planned/unplanned maintenance. Applications
must handle these errors and be able to recover from them.
As error handling and reconnect logic are very application specific, the library does not provide a built-in
retry or recovery logic. Instead it provides several important mechanisms for application to build fault
tolerance, such as,
* Exceptions and error conditions,
* State and terminal error on AMQP objects,
* Closed event on AMQP objects.
## Exceptions
Application should expect exceptions being thrown from API calls and handle them. The library defines the
`AmqpException` type and throws the exception whenever appropriate. Other common exceptions, such as
`ArgumentException` and `TimeoutException`, could also be thrown.
When an `AmqpException` is handled, application should check its `Error` property, specifically the `Condition`
property of the `Error` object and perform actions accordingly. Please refer to the AMQP specification for the
standard error conditions, and the extended error conditions, if any, defined by the remote peer that the
application is communicating with.
## AmqpObject
When an `AmqpObject` (`Connection`, `Session`, `Link`) has transitioned to closing/ending/detaching state,
any operation other than close will trigger an AmqpException with error condition "amqp:illegal-state" to
be thrown.
The `AmqpObject.Error` property, if set, indicates the error condition under which the object was closed.
The `AmqpObject.IsClosed` property also tells if the object has been closed.
The `AmqpObject.Closed` event notifies subscribers when the object reaches the end state.
## Reconnect
Using the primitives provided above, application can build a reconnect logic that fits the best to its
runtime environment. Following are the typical ways to built such reconnect logic. You can find all of
them in the [LongHaulTest](https://github.com/Azure/amqpnetlite/tree/master/test/LongHaulTest) project.
### Proactive State Check
Before calling any API, the application can check `AmqpObject.IsClosed` property and create the object
if it is closed.
For example, the LongHaulTest uses this strategy to create connection before calling Send or
Receive methods. Similar approach is taken to ensure a link is in place.
```
protected async Task EnsureConnectionAsync()
{
if (this.connection == null || this.connection.IsClosed)
{
Address address = this.role.Address;
ConnectionFactory factory = new ConnectionFactory();
factory.SSL.RemoteCertificateValidationCallback = (a, b, c, e) => true;
factory.AMQP.HostName = this.role.Args.Host ?? address.Host;
factory.AMQP.ContainerId = "amqp-test" + this.id;
this.connection = await factory.CreateAsync(address);
}
}
```
### Handle Exceptions
Exceptions must be handled. In the exception handler, the application has a choice of whether to create
the communication objects, or simply delegate the recreation to the Proactive State Check
strategy if one is in place.
### Subscribe to Closed Event
Application can also subscribe to the Closed event and perform reconnect logic in the event handler.
the Closed event handler is guaranteed to be invoked at most once, as the object could be closed already
when the event is subscribed. Application can use the following logic to avoid the race condition
and ensure the event handler is always invoked. Note that the guarantee becomes at least once so
the event handler must be idempotent.
```
void SafeAddClosed(AmqpObject obj, ClosedCallback callback)
{
obj.Closed += callback;
if (obj.IsClosed)
{
callback(obj, obj.Error);
}
}
```
If the application ever decides to create the AMQP object(s) in the exception handler or Closed event
handler, it **must not mix sync and async API calls**. Most async API calls are completed when a
response is received from the peer and the execution is on the connection's frame pump (which runs on
the I/O thread). If a sync operation which also requires a response is performed, a deadlock occurs.
Most likey you will get a TimeoutException.
The application can use a combination of the above approaches to achieve optimal results, like what
has been done in the LongHaulTest project to enable it to run for weeks against an Azure Service Bus
queue and transfers tens of millions message between the senders and the receivers.

Просмотреть файл

@ -18,6 +18,8 @@
href: listener.md
- name: Buffer Management
href: buffer_management.md
- name: Fault Tolerance
href: fault_tolerance.md
- name: Tracing
href: tracing.md
- name: Test Amqp Broker