Akka Stream library for Azure IoT Hub
Перейти к файлу
Devis Lucato 8f3e401090 0.9.0 Release
* While streaming events from Azure IoT Hub, expose runtime information, to allow monitoring how many events are left to stream (see `MessageFromDevice.runtimeInfo`).
* Change IoTHub public API to make it easier setting streaming options (see `SourceOptions` model) and to increase consistency across the board.
* Allow to stream from the position stored in the checkpointing storage, without enabling checkpointing.
* Support Cassandra authentication when using Cassandra to store offsets checkpoints (@knordstrom).
* Fix: when building pull-requests, disable tests requiring Travis CI secrets.
* Fix: rename `created` messages property to `received`.
* Allow injecting Configuration, e.g. to override settings stored in application.conf.
* Reduce cost of logging instrumentation.
* Added some syntactic sure for the list of partitions to stream and the list of offsets to start from.
* Use SBT modules and add scripts to make it easier running the included samples.
* Upgrade Scala from 2.12.0 to 2.12.1
* Upgrade internal dependencies, e.g. Akka and Azure SDKs.
2017-03-23 11:25:11 -07:00
project 0.8.0 release: new features: C2D Sink, Stream Close, Scala 2.12, msg ID, etc. 2017-01-06 18:03:56 -08:00
samples-java 0.9.0 Release 2017-03-23 11:25:11 -07:00
samples-scala 0.9.0 Release 2017-03-23 11:25:11 -07:00
src 0.9.0 Release 2017-03-23 11:25:11 -07:00
tools/devices-simulator Update message specs, rename “MessageType” to “MessageSchema”, and “created” to “received” 2017-03-13 20:16:32 -07:00
.gitignore Simplify API and extend streaming options [BRK] 2017-03-22 19:20:34 -07:00
.travis.yml Backports from 0.9 branch (scala 2.12.1 and code style) 2017-03-14 19:27:35 -07:00
CHECKPOINTING.md Add tests and finalize API naming 2017-03-23 00:35:44 -07:00
LICENSE Initial release 2016-09-30 16:39:57 -07:00
README.md Add tests and finalize API naming 2017-03-23 00:35:44 -07:00
build.sbt 0.9.0 Release 2017-03-23 11:25:11 -07:00
build.sh Backports from 0.9 branch (scala 2.12.1 and code style) 2017-03-14 19:27:35 -07:00
devices.json.enc Disable integration tests for PRs 2017-03-10 16:11:05 -08:00
run_java_samples.cmd Merge changes from master branch 2017-03-13 17:27:17 -07:00
run_java_samples.sh Merge changes from master branch 2017-03-13 17:27:17 -07:00
run_scala_samples.cmd Run demos from the root of the project 2017-03-10 15:42:06 -08:00
run_scala_samples.sh Run demos from the root of the project 2017-03-10 15:42:06 -08:00
setup-env-vars.bat Merge changes from master branch 2017-03-13 17:27:17 -07:00
setup-env-vars.ps1 Merge changes from master branch 2017-03-13 17:27:17 -07:00
setup-env-vars.sh merge master 2017-03-21 00:08:18 -07:00

README.md

Maven Central Bintray Build Issues Gitter

IoTHubReact

IoTHub React is an Akka Stream library that can be used to read events from Azure IoT Hub, via a reactive stream with asynchronous back pressure, and to send commands to connected devices. Azure IoT Hub is a service used to connect thousands to millions of devices to the Azure cloud.

The library can be used both in Java and Scala, providing a fluent DSL for both programming languages, similarly to the approach used by Akka.

The following is a simple example showing how to use the library in Scala. A stream of incoming telemetry data is read, parsed and converted to a Temperature object, and then filtered based on the temperature value:

IoTHub().source()
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 100)
    .to(console)
    .run()

and the equivalent code in Java:

TypeReference<Temperature> type = new TypeReference<Temperature>() {};

new IoTHub().source()
    .map(m -> (Temperature) jsonParser.readValue(m.contentAsString(), type))
    .filter(x -> x.value > 100)
    .to(console())
    .run(streamMaterializer);

The following shows how to send a command to devices connected to Azure IoT Hub, for instance when the device is measuring a high temperature, this sends a command to "turn fan ON":

val turnFanOn  = MessageToDevice("Turn fan ON")

IoTHub()
    .source()
    .filter(MessageSchema("temperature"))
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 85)
    .map(t  turnFanOn.to(t.deviceId))
    .to(hub.sink())

Streaming from IoT hub to any

An interesting example is reading telemetry data from Azure IoT Hub, and sending it to a Kafka topic, so that it can be consumed by other services downstream:

... 
import org.apache.kafka.common.serialization.StringSerializer
import org.apache.kafka.common.serialization.ByteArraySerializer
import org.apache.kafka.clients.producer.ProducerRecord
import akka.kafka.ProducerSettings
import akka.kafka.scaladsl.Producer

case class KafkaProducer(bootstrapServer: String)(implicit val system: ActorSystem) {

  protected val producerSettings = ProducerSettings(system, new ByteArraySerializer, new StringSerializer)
    .withBootstrapServers(bootstrapServer)

  def getSink() = Producer.plainSink(producerSettings)

  def packageMessage(elem: String, topic: String): ProducerRecord[Array[Byte], String] = {
    new ProducerRecord[Array[Byte], String](topic, elem)
  }
}
val kafkaProducer = KafkaProducer(bootstrapServer)
 
IoTHub().source()
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 100)
    .runWith(kafkaProducer.getSink())

Source options

IoT hub partitions

The library supports reading from a subset of partitions, to enable the development of distributed applications. Consider for instance the scenario of a client application deployed to multiple nodes, where each node processes independently a subset of the incoming telemetry.

val p1 = 0
val p2 = 3

IoTHub().source(Seq(p1, p2))
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 100)
    .to(console)
    .run()

Starting point

Unless specified, the stream starts from the beginning of the data present in each partition. It's possible to start the stream from a given date and time too:

val start = java.time.Instant.now()

IoTHub().source(start)
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 100)
    .to(console)
    .run()

Multiple options

IoTHub().source() provides a quick API to specify the start time or the partitions. To specify more partitions, you can use the SourceOptions class, combining multiple settings:

val options = SourceOptions()
  .partitions(0,2,3)
  .fromTime(java.time.Instant.now())
  .withRuntimeInfo()
  .savePosition()

IoTHub().source(options)
    .map(m  parse(m.contentAsString).extract[Temperature])
    .filter(_.value > 100)
    .to(console)
    .run()

Stream processing restart - saving the current position

The library provides a mechanism to restart the stream from a recent checkpoint, to be resilient to restarts and crashes. Checkpoints are saved automatically, with a configured frequency, on a storage provided. For instance, the stream position can be saved every 30 seconds and/or every 500 messages (the values are configurable), in a table in Cassandra or using Azure blobs.

Currently the position is saved in a concurrent thread, delayed by time and/or count, depending on the configuration settings. Given the current implementation it's possible that the position saved is ahead of your processing logic. While it's possible to mitigate the risk via the configuration settings, at-least-once cannot be guaranteed. We plan to support at-least-once soon, providing more control on the checkpointing logic.

For more information about the checkpointing feature, please read here.

Build configuration

IoTHubReact is available in Maven Central for Scala 2.11 and 2.12. to import the library add the following reference in your build.sbt file:

libraryDependencies += "com.microsoft.azure.iot" %% "iothub-react" % "0.9.0"

or this dependency in pom.xml file when working with Maven:

<dependency>
    <groupId>com.microsoft.azure.iot</groupId>
    <artifactId>iothub-react_2.12</artifactId>
    <version>0.9.0</version>
</dependency>

IoTHubReact internally uses some libraries like Azure IoT SDK, Azure Storage SDK, Akka etc. If your project depends on these libraries too, your can override the versions, explicitly importing the packages in your build.sbt and pom.xml files. If you encounter some incompatibility with future versions of these dependencies, let us know opening an issue or sending a PR.

IoTHub configuration

IoTHubReact uses a configuration file to fetch the parameters required to connect to Azure IoT Hub. The exact values to use can be found in the Azure Portal:

Properties required to receive Device-to-Cloud (D2C) messages:

  • hubName: see EndpointsMessagingEventsEvent Hub-compatible name
  • hubEndpoint: see EndpointsMessagingEventsEvent Hub-compatible endpoint
  • hubPartitions: see EndpointsMessagingEventsPartitions
  • accessPolicy: usually service, see Shared access policies
  • accessKey: see Shared access policieskey namePrimary key (it's a base64 encoded string)

Properties required to send Cloud-to-Device (C2D) commands:

  • accessHostName: see Shared access policieskey nameConnection stringHostName

The values should be stored in your application.conf resource (or equivalent). Optionally you can reference environment settings if you prefer, for example to hide sensitive data.

iothub-react {

  connection {
    hubName        = "<Event Hub compatible name>"
    hubEndpoint    = "<Event Hub compatible endpoint>"
    hubPartitions  = <the number of partitions in your IoT Hub>
    accessPolicy   = "<access policy name>"
    accessKey      = "<access policy key>"
    accessHostName = "<access host name>"
  }
  
  [... other settings...]
}

Example using environment settings:

iothub-react {

  connection {
    hubName        = ${?IOTHUB_EVENTHUB_NAME}
    hubEndpoint    = ${?IOTHUB_EVENTHUB_ENDPOINT}
    hubPartitions  = ${?IOTHUB_EVENTHUB_PARTITIONS}
    accessPolicy   = ${?IOTHUB_ACCESS_POLICY}
    accessKey      = ${?IOTHUB_ACCESS_KEY}
    accessHostName = ${?IOTHUB_ACCESS_HOSTNAME}
  }
  
  [... other settings...]
}

Note that the library will automatically use these exact environment variables, unless overridden in your configuration file (all the default settings are stored in reference.conf).

The logging level can be managed overriding Akka configuration, for example:

akka {
  # Options: OFF, ERROR, WARNING, INFO, DEBUG
  loglevel = "WARNING"
}

There are other settings, to tune performance and connection details:

  • streaming.consumerGroup: the consumer group used during the connection
  • streaming.receiverBatchSize: the number of messages retrieved on each call to Azure IoT hub. The default (and maximum) value is 999.
  • streaming.receiverTimeout: timeout applied to calls while retrieving messages. The default value is 3 seconds.
  • streaming.retrieveRuntimeInfo: when enabled, the messages returned by IoTHub.Source will contain some runtime information about the last message in each partition. You can use this information to calculate how many telemetry events remain to process.

The complete configuration reference (and default values) is available in reference.conf.

Samples

The project includes several demos in Java and Scala, showing some of the use cases and how IoThub React API works. All the demos require an instance of Azure IoT hub, with some devices and messages.

  1. DisplayMessages [Java]: how to stream Azure IoT hub withing a Java application, filtering temperature values greater than 60C
  2. SendMessageToDevice [Java]: how to turn on a fan when a device reports a temperature higher than 22C
  3. AllMessagesFromBeginning [Scala]: simple example streaming all the events in the hub.
  4. OnlyRecentMessages [Scala]: stream all the events, starting from the current time.
  5. OnlyTwoPartitions [Scala]: shows how to stream events from a subset of partitions.
  6. MultipleDestinations [Scala]: shows how to read once and deliver events to multiple destinations.
  7. FilterByMessageSchema [Scala]: how to filter events by message schema. Note: the name of the schema must be set by the device using the $$MessageSchema message property. In future this will be a system property, explicitly supported by Azure IoT SDK.
  8. FilterByDeviceID [Scala]: how to filter events by device ID. The device ID is automatically set by Azure IoT SDK.
  9. CloseStream [Scala]: show how to close the stream
  10. SendMessageToDevice [Scala]: shows the API to send messages to connected devices.
  11. PrintTemperature [Scala]: stream all Temperature events and print data to the console.
  12. Throughput [Scala]: stream all events and display statistics about the throughput.
  13. Throttling [Scala]: throttle the incoming stream to a defined speed of events/second.
  14. StoreOffsetsWhileStreaming [Scala]: demonstrates how the stream can be restarted without losing its position. The current position is stored in a Cassandra table (we suggest to run a docker container for the purpose of the demo, e.g. docker run -ip 9042:9042 --rm cassandra).
  15. StartFromStoredOffsetsButDontWriteNewOffsets [Scala]: shows how to use the saved checkpoints to start streaming from a known position, without changing the value in the storage. If the storage doesn't contain checkpoints, the stream starts from the beginning.
  16. StartFromStoredOffsetsIfAvailableOrByTimeOtherwise [Scala]: similar to the previous demo, with a fallback datetime when the storage doesn't contain checkpoints.
  17. StreamIncludingRuntimeInformation [Scala]: shows how runtime information works.
  18. SendMessageToDevice [Scala]: another example showing how to send 2 different messages to connected devices.

We provide a device simulator in the tools section, which will help simulating some devices sending sample telemetry events.

When ready, you should either edit the application.conf configuration files (scala and java) with your credentials, or set the corresponding environment variables. Follow the instructions described in the previous section on how to set the correct values.

The root folder includes also a script showing how to set the environment variables in Linux/MacOS and Windows.

The demos can be executed using the scripts included in the root folder (run_<language>_samples.sh and run_<language>_samples.cmd):

Future work (MoSCoW)

  • M: device twins and device methods
  • M: support at-least-once when checkpointing
  • S: clustering awareness
  • C: redefine the streaming graph at runtime, e.g. add/remove partitions on the fly
  • C: reopen hub after closing (currently one creates a new instance)
  • W: asynchronicity by using EventHub SDK async APIs

Contributing

Contribution license Agreement

If you want/plan to contribute, we ask you to sign a CLA (Contribution license Agreement). A friendly bot will remind you about it when you submit a pull-request.

Code style

If you are sending a pull request, please check the code style with IntelliJ IDEA, importing the settings from Codestyle.IntelliJ.xml.

Running the tests

You can use the included build.sh script to execute all the unit and functional tests in the suite. The functional tests require an existing Azure IoT Hub resource, that yous should setup. For the tests to connect to your IoT Hub, configure your environment using the setup-env-vars.* scripts mentioned above in this page.