20a257c891 | ||
---|---|---|
core | ||
docs | ||
examples | ||
project | ||
.gitignore | ||
.travis.yml | ||
CONTRIBUTING.md | ||
LICENSE | ||
README.md | ||
event-hubs_spark.png | ||
pom.xml | ||
run_tests.sh | ||
scalastyle-config.xml |
README.md
Azure EventHubs + Apache Spark Connector
Branch | Status |
---|---|
master | |
2.1.x | |
2.0.x |
This is the source code for the Azure Event Hubs and Apache Spark Connector.
Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications. Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using complex algorithms expressed with high-level functions like map
, reduce
, join
, and window
. This data can then be pushed to filesystems, databases, or even back to Event Hubs.
By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.
Latest Releases
Spark Version | Package Name | Package Version |
---|---|---|
Spark 2.1 | spark_streaming-eventhubs_2.11 | |
Spark 2.0 | spark-streaming-eventhubs_2.11 | |
Spark 1.6 | sparking-streaming-eventhubs_2.10 |
Overview
The best place to start when using this library is to make sure you're acquainted with Azure Event Hubs and Apache Spark. You can read Azure Event Hubs documentation here, documentation for Spark Streaming here, and, last but not least, Structured Streaming here.
Using the Connector
Documentation for our connector can be found here which includes a Getting Started guide. Additionally, there're examples using this library here.
Further Assistance
If you need additional assistance, please don't hesitate to ask! Just open an issue, and one of the repo owners will get back to you ASAP. :) Feedback, feature requests, bug reports, etc are all welcomed!
Using the library
In general, you should not need to build this library yourself. If you'd like to help contribute (we'd love to have your help :) ), then building the source and running tests is certainly necessary. You can go to our Contributor's Guide for that information and more.
This library is available for use in Maven projects from the Maven Central Repository, and can be referenced using the following dependency declaration. Be sure to see the Latest Releases to find the package name and package version that works with your version of Apache Spark!
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>spark-streaming-eventhubs_[2.XX]</artifactId>
<version>[LATEST]</version>
</dependency>
<!--- The correct artifactId and version can be found
in the Latest Releases section above -->
SBT Dependency
// https://mvnrepository.com/artifact/com.microsoft.azure/spark-streaming-eventhubs_2.11
libraryDependencies += "com.microsoft.azure" % "spark-streaming-eventhubs_[2.XX]" % "[LATEST]"
Getting the Staging Version
We also publish a staging version of the Spark-EventHubs connector in GitHub. To use the staging version of Spark-EventHubs, two things needed to be added to your pom.xml. First add a new repository like so:
<repository>
<id>spark-eventhubs</id>
<url>https://raw.github.com/Azure/spark-eventhubs/maven-repo/</url>
<snapshots>
<enabled>true</enabled>
<updatePolicy>always</updatePolicy>
</snapshots>
</repository>
Then add the following dependency declaration:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>spark-streaming-eventhubs_[2.XX]</artifactId>
<version>2.1.5-SNAPSHOT</version>
</dependency>
SBT Dependency
// https://mvnrepository.com/artifact/com.microsoft.azure/spark-streaming-eventhubs_2.11
libraryDependencies += "com.microsoft.azure" % "spark-streaming-eventhubs_2.11" % "2.1.5-SNAPSHOT"
Build Prerequisites
In order to use the connector, you need to have:
- Java 1.8 SDK.
- Maven 3.x
- Scala 2.11
More details on building from source and running tests can be found in our Contributor's Guide.
Build Command
mvn clean
mvn install
This command builds and installs spark-streaming-eventhubs jar to your local maven cache. Subsequently, you can build any Spark Streaming application that references this jar.