Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs
Перейти к файлу
Sabee Grewal 20a257c891 Update CONTRIBUTING.md 2017-10-10 10:56:52 -07:00
core updating to latest EventHubs java client and removing redundancies in traces 2017-10-06 09:20:10 -07:00
docs initial repo clean up: first draft of new README, unfinished docs/README and CONTRIBUTING, removed javadocs folder 2017-10-10 09:53:29 -07:00
examples more scalastyle fixes 2017-09-29 15:16:14 -07:00
project reverting to scalastyle 0.8.0 2017-10-02 13:39:58 -07:00
.gitignore Structured Streaming Support of Azure Event Hubs (#77) 2017-05-03 07:47:08 -07:00
.travis.yml Excluding maven-repo branch from Travis CI 2017-10-03 15:16:17 -07:00
CONTRIBUTING.md Update CONTRIBUTING.md 2017-10-10 10:56:52 -07:00
LICENSE Initial commit 2015-09-03 22:11:45 -07:00
README.md first draft of CONTRIBUTING.md 2017-10-10 10:49:00 -07:00
event-hubs_spark.png initial repo clean up: first draft of new README, unfinished docs/README and CONTRIBUTING, removed javadocs folder 2017-10-10 09:53:29 -07:00
pom.xml updating pom and README with new repo location 2017-10-09 15:10:00 -07:00
run_tests.sh Direct Stream (#20) 2017-01-04 09:05:45 -08:00
scalastyle-config.xml Direct Stream (#20) 2017-01-04 09:05:45 -08:00

README.md

Azure Event Hubs + Apache Spark Connector

Azure EventHubs + Apache Spark Connector

Branch Status
master Build Status
2.1.x Build Status
2.0.x Build Status

This is the source code for the Azure Event Hubs and Apache Spark Connector.

Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications. Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using complex algorithms expressed with high-level functions like map, reduce, join, and window. This data can then be pushed to filesystems, databases, or even back to Event Hubs.

By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.

Latest Releases

Spark Version Package Name Package Version
Spark 2.1 spark_streaming-eventhubs_2.11 Maven Central
Spark 2.0 spark-streaming-eventhubs_2.11 Maven Central
Spark 1.6 sparking-streaming-eventhubs_2.10 Maven Central

Change Log

Overview

The best place to start when using this library is to make sure you're acquainted with Azure Event Hubs and Apache Spark. You can read Azure Event Hubs documentation here, documentation for Spark Streaming here, and, last but not least, Structured Streaming here.

Using the Connector

Documentation for our connector can be found here which includes a Getting Started guide. Additionally, there're examples using this library here.

Further Assistance

If you need additional assistance, please don't hesitate to ask! Just open an issue, and one of the repo owners will get back to you ASAP. :) Feedback, feature requests, bug reports, etc are all welcomed!

Using the library

In general, you should not need to build this library yourself. If you'd like to help contribute (we'd love to have your help :) ), then building the source and running tests is certainly necessary. You can go to our Contributor's Guide for that information and more.

This library is available for use in Maven projects from the Maven Central Repository, and can be referenced using the following dependency declaration. Be sure to see the Latest Releases to find the package name and package version that works with your version of Apache Spark!

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>spark-streaming-eventhubs_[2.XX]</artifactId>
        <version>[LATEST]</version>
    </dependency>
	
	<!--- The correct artifactId and version can be found
	in the Latest Releases section above -->

SBT Dependency

// https://mvnrepository.com/artifact/com.microsoft.azure/spark-streaming-eventhubs_2.11
libraryDependencies += "com.microsoft.azure" % "spark-streaming-eventhubs_[2.XX]" % "[LATEST]"

Getting the Staging Version

We also publish a staging version of the Spark-EventHubs connector in GitHub. To use the staging version of Spark-EventHubs, two things needed to be added to your pom.xml. First add a new repository like so:

	<repository>
		<id>spark-eventhubs</id>
		<url>https://raw.github.com/Azure/spark-eventhubs/maven-repo/</url>
		<snapshots>
			<enabled>true</enabled>
			<updatePolicy>always</updatePolicy>
		</snapshots>
	</repository>

Then add the following dependency declaration:

    <dependency>
        <groupId>com.microsoft.azure</groupId>
        <artifactId>spark-streaming-eventhubs_[2.XX]</artifactId>
        <version>2.1.5-SNAPSHOT</version>
    </dependency>

SBT Dependency

// https://mvnrepository.com/artifact/com.microsoft.azure/spark-streaming-eventhubs_2.11
libraryDependencies += "com.microsoft.azure" % "spark-streaming-eventhubs_2.11" % "2.1.5-SNAPSHOT"

Build Prerequisites

In order to use the connector, you need to have:

  1. Java 1.8 SDK.
  2. Maven 3.x
  3. Scala 2.11

More details on building from source and running tests can be found in our Contributor's Guide.

Build Command

mvn clean
mvn install 

This command builds and installs spark-streaming-eventhubs jar to your local maven cache. Subsequently, you can build any Spark Streaming application that references this jar.