Enabling Continuous Data Processing with Apache Spark and Azure Event Hubs

apache apache-spark azure bigdata connector continuous databricks event-hubs eventhubs ingestion kafka microsoft real-time scala spark spark-streaming stream streaming structured-streaming

Перейти к файлу

Sreeram Garlapati da175a86a5 Change offset column from Long Type to String Type (#253 ) * Fix offset type in EventDataSchema * fix type in readme		2018-03-06 16:37:34 -08:00
.github	Library re-write (For Spark 2.3) (#229 )	2018-03-02 10:33:18 -08:00
core	Change offset column from Long Type to String Type (#253 )	2018-03-06 16:37:34 -08:00
docs	Change offset column from Long Type to String Type (#253 )	2018-03-06 16:37:34 -08:00
project	Codebase cleanup. Details in in description. (#174 )	2017-10-16 18:45:51 -07:00
.gitignore	Codebase cleanup. Details in in description. (#174 )	2017-10-16 18:45:51 -07:00
.scalafmt.conf	Codebase cleanup. Details in in description. (#174 )	2017-10-16 18:45:51 -07:00
.travis.yml	Update .travis.yml	2017-11-07 12:57:47 -08:00
LICENSE	Initial commit	2015-09-03 22:11:45 -07:00
README.md	Library re-write (For Spark 2.3) (#229 )	2018-03-02 10:33:18 -08:00
event-hubs_spark.png	initial repo clean up: first draft of new README, unfinished docs/README and CONTRIBUTING, removed javadocs folder	2017-10-10 09:53:29 -07:00
pom.xml	Update poms to 2.3.0-PREVIEW (#240 )	2018-03-02 15:00:19 -08:00
run_tests.sh	Library re-write (For Spark 2.3) (#229 )	2018-03-02 10:33:18 -08:00

README.md

Azure Event Hubs + Apache Spark Connector

Azure Event Hubs Connector for Apache Spark

Branch	Status
master

This is the source code for the Azure Event Hubs Connector for Apache Spark.

Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications. Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using complex algorithms expressed with high-level functions like map, reduce, join, and window. This data can then be pushed to filesystems, databases, or even back to Event Hubs.

By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.

Latest Releases

Spark

Spark Version	Package Name	Package Version
Spark 2.3	azure-eventhubs-spark_2.11
Spark 2.2	azure-eventhubs-spark_2.11

Databricks

Databricks Runtime Version	Package Name	Package Version
Databricks Runtime 4.X	azure-eventhubs-spark_2.11
Databricks Runtime 3.5	azure-eventhubs-spark_2.11

Roadmap

Planned changes can be found on our wiki.

Usage

Linking

For Scala/Java applications using SBT/Maven project definitions, link your application with the artifact below. Note: See Latest Releases to find the correct artifiact for your version of Apache Spark (or Databricks)!

groupId = com.microsoft.azure artifactId = azure-eventhubs-spark_[2.XX] version = 2.3.0

Documentation

Documentation for our connector can be found here. The integration guides there contain all the information you need to use this library.

If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first. You can read Azure Event Hubs documentation here, documentation for Spark Streaming here, and, last but not least, Structured Streaming here.

Further Assistance

If you need additional assistance, please don't hesitate to ask! General questions and discussion should happen on our gitter chat. Please open an issue for bug reports and feature requests! Feedback, feature requests, bug reports, etc are all welcomed!

Contributing

If you'd like to help contribute (we'd love to have your help!), then go to our Contributor's Guide for more information.

Build Prerequisites

In order to use the connector, you need to have:

Java 1.8 SDK.
Maven 3.x
Scala 2.11

More details on building from source and running tests can be found in our Contributor's Guide.

Build Command

// Builds jar and runs all tests
mvn clean package

// Builds jar, runs all tests, and installs jar to your local maven cache
mvn clean install