diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md index 317e3a80..22ccbb4d 100644 --- a/.github/CONTRIBUTING.md +++ b/.github/CONTRIBUTING.md @@ -8,11 +8,9 @@ see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq ## Getting Started -This library is relatively easy to build. To build and test this locally, -make sure you've done the following: +To build and test this locally, make sure you've done the following: - Java 1.8 SDK is installed -- Maven 3.x is installed -- Scala 2.11.X is installed +- [Maven 3.x](https://maven.apache.org/download.cgi) is installed (or [SBT version 1.x](https://www.scala-sbt.org/1.x/docs/index.html)) - A supported version of Apache Spark is installed (see [Latest Releases](/README.md#latest-releases) for supported versions). After that, cloning the code and running `mvn clean package` should successfully @@ -41,14 +39,14 @@ Then add the following dependency declaration: com.microsoft.azure azure-eventhubs-spark_[2.XX] - 2.1.6-SNAPSHOT + 2.3.0 ``` ### SBT Dependency // https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark_2.11 - libraryDependencies += "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.1.6-SNAPSHOT" + libraryDependencies += "com.microsoft.azure" %% "azure-eventhubs-spark" %% "2.3.0" ## Filing Issues @@ -61,15 +59,15 @@ questions/concerns/comments, feel free to file an issue [here](https://github.co ## Pull Requests -### Required guidelines +### Required Guidelines When filing a pull request, the following must be true: - Tests have been added (if needed) to validate changes - scalafmt (using the `.scalafmt.conf` in the repo) must be used to style the code -- `mvn clean package` must run successfully +- `mvn clean test` must run successfully -### General guidelines +### General Guidelines If you would like to make changes to this library, **break up the change into small, logical, testable chunks, and organize your pull requests accordingly**. This makes @@ -85,7 +83,7 @@ following list is a good set of best practices! - There are a small number of commits that each have an informative message - A description of the changes the pull request makes is included, and a reference to the bug/issue the pull request fixes is included, if applicable -### Testing guidelines +### Testing Guidelines If you add code, make sure you add tests to validate your changes. Again, below is a list of best practices when contributing: diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md index e9d25791..7536d858 100644 --- a/.github/PULL_REQUEST_TEMPLATE.md +++ b/.github/PULL_REQUEST_TEMPLATE.md @@ -2,7 +2,7 @@ Thanks for contributing! We appreciate it :) For a Pull Request to be accepted, you must: - Run scalafmt on your code using the `.scalafmt.conf` present in this project -- All tests must pass when you run `mvn clean package` +- All tests must pass when you run `mvn clean test` Just in case, here are some tips that could prove useful when opening a pull request: - Read the [Contributor's Guide](CONTRIBUTING.md) diff --git a/README.md b/README.md index 9125f830..228803da 100644 --- a/README.md +++ b/README.md @@ -10,11 +10,11 @@ |------|-------------| |master|[![Build Status](https://travis-ci.org/Azure/azure-event-hubs-spark.svg?branch=master)](https://travis-ci.org/Azure/azure-event-hubs-spark)| -This is the source code for the Azure Event Hubs Connector for Apache Spark. +This is the source code of the Azure Event Hubs Connector for Apache Spark. Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications. Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using -complex algorithms expressed with high-level functions like ```map```, ```reduce```, ```join```, and ```window```. This data can then be pushed to +complex algorithms expressed with high-level functions like `map`, `reduce`, `join`, and `window`. This data can then be pushed to filesystems, databases, or even back to Event Hubs. By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users. @@ -24,14 +24,15 @@ By making Event Hubs and Spark easier to use together, we hope this connector ma #### Spark |Spark Version|Package Name|Package Version| |-------------|------------|----------------| -|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)| -|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)| +|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)| +|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)| +|Spark 2.1|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)| #### Databricks -|Databricks Runtime Version|Package Name|Package Version| +|Databricks Runtime Version|Artifact Id|Package Version| |-------------|------------|----------------| -|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)| -|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)| +|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)| +|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)| #### Roadmap @@ -45,16 +46,16 @@ For Scala/Java applications using SBT/Maven project definitions, link your appli **Note:** See [Latest Releases](#latest-releases) to find the correct artifiact for your version of Apache Spark (or Databricks)! groupId = com.microsoft.azure - artifactId = azure-eventhubs-spark_[2.XX] + artifactId = azure-eventhubs-spark_2.11 version = 2.3.0 ### Documentation Documentation for our connector can be found [here](docs/). The integration guides there contain all the information you need to use this library. -**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Azure Event Hubs +**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Event Hubs documentation [here](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs), documentation for Spark Streaming -[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, last but not least, Structured Streaming +[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, the last but not least, Structured Streaming [here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html). ### Further Assistance @@ -82,6 +83,6 @@ More details on building from source and running tests can be found in our [Cont // Builds jar and runs all tests mvn clean package - // Builds jar, runs all tests, and installs jar to your local maven cache + // Builds jar, runs all tests, and installs jar to your local maven repository mvn clean install \ No newline at end of file diff --git a/core/src/main/scala/org/apache/spark/eventhubs/EventPosition.scala b/core/src/main/scala/org/apache/spark/eventhubs/EventPosition.scala index dcf2328b..bf5913be 100644 --- a/core/src/main/scala/org/apache/spark/eventhubs/EventPosition.scala +++ b/core/src/main/scala/org/apache/spark/eventhubs/EventPosition.scala @@ -69,7 +69,7 @@ object EventPosition { * @return An [[EventPosition]] instance. */ def fromOffset(offset: String): EventPosition = { - EventPosition(offset = offset, isInclusive = true) + EventPosition(offset) } /** @@ -81,7 +81,7 @@ object EventPosition { */ def fromSequenceNumber(seqNo: SequenceNumber): EventPosition = { require(seqNo >= 0L, "Please pass a positive sequence number.") - EventPosition(seqNo = seqNo, isInclusive = true) + EventPosition(seqNo = seqNo) } /** diff --git a/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsRelationSuite.scala b/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsRelationSuite.scala index 9db52bf7..b297d252 100644 --- a/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsRelationSuite.scala +++ b/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsRelationSuite.scala @@ -55,7 +55,7 @@ class EventHubsRelationSuite extends QueryTest with BeforeAndAfter with SharedSQ .format("eventhubs") .options(ehConf.toMap) .load() - .selectExpr("CAST (body AS STRING)") + .select($"body" cast "string") } private def createPositions(seqNo: Long, ehName: String, partitionCount: Int) = { diff --git a/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsSinkSuite.scala b/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsSinkSuite.scala index 0f364688..ecceb08e 100644 --- a/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsSinkSuite.scala +++ b/core/src/test/scala/org/apache/spark/sql/eventhubs/EventHubsSinkSuite.scala @@ -62,7 +62,7 @@ class EventHubsSinkSuite extends StreamTest with SharedSQLContext { .format("eventhubs") .options(ehConf.toMap) .load() - .selectExpr("CAST (body AS STRING)") + .select($"body" cast "string") } private def createEventHubsWriter( diff --git a/docs/structured-streaming-eventhubs-integration.md b/docs/structured-streaming-eventhubs-integration.md index 0f931c3e..df590032 100644 --- a/docs/structured-streaming-eventhubs-integration.md +++ b/docs/structured-streaming-eventhubs-integration.md @@ -178,10 +178,7 @@ val df = spark .options(ehConf.toMap) .load() -// Select body column as a String -val eventhubs = df - .select("CAST (body AS STRING)") - .as[String] +val eventhubs = df.select($"body" cast "string") // Source with per partition starting positions and rate limiting. In this case, we'll start from // a sequence number for partition 0, enqueued time for partition 3, the end of stream @@ -215,7 +212,7 @@ val df = spark .format("eventhubs") .options(ehConf.toMap) .load() -df.select("CAST(body AS STRING)").as[String] +df.select($"body" cast "string") // start from same place across all partitions. end at the same place accross all partitions. val ehConf = EventHubsConf("VALID.CONNECTION.STRING")