Minor cleanup and documentation edits (#270)
* Adjusting select expressions * Contributor guide edits * PR template edits * README edits * EventPosition cleanup * README reflects support for 2.1 and 2.2 * Removing duplicate naming and unneeded comments * Additional README and Contributing edits
This commit is contained in:
Родитель
e87238f867
Коммит
f8e3efe934
|
@ -8,11 +8,9 @@ see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq
|
|||
|
||||
## Getting Started
|
||||
|
||||
This library is relatively easy to build. To build and test this locally,
|
||||
make sure you've done the following:
|
||||
To build and test this locally, make sure you've done the following:
|
||||
- Java 1.8 SDK is installed
|
||||
- Maven 3.x is installed
|
||||
- Scala 2.11.X is installed
|
||||
- [Maven 3.x](https://maven.apache.org/download.cgi) is installed (or [SBT version 1.x](https://www.scala-sbt.org/1.x/docs/index.html))
|
||||
- A supported version of Apache Spark is installed (see [Latest Releases](/README.md#latest-releases) for supported versions).
|
||||
|
||||
After that, cloning the code and running `mvn clean package` should successfully
|
||||
|
@ -41,14 +39,14 @@ Then add the following dependency declaration:
|
|||
<dependency>
|
||||
<groupId>com.microsoft.azure</groupId>
|
||||
<artifactId>azure-eventhubs-spark_[2.XX]</artifactId>
|
||||
<version>2.1.6-SNAPSHOT</version>
|
||||
<version>2.3.0</version>
|
||||
</dependency>
|
||||
```
|
||||
|
||||
### SBT Dependency
|
||||
|
||||
// https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark_2.11
|
||||
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.1.6-SNAPSHOT"
|
||||
libraryDependencies += "com.microsoft.azure" %% "azure-eventhubs-spark" %% "2.3.0"
|
||||
|
||||
## Filing Issues
|
||||
|
||||
|
@ -61,15 +59,15 @@ questions/concerns/comments, feel free to file an issue [here](https://github.co
|
|||
|
||||
## Pull Requests
|
||||
|
||||
### Required guidelines
|
||||
### Required Guidelines
|
||||
|
||||
When filing a pull request, the following must be true:
|
||||
|
||||
- Tests have been added (if needed) to validate changes
|
||||
- scalafmt (using the `.scalafmt.conf` in the repo) must be used to style the code
|
||||
- `mvn clean package` must run successfully
|
||||
- `mvn clean test` must run successfully
|
||||
|
||||
### General guidelines
|
||||
### General Guidelines
|
||||
|
||||
If you would like to make changes to this library, **break up the change into small,
|
||||
logical, testable chunks, and organize your pull requests accordingly**. This makes
|
||||
|
@ -85,7 +83,7 @@ following list is a good set of best practices!
|
|||
- There are a small number of commits that each have an informative message
|
||||
- A description of the changes the pull request makes is included, and a reference to the bug/issue the pull request fixes is included, if applicable
|
||||
|
||||
### Testing guidelines
|
||||
### Testing Guidelines
|
||||
|
||||
If you add code, make sure you add tests to validate your changes. Again, below is a
|
||||
list of best practices when contributing:
|
||||
|
|
|
@ -2,7 +2,7 @@ Thanks for contributing! We appreciate it :)
|
|||
|
||||
For a Pull Request to be accepted, you must:
|
||||
- Run scalafmt on your code using the `.scalafmt.conf` present in this project
|
||||
- All tests must pass when you run `mvn clean package`
|
||||
- All tests must pass when you run `mvn clean test`
|
||||
|
||||
Just in case, here are some tips that could prove useful when opening a pull request:
|
||||
- Read the [Contributor's Guide](CONTRIBUTING.md)
|
||||
|
|
23
README.md
23
README.md
|
@ -10,11 +10,11 @@
|
|||
|------|-------------|
|
||||
|master|[![Build Status](https://travis-ci.org/Azure/azure-event-hubs-spark.svg?branch=master)](https://travis-ci.org/Azure/azure-event-hubs-spark)|
|
||||
|
||||
This is the source code for the Azure Event Hubs Connector for Apache Spark.
|
||||
This is the source code of the Azure Event Hubs Connector for Apache Spark.
|
||||
|
||||
Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications.
|
||||
Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using
|
||||
complex algorithms expressed with high-level functions like ```map```, ```reduce```, ```join```, and ```window```. This data can then be pushed to
|
||||
complex algorithms expressed with high-level functions like `map`, `reduce`, `join`, and `window`. This data can then be pushed to
|
||||
filesystems, databases, or even back to Event Hubs.
|
||||
|
||||
By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.
|
||||
|
@ -24,14 +24,15 @@ By making Event Hubs and Spark easier to use together, we hope this connector ma
|
|||
#### Spark
|
||||
|Spark Version|Package Name|Package Version|
|
||||
|-------------|------------|----------------|
|
||||
|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|
||||
|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|
||||
|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
|
||||
|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)|
|
||||
|Spark 2.1|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)|
|
||||
|
||||
#### Databricks
|
||||
|Databricks Runtime Version|Package Name|Package Version|
|
||||
|Databricks Runtime Version|Artifact Id|Package Version|
|
||||
|-------------|------------|----------------|
|
||||
|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|
||||
|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|
||||
|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
|
||||
|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
|
||||
|
||||
#### Roadmap
|
||||
|
||||
|
@ -45,16 +46,16 @@ For Scala/Java applications using SBT/Maven project definitions, link your appli
|
|||
**Note:** See [Latest Releases](#latest-releases) to find the correct artifiact for your version of Apache Spark (or Databricks)!
|
||||
|
||||
groupId = com.microsoft.azure
|
||||
artifactId = azure-eventhubs-spark_[2.XX]</artifactId>
|
||||
artifactId = azure-eventhubs-spark_2.11
|
||||
version = 2.3.0
|
||||
|
||||
### Documentation
|
||||
|
||||
Documentation for our connector can be found [here](docs/). The integration guides there contain all the information you need to use this library.
|
||||
|
||||
**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Azure Event Hubs
|
||||
**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Event Hubs
|
||||
documentation [here](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs), documentation for Spark Streaming
|
||||
[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, last but not least, Structured Streaming
|
||||
[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, the last but not least, Structured Streaming
|
||||
[here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html).
|
||||
|
||||
### Further Assistance
|
||||
|
@ -82,6 +83,6 @@ More details on building from source and running tests can be found in our [Cont
|
|||
// Builds jar and runs all tests
|
||||
mvn clean package
|
||||
|
||||
// Builds jar, runs all tests, and installs jar to your local maven cache
|
||||
// Builds jar, runs all tests, and installs jar to your local maven repository
|
||||
mvn clean install
|
||||
|
|
@ -69,7 +69,7 @@ object EventPosition {
|
|||
* @return An [[EventPosition]] instance.
|
||||
*/
|
||||
def fromOffset(offset: String): EventPosition = {
|
||||
EventPosition(offset = offset, isInclusive = true)
|
||||
EventPosition(offset)
|
||||
}
|
||||
|
||||
/**
|
||||
|
@ -81,7 +81,7 @@ object EventPosition {
|
|||
*/
|
||||
def fromSequenceNumber(seqNo: SequenceNumber): EventPosition = {
|
||||
require(seqNo >= 0L, "Please pass a positive sequence number.")
|
||||
EventPosition(seqNo = seqNo, isInclusive = true)
|
||||
EventPosition(seqNo = seqNo)
|
||||
}
|
||||
|
||||
/**
|
||||
|
|
|
@ -55,7 +55,7 @@ class EventHubsRelationSuite extends QueryTest with BeforeAndAfter with SharedSQ
|
|||
.format("eventhubs")
|
||||
.options(ehConf.toMap)
|
||||
.load()
|
||||
.selectExpr("CAST (body AS STRING)")
|
||||
.select($"body" cast "string")
|
||||
}
|
||||
|
||||
private def createPositions(seqNo: Long, ehName: String, partitionCount: Int) = {
|
||||
|
|
|
@ -62,7 +62,7 @@ class EventHubsSinkSuite extends StreamTest with SharedSQLContext {
|
|||
.format("eventhubs")
|
||||
.options(ehConf.toMap)
|
||||
.load()
|
||||
.selectExpr("CAST (body AS STRING)")
|
||||
.select($"body" cast "string")
|
||||
}
|
||||
|
||||
private def createEventHubsWriter(
|
||||
|
|
|
@ -178,10 +178,7 @@ val df = spark
|
|||
.options(ehConf.toMap)
|
||||
.load()
|
||||
|
||||
// Select body column as a String
|
||||
val eventhubs = df
|
||||
.select("CAST (body AS STRING)")
|
||||
.as[String]
|
||||
val eventhubs = df.select($"body" cast "string")
|
||||
|
||||
// Source with per partition starting positions and rate limiting. In this case, we'll start from
|
||||
// a sequence number for partition 0, enqueued time for partition 3, the end of stream
|
||||
|
@ -215,7 +212,7 @@ val df = spark
|
|||
.format("eventhubs")
|
||||
.options(ehConf.toMap)
|
||||
.load()
|
||||
df.select("CAST(body AS STRING)").as[String]
|
||||
df.select($"body" cast "string")
|
||||
|
||||
// start from same place across all partitions. end at the same place accross all partitions.
|
||||
val ehConf = EventHubsConf("VALID.CONNECTION.STRING")
|
||||
|
|
Загрузка…
Ссылка в новой задаче