Minor cleanup and documentation edits (#270)

* Adjusting select expressions

* Contributor guide edits

* PR template edits

* README edits

* EventPosition cleanup

* README reflects support for 2.1 and 2.2

* Removing duplicate naming and unneeded comments

* Additional README and Contributing edits
This commit is contained in:
Sabee Grewal 2018-03-12 11:27:12 -07:00 коммит произвёл GitHub
Родитель e87238f867
Коммит f8e3efe934
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
7 изменённых файлов: 27 добавлений и 31 удалений

18
.github/CONTRIBUTING.md поставляемый
Просмотреть файл

@ -8,11 +8,9 @@ see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq
## Getting Started
This library is relatively easy to build. To build and test this locally,
make sure you've done the following:
To build and test this locally, make sure you've done the following:
- Java 1.8 SDK is installed
- Maven 3.x is installed
- Scala 2.11.X is installed
- [Maven 3.x](https://maven.apache.org/download.cgi) is installed (or [SBT version 1.x](https://www.scala-sbt.org/1.x/docs/index.html))
- A supported version of Apache Spark is installed (see [Latest Releases](/README.md#latest-releases) for supported versions).
After that, cloning the code and running `mvn clean package` should successfully
@ -41,14 +39,14 @@ Then add the following dependency declaration:
<dependency>
<groupId>com.microsoft.azure</groupId>
<artifactId>azure-eventhubs-spark_[2.XX]</artifactId>
<version>2.1.6-SNAPSHOT</version>
<version>2.3.0</version>
</dependency>
```
### SBT Dependency
// https://mvnrepository.com/artifact/com.microsoft.azure/azure-eventhubs-spark_2.11
libraryDependencies += "com.microsoft.azure" % "azure-eventhubs-spark_2.11" % "2.1.6-SNAPSHOT"
libraryDependencies += "com.microsoft.azure" %% "azure-eventhubs-spark" %% "2.3.0"
## Filing Issues
@ -61,15 +59,15 @@ questions/concerns/comments, feel free to file an issue [here](https://github.co
## Pull Requests
### Required guidelines
### Required Guidelines
When filing a pull request, the following must be true:
- Tests have been added (if needed) to validate changes
- scalafmt (using the `.scalafmt.conf` in the repo) must be used to style the code
- `mvn clean package` must run successfully
- `mvn clean test` must run successfully
### General guidelines
### General Guidelines
If you would like to make changes to this library, **break up the change into small,
logical, testable chunks, and organize your pull requests accordingly**. This makes
@ -85,7 +83,7 @@ following list is a good set of best practices!
- There are a small number of commits that each have an informative message
- A description of the changes the pull request makes is included, and a reference to the bug/issue the pull request fixes is included, if applicable
### Testing guidelines
### Testing Guidelines
If you add code, make sure you add tests to validate your changes. Again, below is a
list of best practices when contributing:

2
.github/PULL_REQUEST_TEMPLATE.md поставляемый
Просмотреть файл

@ -2,7 +2,7 @@ Thanks for contributing! We appreciate it :)
For a Pull Request to be accepted, you must:
- Run scalafmt on your code using the `.scalafmt.conf` present in this project
- All tests must pass when you run `mvn clean package`
- All tests must pass when you run `mvn clean test`
Just in case, here are some tips that could prove useful when opening a pull request:
- Read the [Contributor's Guide](CONTRIBUTING.md)

Просмотреть файл

@ -10,11 +10,11 @@
|------|-------------|
|master|[![Build Status](https://travis-ci.org/Azure/azure-event-hubs-spark.svg?branch=master)](https://travis-ci.org/Azure/azure-event-hubs-spark)|
This is the source code for the Azure Event Hubs Connector for Apache Spark.
This is the source code of the Azure Event Hubs Connector for Apache Spark.
Azure Event Hubs is a highly scalable publish-subscribe service that can ingest millions of events per second and stream them into multiple applications.
Spark Streaming and Structured Streaming are scalable and fault-tolerant stream processing engines that allow users to process huge amounts of data using
complex algorithms expressed with high-level functions like ```map```, ```reduce```, ```join```, and ```window```. This data can then be pushed to
complex algorithms expressed with high-level functions like `map`, `reduce`, `join`, and `window`. This data can then be pushed to
filesystems, databases, or even back to Event Hubs.
By making Event Hubs and Spark easier to use together, we hope this connector makes building scalable, fault-tolerant applications easier for our users.
@ -24,14 +24,15 @@ By making Event Hubs and Spark easier to use together, we hope this connector ma
#### Spark
|Spark Version|Package Name|Package Version|
|-------------|------------|----------------|
|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|Spark 2.3|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
|Spark 2.2|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)|
|Spark 2.1|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/maven-central/v/com.microsoft.azure/azure-eventhubs-spark_2.11/2.2.0-PREVIEW.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.2.0-PREVIEW%7Cjar)|
#### Databricks
|Databricks Runtime Version|Package Name|Package Version|
|Databricks Runtime Version|Artifact Id|Package Version|
|-------------|------------|----------------|
|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11/badge.svg)](https://maven-badges.herokuapp.com/maven-central/com.microsoft.azure/azure-eventhubs-spark_2.11)|
|Databricks Runtime 4.X|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
|Databricks Runtime 3.5|azure-eventhubs-spark_2.11|[![Maven Central](https://img.shields.io/badge/maven%20central-2.3.0-brightgreen.svg)](https://search.maven.org/#artifactdetails%7Ccom.microsoft.azure%7Cazure-eventhubs-spark_2.11%7C2.3.0%7Cjar)|
#### Roadmap
@ -45,16 +46,16 @@ For Scala/Java applications using SBT/Maven project definitions, link your appli
**Note:** See [Latest Releases](#latest-releases) to find the correct artifiact for your version of Apache Spark (or Databricks)!
groupId = com.microsoft.azure
artifactId = azure-eventhubs-spark_[2.XX]</artifactId>
artifactId = azure-eventhubs-spark_2.11
version = 2.3.0
### Documentation
Documentation for our connector can be found [here](docs/). The integration guides there contain all the information you need to use this library.
**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Azure Event Hubs
**If you're new to Apache Spark and/or Event Hubs, then we highly recommend reading their documentation first.** You can read Event Hubs
documentation [here](https://docs.microsoft.com/en-us/azure/event-hubs/event-hubs-what-is-event-hubs), documentation for Spark Streaming
[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, last but not least, Structured Streaming
[here](https://spark.apache.org/docs/latest/streaming-programming-guide.html), and, the last but not least, Structured Streaming
[here](https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html).
### Further Assistance
@ -82,6 +83,6 @@ More details on building from source and running tests can be found in our [Cont
// Builds jar and runs all tests
mvn clean package
// Builds jar, runs all tests, and installs jar to your local maven cache
// Builds jar, runs all tests, and installs jar to your local maven repository
mvn clean install

Просмотреть файл

@ -69,7 +69,7 @@ object EventPosition {
* @return An [[EventPosition]] instance.
*/
def fromOffset(offset: String): EventPosition = {
EventPosition(offset = offset, isInclusive = true)
EventPosition(offset)
}
/**
@ -81,7 +81,7 @@ object EventPosition {
*/
def fromSequenceNumber(seqNo: SequenceNumber): EventPosition = {
require(seqNo >= 0L, "Please pass a positive sequence number.")
EventPosition(seqNo = seqNo, isInclusive = true)
EventPosition(seqNo = seqNo)
}
/**

Просмотреть файл

@ -55,7 +55,7 @@ class EventHubsRelationSuite extends QueryTest with BeforeAndAfter with SharedSQ
.format("eventhubs")
.options(ehConf.toMap)
.load()
.selectExpr("CAST (body AS STRING)")
.select($"body" cast "string")
}
private def createPositions(seqNo: Long, ehName: String, partitionCount: Int) = {

Просмотреть файл

@ -62,7 +62,7 @@ class EventHubsSinkSuite extends StreamTest with SharedSQLContext {
.format("eventhubs")
.options(ehConf.toMap)
.load()
.selectExpr("CAST (body AS STRING)")
.select($"body" cast "string")
}
private def createEventHubsWriter(

Просмотреть файл

@ -178,10 +178,7 @@ val df = spark
.options(ehConf.toMap)
.load()
// Select body column as a String
val eventhubs = df
.select("CAST (body AS STRING)")
.as[String]
val eventhubs = df.select($"body" cast "string")
// Source with per partition starting positions and rate limiting. In this case, we'll start from
// a sequence number for partition 0, enqueued time for partition 3, the end of stream
@ -215,7 +212,7 @@ val df = spark
.format("eventhubs")
.options(ehConf.toMap)
.load()
df.select("CAST(body AS STRING)").as[String]
df.select($"body" cast "string")
// start from same place across all partitions. end at the same place accross all partitions.
val ehConf = EventHubsConf("VALID.CONNECTION.STRING")