Граф коммитов

39 Коммитов

Автор SHA1 Сообщение Дата
SJ 5c31041b6a
Fix a configuration bug (useExclusiveReceiver otpion) (#466) 2019-10-23 22:40:17 -07:00
SJ 33ee66e23a
Several bug fixes and improvements for 2.3.14 release (#465)
* [Bug fix] fix IO pipe issue that causes a blocking issue on send calls
* [Bug fix] fix broken TCP connection issue that causes prolonged delay in receiving events
* [Configuration] provide a configuration knob to specify whether an epoch receiver should be used. Default value is true.
* [Tracing] add a Spark task ID to log messages and emit latency data for receive calls
2019-10-23 09:44:13 -07:00
SJ d5ad0d6caa
Update version number for new release (2.3.13) (#451) 2019-07-26 13:29:19 -07:00
SJ 1ad742d461
Update version number for new release (2.3.12) (#446) 2019-05-10 15:13:49 -07:00
SJ d368d7c6b0
Update version numbers for new release (2.3.11 & 2.2.11) (#443) 2019-05-01 21:45:19 -07:00
SJ f5fe40d994
Prepare 2.3.10 & 2.2.10 release (#438) 2019-04-08 14:52:51 -07:00
SJ 420e4bd2a2
Prepare 2.3.9 & 2.2.9 release (#427) 2019-01-18 15:53:32 -08:00
SJ cdea9ae746
Prepare 2.3.8 & 2.2.8 release and update Java client SDK dependency (#425) 2019-01-15 19:47:23 -08:00
SJ 86fb6db24d
Prepare for 2.3.7 release (#420)
* Prepare for 2.3.7 release
* Enable ReceiverTimeout and PrefetchCount configuraion
* Update Java client dependency
2019-01-05 18:25:23 -08:00
SJ 1c198ba39a
Prepare for 2.3.6 & 2.2.6 release (#407) 2018-11-08 09:06:05 -08:00
Sabee Grewal 1c42c2f45b
prep for 2.3.5 release (#399) 2018-10-19 11:53:46 -07:00
Sabee Grewal 7ceae44f3b
Receive calls one at a time to avoid overloading thread pool (#394)
* Receive calls one at a time to avoid overloading thread pool

* bumping to 1.2.0 of java client
2018-10-03 16:24:26 -07:00
Sabee Grewal 22d7936c61 2.3.4 and 2.2.4 release (#387) 2018-09-10 15:06:55 -07:00
Sabee Grewal 43717e0e90
Redo check cursor (#384)
* Redo check cursor

* Update to 2.3.4 snapshot
2018-09-07 14:32:41 -07:00
Sabee Grewal 0733567c2b
2.3.3 and 2.2.3 release (#381) 2018-09-04 20:20:05 -07:00
Fokko Driesprong 0fe7d8f5e0 Simplify configuration and creation of the client (#359)
* Dont read messages that are already pruned by EventHub

In the getPartitions method of the EventHubsRDD class we check if the
offsets are still valid. It is possible that the retention has kicked
in and the messages are no longer available on the bus.

For more info, refer to this issue:
https://github.com/Azure/azure-event-hubs-spark/issues/313

Did some minor refactoring:

- Made the clientFactory static so we don't need to pass this constructor
  around
- Changed the signature of allBoundedSeqNos from a Seq to a Map, since
  the partitionId is unique and later on in the code it is also converted
  to a map.
- Removed the trim method, since passing EventHub config keys to Spark
  does not do any harm. Without this change, the tests are failing since
  they are not being switched to the simulator.

* Remove the offset calculation

* Bump version to 2.3.3-SNAPSHOT

* Restore trimmed config when creating a EventHubsRDD
2018-07-29 14:19:17 -07:00
Sabee Grewal 3682b30e7d
Prep for 2.3.2 release (#354) 2018-07-05 11:46:32 -07:00
Sabee Grewal 29e707621a
Update poms to 2.3.2-SNAPSHOT (#304) 2018-03-29 15:09:44 -07:00
Sabee Grewal ca17f585a2
Updating to 2.3.1 (#288) 2018-03-22 20:30:20 -07:00
Sabee Grewal bc5c651c64 Updating poms to 2.3.0 (#260) 2018-03-07 14:26:35 -08:00
Sabee Grewal 7017ced4c2
Update poms to 2.3.0-PREVIEW (#240) 2018-03-02 15:00:19 -08:00
Sabee Grewal 73f486e535
Library re-write (For Spark 2.3) (#229)
* added EventHubsConf but haven't integrated it yet. build is stable!

* putting a pin in these EventHubsConf changes to focus on Spark 2.2

* WIP: implementation and tests complete. Need to fix issue related to Spark 2.2

* updating connector to work with Spark 2.2

* minor update to comments

* setting timeouts in EventHubClientWrapper

* change EventHubsConf.copy to EventHubsConf.clone

* temporarily disabling tests. progress tracker tests are being problematic and they are going to be removed in the next phase of cleanup

* driver-side translation added. dstream re-written. rdd re-written. configuration documentation added.

* EventHubsSource partial rewrite complete. Committing progress b/c need hit pause and fix a bug in an older version

* EventHubsSource re-write complete. Moving on to testing. Re-write was substantial, so I expect further changes will be needed as we fine tune the connector

* Fixed client, starting tests

* moving all client functionality into the client. added simulated eventhubs. gonna starting really reworking the tests now

* cleaned out old tests. updated code. everything is building, no tests yet.

* updated EventHubsConfSuite, all tests passing

* test utils set up, first RDD is passing

* adding RDD tests

* finalized sequence number support in eventhubsconf, dstream, and source

* basic stream tests done, moving to checkpointing tests

* finished DStream tests. moving to Source tests

* tests for EventHubsSourceOffset and JsonUtils

* removing excessive stack trace printing

* first few source tests. running into a cast exception due to EH Java Client, gonna take care of that now

* fixing how source handles EnqueueTime from EventData

* added maxSeqNoPerTrigger and corresponding source tests

* additional Source tests

* decoupled simulated client from simulated eventhubs. extended simulated eventhubs to allow sending events

* rdd, dstream, and source tests adapted to new simulated eventhubs

* adding AddEventHubsData integration tests. switching machines.

* modifying eventhubsclient to avoid false positive data loss reports

* additional structured streaming integration tests

* adding support for national and private clouds via setDomainName in EventHubsConf

* added final integration tests for struct streaming

* EnqueuedTime is converted to java.sql.Timestamp

* removing unused imports

* moving to eventhubs java client 1.0.0

* Remove isValid from EventHubsConf

* maxRatePerPartition refactoring

* Client refactoring - signature changes and removing unused methods

* EventHubsConf refactoring

* Common package is removed

* dropping default max rate

* Support for JavaRDD and JavaInputDStream

* Rename Position to EventPosition

* misc cleanup

* Support multiple simulated eventhubs at once

* remove sql containsProps and userDefinedKeys options

* parallelized all loops in EventHubsClient.translate

* removing unecessary comments

* adding javadoc comments

* conn str builder tests

* EventHubsConf tests added

* Minor bug fixes and EventPosition serialization issue is fixed

* Simulated client is enabled in tests

* Moving non-util files out of utils package

* ClientWrapper fix

* Minor bug fixes in tests

* Moved to Spark 2.3, all tests passing

* EventPosition bug fix

* Receive until we do get null, only make API call for partition count once

* moving defaults into package.scala

* removing out of date docs, adding structured streaming integration guide

* spark streaming integration guide

* Removing old information from docs

* Updated PySpark docs

* updating doc name

* Updating minor issues in docs. Added experimental tag to four apis in eventhubsconf

* Adding support for batch styled queries in structured streaming

* Update struct streaming docs to reflect new batch query support

* docs/README formatting

* doc fomratting

* add batch style query code sample in docs

* EventData: remove inclusive flag from public api. Starts are always inclusive, ends are always exclusive

* Updating public apis to take NameAndPartition instead of PartitionId

* Fixing javadoc issues in EventPosition

* updating readme

* updating templates for pull requests, issues, and contriubting

* moving test resource to test directory

* renaming EventHubsClientWrapper to EventHubsClient

* fixing access issues in NameandPartition

* reorganizing test resources

* Accomodating breaking changes in java client

* Additional tracing in translate method

* Client connection pooling and thread pooling first draft

* Minor bug fix to connection pool

* remove failOnDataLoss option

* Adding EventHubsSink

* Adding send functionality to TestUtils

* First batched writes passing

* More unit tests for EventHubsRelation and EventHubsSink

* Additional Sink tests

* Final Sink test updates

* Adding Sink documentation to integration guide

* Adding databricks docs

* remvoing concurrent jobs limit in spark streaming

* Check for EventData expiration each batch

* Rebase

* Adding preferred location in Spark Streaming and Struct Streaming

* concurrency bug fix in EVentHubsClient

* Minor logging fix

* retry client create until successful

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update spark-streaming-eventhubs-integration.md

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update README.md

* add toString for simulated eventhubs

* Update structured-streaming-eventhubs-integration.md

* Update spark-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Updating docs - typo fixes and reorganizing

* fixing NPE in RDD

* Moving to proper Spark 2.3.0 release and Java client 1.0.0 release

* Enabling unit and integration tests in Travis

* Updating CONTRIBUTING.md

* additional traces in client pool
2018-03-02 10:33:18 -08:00
Sabee Grewal fe479bc9c3
Updating pom to work on Spark 2.2 support with updated package name (#208)
* updating pom to work on Spark 2.2 support with updated package name

* adding roadmap to README
2017-11-07 15:03:21 -08:00
Sabee Grewal 89f87b4d3e
Moving to Spark 2.2 and adding support for non-public clouds (#201)
* updates current tests - temporarily disabling struct stream tests

* adding support for URI in addition to namespace

* removing white space
2017-11-01 16:32:01 -07:00
Sabee Grewal 7d4645b91e Removing unused dependencies and plugins from pom.xml (#189)
* changing from java getters/setters to scala-styled ones

* removing uncessesary dependencies from pom.xml
2017-10-23 15:50:24 -07:00
Sabee Grewal 2795a40837 First phase of client consolidation (#182)
* updating poms and readme to 2.1.6-SNAPSHOT

* first phase of client consolidation
2017-10-20 16:13:35 -07:00
sabeegrewal 66849571d1 updating pom for 2.1.5 release 2017-10-17 11:24:25 -07:00
Nan Zhu c06e0fbc69 prepare for 2.2 release (#159)
* update version for 2.1.3

* [2.1.x] should update highest offset when wake up by notify() (#153)

* should update highest offset when wake up by notify()

* fix test

* making EventHubs API call for every partition each batch
2017-09-26 16:07:52 -07:00
Nan Zhu a204d3bb3b update version number (#148) 2017-09-19 13:35:07 -07:00
Nan Zhu b4f32b8cbb update version for 2.1.3 (#142) 2017-09-18 10:26:23 -07:00
Nan Zhu 207d9834e4 [2.1.x] optimize thread synchronization and show metrics caused by reading progress files (#124)
* fix flaky test

* remove duplicate code

* change sync order and add metrics

* update pom

* update version number

* change back pom

* test fix
2017-08-24 10:59:32 -07:00
Nan Zhu f3896776a3 release 2.1.2 and 2.0.8 (#113) 2017-07-31 12:44:15 -07:00
Nan Zhu 717b297ca8 [2.1.x] replace rest client with amqp one (#97)
* replace rest client with amqp one

* fix the failed tests
2017-06-26 09:39:24 -07:00
Nan Zhu 49a47512ae update pom version number (#88) 2017-05-25 09:06:52 -07:00
Nan Zhu ee7dcfe658 Structured Streaming Support of Azure Event Hubs (#77)
structured streaming support
2017-05-03 07:47:08 -07:00
Nan Zhu 41e91379c9 Update pom.xml 2017-03-28 08:34:00 -07:00
Nan Zhu b79cf97679 release of 2.0.4 (#52)
* ignore scalastyle output

* test eventhubs 0.12

* release of 2.0.4

* fix compilation error

* fix NPE

* further fix NPE

* fix classcastexception

* fix failed test cases

* include scalaj to jar file

* do not limit to use WASB

* upgrade to 0.13

* release note of 2.0.4

* longer waiting interval

* restendpoint (#18)
2017-03-28 07:50:47 -07:00
Nan Zhu 399e2167bf Update pom.xml 2017-01-27 14:56:24 -08:00
Nan Zhu d5d97c3524 Direct Stream (#20)
implement the direct dstream based integration of Spark Streaming and EventHubs
2017-01-04 09:05:45 -08:00