Граф коммитов

27 Коммитов

Автор SHA1 Сообщение Дата
nyaghma 91fc221bce
update documentations (#618) 2022-05-03 18:02:33 -07:00
nyaghma 4f705671cb
updated the document about receiver disconnect error (#599) 2021-06-03 14:41:35 -07:00
nyaghma 234bd07cf8
Multi readers example (#540) 2020-09-30 18:57:11 -07:00
Sabee Grewal 73f486e535
Library re-write (For Spark 2.3) (#229)
* added EventHubsConf but haven't integrated it yet. build is stable!

* putting a pin in these EventHubsConf changes to focus on Spark 2.2

* WIP: implementation and tests complete. Need to fix issue related to Spark 2.2

* updating connector to work with Spark 2.2

* minor update to comments

* setting timeouts in EventHubClientWrapper

* change EventHubsConf.copy to EventHubsConf.clone

* temporarily disabling tests. progress tracker tests are being problematic and they are going to be removed in the next phase of cleanup

* driver-side translation added. dstream re-written. rdd re-written. configuration documentation added.

* EventHubsSource partial rewrite complete. Committing progress b/c need hit pause and fix a bug in an older version

* EventHubsSource re-write complete. Moving on to testing. Re-write was substantial, so I expect further changes will be needed as we fine tune the connector

* Fixed client, starting tests

* moving all client functionality into the client. added simulated eventhubs. gonna starting really reworking the tests now

* cleaned out old tests. updated code. everything is building, no tests yet.

* updated EventHubsConfSuite, all tests passing

* test utils set up, first RDD is passing

* adding RDD tests

* finalized sequence number support in eventhubsconf, dstream, and source

* basic stream tests done, moving to checkpointing tests

* finished DStream tests. moving to Source tests

* tests for EventHubsSourceOffset and JsonUtils

* removing excessive stack trace printing

* first few source tests. running into a cast exception due to EH Java Client, gonna take care of that now

* fixing how source handles EnqueueTime from EventData

* added maxSeqNoPerTrigger and corresponding source tests

* additional Source tests

* decoupled simulated client from simulated eventhubs. extended simulated eventhubs to allow sending events

* rdd, dstream, and source tests adapted to new simulated eventhubs

* adding AddEventHubsData integration tests. switching machines.

* modifying eventhubsclient to avoid false positive data loss reports

* additional structured streaming integration tests

* adding support for national and private clouds via setDomainName in EventHubsConf

* added final integration tests for struct streaming

* EnqueuedTime is converted to java.sql.Timestamp

* removing unused imports

* moving to eventhubs java client 1.0.0

* Remove isValid from EventHubsConf

* maxRatePerPartition refactoring

* Client refactoring - signature changes and removing unused methods

* EventHubsConf refactoring

* Common package is removed

* dropping default max rate

* Support for JavaRDD and JavaInputDStream

* Rename Position to EventPosition

* misc cleanup

* Support multiple simulated eventhubs at once

* remove sql containsProps and userDefinedKeys options

* parallelized all loops in EventHubsClient.translate

* removing unecessary comments

* adding javadoc comments

* conn str builder tests

* EventHubsConf tests added

* Minor bug fixes and EventPosition serialization issue is fixed

* Simulated client is enabled in tests

* Moving non-util files out of utils package

* ClientWrapper fix

* Minor bug fixes in tests

* Moved to Spark 2.3, all tests passing

* EventPosition bug fix

* Receive until we do get null, only make API call for partition count once

* moving defaults into package.scala

* removing out of date docs, adding structured streaming integration guide

* spark streaming integration guide

* Removing old information from docs

* Updated PySpark docs

* updating doc name

* Updating minor issues in docs. Added experimental tag to four apis in eventhubsconf

* Adding support for batch styled queries in structured streaming

* Update struct streaming docs to reflect new batch query support

* docs/README formatting

* doc fomratting

* add batch style query code sample in docs

* EventData: remove inclusive flag from public api. Starts are always inclusive, ends are always exclusive

* Updating public apis to take NameAndPartition instead of PartitionId

* Fixing javadoc issues in EventPosition

* updating readme

* updating templates for pull requests, issues, and contriubting

* moving test resource to test directory

* renaming EventHubsClientWrapper to EventHubsClient

* fixing access issues in NameandPartition

* reorganizing test resources

* Accomodating breaking changes in java client

* Additional tracing in translate method

* Client connection pooling and thread pooling first draft

* Minor bug fix to connection pool

* remove failOnDataLoss option

* Adding EventHubsSink

* Adding send functionality to TestUtils

* First batched writes passing

* More unit tests for EventHubsRelation and EventHubsSink

* Additional Sink tests

* Final Sink test updates

* Adding Sink documentation to integration guide

* Adding databricks docs

* remvoing concurrent jobs limit in spark streaming

* Check for EventData expiration each batch

* Rebase

* Adding preferred location in Spark Streaming and Struct Streaming

* concurrency bug fix in EVentHubsClient

* Minor logging fix

* retry client create until successful

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update spark-streaming-eventhubs-integration.md

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update README.md

* add toString for simulated eventhubs

* Update structured-streaming-eventhubs-integration.md

* Update spark-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Updating docs - typo fixes and reorganizing

* fixing NPE in RDD

* Moving to proper Spark 2.3.0 release and Java client 1.0.0 release

* Enabling unit and integration tests in Travis

* Updating CONTRIBUTING.md

* additional traces in client pool
2018-03-02 10:33:18 -08:00
Scott Lyons 63ec0f0dba Adding importable example for Databricks on Azure (#217) 2017-11-30 14:59:13 -08:00
Sabee Grewal fe479bc9c3
Updating pom to work on Spark 2.2 support with updated package name (#208)
* updating pom to work on Spark 2.2 support with updated package name

* adding roadmap to README
2017-11-07 15:03:21 -08:00
Sabee Grewal 89f87b4d3e
Moving to Spark 2.2 and adding support for non-public clouds (#201)
* updates current tests - temporarily disabling struct stream tests

* adding support for URI in addition to namespace

* removing white space
2017-11-01 16:32:01 -07:00
Sabee Grewal 156f9152f1 RateControlUtils clean up (#192)
* making naming less verbose

* more naming updates

* moving EventHubsUtils to eventhubs.common - going to add constants there and soon we'll support Spark core

* RateControlUtils cleanup

* additional cleanup

* updating examples to reflect new package location
2017-10-24 18:03:46 -07:00
Sabee Grewal 963a79e073 Cleaning EventHubUtil API (#191)
* Removed unused API and exposed the option to create a DStream from a single Event Hubs

* changing variable name for consistency

* removing dependency on EventHubs Namespace parameter

* namespace name does not need to be passed for DStream to be created

* require that only one namespace exists within ehParams

* updating examples to reflect new API
2017-10-24 12:32:59 -07:00
Sabee Grewal 2795a40837 First phase of client consolidation (#182)
* updating poms and readme to 2.1.6-SNAPSHOT

* first phase of client consolidation
2017-10-20 16:13:35 -07:00
sabeegrewal 66849571d1 updating pom for 2.1.5 release 2017-10-17 11:24:25 -07:00
Sabee Grewal 60931e35e4 Codebase cleanup. Details in in description. (#174) 2017-10-16 18:45:51 -07:00
sabeegrewal 784465dc5e more scalastyle fixes 2017-09-29 15:16:14 -07:00
sabeegrewal afb2fad2fb fixing scalastyle issues and build warnings 2017-09-29 15:12:37 -07:00
Nan Zhu b1e026758a update version number (#154)
* change version number to 2.1.4

* update to 2.1.5-SNAPSHOT
2017-09-21 09:49:16 -07:00
Nan Zhu c107b5c928 update version to 2.1.4-SNAPSHOT (#147) 2017-09-19 13:32:24 -07:00
Nan Zhu 506573d4ac update version for 2.1.3 2017-09-18 10:13:43 -07:00
Nan Zhu 207d9834e4 [2.1.x] optimize thread synchronization and show metrics caused by reading progress files (#124)
* fix flaky test

* remove duplicate code

* change sync order and add metrics

* update pom

* update version number

* change back pom

* test fix
2017-08-24 10:59:32 -07:00
Nan Zhu f3896776a3 release 2.1.2 and 2.0.8 (#113) 2017-07-31 12:44:15 -07:00
Nan Zhu 717b297ca8 [2.1.x] replace rest client with amqp one (#97)
* replace rest client with amqp one

* fix the failed tests
2017-06-26 09:39:24 -07:00
Nan Zhu 49a47512ae update pom version number (#88) 2017-05-25 09:06:52 -07:00
Nan Zhu ee7dcfe658 Structured Streaming Support of Azure Event Hubs (#77)
structured streaming support
2017-05-03 07:47:08 -07:00
Nan Zhu f17dd98a9a Update pom.xml 2017-03-28 08:34:15 -07:00
Nan Zhu b79cf97679 release of 2.0.4 (#52)
* ignore scalastyle output

* test eventhubs 0.12

* release of 2.0.4

* fix compilation error

* fix NPE

* further fix NPE

* fix classcastexception

* fix failed test cases

* include scalaj to jar file

* do not limit to use WASB

* upgrade to 0.13

* release note of 2.0.4

* longer waiting interval

* restendpoint (#18)
2017-03-28 07:50:47 -07:00
Nan Zhu 9bdbb2183b Update pom.xml 2017-01-27 14:56:45 -08:00
Nan Zhu 1f8c050238 working around several issues (#33)
This CR contains the working around for several issues we found in EventHub/Spark, specifically they are


1) Unreliable rest endpoint in EventHub -> fail the application if no response after retry
2) Spark checkpoint issue (https://issues.apache.org/jira/browse/SPARK-19278) -> disable cleanup for progress file but keep that for temp files
3) Too many files in EventHub client -> one-one mapping of Client/Receiver
2017-01-26 10:08:44 -08:00
Nan Zhu d5d97c3524 Direct Stream (#20)
implement the direct dstream based integration of Spark Streaming and EventHubs
2017-01-04 09:05:45 -08:00