* [Bug fix] fix IO pipe issue that causes a blocking issue on send calls
* [Bug fix] fix broken TCP connection issue that causes prolonged delay in receiving events
* [Configuration] provide a configuration knob to specify whether an epoch receiver should be used. Default value is true.
* [Tracing] add a Spark task ID to log messages and emit latency data for receive calls
* Dont read messages that are already pruned by EventHub
In the getPartitions method of the EventHubsRDD class we check if the
offsets are still valid. It is possible that the retention has kicked
in and the messages are no longer available on the bus.
For more info, refer to this issue:
https://github.com/Azure/azure-event-hubs-spark/issues/313
Did some minor refactoring:
- Made the clientFactory static so we don't need to pass this constructor
around
- Changed the signature of allBoundedSeqNos from a Seq to a Map, since
the partitionId is unique and later on in the code it is also converted
to a map.
- Removed the trim method, since passing EventHub config keys to Spark
does not do any harm. Without this change, the tests are failing since
they are not being switched to the simulator.
* Remove the offset calculation
* Bump version to 2.3.3-SNAPSHOT
* Restore trimmed config when creating a EventHubsRDD
* added EventHubsConf but haven't integrated it yet. build is stable!
* putting a pin in these EventHubsConf changes to focus on Spark 2.2
* WIP: implementation and tests complete. Need to fix issue related to Spark 2.2
* updating connector to work with Spark 2.2
* minor update to comments
* setting timeouts in EventHubClientWrapper
* change EventHubsConf.copy to EventHubsConf.clone
* temporarily disabling tests. progress tracker tests are being problematic and they are going to be removed in the next phase of cleanup
* driver-side translation added. dstream re-written. rdd re-written. configuration documentation added.
* EventHubsSource partial rewrite complete. Committing progress b/c need hit pause and fix a bug in an older version
* EventHubsSource re-write complete. Moving on to testing. Re-write was substantial, so I expect further changes will be needed as we fine tune the connector
* Fixed client, starting tests
* moving all client functionality into the client. added simulated eventhubs. gonna starting really reworking the tests now
* cleaned out old tests. updated code. everything is building, no tests yet.
* updated EventHubsConfSuite, all tests passing
* test utils set up, first RDD is passing
* adding RDD tests
* finalized sequence number support in eventhubsconf, dstream, and source
* basic stream tests done, moving to checkpointing tests
* finished DStream tests. moving to Source tests
* tests for EventHubsSourceOffset and JsonUtils
* removing excessive stack trace printing
* first few source tests. running into a cast exception due to EH Java Client, gonna take care of that now
* fixing how source handles EnqueueTime from EventData
* added maxSeqNoPerTrigger and corresponding source tests
* additional Source tests
* decoupled simulated client from simulated eventhubs. extended simulated eventhubs to allow sending events
* rdd, dstream, and source tests adapted to new simulated eventhubs
* adding AddEventHubsData integration tests. switching machines.
* modifying eventhubsclient to avoid false positive data loss reports
* additional structured streaming integration tests
* adding support for national and private clouds via setDomainName in EventHubsConf
* added final integration tests for struct streaming
* EnqueuedTime is converted to java.sql.Timestamp
* removing unused imports
* moving to eventhubs java client 1.0.0
* Remove isValid from EventHubsConf
* maxRatePerPartition refactoring
* Client refactoring - signature changes and removing unused methods
* EventHubsConf refactoring
* Common package is removed
* dropping default max rate
* Support for JavaRDD and JavaInputDStream
* Rename Position to EventPosition
* misc cleanup
* Support multiple simulated eventhubs at once
* remove sql containsProps and userDefinedKeys options
* parallelized all loops in EventHubsClient.translate
* removing unecessary comments
* adding javadoc comments
* conn str builder tests
* EventHubsConf tests added
* Minor bug fixes and EventPosition serialization issue is fixed
* Simulated client is enabled in tests
* Moving non-util files out of utils package
* ClientWrapper fix
* Minor bug fixes in tests
* Moved to Spark 2.3, all tests passing
* EventPosition bug fix
* Receive until we do get null, only make API call for partition count once
* moving defaults into package.scala
* removing out of date docs, adding structured streaming integration guide
* spark streaming integration guide
* Removing old information from docs
* Updated PySpark docs
* updating doc name
* Updating minor issues in docs. Added experimental tag to four apis in eventhubsconf
* Adding support for batch styled queries in structured streaming
* Update struct streaming docs to reflect new batch query support
* docs/README formatting
* doc fomratting
* add batch style query code sample in docs
* EventData: remove inclusive flag from public api. Starts are always inclusive, ends are always exclusive
* Updating public apis to take NameAndPartition instead of PartitionId
* Fixing javadoc issues in EventPosition
* updating readme
* updating templates for pull requests, issues, and contriubting
* moving test resource to test directory
* renaming EventHubsClientWrapper to EventHubsClient
* fixing access issues in NameandPartition
* reorganizing test resources
* Accomodating breaking changes in java client
* Additional tracing in translate method
* Client connection pooling and thread pooling first draft
* Minor bug fix to connection pool
* remove failOnDataLoss option
* Adding EventHubsSink
* Adding send functionality to TestUtils
* First batched writes passing
* More unit tests for EventHubsRelation and EventHubsSink
* Additional Sink tests
* Final Sink test updates
* Adding Sink documentation to integration guide
* Adding databricks docs
* remvoing concurrent jobs limit in spark streaming
* Check for EventData expiration each batch
* Rebase
* Adding preferred location in Spark Streaming and Struct Streaming
* concurrency bug fix in EVentHubsClient
* Minor logging fix
* retry client create until successful
* Update structured-streaming-eventhubs-integration.md
* Update azure_eventhubs_support.md
* Update spark-streaming-eventhubs-integration.md
* Update structured-streaming-eventhubs-integration.md
* Update azure_eventhubs_support.md
* Update README.md
* add toString for simulated eventhubs
* Update structured-streaming-eventhubs-integration.md
* Update spark-streaming-eventhubs-integration.md
* Update azure_eventhubs_support.md
* Updating docs - typo fixes and reorganizing
* fixing NPE in RDD
* Moving to proper Spark 2.3.0 release and Java client 1.0.0 release
* Enabling unit and integration tests in Travis
* Updating CONTRIBUTING.md
* additional traces in client pool
* update version for 2.1.3
* [2.1.x] should update highest offset when wake up by notify() (#153)
* should update highest offset when wake up by notify()
* fix test
* making EventHubs API call for every partition each batch
* ignore scalastyle output
* test eventhubs 0.12
* release of 2.0.4
* fix compilation error
* fix NPE
* further fix NPE
* fix classcastexception
* fix failed test cases
* include scalaj to jar file
* do not limit to use WASB
* upgrade to 0.13
* release note of 2.0.4
* longer waiting interval
* restendpoint (#18)