azure-event-hubs-spark/docs/misc
Sabee Grewal 73f486e535
Library re-write (For Spark 2.3) (#229)
* added EventHubsConf but haven't integrated it yet. build is stable!

* putting a pin in these EventHubsConf changes to focus on Spark 2.2

* WIP: implementation and tests complete. Need to fix issue related to Spark 2.2

* updating connector to work with Spark 2.2

* minor update to comments

* setting timeouts in EventHubClientWrapper

* change EventHubsConf.copy to EventHubsConf.clone

* temporarily disabling tests. progress tracker tests are being problematic and they are going to be removed in the next phase of cleanup

* driver-side translation added. dstream re-written. rdd re-written. configuration documentation added.

* EventHubsSource partial rewrite complete. Committing progress b/c need hit pause and fix a bug in an older version

* EventHubsSource re-write complete. Moving on to testing. Re-write was substantial, so I expect further changes will be needed as we fine tune the connector

* Fixed client, starting tests

* moving all client functionality into the client. added simulated eventhubs. gonna starting really reworking the tests now

* cleaned out old tests. updated code. everything is building, no tests yet.

* updated EventHubsConfSuite, all tests passing

* test utils set up, first RDD is passing

* adding RDD tests

* finalized sequence number support in eventhubsconf, dstream, and source

* basic stream tests done, moving to checkpointing tests

* finished DStream tests. moving to Source tests

* tests for EventHubsSourceOffset and JsonUtils

* removing excessive stack trace printing

* first few source tests. running into a cast exception due to EH Java Client, gonna take care of that now

* fixing how source handles EnqueueTime from EventData

* added maxSeqNoPerTrigger and corresponding source tests

* additional Source tests

* decoupled simulated client from simulated eventhubs. extended simulated eventhubs to allow sending events

* rdd, dstream, and source tests adapted to new simulated eventhubs

* adding AddEventHubsData integration tests. switching machines.

* modifying eventhubsclient to avoid false positive data loss reports

* additional structured streaming integration tests

* adding support for national and private clouds via setDomainName in EventHubsConf

* added final integration tests for struct streaming

* EnqueuedTime is converted to java.sql.Timestamp

* removing unused imports

* moving to eventhubs java client 1.0.0

* Remove isValid from EventHubsConf

* maxRatePerPartition refactoring

* Client refactoring - signature changes and removing unused methods

* EventHubsConf refactoring

* Common package is removed

* dropping default max rate

* Support for JavaRDD and JavaInputDStream

* Rename Position to EventPosition

* misc cleanup

* Support multiple simulated eventhubs at once

* remove sql containsProps and userDefinedKeys options

* parallelized all loops in EventHubsClient.translate

* removing unecessary comments

* adding javadoc comments

* conn str builder tests

* EventHubsConf tests added

* Minor bug fixes and EventPosition serialization issue is fixed

* Simulated client is enabled in tests

* Moving non-util files out of utils package

* ClientWrapper fix

* Minor bug fixes in tests

* Moved to Spark 2.3, all tests passing

* EventPosition bug fix

* Receive until we do get null, only make API call for partition count once

* moving defaults into package.scala

* removing out of date docs, adding structured streaming integration guide

* spark streaming integration guide

* Removing old information from docs

* Updated PySpark docs

* updating doc name

* Updating minor issues in docs. Added experimental tag to four apis in eventhubsconf

* Adding support for batch styled queries in structured streaming

* Update struct streaming docs to reflect new batch query support

* docs/README formatting

* doc fomratting

* add batch style query code sample in docs

* EventData: remove inclusive flag from public api. Starts are always inclusive, ends are always exclusive

* Updating public apis to take NameAndPartition instead of PartitionId

* Fixing javadoc issues in EventPosition

* updating readme

* updating templates for pull requests, issues, and contriubting

* moving test resource to test directory

* renaming EventHubsClientWrapper to EventHubsClient

* fixing access issues in NameandPartition

* reorganizing test resources

* Accomodating breaking changes in java client

* Additional tracing in translate method

* Client connection pooling and thread pooling first draft

* Minor bug fix to connection pool

* remove failOnDataLoss option

* Adding EventHubsSink

* Adding send functionality to TestUtils

* First batched writes passing

* More unit tests for EventHubsRelation and EventHubsSink

* Additional Sink tests

* Final Sink test updates

* Adding Sink documentation to integration guide

* Adding databricks docs

* remvoing concurrent jobs limit in spark streaming

* Check for EventData expiration each batch

* Rebase

* Adding preferred location in Spark Streaming and Struct Streaming

* concurrency bug fix in EVentHubsClient

* Minor logging fix

* retry client create until successful

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update spark-streaming-eventhubs-integration.md

* Update structured-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Update README.md

* add toString for simulated eventhubs

* Update structured-streaming-eventhubs-integration.md

* Update spark-streaming-eventhubs-integration.md

* Update azure_eventhubs_support.md

* Updating docs - typo fixes and reorganizing

* fixing NPE in RDD

* Moving to proper Spark 2.3.0 release and Java client 1.0.0 release

* Enabling unit and integration tests in Travis

* Updating CONTRIBUTING.md

* additional traces in client pool
2018-03-02 10:33:18 -08:00
..
install_spark_on_windows.md Library re-write (For Spark 2.3) (#229) 2018-03-02 10:33:18 -08:00