From what we've seen disposition errors are not link-ending errors. One of the more common ones is when the service indicates that you are being throttled.
_Some_ sends will complete (and some will fail) but closing the link means we can't tell which are which.
The recovery algorithm could get into a cycle where two goroutines could continue to recover, even when a new link was in place. This can cause a lot of thrash since each recovery of the link will interrupt any in-progress sends, causing even more recoveries, etc...
This PR fixes that by checking the link ID that we are trying to recycle. If it doesn't match then the link must already have been recovered so we can no-op and early return.
This part wasn't the highest value for recovery compared to just skipping recovery altogether if the link had cycled, so for now I'll just remove it and replace it with a TODO.
- Adjusting tests to handle the link ID
- Changing Recover() so I don't break the public facing API by breaking it apart into an internal version and a public facing one. (the internal one let's us pass in the expected link ID to avoid the expensive recovery)
As part of #216 we added in some features that needed to be documented in the readme and changelogs.
There are no code changes in this, it's purely documentation.
As part of the work @jhendrixMSFT added for recovering sender links we also wanted to allow the customer to cap the # of retries (currently it's infinite so long as the errors are considered retryable).
This PR allows that by exposing a new option as part of the event hub client that allows you to specify the max retry count. It's a small step to also allow configuring the underlying backoff policy details as well but I've purposefully not done that so we can take some time to discuss a possible design for retry policies.
Fixes#216
* Enable race detection in integration tests
Move tests specific to race detection to their own source file under the
race build tag.
Removed error checking for concurrent test and updated comment.
* add missing target
* add missing test case reporting
* check for error on Recover()
The MemoryPersister would return an error and a "beginning of stream" Checkpoint if a checkpoint didn't yet exist for a partition. Any code that was expecting the error to take precedence would fail. This was causing the basic sample in the readme to fail even though nothing was actually wrong.
This PR removes the error and documents a bit more of the internals with some tests.
I've added some tests in to get better confidence around it (and to document what filters you do end up with). For the most part we can fall back to just using the same non-inclusive filter for everything offset based since offsets can't be negative
Also, the tf file doesn't work with the latest terraforms so I've adjusted it like the service bus one.
The error, as far as I can tell, is "accurate" but totally unneeded as the rest of the code can handle the checkpoint just fine.
This was blowing up the basic sample from the readme, even if you did everything correctly.