Граф коммитов

94 Коммитов

Автор SHA1 Сообщение Дата
sebastianburckhardt df48b307e9 address PR feedback 2022-11-02 08:22:46 -07:00
sebastianburckhardt 7de8a0d312 support authentication using token credentials. 2022-10-13 09:08:41 -07:00
Sebastian Burckhardt cf1e345881
Update to latest Azure Storage Tables client (#195)
* update to latest Azure Storage Tables client

* Apply suggestions from code review

Co-authored-by: Jacob Viau <javia@microsoft.com>

* use fail-fast when parsing records, add unit tests, and fix a number of bugs

* Add some .ConfigureAwait(false) to AzureBlobLoadPublisher

* address PR feedback.

Co-authored-by: Jacob Viau <javia@microsoft.com>
2022-10-12 15:29:40 -07:00
Sebastian Burckhardt 0333d908f0
support implicit state deletion (#192)
* support implicit state deletion for deleted entities

* update packages
2022-09-28 14:16:53 -07:00
Sebastian Burckhardt a69551ab27
Overhaul internal layers and single-host emulation (#194)
* temp commit

* fix stuff

* update retry logic

* update tests

* rename providers to layers

* add display names for pipeline stages

* minor updates to comments and remove unnecessary usings

* temporarily disable the size checking

* edit and reorder CI pipeline for troubleshooting

* do not run replay checker on the default host fixture

* more tinkering with order of tasks in CI pipeline

* update test projects to latest runtime and packages

* remove replay checker from in-memory tests since it does not actually run anyway

* temporarily remove ConcurrentTestsFaster.EachScenarioOnce since it has demonstrated hanging

* update pipeline durations

* reenable EachScenarioOnce with fixes

* remove excessive tests
2022-09-28 13:14:01 -07:00
Sebastian Burckhardt 38b75d791f
Update tooling to latest Functions version (#190)
* Update tooling for Hello samples

* update tooling in load generator and performance tests

* remove unneded dependency on Microsoft.Azure.Functions.Extensions

* harden unit test logger
2022-08-16 11:32:21 -07:00
Sebastian Burckhardt 113fc4addc
Remove obsolete uses of ConnectionStringResolver (#171)
* replace obsolete resolver type.

* remove unnecessary usings

* undo changes due to mysterious CI failures, instead just disable warnings
2022-07-25 13:55:46 -07:00
Sebastian Burckhardt 0fa3f4b100
update packages (#175) 2022-06-21 12:21:49 -07:00
Sebastian Burckhardt 413a353e1c
fix race condition on OrchestrationRuntimeState with DurableTask.Core when committing work items (#161) 2022-05-26 06:30:37 -07:00
Sebastian Burckhardt 659873bf29
Fix cachedebugger test (#162)
* fix failing test caused by bug in cachedebugger

* adjust tolerance
2022-05-26 06:30:20 -07:00
Sebastian Burckhardt d53449a9ab
generalize placement for instance id. (#159) 2022-05-25 11:35:58 -07:00
Sebastian Burckhardt d18ba2c9d0
update package dependencies (#158) 2022-05-25 11:35:37 -07:00
Sebastian Burckhardt 344f9ff94f
Retain outbox to handle lost events (#130)
* draft commit

* revise so we can distinguish types of recovery and avoid sending too much stuff

* temp commit

* finish implementation of EH recovery

* update tracing for perf, clients, and load monitor

* fix eh checkpointing, and fix replay behavior

* update timeout handling for queries and requests

* fix timeouts for queries

* fix restart on lost events

* fix accidentally committed stuff

* make blob logger resilient against MemoryStream size overflow

* use storage account instead of storage emulator for CI
2022-04-05 08:03:13 -07:00
Sebastian Burckhardt 177cf161e0
Implement CacheSizeTracker and use FASTER v2 (#128)
* update to FASTER v2 and implement MemoryTracker

* update history size calculation in tests

* fix renamed variable

* work around incorrect status on read completion callback

* minor updates to tests

* adjust timeout for latest bits

* Recover fewer pages.

* handle one more transient timeout on storage calls.

* fix compaction

(cherry picked from commit acec2f0ac06c25e39bda3edcb4ab6a7a8b71d5bb)

* fix client issues causing hangs

(cherry picked from commit 6434c56033ee9fac995659764ab2aedd75fb0d94)

* update to latest prerelease

(cherry picked from commit 0ad283aaa9956022d1e5bf14b616961a59f071d6)

* add EH overlap to work around bug

* add tolerance to MemoryReduction test

* overhaul exception handling in query events

* remove deadlock risks in cancellation token handlers

* compaction must be categorized like scan

* load fewer pages on recovery

* trace event id string when receiving packet

* add condition to size check to account for concurrent modifications

* keep termination continuation direct for tracing and for debugging

* retry slow reads to work around FASTER hang issues

* update tracing

* instrument hang detection

* update timeout limits

* refactor cancellation

* make the faultinjector back off of startup after 2 failures

* overhaul shutdown sequence

* add hang detection to azure storage device

* add CompactThenFail test

* simplify checkpoint state machine as compaction now already moves BeginAddress

* fix bug in PartitionErrorHandler

* Arm assertions in release build also, and run EH CI tests on release

* extend hang detection limit to 90s and remove the slowest ScaleSmallScenarios test

* update compaction progress reporting

* recognize yet another type of transient storage error

* use release build and remove concurrent test from EH tests on CI

* implement thread count watcher

* fix bug in AzureStorageDevice.RemoveSegmentsAsync and add to tracking

* remove .ConfigureAwait(false)

* update PerformanceTests http bindings to support arbitrary prefixes with start/purge/query/count/await

* fix CI

* fix nullref in dispose
2022-03-09 16:40:18 -08:00
Sebastian Burckhardt 2391111762
Update to 2.6.1 (#125)
* update to version 2.6.1 of the extension

* update DurableTask.Core to 2.7.0
2022-02-15 19:50:37 -08:00
Sebastian Burckhardt 30abd4d516
Implement FASTER compaction and checkpoint testing mechanism (#121)
* implement FASTER log compaction state machine.

* fix test

* fix bug in ActivitiesState, must not overwrite ReportedLoad field

* improve tracing of outbox events

* improve event detail tracing

* improve tracing of storage operations

* improve checkpoint removal tracing

* fix state machine logic for removing obsolete checkpoints

* backport fix to EmitCurrentState

* fix batch.SendingEventId assignment: should not be gated by replay
2022-02-07 15:25:54 -08:00
Sebastian Burckhardt ba56fa4b23
use path prefix to solve atomicity of taskhub creation/deletion (#118) 2022-01-25 13:20:28 -08:00
Sebastian Burckhardt a9e6dbca23
Minor tracing changes and refactoring (#116)
* use explicit interface for IOrchestrationService, and add some eventhubs tracing

* remove configureawait and make tracing for received events more visible

* update offset provider

* two more tracing changes
2022-01-25 07:27:20 -08:00
Sebastian Burckhardt 94129ab595
fix missing await for task in ApplyToStore (#112)
* fix missing await for task in ApplyToStore

* increase timeout for test to fix CI test fail
2022-01-21 14:54:33 -08:00
Sebastian Burckhardt ab89349e3e
encapsulate orchestration service query API in interface, and fix query test that ignored continuation token (#114) 2022-01-21 09:55:42 -08:00
Sebastian Burckhardt 4e2b24fdc4
Fix hanging storage (#103)
* undo semantics change on WaitForOrchestration

* improve ping so it is useful for detecting hangs

* catch timeouts

* more tracing in EventHubsProcessor

* fix details in storage format error

* support error injection and replay checking in DF deployments

* reduce tracing detail of exceptions for warnings

* add replay condition to tracing of discarded work items

* fix replay checker to handle serialized tracked objects

* avoid propagation of hangs by adding termination wrapper on all awaits in FasterStorage

* remove instanceprotected assertions since they may not be local to a partition

* add check for termination condition to EH checkpoint saving, and add tracing to event processor host registration

* fix race condition in MemoryTransport
2022-01-11 04:53:10 -08:00
Sebastian Burckhardt 0499ba5c18
Add fault injection test, fix bugs (#101)
* implement fault injection tests

* fix tests

* update fault injection tests

* revise MemoryTransport failure handling

* fix serialization of RecoveryCompleted

* refactor concurrent tests into separate file

* revise tracing and timeouts in tests

* fix fault injection tests

* change timeout behavior of wait for orchestration to bring it in line with other backends

* fix bug that causes test to hang when recovering

* revise the recovery mechanism in MemoryTransport

* connect Partition.Assert to TestHooks so it can fail unit tests

* fix missing check for replaying when processing RecoveryCompleted event

* revise fault injector: add timeout to startup and do not inject lease renewals

* annotate tests with whether they support any transport

* adjust timeouts and update pipeline

* add labels to all the assertions and hook them up so they trip unit tests

* remove non-fixture query test from "AnyTransport" test category
2022-01-06 12:43:08 -08:00
Sebastian Burckhardt 885f7b9760
Implement replay checker (#99)
* fix aliasing bug

* implement replay checker

* undo unnecessary change

* add missing teeth: must fail unit tests on test hook errors

* fix bug, remove unnecessary field

* fix replay discrepancy on fields of outgoing messages

* fix race condition when updating creation timestamp during prefetch

* fix last commit

* fix false errors caused by json property order
2022-01-04 09:25:32 -08:00
Sebastian Burckhardt f542c93a01
fix warnings (#100) 2022-01-03 12:57:00 -08:00
Sebastian Burckhardt 9767b7e610 fix across.ps1 series 2021-12-21 14:18:03 -08:00
Sebastian Burckhardt 822e90381d
separate storage of singleton states from storage of instance and history states. (#30) 2021-12-21 08:04:44 -08:00
Sebastian Burckhardt 4dd79140ee
Update to latest versions of DF and DT (#89)
* update to latest published versions

* revert update of LoggingAbstractions
2021-11-17 09:49:29 -08:00
Sebastian Burckhardt 012b50cd35
Update performance tests (#88)
* fix small errors and add some tracing in performance tests

* edit perftests

* update concurrency default

* change throughput scale for word hash

* update series, and use public access blob for gutenberg

* update bank and fanoutfanin series

* limit number of words counted in each book to allow better load balancing

* fix error and select EP2 for alt

* update wordcount definitions for EP2 and EP3

* update wordcount definition for EP1

* add load generator app
2021-11-15 12:39:53 -08:00
Sebastian Burckhardt 45e22a23f5
Fix NullReferenceException during recovery, add unit test for recovery (#87)
* temp commit

* add simple recovery test

* remove unnecessary setting

* fix FASTER NullReferenceException during recovery caused by missing delta log
2021-11-12 13:37:35 -08:00
Sebastian Burckhardt bcb5f67c51
Improve performance of query results (#70)
* launch query and prefetch events earlier to improve latency

* add nicer query endpoint to perf tests

(cherry picked from commit 9851a1de91)

* add tracing for when a sender is starting to send and event.

* support use of multiple client channels; fix range of partitions; support pipelined return of query results

* use custom serialization to compress query results

* fix bug in test
2021-09-13 12:28:22 -07:00
Xiangfeng Zhu 6b3b3720af
Better Scheduling/Load Balancing for activities (#52)
* Add random scheduler

* add loadmonitor component

* add scheduling options & filehash tweaks

* keep partition prefix consistent with all the other tracing

* fix bug (failed to update estimated load when remote activities complete)

* Add aggresive scheduler mode

* fixed a typo

* basic load monitor

* checkpoint

* fix bug in ETW tracing

* change conditions for load monitor reporting

* use full tracing

* trace more information for OffloadCommandReceived

* change tracing of task messages to provide more consistent results

* several package updates, including DurableTask.Core 2.5.6

* update to account for tracing changes

* remove load monitor interval and add idle message

* add latency field to ActivityCompleted and RemoteActivityResultReceived

* added offload algorithms based on waiting time

* fix overflow in filehash

* fixes to loadmonitor logic, autoscaler, and temporary info sender

* fix missing file and update scale tracing

* change loadmonitor hosting so it moves less often

* implement overlay of pending commands on offload estimation

* improve precision of load estimation and distinguish between stationary and mobile.

* remove unnecessary left over line of code.

* replace push based algo with pull based

* fix so we don't pull more than mobile

* fix dumb mistake

* fix formula

* need more conservative default estimate for completion time, otherwise offload is too aggressive

* remove unnecessary constants.

* add tracing for RTT

* tweak parameters for RTT and smoothing

* batchworker tracing for senders

* use array instead of dictionary, fix concurrent modification exception

* revise sender for load monitor events to send only latest state, and to do rate limiting

* fix tracing

* turn off all loadmonitor activity when the parameter ActivityScheduler is not set LoadMonitor

* implemented "Static" setting for ActivityScheduler

* reduce severity of tracing in loadmonitor

* make filehash scale configurable

* update names, remove old algorithms

* improve terminology and implement solicitation

* fixes and simplifications

* minor updates

* simplify data structure for local activities

* use smarter concurrency defaults, same as for default backend

* fix tracing in LoadMonitor

* update filehash a bit to make it more uniform

* add test series for comparing local vs locavore

Co-authored-by: Sebastian Burckhardt <sburckha@microsoft.com>
2021-09-01 12:39:02 -07:00
Sebastian Burckhardt b8a4900960
revise time-based checkpointing so it happens only in idle moments; and do not take state checkpoints when shutting partitions down quickly. (#68) 2021-08-13 11:37:28 -07:00
Sebastian Burckhardt a2588349f1
several package updates, including DurableTask.Core 2.5.6 (#63) 2021-07-22 14:58:11 -07:00
Sebastian Burckhardt 5ea53e959c
Fix dll load problem and throw exception on 32 bit (#62)
* Go back to System.Threading.Channels version 4.7.1

* throw exception on 32bit process

* update to latest Microsoft.NET.Sdk.Functions

* use trace verbosity for unit tests
2021-07-21 15:15:54 -07:00
Sebastian Burckhardt e9a98fab8a fix warning 2021-06-22 17:04:01 -07:00
Sebastian Burckhardt f1cdd83122
Mitigations for dll load problem (#57)
* remove obsolete code for query support.

* add fast-fail for dll loading problem (#55), and go back to last functioning version.
2021-06-22 10:46:19 -07:00
Sebastian Burckhardt d10708ae0e
Update Packages (#54)
* Update package references.

* fix blob triggers to use new versions, and fix a typo
2021-06-18 07:07:21 -07:00
Sebastian Burckhardt 7af87877c8
fix replay bug in activity scheduler, and add configuration setting (#51) 2021-06-14 14:26:03 -07:00
Sebastian Burckhardt 1c0cc940dc
consolidate and reorganize performance tests, add scripts for automatic deployment, running, and results collection (#50) 2021-06-14 12:32:12 -07:00
Sebastian Burckhardt a946bfdf39 add very basic unit test for purge and list. 2021-06-03 15:24:56 -07:00
Sebastian Burckhardt 1672852ebc Use unique taskhubname for each run of DurableTask.Netherite.AzureFunctions.Test 2021-06-03 15:13:03 -07:00
Sebastian Burckhardt 244286ee91
introduce 'CacheOrchestrationCursor' setting, instead of using ExtendedSessionsEnabled (#43) 2021-05-12 07:10:05 -07:00
Sebastian Burckhardt b5ca7f9dfb
fix bug in scalemonitor, and add support for using blobs instead of tables (#42) 2021-05-11 15:57:24 -07:00
Sebastian Burckhardt 6f8ba021ed
Fix test configuration (#40)
* fix implementation of local file storage for FASTER

* fix test configuration in DurableTask.Netherite.AzureFunctions

* generalize the CI pipeline so it runs all tests again.
2021-05-07 09:39:51 -07:00
Sebastian Burckhardt fd281e7ce0 Add test for periodic orchestration; display exact value of next timer in load partition tabel 2021-04-23 15:31:47 -07:00
Sebastian Burckhardt fa47f71c35 Revert "update package versions (#34)"
This reverts commit 8ff9d1fb65.
2021-03-31 14:31:02 -07:00
Sebastian Burckhardt 8ff9d1fb65
update package versions (#34) 2021-03-31 12:03:20 -07:00
Sebastian Burckhardt 94edddc01d Merge branch 'main' of https://github.com/microsoft/durabletask-netherite into main 2021-03-31 09:55:27 -07:00
Sebastian Burckhardt 6b69465e1c
implement autoscaling and update to 2.4.3 (#24) 2021-03-31 09:53:23 -07:00
Sebastian Burckhardt 349d644eb8 fix warning 2021-03-31 07:30:13 -07:00