Граф коммитов

69 Коммитов

Автор SHA1 Сообщение Дата
Yuqi Wang 1232a3dfd6
Add slides for FrameworkController DeepDive (#80) 2022-12-08 14:16:20 +08:00
microsoft-github-policy-service[bot] 6a3d9c17db
Add Microsoft mandatory file (#78)
Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>
2022-08-03 10:42:23 +08:00
Yuqi Wang d602cdcbab
Keep podGracefulDeletionTimeoutSec as Nullable (#74) 2022-02-23 18:38:16 +08:00
Yuqi Wang e6589162fb
Upgrade CRD to apiextensions.k8s.io/v1 to support k8s >= 1.22 (#72) 2022-01-17 11:08:28 +08:00
Yuqi Wang 2746a34ed6
Switch to go mod (#69) 2022-01-14 17:55:08 +08:00
Yuqi Wang 29d95df1ab
Add Third Party Controller Wrapper: AzureML Kubernetes Compute (#63) 2021-03-23 20:18:54 +08:00
Yuqi Wang 4b5707f53e
Expose Task History (#62) 2020-10-19 21:39:28 +08:00
Yuqi Wang 959722c429
Treat invalid Pod caused by network error as PodCreationUnknownError (#61) 2020-08-31 20:57:51 +08:00
Yuqi Wang a220bd321f
Expose and increase default sync concurrency (#60) 2020-08-28 19:58:15 +08:00
Yuqi Wang 29e115373a
Pause zero scale Framework instead of completing it (#59) 2020-08-11 19:46:34 +08:00
Yuqi Wang d67fc76595
Support Create ExecutionType: Just create without start (#58) 2020-08-10 12:20:02 +08:00
Yuqi Wang 7669288d1e
Refine Doc (#57) 2020-08-05 15:25:07 +08:00
Yuqi Wang c61269671d
Support Framework ScaleUp/ScaleDown with Strong Safety Guarantee (#56) 2020-08-03 14:00:49 +08:00
Yuqi Wang 896761acc2
Update HiveD TF example (#55) 2020-03-30 16:36:02 +08:00
Yuqi Wang dbcb0d117c
Update doc links for HiveD (#54) 2020-03-23 20:17:46 +08:00
Di Xu 40fb74d1d5
add FC_TASK_INDEX to label so can select pod uniquely (#53) 2020-02-14 18:33:00 -08:00
Yuqi Wang c4be168117
Enrich PodSpecError to early fail Pod (#52) 2020-01-17 16:32:53 +08:00
Yuqi Wang 7789e3e73f
Aware UID change during Update event and Sync (#51) 2020-01-13 13:57:28 +08:00
Scarlett Li 9d1822f4ae
Update README.md 2019-12-25 10:18:17 +08:00
Yuqi Wang 285ade0ea8
Remove deprecated Initializers in planning (#50) 2019-12-10 11:12:32 +08:00
Yuqi Wang 429fa5498e
Fix invalid json in log caused by fmt (MISSING) (#49) 2019-11-12 14:17:11 +08:00
Yuqi Wang b819592951
Update Doc for Framework Availability (#48) 2019-11-06 11:28:09 +08:00
Yuqi Wang e0ffdc3266
Add project badges (#47) 2019-10-30 21:14:32 +08:00
Yuqi Wang 452675ab59
Setup CI build (#46) 2019-10-30 20:40:28 +08:00
Yuqi Wang 707b7a9c97
Add PodNodeName to help track failures on node before PodIP is available (#45) 2019-10-29 17:13:04 +08:00
Yuqi Wang 8e4145176c
Support large scale Framework by LargeFrameworkCompression (#44) 2019-10-23 15:43:59 +08:00
Yuqi Wang 77ec4abbdc
Support PodGracefulDeletionTimeoutSec to tune Framework Consistency vs Availability (#43) 2019-09-19 17:54:25 +08:00
Yuqi Wang 42373169ca
Refine PodFailureSpec (#42) 2019-09-16 13:44:41 +08:00
Yuqi Wang df63d60c53
Support PodFailureSpec to classify and summarize Pod failures (#41) 2019-09-02 18:54:50 +08:00
Yuqi Wang 54a4554b69
Refine RetryDelaySec (#39) 2019-08-16 17:46:05 +08:00
Yuqi Wang e13b822eff
Remove unnecessary recoverFrameworkWorkItems (#38) 2019-08-12 13:42:49 +08:00
Yuqi Wang 4a771cd6c4
Support FrameworkCompletedRetainSec (#37) 2019-08-09 19:04:40 +08:00
Yuqi Wang 9298ab677c
Redefine FrameworkAttemptRunning and Record attempt running start time (#35)
This helps to measure pure running duration
2019-08-08 11:21:06 +08:00
Yuqi Wang 80492e5c53
Add TensorFlow Example to leverage HivedScheduler (#34) 2019-08-02 12:19:32 +08:00
Yuqi Wang d432b57875
Fill object TypeMeta/GroupVersionKind in case it is missed in history snapshot (#33) 2019-08-01 14:44:26 +08:00
Yuqi Wang 220a3df922
Update Doc (#32) 2019-07-31 17:29:26 +08:00
Yuqi Wang 1aa6e612e1
Support LogObjectSnapshot: Expose Framework and Pod History (#31) 2019-07-31 15:04:08 +08:00
Yuqi Wang 48f601bb39
Switch to klog (#30) 2019-07-26 20:09:05 +08:00
Yuqi Wang 2caad5b969
Upgrade to golang 1.12.6 (#29) 2019-07-18 15:58:23 +08:00
Yuqi Wang 243996c2c0
Refine Framework golang iteration (#28) 2019-07-17 15:37:44 +08:00
Yuqi Wang 157c3bfe0f
Still sync Task after FrameworkAttemptCompleted (#27) 2019-07-17 15:35:32 +08:00
Yuqi Wang 20f38add58
Revise TaskStatus after FrameworkAttemptCompleted (#26) 2019-07-17 15:32:31 +08:00
Yuqi Wang 9cf3c8881e
Make AttemptCompleted state is not necessary to have an associated instance (#25) 2019-07-17 15:27:38 +08:00
Yuqi Wang dbb98da159
Support Stop Framework (#24) 2019-07-17 15:24:19 +08:00
Yuqi Wang 440a3bf7fa
Update doc links (#23) 2019-07-17 15:21:23 +08:00
Yuqi Wang b4b3695cea
Revise Internal and External CompletionTypeAttribute to User and Platform (#22) 2019-07-17 15:18:57 +08:00
Yuqi Wang 0a7b9851a0
Support Pod Template Placeholders (#21) 2019-07-17 15:16:08 +08:00
Yuqi Wang a4b9eeb690
Consolidate slice append (#20) 2019-07-17 15:12:56 +08:00
Yuqi Wang 1fb9e251e7
Refine updateRemoteFrameworkStatus (#19) 2019-07-17 15:09:52 +08:00
Yuqi Wang 63422e0227
Fix fExpectedStatusInfos map race condition (#18) 2019-07-17 15:03:33 +08:00