Yuqi Wang
|
1232a3dfd6
|
Add slides for FrameworkController DeepDive (#80)
|
2022-12-08 14:16:20 +08:00 |
microsoft-github-policy-service[bot]
|
6a3d9c17db
|
Add Microsoft mandatory file (#78)
Co-authored-by: microsoft-github-policy-service[bot] <77245923+microsoft-github-policy-service[bot]@users.noreply.github.com>
|
2022-08-03 10:42:23 +08:00 |
Yuqi Wang
|
d602cdcbab
|
Keep podGracefulDeletionTimeoutSec as Nullable (#74)
|
2022-02-23 18:38:16 +08:00 |
Yuqi Wang
|
e6589162fb
|
Upgrade CRD to apiextensions.k8s.io/v1 to support k8s >= 1.22 (#72)
|
2022-01-17 11:08:28 +08:00 |
Yuqi Wang
|
2746a34ed6
|
Switch to go mod (#69)
|
2022-01-14 17:55:08 +08:00 |
Yuqi Wang
|
29d95df1ab
|
Add Third Party Controller Wrapper: AzureML Kubernetes Compute (#63)
|
2021-03-23 20:18:54 +08:00 |
Yuqi Wang
|
4b5707f53e
|
Expose Task History (#62)
|
2020-10-19 21:39:28 +08:00 |
Yuqi Wang
|
959722c429
|
Treat invalid Pod caused by network error as PodCreationUnknownError (#61)
|
2020-08-31 20:57:51 +08:00 |
Yuqi Wang
|
a220bd321f
|
Expose and increase default sync concurrency (#60)
|
2020-08-28 19:58:15 +08:00 |
Yuqi Wang
|
29e115373a
|
Pause zero scale Framework instead of completing it (#59)
|
2020-08-11 19:46:34 +08:00 |
Yuqi Wang
|
d67fc76595
|
Support Create ExecutionType: Just create without start (#58)
|
2020-08-10 12:20:02 +08:00 |
Yuqi Wang
|
7669288d1e
|
Refine Doc (#57)
|
2020-08-05 15:25:07 +08:00 |
Yuqi Wang
|
c61269671d
|
Support Framework ScaleUp/ScaleDown with Strong Safety Guarantee (#56)
|
2020-08-03 14:00:49 +08:00 |
Yuqi Wang
|
896761acc2
|
Update HiveD TF example (#55)
|
2020-03-30 16:36:02 +08:00 |
Yuqi Wang
|
dbcb0d117c
|
Update doc links for HiveD (#54)
|
2020-03-23 20:17:46 +08:00 |
Di Xu
|
40fb74d1d5
|
add FC_TASK_INDEX to label so can select pod uniquely (#53)
|
2020-02-14 18:33:00 -08:00 |
Yuqi Wang
|
c4be168117
|
Enrich PodSpecError to early fail Pod (#52)
|
2020-01-17 16:32:53 +08:00 |
Yuqi Wang
|
7789e3e73f
|
Aware UID change during Update event and Sync (#51)
|
2020-01-13 13:57:28 +08:00 |
Scarlett Li
|
9d1822f4ae
|
Update README.md
|
2019-12-25 10:18:17 +08:00 |
Yuqi Wang
|
285ade0ea8
|
Remove deprecated Initializers in planning (#50)
|
2019-12-10 11:12:32 +08:00 |
Yuqi Wang
|
429fa5498e
|
Fix invalid json in log caused by fmt (MISSING) (#49)
|
2019-11-12 14:17:11 +08:00 |
Yuqi Wang
|
b819592951
|
Update Doc for Framework Availability (#48)
|
2019-11-06 11:28:09 +08:00 |
Yuqi Wang
|
e0ffdc3266
|
Add project badges (#47)
|
2019-10-30 21:14:32 +08:00 |
Yuqi Wang
|
452675ab59
|
Setup CI build (#46)
|
2019-10-30 20:40:28 +08:00 |
Yuqi Wang
|
707b7a9c97
|
Add PodNodeName to help track failures on node before PodIP is available (#45)
|
2019-10-29 17:13:04 +08:00 |
Yuqi Wang
|
8e4145176c
|
Support large scale Framework by LargeFrameworkCompression (#44)
|
2019-10-23 15:43:59 +08:00 |
Yuqi Wang
|
77ec4abbdc
|
Support PodGracefulDeletionTimeoutSec to tune Framework Consistency vs Availability (#43)
|
2019-09-19 17:54:25 +08:00 |
Yuqi Wang
|
42373169ca
|
Refine PodFailureSpec (#42)
|
2019-09-16 13:44:41 +08:00 |
Yuqi Wang
|
df63d60c53
|
Support PodFailureSpec to classify and summarize Pod failures (#41)
|
2019-09-02 18:54:50 +08:00 |
Yuqi Wang
|
54a4554b69
|
Refine RetryDelaySec (#39)
|
2019-08-16 17:46:05 +08:00 |
Yuqi Wang
|
e13b822eff
|
Remove unnecessary recoverFrameworkWorkItems (#38)
|
2019-08-12 13:42:49 +08:00 |
Yuqi Wang
|
4a771cd6c4
|
Support FrameworkCompletedRetainSec (#37)
|
2019-08-09 19:04:40 +08:00 |
Yuqi Wang
|
9298ab677c
|
Redefine FrameworkAttemptRunning and Record attempt running start time (#35)
This helps to measure pure running duration
|
2019-08-08 11:21:06 +08:00 |
Yuqi Wang
|
80492e5c53
|
Add TensorFlow Example to leverage HivedScheduler (#34)
|
2019-08-02 12:19:32 +08:00 |
Yuqi Wang
|
d432b57875
|
Fill object TypeMeta/GroupVersionKind in case it is missed in history snapshot (#33)
|
2019-08-01 14:44:26 +08:00 |
Yuqi Wang
|
220a3df922
|
Update Doc (#32)
|
2019-07-31 17:29:26 +08:00 |
Yuqi Wang
|
1aa6e612e1
|
Support LogObjectSnapshot: Expose Framework and Pod History (#31)
|
2019-07-31 15:04:08 +08:00 |
Yuqi Wang
|
48f601bb39
|
Switch to klog (#30)
|
2019-07-26 20:09:05 +08:00 |
Yuqi Wang
|
2caad5b969
|
Upgrade to golang 1.12.6 (#29)
|
2019-07-18 15:58:23 +08:00 |
Yuqi Wang
|
243996c2c0
|
Refine Framework golang iteration (#28)
|
2019-07-17 15:37:44 +08:00 |
Yuqi Wang
|
157c3bfe0f
|
Still sync Task after FrameworkAttemptCompleted (#27)
|
2019-07-17 15:35:32 +08:00 |
Yuqi Wang
|
20f38add58
|
Revise TaskStatus after FrameworkAttemptCompleted (#26)
|
2019-07-17 15:32:31 +08:00 |
Yuqi Wang
|
9cf3c8881e
|
Make AttemptCompleted state is not necessary to have an associated instance (#25)
|
2019-07-17 15:27:38 +08:00 |
Yuqi Wang
|
dbb98da159
|
Support Stop Framework (#24)
|
2019-07-17 15:24:19 +08:00 |
Yuqi Wang
|
440a3bf7fa
|
Update doc links (#23)
|
2019-07-17 15:21:23 +08:00 |
Yuqi Wang
|
b4b3695cea
|
Revise Internal and External CompletionTypeAttribute to User and Platform (#22)
|
2019-07-17 15:18:57 +08:00 |
Yuqi Wang
|
0a7b9851a0
|
Support Pod Template Placeholders (#21)
|
2019-07-17 15:16:08 +08:00 |
Yuqi Wang
|
a4b9eeb690
|
Consolidate slice append (#20)
|
2019-07-17 15:12:56 +08:00 |
Yuqi Wang
|
1fb9e251e7
|
Refine updateRemoteFrameworkStatus (#19)
|
2019-07-17 15:09:52 +08:00 |
Yuqi Wang
|
63422e0227
|
Fix fExpectedStatusInfos map race condition (#18)
|
2019-07-17 15:03:33 +08:00 |