Yuqi Wang
d602cdcbab
Keep podGracefulDeletionTimeoutSec as Nullable ( #74 )
2022-02-23 18:38:16 +08:00
Yuqi Wang
e6589162fb
Upgrade CRD to apiextensions.k8s.io/v1 to support k8s >= 1.22 ( #72 )
2022-01-17 11:08:28 +08:00
Yuqi Wang
4b5707f53e
Expose Task History ( #62 )
2020-10-19 21:39:28 +08:00
Yuqi Wang
959722c429
Treat invalid Pod caused by network error as PodCreationUnknownError ( #61 )
2020-08-31 20:57:51 +08:00
Yuqi Wang
a220bd321f
Expose and increase default sync concurrency ( #60 )
2020-08-28 19:58:15 +08:00
Yuqi Wang
29e115373a
Pause zero scale Framework instead of completing it ( #59 )
2020-08-11 19:46:34 +08:00
Yuqi Wang
d67fc76595
Support Create ExecutionType: Just create without start ( #58 )
2020-08-10 12:20:02 +08:00
Yuqi Wang
c61269671d
Support Framework ScaleUp/ScaleDown with Strong Safety Guarantee ( #56 )
2020-08-03 14:00:49 +08:00
Di Xu
40fb74d1d5
add FC_TASK_INDEX to label so can select pod uniquely ( #53 )
2020-02-14 18:33:00 -08:00
Yuqi Wang
c4be168117
Enrich PodSpecError to early fail Pod ( #52 )
2020-01-17 16:32:53 +08:00
Yuqi Wang
7789e3e73f
Aware UID change during Update event and Sync ( #51 )
2020-01-13 13:57:28 +08:00
Yuqi Wang
285ade0ea8
Remove deprecated Initializers in planning ( #50 )
2019-12-10 11:12:32 +08:00
Yuqi Wang
429fa5498e
Fix invalid json in log caused by fmt (MISSING) ( #49 )
2019-11-12 14:17:11 +08:00
Yuqi Wang
707b7a9c97
Add PodNodeName to help track failures on node before PodIP is available ( #45 )
2019-10-29 17:13:04 +08:00
Yuqi Wang
8e4145176c
Support large scale Framework by LargeFrameworkCompression ( #44 )
2019-10-23 15:43:59 +08:00
Yuqi Wang
77ec4abbdc
Support PodGracefulDeletionTimeoutSec to tune Framework Consistency vs Availability ( #43 )
2019-09-19 17:54:25 +08:00
Yuqi Wang
42373169ca
Refine PodFailureSpec ( #42 )
2019-09-16 13:44:41 +08:00
Yuqi Wang
df63d60c53
Support PodFailureSpec to classify and summarize Pod failures ( #41 )
2019-09-02 18:54:50 +08:00
Yuqi Wang
54a4554b69
Refine RetryDelaySec ( #39 )
2019-08-16 17:46:05 +08:00
Yuqi Wang
e13b822eff
Remove unnecessary recoverFrameworkWorkItems ( #38 )
2019-08-12 13:42:49 +08:00
Yuqi Wang
4a771cd6c4
Support FrameworkCompletedRetainSec ( #37 )
2019-08-09 19:04:40 +08:00
Yuqi Wang
9298ab677c
Redefine FrameworkAttemptRunning and Record attempt running start time ( #35 )
...
This helps to measure pure running duration
2019-08-08 11:21:06 +08:00
Yuqi Wang
d432b57875
Fill object TypeMeta/GroupVersionKind in case it is missed in history snapshot ( #33 )
2019-08-01 14:44:26 +08:00
Yuqi Wang
1aa6e612e1
Support LogObjectSnapshot: Expose Framework and Pod History ( #31 )
2019-07-31 15:04:08 +08:00
Yuqi Wang
48f601bb39
Switch to klog ( #30 )
2019-07-26 20:09:05 +08:00
Yuqi Wang
2caad5b969
Upgrade to golang 1.12.6 ( #29 )
2019-07-18 15:58:23 +08:00
Yuqi Wang
243996c2c0
Refine Framework golang iteration ( #28 )
2019-07-17 15:37:44 +08:00
Yuqi Wang
157c3bfe0f
Still sync Task after FrameworkAttemptCompleted ( #27 )
2019-07-17 15:35:32 +08:00
Yuqi Wang
20f38add58
Revise TaskStatus after FrameworkAttemptCompleted ( #26 )
2019-07-17 15:32:31 +08:00
Yuqi Wang
9cf3c8881e
Make AttemptCompleted state is not necessary to have an associated instance ( #25 )
2019-07-17 15:27:38 +08:00
Yuqi Wang
dbb98da159
Support Stop Framework ( #24 )
2019-07-17 15:24:19 +08:00
Yuqi Wang
b4b3695cea
Revise Internal and External CompletionTypeAttribute to User and Platform ( #22 )
2019-07-17 15:18:57 +08:00
Yuqi Wang
0a7b9851a0
Support Pod Template Placeholders ( #21 )
2019-07-17 15:16:08 +08:00
Yuqi Wang
a4b9eeb690
Consolidate slice append ( #20 )
2019-07-17 15:12:56 +08:00
Yuqi Wang
1fb9e251e7
Refine updateRemoteFrameworkStatus ( #19 )
2019-07-17 15:09:52 +08:00
Yuqi Wang
63422e0227
Fix fExpectedStatusInfos map race condition ( #18 )
2019-07-17 15:03:33 +08:00
Yuqi Wang
9ab3eadace
Fix LogLines ( #17 )
2019-07-17 14:59:07 +08:00
Yuqi Wang
3654ac11ce
Upgrade to kubernetes-1.14.2 ( #16 )
2019-07-17 14:47:01 +08:00
Yuqi Wang
8adcef25f6
Add FrameworkAttemptPreparing State ( #12 )
2019-02-19 19:28:02 +08:00
Yuqi Wang
0c93ab3733
Refine CompletionPolicy comment and log ( #11 )
2019-02-15 13:58:46 +08:00
Yuqi Wang
7e3eaa0c21
Fix TaskComplete may transition to TaskAttemptCompleted ( #10 )
2019-02-14 19:06:27 +08:00
Yuqi Wang
3420ae0e67
[BREAKING CHANGE]: Refine AnnotationKey, LabelKey and EnvName ( #6 )
...
1. Change "POD_NAMESPACE" to "FRAMEWORK_NAMESPACE"
2. Prefix "FC_" for all FrameworkController Predefined AnnotationKeys, LabelKeys and EnvNames
3. Prefix "FB_" and uppercase TaskRoleName for all FrameworkBarrier EnvNames
2019-01-17 17:41:46 +08:00
Yuqi Wang
07c2a6c058
Refine Doc and Example ( #3 )
...
Refine Doc and Example
2018-12-17 21:32:22 +08:00
Yuqi Wang
94a1680339
Support FrameworkBarrier for GangExecution and Add Distributed TensorFlow Training Example ( #2 )
...
1. Support FrameworkBarrier for GangExecution
2. Add Distributed TensorFlow Training Example
2018-11-23 14:53:04 +08:00
Yuqi Wang
75dea76860
Initial FrameworkController: General-Purpose Kubernetes Pod Controller
2018-10-22 08:34:54 +00:00