Граф коммитов

3995 Коммитов

Автор SHA1 Сообщение Дата
Yuge Zhang 1bc5c348f4
Fix typo in history API error message (#4401) 2020-04-27 14:39:44 +08:00
YundongYe 972109f418
[Doc] Remove yarn content from deployment doc (#4447) 2020-04-26 10:22:48 +08:00
Zhiyuan He 72268d9328
fix job retry url (#4442)
* fix

* fix
2020-04-26 10:06:21 +08:00
YundongYe 14016f8be1
[cluster-object-model] Disable yarn value in cluster-type (#4445) 2020-04-24 17:07:59 +08:00
YundongYe 3ab8ae0479
Fix swagger issue. (#4430) 2020-04-24 15:24:52 +08:00
YundongYe 54453e4f45
[build] Disable yarn option in pai_build.py. (#4443)
* Disable Yarn build option and remove hadoop binary building source code

* Update doc
2020-04-24 13:08:01 +08:00
YundongYe db46dc0e2e
[Jenkins]Remove Yarn Verion's Test (#4440) 2020-04-24 10:37:13 +08:00
Yifan Xiong aaf3ac80d4
Fix Azure File issues in storage (#4438)
Fix Azure File issues in storage.
2020-04-24 10:34:14 +08:00
Yifan Xiong 4e4124c1f0
[Docs] Update nullable fields in job api (#4437)
Update nullable fields in job api.
2020-04-23 16:23:51 +08:00
Yifan Xiong 9951624c73
[Rest Server] Add rate limit for RESTful API (#4418)
Add rate limit for RESTful API, defaults to:
* 600 requests per ip every 1 min
* 60 list job requests per user every 1 min
* 60 submit job requests per user every 1 hour
2020-04-22 13:07:13 +08:00
Binyang2014 ae23350d7b
[watchdog] change default mem limit (#4413) 2020-04-22 10:02:19 +08:00
Yuqing Yang 8b9632926d
validate rest server swagger (#4404)
* add swagger validation to travis

* fix validation error in swagger (#4405)
2020-04-20 10:47:59 +08:00
Yifan Xiong 7b1247d087
[Rest Server] Fix log path length (#4400)
Fix log path length, set job name length limit to 255.

Closes #4391.
2020-04-17 16:40:56 +08:00
Yuqing Yang 13bea1e062
refine rest swagger and codes (#4355)
* fix empty error

* hosted on https://swagger

* remove /api/v2/jobs/{}/ssh

* update GET token and POST basic/login

* [Docs] Update job and virtual cluster api docs (#4363)

* Update job and virtual cluster api docs

Update job and virtual cluster api docs.

* Update security option

Update security option.

* Update operation id

Update operation id.

* bind authn api to v2 and update doc  (#4368)

* Add document of auth api.

* Fix description

* update swagger

* Fix api's method

* Fix api/v2/auth to api/v2/authn

* [Restserver] Refine group api (#4371)

* Update Group API

* Fix issue

* Change update api from put to patch

* Add patchExtension into put api.

* issue fix.

* typo fix

* Fix travis

* fix typo

* Update token APIs and k8s proxy APIs (#4375)

* update example for ForbiddenUserError

* add get /api/v2/info

* add v2 router

* fix

* update

Co-authored-by: yiyione <yiyi_one@foxmail.com>
Co-authored-by: Yi Yi <yiyi@microsoft.com>

* update group api doc (#4374)

* update group api doc

* Set patch's default value to false

* Pack group entry into

* Pack group entry into

* Add job attempt schema in api doc (#4376)

* Refine user api (#4379)

* Update user api doc. (#4383)

* Check parameter in joi (#4388)

* update token to tokens (#4389)

Co-authored-by: yiyione <yiyi_one@foxmail.com>

* fix typo

* update (#4398)

Co-authored-by: yiyione <yiyi_one@foxmail.com>

* basic  logout api (#4399)

* Add length limit for job name

Add length limit for job name.

Co-authored-by: Yifan Xiong <yifan.xiong@microsoft.com>
Co-authored-by: YundongYe <8675883+ydye@users.noreply.github.com>
Co-authored-by: yiyione <307402594@qq.com>
Co-authored-by: yiyione <yiyi_one@foxmail.com>
Co-authored-by: Yi Yi <yiyi@microsoft.com>
Co-authored-by: Mingliang Tao <mintao@microsoft.com>
2020-04-17 15:42:12 +08:00
Mingliang Tao e7db965ae2
Fix abnormal job list bug in home page (#4397) 2020-04-17 14:03:59 +08:00
Yifan Xiong 8063bf6509
[Rest Server] Update job and virtual cluster api (#4381)
* Add bearer auth for job and vc api
* Migrate sku types api in vc
* Add auth token in webportal
2020-04-17 13:08:06 +08:00
Yifan Xiong 5165b5cbf9
[Rest Server] Add minimal priority when priority class disabled (#4396)
* Add minimal priority when priority class disabled

Add minimal priority when priority class disabled.

* Skip yarn version

Skip yarn version.
2020-04-17 10:41:37 +08:00
Binyang2014 6d31509563
[deploy] update doc about removing weave-net(#4390) 2020-04-17 09:06:35 +08:00
Mingliang Tao f403cdf8f3
Refine gpu chart (#4386) 2020-04-16 17:32:04 +08:00
Mingliang Tao 30d6eb73af
Change title from Utilization to Allocation (#4382) 2020-04-14 12:32:09 +08:00
Mingliang Tao 4a7ea4697a
Move chevron arrow to left (#4380) 2020-04-14 12:31:53 +08:00
Binyang2014 4e0121f8e3
Update runtime and errorSpec (#4378)
Bump runtime version to v0.1.1
Update errorSpec to handle ecc error
2020-04-10 15:37:58 +08:00
Yifan Xiong 0b8c35896d
Change gpuType(s) to skuType(s) (#4362)
Change gpuType(s) to skuType(s).
2020-04-09 17:12:20 +08:00
Zhiyuan He 763a3b498f
Disable `default` vc option in edit user dialog (#4360) 2020-04-08 13:08:14 +08:00
YundongYe 06f8c51969
[pylon] Update openssl-1.1.0f's url and fix GitPython version in dev-box (#4367)
* Update openssl-1.1.0f's url
* Fix GitPython to 2.x.x
2020-04-07 15:45:38 +08:00
Yuqi Wang ed66e057a6
Update HiveD Image (#4366) 2020-04-07 11:07:04 +08:00
Binyang2014 4fb34fe5cc
[log-manager] Fix return too much log issue (#4358) 2020-04-03 13:25:37 +08:00
fan yang 9838419c6c
add a link to architecture in Readme (#4353) 2020-04-02 14:53:12 +08:00
Yifan Xiong 19b8ef3bca
[Rest Server] Support CPU cell in hived scheduler (#4342)
* Support CPU cell in hived scheduler
* Update resource pre-check
2020-04-02 14:40:07 +08:00
Yifan Xiong 4a2aceadbe
[Rest Server] Update virtual cluster metrics using scheduler api (#4329)
* Update virtual cluster metrics using scheduler api
* Update node gpu metics using scheduler api
* Check virtual cluster quota using hived scheduler api during job submission
* Update virtual cluster metrics on webportal
2020-04-01 19:08:14 +08:00
Zhiyuan He c0ef775d2a
Upgrade Twisted to 20.3.0 (#4347) 2020-04-01 11:22:36 +08:00
yiyione e178ea6e99
[CI] Update python to 3.8 for lint document (#4345)
* Update python to 3.8 for lint document

* fix

Co-authored-by: Yi Yi <yiyi@microsoft.com>
2020-04-01 10:59:15 +08:00
Zhiyuan He 51a6e1fa16
Add a notice if user still wants to use yarn version (#4340)
* init

* fix

* fix
2020-03-31 10:10:40 +08:00
Binyang2014 a0b18ff94b
Bump node-exporter version to 0.18.1 (#4341) 2020-03-31 09:36:27 +08:00
Yuqi Wang 165142afed
Update HiveD Image and add exitspec for ContainerGpuDeviceNotFoundError (#4343) 2020-03-30 21:29:28 +08:00
Yifan Xiong 0631768ebb
Enable priority class by default (#4334)
Enable priority class by default.
2020-03-27 18:58:16 +08:00
Binyang2014 561149b8fe
[Rest Server] fix fetch job status failed (#4333) 2020-03-27 16:13:57 +08:00
Zhiyuan He f8365cd1d3
Update default docker image list (#4328)
* fix

* fix

* lint

* fix
2020-03-26 17:02:49 +08:00
Mingliang Tao f33c4c03b7
Remove out-dated marketplace (#4324) 2020-03-26 15:12:53 +08:00
yiyione 4742bd3fd1
[VS Code] Remove VS Code extension code (#4317)
* Remove vscode extension code

* fix broken links

* update broken links

* update

Co-authored-by: Yi Yi <yiyi@microsoft.com>
2020-03-26 14:50:02 +08:00
Yuqi Wang abb68009d4
Update HiveD Image (#4320) 2020-03-24 12:22:21 +08:00
Binyang2014 dfc7b86390
[kube-runtime] remove kube-runtime (#4311)
* remove kube-runtime

* add runtime tag

* add doc

* fix doc

* more changes

* rename to openpai-runtime

* fix build issue

* change version
2020-03-23 20:48:37 +08:00
Yuqi Wang 3a91564935
[Hived]: Move Hived from OpenPAI to dedicated repo (#4319)
* [Hived]: Move Hived from OpenPAI to dedicated repo

Moved to https://github.com/microsoft/hivedscheduler
2020-03-23 20:17:53 +08:00
Zhiyuan He 1b9635b2f3
fix (#4318) 2020-03-23 19:46:21 +08:00
Zhiyuan He 62106ff7b5
init (#4313) 2020-03-23 12:45:58 +08:00
Yuqi Wang e30100559a
[Hived]: Add Feature Demo Doc (#4235)
* [Hived]: Add Feature Demo Doc

* [Hived]: Add Feature Demo Doc

* [HiveD] refine doc (#4241)

* refine readme.md

* refine readme.md

* minor

* resolve comments

* refine feature demo

* Refine Feature Demo Doc (#4309)

* Refine

Co-authored-by: Hanyu Zhao <zhaohanyu@pku.edu.cn>
2020-03-20 20:23:34 +08:00
Yifan Xiong cb851a5ec6
Migrate protocol spec (#4307)
* Migrate protocol spec

Migrate pai protocol spec to https://github.com/microsoft/openpai-protocol.

* Fix broken links

Fix broken links.
2020-03-20 17:05:42 +08:00
Yuqi Wang c83c7fcc71
Merge pull request #4304 from microsoft/del-es
Remove Elasticsearch
2020-03-20 15:10:33 +08:00
Binyang2014 bdb37c77e2
[webportal] Change storage representation (#4300)
* Change storage representation

* fix commets
2020-03-20 14:11:34 +08:00
Zhiyuan He 29f5ce61f1
fix CI in rest-server after removal of elasticsearch (#4305) 2020-03-20 13:01:59 +08:00