Yanjie Gao
d860338876
fix part of broken link when refactor and merge code ( #1358 )
...
* fix link of faq
* fix joblog link
* fix paictl-design.md
* fix doc link
* fix grafana
* fix cluster bootup link
* fix link at how to write service
* fix link of how to write configuration
2018-09-13 15:38:36 +08:00
Di Xu
ea2938d2f9
refactor job-exporter, watchdog UT and docs for folder ( #1355 )
2018-09-13 12:33:22 +08:00
Di Xu
7462274bcf
refactor prometheus's folder ( #1353 )
2018-09-13 11:15:18 +08:00
Di Xu
1d5fbce607
refactor {yarn,job}-exporter folder ( #1352 )
2018-09-13 10:44:54 +08:00
Di Xu
5f3bffe683
refactor alert manager folder ( #1351 )
2018-09-13 09:50:05 +08:00
ZhaoYu Dong
59d2773185
Merge branch 'zhaoyu/port-conflict-after-refactor' into folder-refactor
2018-09-12 21:31:01 +08:00
ZhaoYu Dong
36434e3a77
Merge branch 'folder-refactor' into zhaoyu/port-conflict-after-refactor
2018-09-12 21:29:27 +08:00
Ziming Miao
bef05fdb35
fix bugs in refactor ( #1340 )
...
* fix clean job for refactor
* fix hadoop-ai build for refactor
2018-09-12 20:26:54 +08:00
Ziming Miao
894c04ed7e
refactor docs ( #1344 )
2018-09-12 20:25:49 +08:00
George Cheng
92e9d4fa66
Folder refactor: fix bad links of web portal and REST server ( #1345 )
2018-09-12 18:43:29 +08:00
George Cheng
9a07c8ef11
Webportal: folder refactor ( #1343 )
...
* Webportal: folder refactor
* Fix wrong path in Travis cI
* Refactor deploy
2018-09-12 18:31:44 +08:00
George Cheng
706443c335
REST server: folder refactor ( #1339 )
...
* REST server: folder refactor
* Update .travis.yml
* Fix doc links
* Refactor deployment scripts
2018-09-12 18:30:50 +08:00
Di Xu
a949680ddf
refactor watchdog folder ( #1342 )
2018-09-12 17:46:38 +08:00
Di Xu
a92ad3c37a
update build doc ( #1341 )
2018-09-12 17:29:26 +08:00
FAREAST\canwan
6d49a3177c
Merge remote-tracking branch 'origin/master' into folder-refactor
2018-09-12 16:20:32 +08:00
ZhaoYu Dong
199013eb04
Merge branch 'folder-refactor' of github.com:Microsoft/pai into zhaoyu/port-conflict
2018-09-12 15:38:11 +08:00
George Cheng
13d35df853
Add feedback with current version ( #1289 )
...
* REST server/Webportal: Bump version to v0.8.0
* Add feedback button with version
2018-09-12 14:34:45 +08:00
FAREAST\canwan
f048ddcec6
Merge remote-tracking branch 'origin/master' into folder-refactor
...
# Conflicts:
# deployment/k8sPaiLibrary/maintainconf/deploy.yaml
2018-09-12 12:47:37 +08:00
CathyWang0329
a17f154876
Generate ssh key at rest server and use webhdfs instead of hdfs binary ( #1302 )
...
* change hdfs library to webhdfs in docker container script
* update user auth when calling webhdfs
* remove npm keypair use ssh-keygen instead
* change to use ssh-keygen
* add check hdfs library exists or not when running MPI job
* update getJobSshInfo to new ssh store pattern
* add get ssh info UT
* remove debug info
* add more log to webhdfs
* fix travis build error
2018-09-12 12:40:58 +08:00
ZhaoYu Dong
ebce1e5f4c
Merge branch 'folder-refactor' of github.com:Microsoft/pai into zhaoyu/port-conflict
2018-09-12 11:34:16 +08:00
ZhaoYu Dong
0d3d4892a9
change the hdfs ports to avoid conflicts
2018-09-12 11:31:47 +08:00
FAREAST\canwan
18b40188c8
update pai_build command
2018-09-12 11:30:06 +08:00
YundongYe
9bd586aad6
[Docker] GET DOCKER ROOT DIR FROM SHELL ( #1334 )
2018-09-12 11:28:51 +08:00
FAREAST\canwan
83411fe4ed
fix typo error
2018-09-12 11:20:00 +08:00
ZhaoYu Dong
c59d640caf
Zhaoyu/disk cleaner/clean docker cache ( #1292 )
...
* add cleaner utils
* add executor
* model definition
* add test
* add unit test script
* change path
* add worker test case
* executor testcases
* add testcase main
* fix testcase
* change the testcases
* fix executor
* stop using daemon
* remove stop condition
* refine cleaner
* change scripts
* change script path
* add executor terminate
* executor exception
* setup logger
* add test logger
* fix test log
* refine cache clean action
* refine clean action
* add script tests
* fix script testcase
* change exception log
* simplify cleaner
* add test cases
* fix typo
* fix testcase
* terminate running command when timeout and refine per the review comments
* extend the wait time to 5 seconds when terminating workers
2018-09-12 11:08:47 +08:00
Liu Dongqing
e149410dd4
Support ssh auth with private key file to enable to deploy pai to AWS… ( #1308 )
...
* Support ssh auth with private key file to enable to deploy pai to AWS EC2 (#1295 )
* Check the ssh-key-filename as it is optional (#1295 )
* Check the ssh-key-filename is not None (#1295 )
* Rename configuration key-filename to keyfile-path; Make keyfile path option to pass CI
* Use k8s secret to pass ssh key to watchdog
* Create k8s secret automatically; fixed the quick start config
* Fixed the watchdog keyfile path config name
* Fixed the format of the watchdog yaml template
2018-09-12 10:18:41 +08:00
Di Xu
4d4fedc27c
add cluster name to email subject ( #1303 )
2018-09-12 09:06:01 +08:00
Di Xu
b8bace834e
emit pai service resource usage ( #1330 )
2018-09-12 07:52:02 +08:00
FAREAST\canwan
090701319e
Merge remote-tracking branch 'origin/master' into folder-refactor
2018-09-11 18:54:49 +08:00
Di Xu
ab2123fbb4
make prometheus/alert_manager/node_exporter tolerate pressures ( #1317 )
2018-09-11 18:08:05 +08:00
FAREAST\canwan
6e29174edd
Merge remote-tracking branch 'origin/yuye/bug_fix' into canwan/resolve-conflict
...
# Conflicts:
# deployment/k8sPaiLibrary/maintainconf/deploy.yaml
2018-09-11 16:23:36 +08:00
yuye@microsoft.com
c8972c9427
GET DOCKER ROOT DIR FROM SHELL
2018-09-11 15:50:37 +08:00
yuye@microsoft.com
5fd66f34c7
GET DOCKER ROOT DIR FROM SHELL
2018-09-11 15:44:28 +08:00
FAREAST\canwan
1c52645dde
add component.dep to hadoop-run
2018-09-11 15:38:04 +08:00
FAREAST\canwan
4a114f877e
resolve comment
2018-09-11 11:32:49 +08:00
FAREAST\canwan
482178010f
Merge branch 'master' into canwan/resolve-conflict
2018-09-11 10:26:56 +08:00
YundongYe
45fb9385db
Add support for random docker root dir for kubelet ( #1230 )
2018-09-11 09:48:17 +08:00
Can Wang
06e5e53370
Merge branch 'master' into canwan/resolve-conflict
2018-09-10 23:29:37 +08:00
FAREAST\canwan
d46bfdf140
remove previous pai-build
2018-09-10 19:42:46 +08:00
FAREAST\canwan
edd23e703e
fix kubernetes-cleanup
2018-09-10 17:48:18 +08:00
Di Xu
b6e0784aa8
change email template ( #1282 )
2018-09-10 15:40:27 +08:00
YundongYe
2ec8f9a6b3
A new sub-command for configuration operation: paictl config ( #1263 )
2018-09-10 14:42:14 +08:00
FAREAST\canwan
c5153d02a8
fix bug in dev-box.dockerfile
2018-09-10 13:54:43 +08:00
FAREAST\canwan
559c026f3b
update document link
2018-09-10 13:34:53 +08:00
FAREAST\canwan
e430930267
Merge remote-tracking branch 'origin/master' into canwan/update-jenkins
...
# Conflicts:
# Jenkinsfile
# deployment/k8sPaiLibrary/maintainconf/clean.yaml
# deployment/k8sPaiLibrary/maintaintool/kubernetes-cleanup.sh
# docs/pai-management/doc/add-service.md
# docs/pai-management/doc/cluster-bootup.md
# docs/pai-management/doc/how-to-write-pai-configuration.md
# pai-management/bootstrap/alert-manager/delete.sh
# pai-management/bootstrap/alert-manager/refresh.sh.template
# pai-management/bootstrap/alert-manager/service.yaml
# pai-management/bootstrap/alert-manager/stop.sh
# pai-management/bootstrap/drivers/node-label.sh.template
# pai-management/bootstrap/hadoop-jobhistory/node-label.sh.template
# pai-management/bootstrap/hadoop-node-manager/hadoop-node-manager-delete/delete-data.sh
# pai-management/container-setup.sh
# pai-management/k8sPaiLibrary/maintaintool/kubernetes-cleanup.sh
# pai-management/k8sPaiLibrary/template/kubernetes-cleanup.sh.template
# pai-management/paiLibrary/paiBuild/build_center.py
# pai-management/paiLibrary/paiBuild/hadoop_ai_build.py
# paictl.py
# prometheus/doc/exporter-for-other-services.md
# src/dev-box/build/container-setup.sh
# src/dev-box/build/dev-box.dockerfile
# src/drivers/deploy/node-label.sh.template
# src/hadoop-ai/build/build-pre.sh
# src/hadoop-jobhistory/deploy/node-label.sh.template
# src/hadoop-name-node/deploy/node-label.sh.template
# src/hadoop-node-manager/deploy/hadoop-node-manager-delete/delete-data.sh
# src/hadoop-node-manager/deploy/node-label.sh.template
# src/hadoop-resource-manager/deploy/node-label.sh.template
# src/zookeeper/deploy/node-label.sh.template
2018-09-10 13:07:56 +08:00
Hao Yuan
a59f05fe32
waiting for rest-server to be ready ( #1311 )
2018-09-10 13:05:22 +08:00
George Cheng
62f53a2f5a
Webportal: disable JS optimize in debug mode ( #1267 )
2018-09-09 23:28:53 +08:00
Can Wang
042324c68e
change script running path
2018-09-09 20:06:05 +08:00
Can Wang
8674ab52de
fix pai_build bug
2018-09-09 19:42:51 +08:00
Can Wang
2fdd14474f
fix bug in paictl
2018-09-09 19:18:02 +08:00