Граф коммитов

2561 Коммитов

Автор SHA1 Сообщение Дата
Yanjie Gao d860338876
fix part of broken link when refactor and merge code (#1358)
* fix link of faq

* fix joblog link

* fix paictl-design.md

* fix doc link

* fix grafana

* fix cluster bootup link

* fix link at how to write service

* fix link of how to write configuration
2018-09-13 15:38:36 +08:00
Di Xu ea2938d2f9
refactor job-exporter, watchdog UT and docs for folder (#1355) 2018-09-13 12:33:22 +08:00
Di Xu 7462274bcf
refactor prometheus's folder (#1353) 2018-09-13 11:15:18 +08:00
Di Xu 1d5fbce607
refactor {yarn,job}-exporter folder (#1352) 2018-09-13 10:44:54 +08:00
Di Xu 5f3bffe683
refactor alert manager folder (#1351) 2018-09-13 09:50:05 +08:00
ZhaoYu Dong 59d2773185 Merge branch 'zhaoyu/port-conflict-after-refactor' into folder-refactor 2018-09-12 21:31:01 +08:00
ZhaoYu Dong 36434e3a77 Merge branch 'folder-refactor' into zhaoyu/port-conflict-after-refactor 2018-09-12 21:29:27 +08:00
Ziming Miao bef05fdb35
fix bugs in refactor (#1340)
* fix clean job for refactor

* fix hadoop-ai build for refactor
2018-09-12 20:26:54 +08:00
Ziming Miao 894c04ed7e
refactor docs (#1344) 2018-09-12 20:25:49 +08:00
George Cheng 92e9d4fa66
Folder refactor: fix bad links of web portal and REST server (#1345) 2018-09-12 18:43:29 +08:00
George Cheng 9a07c8ef11
Webportal: folder refactor (#1343)
* Webportal: folder refactor

* Fix wrong path in Travis cI

* Refactor deploy
2018-09-12 18:31:44 +08:00
George Cheng 706443c335
REST server: folder refactor (#1339)
* REST server: folder refactor

* Update .travis.yml

* Fix doc links

* Refactor deployment scripts
2018-09-12 18:30:50 +08:00
Di Xu a949680ddf
refactor watchdog folder (#1342) 2018-09-12 17:46:38 +08:00
Di Xu a92ad3c37a
update build doc (#1341) 2018-09-12 17:29:26 +08:00
FAREAST\canwan 6d49a3177c Merge remote-tracking branch 'origin/master' into folder-refactor 2018-09-12 16:20:32 +08:00
ZhaoYu Dong 199013eb04 Merge branch 'folder-refactor' of github.com:Microsoft/pai into zhaoyu/port-conflict 2018-09-12 15:38:11 +08:00
George Cheng 13d35df853
Add feedback with current version (#1289)
* REST server/Webportal: Bump version to v0.8.0

* Add feedback button with version
2018-09-12 14:34:45 +08:00
FAREAST\canwan f048ddcec6 Merge remote-tracking branch 'origin/master' into folder-refactor
# Conflicts:
#	deployment/k8sPaiLibrary/maintainconf/deploy.yaml
2018-09-12 12:47:37 +08:00
CathyWang0329 a17f154876
Generate ssh key at rest server and use webhdfs instead of hdfs binary (#1302)
* change hdfs library to webhdfs in docker container script

* update user auth when calling webhdfs

* remove npm keypair use ssh-keygen instead

* change to use ssh-keygen

* add check hdfs library exists or not when running MPI job

* update getJobSshInfo to new ssh store pattern

* add get ssh info UT

* remove debug info

* add more log to webhdfs

* fix travis build error
2018-09-12 12:40:58 +08:00
ZhaoYu Dong ebce1e5f4c Merge branch 'folder-refactor' of github.com:Microsoft/pai into zhaoyu/port-conflict 2018-09-12 11:34:16 +08:00
ZhaoYu Dong 0d3d4892a9 change the hdfs ports to avoid conflicts 2018-09-12 11:31:47 +08:00
FAREAST\canwan 18b40188c8 update pai_build command 2018-09-12 11:30:06 +08:00
YundongYe 9bd586aad6
[Docker] GET DOCKER ROOT DIR FROM SHELL (#1334) 2018-09-12 11:28:51 +08:00
FAREAST\canwan 83411fe4ed fix typo error 2018-09-12 11:20:00 +08:00
ZhaoYu Dong c59d640caf
Zhaoyu/disk cleaner/clean docker cache (#1292)
* add cleaner utils

* add executor

* model definition

* add test

* add unit test script

* change path

* add worker test case

* executor testcases

* add testcase main

* fix testcase

* change the testcases

* fix executor

* stop using daemon

* remove stop condition

* refine cleaner

* change scripts

* change script path

* add executor terminate

* executor exception

* setup logger

* add test logger

* fix test log

* refine cache clean action

* refine clean action

* add script tests

* fix script testcase

* change exception log

* simplify cleaner

* add test cases

* fix typo

* fix testcase

* terminate running command when timeout and refine per the review comments

* extend the wait time to 5 seconds when terminating workers
2018-09-12 11:08:47 +08:00
Liu Dongqing e149410dd4 Support ssh auth with private key file to enable to deploy pai to AWS… (#1308)
* Support ssh auth with private key file to enable to deploy pai to AWS EC2 (#1295)

* Check the ssh-key-filename as it is optional (#1295)

* Check the ssh-key-filename is not None (#1295)

* Rename configuration key-filename to keyfile-path; Make keyfile path option to pass CI

* Use k8s secret to pass ssh key to watchdog

* Create k8s secret automatically; fixed the quick start config

* Fixed the watchdog keyfile path config name

* Fixed the format of the watchdog yaml template
2018-09-12 10:18:41 +08:00
Di Xu 4d4fedc27c
add cluster name to email subject (#1303) 2018-09-12 09:06:01 +08:00
Di Xu b8bace834e
emit pai service resource usage (#1330) 2018-09-12 07:52:02 +08:00
FAREAST\canwan 090701319e Merge remote-tracking branch 'origin/master' into folder-refactor 2018-09-11 18:54:49 +08:00
Di Xu ab2123fbb4
make prometheus/alert_manager/node_exporter tolerate pressures (#1317) 2018-09-11 18:08:05 +08:00
FAREAST\canwan 6e29174edd Merge remote-tracking branch 'origin/yuye/bug_fix' into canwan/resolve-conflict
# Conflicts:
#	deployment/k8sPaiLibrary/maintainconf/deploy.yaml
2018-09-11 16:23:36 +08:00
yuye@microsoft.com c8972c9427 GET DOCKER ROOT DIR FROM SHELL 2018-09-11 15:50:37 +08:00
yuye@microsoft.com 5fd66f34c7 GET DOCKER ROOT DIR FROM SHELL 2018-09-11 15:44:28 +08:00
FAREAST\canwan 1c52645dde add component.dep to hadoop-run 2018-09-11 15:38:04 +08:00
FAREAST\canwan 4a114f877e resolve comment 2018-09-11 11:32:49 +08:00
FAREAST\canwan 482178010f Merge branch 'master' into canwan/resolve-conflict 2018-09-11 10:26:56 +08:00
YundongYe 45fb9385db
Add support for random docker root dir for kubelet (#1230) 2018-09-11 09:48:17 +08:00
Can Wang 06e5e53370 Merge branch 'master' into canwan/resolve-conflict 2018-09-10 23:29:37 +08:00
FAREAST\canwan d46bfdf140 remove previous pai-build 2018-09-10 19:42:46 +08:00
FAREAST\canwan edd23e703e fix kubernetes-cleanup 2018-09-10 17:48:18 +08:00
Di Xu b6e0784aa8
change email template (#1282) 2018-09-10 15:40:27 +08:00
YundongYe 2ec8f9a6b3
A new sub-command for configuration operation: paictl config (#1263) 2018-09-10 14:42:14 +08:00
FAREAST\canwan c5153d02a8 fix bug in dev-box.dockerfile 2018-09-10 13:54:43 +08:00
FAREAST\canwan 559c026f3b update document link 2018-09-10 13:34:53 +08:00
FAREAST\canwan e430930267 Merge remote-tracking branch 'origin/master' into canwan/update-jenkins
# Conflicts:
#	Jenkinsfile
#	deployment/k8sPaiLibrary/maintainconf/clean.yaml
#	deployment/k8sPaiLibrary/maintaintool/kubernetes-cleanup.sh
#	docs/pai-management/doc/add-service.md
#	docs/pai-management/doc/cluster-bootup.md
#	docs/pai-management/doc/how-to-write-pai-configuration.md
#	pai-management/bootstrap/alert-manager/delete.sh
#	pai-management/bootstrap/alert-manager/refresh.sh.template
#	pai-management/bootstrap/alert-manager/service.yaml
#	pai-management/bootstrap/alert-manager/stop.sh
#	pai-management/bootstrap/drivers/node-label.sh.template
#	pai-management/bootstrap/hadoop-jobhistory/node-label.sh.template
#	pai-management/bootstrap/hadoop-node-manager/hadoop-node-manager-delete/delete-data.sh
#	pai-management/container-setup.sh
#	pai-management/k8sPaiLibrary/maintaintool/kubernetes-cleanup.sh
#	pai-management/k8sPaiLibrary/template/kubernetes-cleanup.sh.template
#	pai-management/paiLibrary/paiBuild/build_center.py
#	pai-management/paiLibrary/paiBuild/hadoop_ai_build.py
#	paictl.py
#	prometheus/doc/exporter-for-other-services.md
#	src/dev-box/build/container-setup.sh
#	src/dev-box/build/dev-box.dockerfile
#	src/drivers/deploy/node-label.sh.template
#	src/hadoop-ai/build/build-pre.sh
#	src/hadoop-jobhistory/deploy/node-label.sh.template
#	src/hadoop-name-node/deploy/node-label.sh.template
#	src/hadoop-node-manager/deploy/hadoop-node-manager-delete/delete-data.sh
#	src/hadoop-node-manager/deploy/node-label.sh.template
#	src/hadoop-resource-manager/deploy/node-label.sh.template
#	src/zookeeper/deploy/node-label.sh.template
2018-09-10 13:07:56 +08:00
Hao Yuan a59f05fe32
waiting for rest-server to be ready (#1311) 2018-09-10 13:05:22 +08:00
George Cheng 62f53a2f5a
Webportal: disable JS optimize in debug mode (#1267) 2018-09-09 23:28:53 +08:00
Can Wang 042324c68e change script running path 2018-09-09 20:06:05 +08:00
Can Wang 8674ab52de fix pai_build bug 2018-09-09 19:42:51 +08:00
Can Wang 2fdd14474f fix bug in paictl 2018-09-09 19:18:02 +08:00