зеркало из https://github.com/microsoft/pai.git
fix part of broken link when refactor and merge code (#1358)
* fix link of faq * fix joblog link * fix paictl-design.md * fix doc link * fix grafana * fix cluster bootup link * fix link at how to write service * fix link of how to write configuration
This commit is contained in:
Родитель
ea2938d2f9
Коммит
d860338876
|
@ -18,11 +18,11 @@ A: Please check [job_log.md](./job_log.md)'s introduction.
|
|||
|
||||
### Q: To improve the cluster usage, user would like to see a VC can use up all cluster resource if others don’t use it.
|
||||
|
||||
A: By default, a VC can use up all cluster resource if others don’t use it. OpenPAI use [capacity scheduler](https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html) of YARN for resource allocation. maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. Default value of -1 implies a queue can use complete capacity of the cluster. [OpenPAI capacity scheduler](../pai-management/bootstrap/hadoop-resource-manager/hadoop-resource-manager-configuration/capacity-scheduler.xml.template) not set this item and there is no limit.
|
||||
A: By default, a VC can use up all cluster resource if others don’t use it. OpenPAI use [capacity scheduler](https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html) of YARN for resource allocation. maximum-capacity defines a limit beyond which a queue cannot use the capacity of the cluster. This provides a means to limit how much excess capacity a queue can use. Default value of -1 implies a queue can use complete capacity of the cluster. [OpenPAI capacity scheduler](../src/hadoop-resource-manager/deploy/hadoop-resource-manager-configuration/capacity-scheduler.xml.template) not set this item and there is no limit.
|
||||
|
||||
### Q: To ensure one user cannot occupy excessive resource, operator would like to set a quota constraint for individual users.
|
||||
|
||||
A: OpenPAI use capacity scheduler of YARN for resource allocation. User can configure the items "[yarn.scheduler.capacity.root.{{ queueName }}.user-limit-factor, yarn.scheduler.capacity.root.{{ queueName }}.minimum-user-limit-percent](https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html)" to control the user's resource quota. These configuration items are in this file [capacity-scheduler.xml.template](../pai-management/bootstrap/hadoop-resource-manager/hadoop-resource-manager-configuration/capacity-scheduler.xml.template) of OpenPAI.
|
||||
A: OpenPAI use capacity scheduler of YARN for resource allocation. User can configure the items "[yarn.scheduler.capacity.root.{{ queueName }}.user-limit-factor, yarn.scheduler.capacity.root.{{ queueName }}.minimum-user-limit-percent](https://hadoop.apache.org/docs/r1.2.1/capacity_scheduler.html)" to control the user's resource quota. These configuration items are in this file [capacity-scheduler.xml.template](../src/hadoop-resource-manager/deploy/hadoop-resource-manager-configuration/capacity-scheduler.xml.template) of OpenPAI.
|
||||
|
||||
```xml
|
||||
<property>
|
||||
|
@ -58,7 +58,7 @@ Each queue enforces a limit on the percentage of resources allocated to a user a
|
|||
|
||||
### Q: How to configure virtual cluster capacity?
|
||||
|
||||
A: Please refer [configure virtual cluster capacity](../pai-management/doc/how-to-write-pai-configuration.md#configure_vc_capacity)
|
||||
A: Please refer [configure virtual cluster capacity](./pai-management/doc/how-to-write-pai-configuration.md#configure_vc_capacity)
|
||||
|
||||
### Q: How to use private docker registry job image when submitting an OpenPAI job?
|
||||
|
||||
|
|
|
@ -151,7 +151,7 @@ If the Framework retried many times, check other attempts by searching the Frame
|
|||
|
||||
#### Note:
|
||||
- History job number limit: Currently, OpenPAI's yarn only store 1000 jobs' logs. User maybe can not find some old job's logs. For example, frequently retried job logs.
|
||||
- Job container log rotation: In order to prevent the historical log from being too large, OpenPAI configure the [docker log rotation](https://docs.docker.com/config/containers/logging/json-file/) by file [docker-daemon.json](../pai-management/k8sPaiLibrary/maintaintool/docker-daemon.json).
|
||||
- Job container log rotation: In order to prevent the historical log from being too large, OpenPAI configure the [docker log rotation](https://docs.docker.com/config/containers/logging/json-file/) by file [docker-daemon.json](../deployment/k8sPaiLibrary/maintaintool/docker-daemon.json).
|
||||
The default configuration:
|
||||
- "max-size": "100m": The maximum size of the log before it is rolled. A positive integer plus a modifier representing the unit of measure (k, m, or g). Defaults to -1 (unlimited).,
|
||||
- "max-file": "10": The maximum number of log files that can be present. If rolling the logs creates excess files, the oldest file is removed. Only effective when max-size is also set. A positive integer. Defaults to 1.
|
||||
|
|
|
@ -190,8 +190,8 @@ This configuration consists of 7 parts.
|
|||
|
||||
[Deployment](https://kubernetes.io/docs/concepts/workloads/controllers/deployment/)
|
||||
- example
|
||||
- [Grafana](../bootstrap/grafana/grafana.yaml.template)
|
||||
- [prometheus](../bootstrap/prometheus/prometheus-deployment.yaml.template)
|
||||
- [Grafana](../../../bootstrap/grafana/grafana.yaml.template)
|
||||
- [prometheus](../../../src/prometheus/deploy/prometheus-deployment.yaml.template)
|
||||
|
||||
[Job](https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/)
|
||||
- Benefits
|
||||
|
@ -255,4 +255,4 @@ Delete service
|
|||
|
||||
```
|
||||
./paictl.py service stop -p /path/to/configuration/dir -n hbase
|
||||
```
|
||||
```
|
||||
|
|
|
@ -394,7 +394,7 @@ http://<master>:9090
|
|||
|
||||
```From OpenPAI watchdog```:
|
||||
|
||||
[OpenPAI watchdog](../../prometheus/doc/watchdog-metrics.md)
|
||||
[OpenPAI watchdog](../../alerting/watchdog-metrics.md)
|
||||
|
||||
- Log
|
||||
|
||||
|
|
|
@ -416,7 +416,7 @@ hadoop:
|
|||
|
||||
| Configuration Property | File | Meaning |
|
||||
| --- | --- | --- |
|
||||
| ```custom-hadoop-binary-path```|services-configuration.yaml| Please set a path here for paictl to build [hadoop-ai](../../hadoop-ai).|
|
||||
| ```custom-hadoop-binary-path```|services-configuration.yaml| Please set a path here for paictl to build [hadoop-ai](../../../src/hadoop-ai).|
|
||||
| ```virtualClusters```|services-configuration.yaml| Hadoop queue setting. Each VC will be assigned with (capacity / total_capacity * 100%) of resources. paictl will create the 'default' VC with 0 capacity, if it is not been specified. paictl will split resources to each VC evenly if the total capacity is 0. The capacity of each VC will be set to 0 if it is a negative number.|
|
||||
|
||||
### how to check
|
||||
|
@ -564,4 +564,4 @@ The `paictl` tool sets the following default values in the 4 configuration files
|
|||
| ```docker registry``` | The docker registry is set to `docker.io`, and the docker namespace is set to `openpai`. In another word, all PAI service images will be pulled from `docker.io/openpai` (see [this link](https://hub.docker.com/r/openpai/) on DockerHub for the details of all images). |
|
||||
| ```Cluster id``` | Cluster id is set to `pai-example` |
|
||||
| ```REST server's admin user``` | REST server's admin user is set to `admin`, and its password is set to `admin-password` |
|
||||
| ```VC``` | There is only one VC in the system, `default`, which has 100% of the resource capacity. |
|
||||
| ```VC``` | There is only one VC in the system, `default`, which has 100% of the resource capacity. |
|
||||
|
|
|
@ -57,23 +57,23 @@ User could customize Hadoop startup configuration at OpenPAI's [folder / file](.
|
|||
|
||||
## Configure Zookeeper <a name="zookeeper"></a>
|
||||
|
||||
User could customize [Zookeeper](https://zookeeper.apache.org/) at OpenPAI's [folder / file](../src/zookeeper/zoo.cfg)
|
||||
User could customize [Zookeeper](https://zookeeper.apache.org/) at OpenPAI's [folder / file](../../../src/zookeeper/zoo.cfg)
|
||||
|
||||
User could customize [Zookeeper](https://zookeeper.apache.org/) startup configuration at OpenPAI's [folder / file](../../../src/zookeeper/deploy/zookeeper.yaml.template)
|
||||
|
||||
## Configure Prometheus / Exporter <a name="prometheus"></a>
|
||||
|
||||
User could customize [Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) at OpenPAI's [folder / file](../bootstrap/prometheus/prometheus-configmap.yaml.template)
|
||||
User could customize [Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) at OpenPAI's [folder / file](../../../src/prometheus/prometheus-configmap.yaml.template)
|
||||
|
||||
User could customize [Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) startup configuration at OpenPAI's [folder / file](../bootstrap/prometheus/prometheus-deployment.yaml.template)
|
||||
User could customize [Prometheus](https://prometheus.io/docs/prometheus/latest/configuration/configuration/) startup configuration at OpenPAI's [folder / file](../../../src/prometheus/prometheus-deployment.yaml.template)
|
||||
|
||||
User could customize [Node-exporter](https://github.com/prometheus/node_exporter) startup configuration at OpenPAI's [folder / file](../bootstrap/node-exporter/node-exporter.yaml.template)
|
||||
User could customize [Node-exporter](https://github.com/prometheus/node_exporter) startup configuration at OpenPAI's [folder / file](../../../src/node-exporter/node-exporter.yaml.template)
|
||||
|
||||
## Configure Alert Manager <a name="alertmanager"></a>
|
||||
|
||||
User could customize [Alert Manager](https://prometheus.io/docs/alerting/alertmanager/) at OpenPAI's [folder / file](../bootstrap/alert-manager/alert-configmap.yaml.template). Please refer to [doc](../../prometheus/doc/alert-manager.md#configuration) for more info.
|
||||
User could customize [Alert Manager](https://prometheus.io/docs/alerting/alertmanager/) at OpenPAI's [folder / file](../../../src/alert-manager/alert-configmap.yaml.template). Please refer to [doc](../../alerting/alert-manager.md#configuration) for more info.
|
||||
|
||||
User could customize [Alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) at OpenPAI's [folder / file](../../prometheus/prometheus-alert)
|
||||
User could customize [Alerting rules](https://prometheus.io/docs/prometheus/latest/configuration/alerting_rules/) at OpenPAI's [folder / file](../../../src/prometheus/deploy/alerting/prometheus-alert)
|
||||
|
||||
## Configure Grafana <a name="grafana"></a>
|
||||
|
||||
|
|
|
@ -13,7 +13,7 @@
|
|||
- Besides generation the cluster-configuration from the machine-list, the main responsibilities are boot and clean kubernetes-cluster.
|
||||
|
||||
- k8s-bootup:
|
||||
- According to the different role of the node in kubernetes-cluster, paictl will prepare different package for remote working. The package content of different role is defined in the [deploy.yaml](../../../deployment/k8sPaiLibrary/maintainconf/deploy.yaml)
|
||||
- According to the different role of the node in kubernetes-cluster, paictl will prepare different package for remote working. The package content of different role is defined in the [deploy.yaml](../../deployment/k8sPaiLibrary/maintainconf/deploy.yaml)
|
||||
- paictl will send the package to each machine through paramiko, and execute corresponding script on the remote machine.
|
||||
- After the k8s cluster is bootup, paictl will install kubectl for you to manage the cluster.
|
||||
|
||||
|
@ -42,4 +42,4 @@
|
|||
</div>
|
||||
|
||||
|
||||
- paictl will solve the start order of the services according to the configuration in the service.yaml
|
||||
- paictl will solve the start order of the services according to the configuration in the service.yaml
|
||||
|
|
Загрузка…
Ссылка в новой задаче