Move Prometheus/Grafana config to separate file

- Move grafana admin login info to credentials
- Update documentation for Prometheus/Grafana integration
- Resolves #205
This commit is contained in:
Fred Park 2018-06-08 13:17:40 -07:00
Родитель 449e621a66
Коммит 612e2a50e5
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 3C4D545F457737EB
25 изменённых файлов: 929 добавлений и 248 удалений

Просмотреть файл

@ -4,6 +4,8 @@
[![Image Layers](https://images.microbadger.com/badges/image/alfpark/batch-shipyard:latest-cli.svg)](http://microbadger.com/images/alfpark/batch-shipyard)
# Batch Shipyard
<img src="https://azurebatchshipyard.blob.core.windows.net/github/README-dash.png" alt="dashboard" width="1024" />
[Batch Shipyard](https://github.com/Azure/batch-shipyard) is a tool to help
provision and execute container-based batch processing and HPC workloads on
[Azure Batch](https://azure.microsoft.com/services/batch/) compute
@ -23,21 +25,17 @@ in Azure, independent of any integrated Azure Batch functionality.
Azure Batch compute nodes
* Automated deployment of required Docker and/or Singularity images to
compute nodes
* Accelerated Docker and Singularity image deployment at scale to compute
pools consisting of a large number of VMs via private peer-to-peer
distribution of container images among the compute nodes
* Mixed mode support for Docker and Singularity: run your Docker and
Singularity containers within the same job, side-by-side or even concurrently
* Comprehensive data movement support: move data easily between locally
accessible storage systems, remote filesystems, Azure Blob or File Storage,
and compute nodes
* Support for Docker Registries including
[Azure Container Registry](https://azure.microsoft.com/services/container-registry/)
and other Internet-accessible public and private registries
* Support for the [Singularity Hub](https://singularity-hub.org/) Container
Registry
* Support for serverless execution binding with
[Azure Functions](http://batch-shipyard.readthedocs.io/en/latest/60-batch-shipyard-site-extension/)
* Support for Docker Registries including
[Azure Container Registry](https://azure.microsoft.com/services/container-registry/),
other Internet-accessible public and private registries, and support for
the [Singularity Hub](https://singularity-hub.org/) Container Registry
* [Standalone Remote Filesystem Provisioning](http://batch-shipyard.readthedocs.io/en/latest/65-batch-shipyard-remote-fs/)
with integration to auto-link these filesystems to compute nodes with
support for [NFS](https://en.wikipedia.org/wiki/Network_File_System) and
@ -49,6 +47,10 @@ via [blobfuse](https://github.com/Azure/azure-storage-fuse),
[GlusterFS](https://www.gluster.org/) provisioned directly on compute nodes
(which can act as a distributed local file system/cache), and custom Linux
mount support (fstab)
* Automated, integrated
[resource monitoring](http://batch-shipyard.readthedocs.io/en/latest/66-batch-shipyard-resource-monitoring/)
with [Prometheus](https://prometheus.io/) and [Grafana](https://grafana.com/)
for Batch pools and RemoteFS storage clusters
* Seamless integration with Azure Batch job, task and file concepts along with
full pass-through of the
[Azure Batch API](https://azure.microsoft.com/documentation/articles/batch-api-basics/)
@ -89,6 +91,9 @@ optional creation of SSH tunneling scripts to Docker Hosts on compute nodes
on compliant Windows compute node pools with the ability to activate
[Azure Hybrid Use Benefit](https://azure.microsoft.com/pricing/hybrid-benefit/)
if applicable
* Accelerated Docker and Singularity image deployment at scale to compute
pools consisting of a large number of VMs via private peer-to-peer
distribution of container images among the compute nodes
## Installation
### Azure Cloud Shell

Просмотреть файл

@ -126,44 +126,3 @@ global_resources:
include:
- '*.bin'
path: /another/local/path/dir
monitoring:
location: <Azure region, e.g., eastus>
resource_group: my-prom-server-rg
hostname_prefix: prom
ssh:
username: shipyard
ssh_public_key: /path/to/rsa/publickey.pub
ssh_public_key_data: ssh-rsa ...
ssh_private_key: /path/to/rsa/privatekey
generated_file_export_path: null
public_ip:
enabled: true
static: false
virtual_network:
name: myvnet
resource_group: my-vnet-resource-group
existing_ok: false
address_space: 10.0.0.0/16
subnet:
name: my-server-subnet
address_prefix: 10.0.0.0/24
network_security:
ssh:
- '*'
grafana:
- '*'
vm_size: STANDARD_D2_V2
accelerated_networking: false
services:
resource_polling_interval: 15
lets_encrypt:
enabled: true
use_staging_environment: true
prometheus:
port: 9090
scrape_interval: 10s
grafana:
admin:
user: admin
password: admin
additional_dashboards: []

Просмотреть файл

@ -99,3 +99,9 @@ credentials:
filename: some/path/token.cache
credentials_secret_id: https://<vault_name>.vault.azure.net/secrets/<secret_id>
uri: https://<vault_name>.vault.azure.net/
# monitoring credentials
monitoring:
grafana:
admin:
username: grafana_username
password: grafana_user_password

Просмотреть файл

@ -0,0 +1,48 @@
monitoring:
location: <Azure region, e.g., eastus>
resource_group: my-prom-server-rg
hostname_prefix: prom
ssh:
username: shipyard
ssh_public_key: /path/to/rsa/publickey.pub
ssh_public_key_data: ssh-rsa ...
ssh_private_key: /path/to/rsa/privatekey
generated_file_export_path: null
public_ip:
enabled: true
static: false
virtual_network:
name: myvnet
resource_group: my-vnet-resource-group
existing_ok: false
address_space: 10.0.0.0/16
subnet:
name: my-server-subnet
address_prefix: 10.0.0.0/24
network_security:
ssh:
- '*'
grafana:
- 1.2.3.0/24
- 2.3.4.5
prometheus:
- 2.3.4.5
custom_inbound_rules:
myrule:
destination_port_range: 5000-5001
protocol: '*'
source_address_prefix:
- 1.2.3.4
- 5.6.7.0/24
vm_size: STANDARD_D2_V2
accelerated_networking: false
services:
resource_polling_interval: 15
lets_encrypt:
enabled: true
use_staging_environment: true
prometheus:
port: 9090
scrape_interval: 10s
grafana:
additional_dashboards: null

Просмотреть файл

@ -4152,7 +4152,7 @@ def action_monitor_add(table_client, config, poolid, fscluster):
:param list fscluster: list of fs clusters to monitor
"""
if util.is_none_or_empty(poolid) and util.is_none_or_empty(fscluster):
logger.error('no resources specified')
logger.error('no monitoring resources specified to add')
return
# ensure that we are operating in AAD mode for batch
if util.is_not_empty(poolid):
@ -4205,6 +4205,14 @@ def action_monitor_remove(table_client, config, all, poolid, fscluster):
if not all and util.is_not_empty(poolid):
bc = settings.credentials_batch(config)
_check_for_batch_aad(bc, 'remove pool monitors')
if (not all and util.is_none_or_empty(poolid) and
util.is_none_or_empty(fscluster)):
logger.error('no monitoring resources specified to remove')
return
if all and (util.is_not_empty(poolid) or util.is_not_empty(fscluster)):
raise ValueError(
'cannot specify --all with specific monitoring resources to '
'remove')
storage.remove_resources_from_monitoring(
table_client, config, all, poolid, fscluster)

Просмотреть файл

@ -256,3 +256,13 @@ def parse_secret_ids(client, config):
'invalid'.format(secid))
settings.set_credentials_registry_password(
config, reg, False, password)
# monitioring passwords
secid = settings.credentials_grafana_admin_password_secret_id(config)
if secid is not None:
logger.debug('fetching Grafana admin password from keyvault')
password = get_secret(client, secid)
if util.is_none_or_empty(password):
raise ValueError(
'Grafana admin password retrieved for secret id {} is '
'invalid'.format(secid))
settings.set_credentials_grafana_admin_password(config, password)

Просмотреть файл

@ -4164,7 +4164,7 @@ def monitoring_prometheus_settings(config):
conf = {}
port = None
else:
port = str(_kv_read(conf, 'port', default=9090))
port = str(_kv_read(conf, 'port'))
return PrometheusMonitoringSettings(
port=port,
scrape_interval=_kv_read_checked(
@ -4172,6 +4172,33 @@ def monitoring_prometheus_settings(config):
)
def credentials_grafana_admin_password_secret_id(config):
# type: (dict) -> str
"""Get Grafana admin password KeyVault Secret Id
:param dict config: configuration object
:rtype: str
:return: keyvault secret id
"""
try:
secid = config[
'credentials']['monitoring']['grafana']['admin'][
'password_keyvault_secret_id']
if util.is_none_or_empty(secid):
raise KeyError()
except KeyError:
return None
return secid
def set_credentials_grafana_admin_password(config, pw):
# type: (dict, str) -> None
"""Set Grafana admin password
:param dict config: configuration object
:param str pw: password
"""
config['credentials']['monitoring']['grafana']['admin']['password'] = pw
def monitoring_grafana_settings(config):
# type: (dict) -> GrafanaMonitoringSettings
"""Get grafana monitoring settings
@ -4183,10 +4210,20 @@ def monitoring_grafana_settings(config):
conf = config['monitoring']['services']['grafana']
except KeyError:
conf = {}
admin = _kv_read_checked(conf, 'admin', default={})
try:
gaconf = config['credentials']['monitoring']['grafana']
except KeyError:
gaconf = {}
admin = _kv_read_checked(gaconf, 'admin', default={})
admin_user = _kv_read_checked(admin, 'username')
if util.is_none_or_empty(admin_user):
raise ValueError('Grafana admin user is invalid')
admin_password = _kv_read_checked(admin, 'password')
if util.is_none_or_empty(admin_password):
raise ValueError('Grafana admin password is invalid')
return GrafanaMonitoringSettings(
admin_user=_kv_read_checked(admin, 'user', default='admin'),
admin_password=_kv_read_checked(admin, 'password', default='admin'),
admin_user=admin_user,
admin_password=admin_password,
additional_dashboards=_kv_read_checked(conf, 'additional_dashboards'),
)

Просмотреть файл

@ -57,6 +57,7 @@ class ConfigType(enum.Enum):
Pool = 3,
Jobs = 4,
RemoteFS = 5,
Monitor = 6,
# global defines
@ -82,6 +83,10 @@ _SCHEMAS = {
'name': 'RemoteFS',
'schema': pathlib.Path(_ROOT_PATH, 'schemas/fs.yaml'),
},
ConfigType.Monitor: {
'name': 'Monitor',
'schema': pathlib.Path(_ROOT_PATH, 'schemas/monitor.yaml'),
},
}
# configure loggers

Просмотреть файл

@ -94,8 +94,8 @@ remove them with the following commands:
## <a name="ludicrous"></a>Ludicrous Speed Quickstart
Pre-jump checklist:
* Fresh Linux machine with network access
* `git` is installed
* Linux, Mac or WSL machine with network access
* `git` and Python3 is installed
* Comfortable with Linux commandline
* Have an active Azure subscription
* Understand how to use the Azure Portal
@ -112,9 +112,9 @@ Execute jump:
git clone https://github.com/Azure/batch-shipyard.git
cd batch-shipyard
./install.sh
nano recipes/TensorFlow-CPU/config/credentials.yaml
# edit required properties in file and save
export SHIPYARD_CONFIGDIR=recipes/TensorFlow-CPU/config
nano $SHIPYARD_CONFIGDIR/credentials.yaml
# edit required properties in file and save
./shipyard pool add
./shipyard jobs add --tail stdout.txt
```

Просмотреть файл

@ -19,6 +19,10 @@ Batch Shipyard jobs and tasks configuration
Batch Shipyard remote filesystem configuration. This configuration is
entirely optional unless using the remote filesystem capabilities of
Batch Shipyard.
6. [Monitoring](16-batch-shipyard-configuration-monitor.md) -
Batch Shipyard resource monitoring configuration. This configuration is
entirely optional unless using the resource monitoring capabilities of
Batch Shipyard.
Note that all potential properties are described here and that specifying
all such properties may result in invalid configuration as some properties

Просмотреть файл

@ -99,6 +99,12 @@ credentials:
filename: some/path/token.cache
credentials_secret_id: https://<vault_name>.vault.azure.net/secrets/<secret_id>
uri: https://<vault_name>.vault.azure.net/
monitoring:
grafana:
admin:
username: grafana_username
password: grafana_user_password
password_keyvault_secret_id: https://<vault_name>.vault.azure.net/secrets/<secret_id>
```
## Details
@ -231,13 +237,15 @@ public repositories on Docker Hub or Singularity Hub. However, this is
required if pulling from authenticated private registries such as a secured
Azure Container Registry or private repositories on Docker Hub.
* (optional) `hub` defines the login property to Docker Hub. This is only
required for private repos on Docker Hub.
* (optional) `username` username to log in to Docker Hub
* (optional) `password` password associated with the username
* (optional) `password_keyvault_secret_id` property can be used to
reference an Azure KeyVault secret id. Batch Shipyard will contact the
specified KeyVault and replace the `password` value as returned by
Azure KeyVault.
required for private repos on Docker Hub.
* (required) `username` username to log in to Docker Hub
* (required unless `password_keyvault_secret_id` is specified)
`password` password associated with the username
* (required unless `password` is specified)
`password_keyvault_secret_id` property can be used to
reference an Azure KeyVault secret id. Batch Shipyard will contact
the specified KeyVault and replace the `password` value as returned
by Azure KeyVault.
* (optional) `myserver-myorg.azurecr.io` is an example property that
defines a private container registry to connect to. This is an example to
connect to the [Azure Container Registry service](https://azure.microsoft.com/services/container-registry/).
@ -247,12 +255,14 @@ Azure Container Registry or private repositories on Docker Hub.
`global_resources`:`additional_registries`:`docker`,
`global_resources`:`additional_registries`:`singularity` in the global
configuration.
* (optional) `username` username to log in to this registry
* (optional) `password` password associated with this username
* (optional) `password_keyvault_secret_id` property can be used to
reference an Azure KeyVault secret id. Batch Shipyard will contact the
specified KeyVault and replace the `password` value as returned by
Azure KeyVault.
* (required) `username` username to log in to this registry
* (required unless `password_keyvault_secret_id` is specified)
`password` password associated with the username
* (required unless `password` is specified)
`password_keyvault_secret_id` property can be used to
reference an Azure KeyVault secret id. Batch Shipyard will contact
the specified KeyVault and replace the `password` value as returned
by Azure KeyVault.
### Management: `management`
* (optional) The `management` property defines the required members for
@ -284,6 +294,19 @@ Please refer to the
for more information regarding `*_keyvault_secret_id` properties and how
they are used for credential management with Azure KeyVault.
### Resource Monitoring: `monitoring`
* (optional) `grafana` configures the Grafana login for the resource
monitoring virtual machine
* (required) `admin` is the administrator login
* (required) `username` is the administrator login username
* (required unless `password_keyvault_secret_id` is specified)
`password` is the administrator login password
* (required unless `password` is specified)
`password_keyvault_secret_id` property can be used to
reference an Azure KeyVault secret id. Batch Shipyard will contact
the specified KeyVault and replace the `password` value as returned
by Azure KeyVault.
## <a name="non-public"></a>Non-Public Azure Regions
To connect to non-public Azure regions, you will need to ensure that your
credentials configuration is populated with the correct `authority_url` and

Просмотреть файл

@ -128,8 +128,7 @@ pool_specification:
cadvisor:
enabled: false
port: 8080
options:
- -docker_only
options: []
```
The `pool_specification` property has the following members:

Просмотреть файл

@ -102,6 +102,11 @@ remote_fs:
- p10-disk1b
filesystem: btrfs
raid_level: 0
prometheus:
node_exporter:
enabled: false
port: 9100
options: []
```
## Details
@ -370,13 +375,32 @@ The number of entries in this map must match the `vm_count`.
to expand the number of disks in the array in the future, you must
use `btrfs` as the filesystem. At least two disks per virtual
machine are required for RAID-0.
* (optional) `prometheus` properties are to control if collectors for metrics
to export to [Prometheus](https://prometheus.io/) monitoring are enabled.
Note that all exporters do not have their ports exposed to the internet by
default. This means that the Prometheus instance itself must reside
on, or peered with, the virtual network that the storage cluster is in. This
ensures that external parties cannot scrape exporter metrics from storage
cluster VMs.
* (optional) `node_exporter` contains options for the
[Node Exporter](https://github.com/prometheus/node_exporter) metrics
exporter.
* (optional) `enabled` property enables or disables this exporter.
Default is `false`.
* (optional) `port` is the port for Prometheus to connect to scrape.
This is the internal port on the storage cluster VM.
* (optional) `options` is a list of options to pass to the
node exporter instance running on all nodes. The following
collectors are force disabled, in addition to others disabled by
default: textfile, wifi, xfs, zfs. The nfs collector is enabled if
the file server is NFS, automatically.
## Remote Filesystems with Batch Shipyard Guide
Please see the [full guide](65-batch-shipyard-remote-fs.md) for information
on how this feature works in Batch Shipyard.
## Full template
A full template of a credentials file can be found
A full template of a RemoteFS configuration file can be found
[here](https://github.com/Azure/batch-shipyard/tree/master/config_templates).
Note that these templates cannot be used as-is and must be modified to fit
your scenario.

Просмотреть файл

@ -0,0 +1,187 @@
# Batch Shipyard Resource Monitoring Configuration
This page contains in-depth details on how to configure the resource
monitoring configuration file for Batch Shipyard.
## Schema
The monitoring schema is as follows:
```yaml
monitoring:
location: <Azure region, e.g., eastus>
resource_group: my-prom-server-rg
hostname_prefix: prom
ssh:
username: shipyard
ssh_public_key: /path/to/rsa/publickey.pub
ssh_public_key_data: ssh-rsa ...
ssh_private_key: /path/to/rsa/privatekey
generated_file_export_path: null
public_ip:
enabled: true
static: false
virtual_network:
name: myvnet
resource_group: my-vnet-resource-group
existing_ok: false
address_space: 10.0.0.0/16
subnet:
name: my-server-subnet
address_prefix: 10.0.0.0/24
network_security:
ssh:
- '*'
grafana:
- 1.2.3.0/24
- 2.3.4.5
prometheus:
- 2.3.4.5
vm_size: STANDARD_D2_V2
accelerated_networking: false
services:
resource_polling_interval: 15
lets_encrypt:
enabled: true
use_staging_environment: true
prometheus:
port: 9090
scrape_interval: 10s
grafana:
additional_dashboards: null
```
The `monitoring` property has the following members:
* (required) `location` is the Azure region name for the resources, e.g.,
`eastus` or `northeurope`. The `location` specified must match the same
region as your Azure Batch account if monitring compute pools and/or within
the same region if monitoring storage clusters.
* (required) `resource_group` this is the resource group to use for the
monitoring resource.
* (required) `hostname_prefix` is the DNS label prefix to apply to each
virtual machine and resource allocated for the monitoring resource. It should
be unique.
* (required) `ssh` is the SSH admin user to create on the machine. This is not
optional in this configuration as it is in the pool specification. If you are
running Batch Shipyard on Windows, please refer to
[these instructions](85-batch-shipyard-ssh-docker-tunnel.md#ssh-keygen)
on how to generate an SSH keypair for use with Batch Shipyard.
* (required) `username` is the admin user to create on all virtual machines
* (optional) `ssh_public_key` is the path to a pre-existing ssh public
key to use. If this is not specified, an RSA public/private key pair will
be generated for use in your current working directory (with a
non-colliding name for auto-generated SSH keys for compute pools, i.e.,
`id_rsa_shipyard_remotefs`). On Windows only, if this is option is not
specified, the SSH keys are not auto-generated (unless `ssh-keygen.exe`
can be invoked in the current working directory or is in `%PATH%`).
This option cannot be specified with `ssh_public_key_data`.
* (optional) `ssh_public_key_data` is the raw RSA public key data in
OpenSSH format, e.g., a string starting with `ssh-rsa ...`. Only one
key may be specified. This option cannot be specified with
`ssh_public_key`.
* (optional) `ssh_private_key` is the path to an existing SSH private key
to use against either `ssh_public_key` or `ssh_public_key_data` for
connecting to storage nodes and performing operations that require SSH
such as cluster resize and detail status. This option should only be
specified if either `ssh_public_key` or `ssh_public_key_data` are
specified.
* (optional) `generated_file_export_path` is an optional path to specify
for where to create the RSA public/private key pair.
* (optional) `public_ip` are public IP properties for the virtual machine.
* (optional) `enabled` designates if public IPs should be assigned. The
default is `true`. Note that if public IP is disabled, then you must
create an alternate means for accessing the resource monitor virtual
machine through a "jumpbox" on the virtual network. If this property
is set to `false` (disabled), then any action requiring SSH, or the
SSH command itself, will occur against the private IP address of the
virtual machine.
* (optional) `static` is to specify if static public IPs should be assigned
to each virtual machine allocated. The default is `false` which
results in dynamic public IP addresses. A "static" FQDN will be provided
per virtual machine, regardless of this setting if public IPs are
enabled.
* (required) `virtual_network` is the virtual network to use for the
resource monitor.
* (required) `name` is the virtual network name
* (optional) `resource_group` is the resource group for the virtual
network. If this is not specified, the resource group name falls back
to the resource group specified in the resource monitor.
* (optional) `existing_ok` allows use of a pre-existing virtual network.
The default is `false`.
* (required if creating, optional otherwise) `address_space` is the
allowed address space for the virtual network.
* (required) `subnet` specifies the subnet properties. This subnet should
be exclusive to the resource monitor and cannot be shared with other
resources, including Batch compute nodes. Batch compute nodes and storage
clusters can co-exist on the same virtual network, but should be in
separate subnets.
* (required) `name` is the subnet name.
* (required) `address_prefix` is the subnet address prefix to use for
allocation of the resource monitor virtual machine to.
* (required) `network_security` defines the network security rules to apply
to the resource monitoring virtual machine.
* (required) `ssh` is the rule for which address prefixes to allow for
connecting to sshd port 22 on the virtual machine. In the example, `"*"`
allows any IP address to connect. This is an array property which allows
multiple address prefixes to be specified.
* (optional) `grafana` rule allows grafana HTTPS (443) server port to be
exposed to the specified address prefix. Multiple address prefixes
can be specified.
* (optional) `prometheus` rule allows the Prometheus server pot to be
exposed to the specified address prefix. Multiple address prefixes
can be specified.
* (optional) `custom_inbound_rules` are custom inbound rules for other
services that you need to expose.
* (required) `<rule name>` is the name of the rule; the example uses
`myrule`. Each rule name should be unique.
* (required) `destination_port_range` is the ports on each virtual
machine that will be exposed. This can be a single port and
should be a string.
* (required) `source_address_prefix` is an array of address
prefixes to allow.
* (required) `protocol` is the protocol to allow. Valid values are
`tcp`, `udp` and `*` (which means any protocol).
* (required) `vm_size` is the virtual machine instance size to use.
* (optional) `accelerated_networking` enables or disables
[accelerated networking](https://docs.microsoft.com/azure/virtual-network/create-vm-accelerated-networking-cli).
The default is `false` if not specified.
* (required) `services` defines the behavior of the services that run on
the monitoring resource virtual machine.
* (optional) `resource_polling_interval` is the polling interval in
seconds for monitored resource discovery. The default is `15` seconds.
* (optional) `lets_encrypt` defines options for enabling
[Let's Encrypt](https://letsencrypt.org/) on the
[nginx](https://www.nginx.com/) reverse proxy for TLS encryption. This
can only be enabled if the `public_ip` is enabled.
* (required) `enabled` controls if Let's Encrypt is enabled or not.
The default is `true`.
* (optional) `use_staging_environment` forces the certificate request
to happen against Let's Encrypt's staging servers. Although this
will enable encryption over HTTP, since the CA is fake, warnings
will appear with most browsers when attempting to connect to the
service endpoints on the resource monitoring VM. This is useful
to ensure your configuration is correct before switching to a
production certificate. The default is `true`.
* (optional) `prometheus` configures the Prometheus server endpoint on the
resource monitoring VM. Note that it is not required to define this
section. If it is omitted, then the Prometheus server is not exposed.
* (optional) `port` is the port to use. If this is value is omitted,
the Prometheus server is not exposed.
* (optional) `scrape_interval` is the collector scrape interval to
use. The default is `10s`. Note that valid values are Prometheus
[duration strings](https://prometheus.io/docs/prometheus/latest/configuration/configuration/#%3Cduration%3E).
* (optional) `grafana` configures the Grafana endpoint on the resource
monitoring VM
* (optional) `additional_dashboards` is a dictionary of additional
Grafana dashboards to provision. The format of the dictionary is
`filename.json: URL`. For example,
`my_custom_dash.json: https://some.url`.
## Resource Monitoring with Batch Shipyard Guide
Please see the [full guide](66-batch-shipyard-resource-monitoring.md) for
information on how this feature works in Batch Shipyard.
## Full template
A full template of a resource monitoring configuration file can be found
[here](https://github.com/Azure/batch-shipyard/tree/master/config_templates).
Note that these templates cannot be used as-is and must be modified to fit
your scenario.

Просмотреть файл

@ -20,9 +20,9 @@ you can invoke as:
shipyard.cmd
```
If you installed manually (i.e., did not use the installer scripts), then
you will need to invoke the Python interpreter and pass the script as an
argument. For example:
If you installed manually (i.e., took the non-recommended installation path
and did not use the installer scripts), then you will need to invoke the
Python interpreter and pass the script as an argument. For example:
```
python3 shipyard.py
```
@ -55,6 +55,8 @@ shipyard <command> <subcommand> <options>
For instance:
```shell
shipyard pool add --configdir config
# or equivalent in Linux for this particular command
SHIPYARD_CONFIGDIR=config shipyard pool add
```
Would create a pool on the Batch account as specified in the config files
found in the `config` directory. Please note that `<options>` must be
@ -90,6 +92,7 @@ These options must be specified after the command and sub-command. These are:
--fs TEXT RemoteFS config file
--pool TEXT Pool config file
--jobs TEXT Jobs config file
--monitor TEXT Resource monitoring config file
--subscription-id TEXT Azure Subscription ID
--keyvault-uri TEXT Azure KeyVault URI
--keyvault-credentials-secret-id TEXT
@ -148,6 +151,8 @@ current working directory (i.e., `.`).
* `--jobs path/to/jobs.yaml` is required for job-related actions.
* `--fs path/to/fs.yaml` is required for fs-related actions and some pool
actions.
* `--monitor path/to/monitor.yaml` is required for resource monitoring
actions.
* `--subscription-id` is the Azure Subscription Id associated with the
Batch account or Remote file system resources. This is only required for
creating pools with a virtual network specification or with `fs` commands.
@ -183,6 +188,7 @@ instead:
* `SHIPYARD_POOL_CONF` in lieu of `--pool`
* `SHIPYARD_JOBS_CONF` in lieu of `--jobs`
* `SHIPYARD_FS_CONF` in lieu of `--fs`
* `SHIPYARD_MONITOR_CONF` in lieu of `--monitor`
* `SHIPYARD_SUBSCRIPTION_ID` in lieu of `--subscription-id`
* `SHIPYARD_KEYVAULT_URI` in lieu of `--keyvault-uri`
* `SHIPYARD_KEYVAULT_CREDENTIALS_SECRET_ID` in lieu of
@ -198,8 +204,7 @@ instead:
* `SHIPYARD_AAD_CERT_THUMBPRINT` in lieu of `--aad-cert-thumbprint`
## Commands
`shipyard` (and `shipyard.py`) script contains the following top-level
commands:
`shipyard` has the following top-level commands:
```
account Batch account actions
cert Certificate actions
@ -209,6 +214,7 @@ commands:
jobs Jobs actions
keyvault KeyVault actions
misc Miscellaneous actions
monitor Monitoring actions
pool Pool actions
storage Storage actions
```
@ -384,8 +390,8 @@ storage cluster to perform actions against.
its subnets
* `--generate-from-prefix` will attempt to generate all resource names
using conventions used. This is helpful when there was an issue with
cluster deletion and the original virtual machine(s) resources can no
longer by enumerated. Note that OS disks and data disks cannot be
cluster creation/deletion and the original virtual machine(s) resources
cannot be enumerated. Note that OS disks and data disks cannot be
deleted with this option. Please use `fs disks del` to delete disks
that may have been used in the storage cluster.
* `--no-wait` does not wait for deletion completion. It is not recommended
@ -580,6 +586,55 @@ or has run the specified task
attempt to find a suitable TensorFlow image from Docker images in the
global resource list or will acquire one on demand for this command.
## `monitor` Command
The `monitor` command has the following sub-commands:
```
add Add a resource to monitor
create Create a monitoring resource
destroy Destroy a monitoring resource
list List all monitored resources
remove Remove a resource from monitoring
ssh Interactively login via SSH to monitoring...
start Starts a previously suspended monitoring...
suspend Suspend a monitoring resource
```
* `add` will add a resource to monitor to an existing monitoring VM
* `--poolid` will add the specified Batch pool to monitor
* `--remote-fs` will add the specified RemoteFS cluster to monitor
* `create` will create a monitoring resource VM
* `destroy` will destroy a monitoring resource VM
* `--delete-resource-group` will delete the entire resource group that
contains the monitoring resource. Please take care when using this
option as any resource in the resoure group is deleted which may be
other resources that are not Batch Shipyard related.
* `--delete-virtual-network` will delete the virtual network and all of
its subnets
* `--generate-from-prefix` will attempt to generate all resource names
using conventions used. This is helpful when there was an issue with
monitoring creation/deletion and the original virtual machine resources
cannot be enumerated. Note that OS disks cannot be deleted with this
option. Please use an alternate means (i.e., the Azure Portal) to
delete disks that may have been used by the monitoring VM.
* `--no-wait` does not wait for deletion completion. It is not recommended
to use this parameter.
* `list` will list all monitored resources
* `remove` will remove a resource to monitor to an existing monitoring VM
* `--all` will remove all resources that are currently monitored
* `--poolid` will remove the specified Batch pool to monitor
* `--remote-fs` will remove the specified RemoteFS cluster to monitor
* `ssh` will interactively log into a compute node via SSH.
* `COMMAND` is an optional argument to specify the command to run. If your
command has switches, preface `COMMAND` with double dash as per POSIX
convention, e.g., `pool ssh -- sudo docker ps -a`.
* `--tty` allocates a pseudo-terminal
* `start` will start a previously suspended monitoring VM
* `--no-wait` does not wait for the restart to complete. It is not
recommended to use this parameter.
* `suspend` suspends a monitoring VM
* `--no-wait` does not wait for the suspension to complete. It is not
recommended to use this parameter.
## `pool` Command
The `pool` command has the following sub-commands:
```

Просмотреть файл

@ -242,3 +242,8 @@ following:
`batch.node.ubuntu 16.04` as the `node_agent` value. You can view a
complete list of supported node agent sku ids with the `pool listskus`
command.
### ARM Image Retention Requirements
Ensure that the ARM image exists for the lifetimes of any pool referencing
the custom image. Failure to do so can result in pool allocation failures
and/or resize failures.

Просмотреть файл

@ -65,6 +65,13 @@ to fit the desired number of target dedicated and low priority compute nodes.
Note that this calculation does not consider autoscale where the number of
nodes can exceed the specified targets.
### Forced Tunneling and User-Defined Routes
If you are redirecting Internet-bound traffic from the subnet back to
on-premises, then you may have to add
[user-defined routes](https://docs.microsoft.com/azure/virtual-network/virtual-networks-udr-overview)
to that subnet. Please follow the instructions at this
[document](https://docs.microsoft.com/azure/batch/batch-virtual-network#user-defined-routes-for-forced-tunneling).
## Network Security
Azure provides a resource called a Network Security Group that allows you
to define security rules to restrict inbound and outbound network traffic

Просмотреть файл

@ -69,7 +69,7 @@ fault domains of the GlusterFS servers
* Automatic volume mounting of remote filesystems into a Docker container
executed through Batch Shipyard
## Overview and Mental Model
## Mental Model
A Batch Shipyard provisioned remote filesystem is built on top of different
resources in Azure. These resources are from networking, storage and
compute. To more readily explain the concepts that form a Batch Shipyard
@ -174,10 +174,6 @@ explanation of each remote filesystem and storage cluster configuration
option. Please see [this page](20-batch-shipyard-usage.md) for documentation
on `fs` command usage.
You can find information regarding User Subscription Batch accounts and how
to create them at this
[blog post](https://docs.microsoft.com/azure/batch/batch-account-create-portal#user-subscription-mode).
## Sample Recipes
Sample recipes for RemoteFS storage clusters of NFS and GlusterFS types can
be found in the

Просмотреть файл

@ -0,0 +1,260 @@
# Resource Monitoring with Batch Shipyard
The focus of this article is to explain how to provision a resource monitor
for monitoring Batch pools and RemoteFS clusters.
<img src="https://azurebatchshipyard.blob.core.windows.net/github/66-container_metrics.png" alt="dashboard" width="1024" />
## Overview
For many scenarios, it is often desirable to have visibility into a set of
machines to gain insights through certain metrics over time. A global
monitoring resource is valuable to peer into per-machine and aggregate
metrics for Batch processing workloads as jobs are processed for measurements
such as CPU, memory and network usage. As Batch Shipyard's execution model
is based on containers, insights into container behavior is also desirable
in addition to host-level metrics.
Creating a monitoring system that can monitor ephemeral resources such
as Batch nodes that may autoscale up or down at any moment and across
disparate resources such as Batch pools and RemoteFS clusters can be
challenging. Securing these resources adds additional complexity.
Fortunately, Batch Shipyard has commands that can help setup such monitoring
resources quickly.
## Major Features
* Supports monitoring Azure Batch Pools and Batch Shipyard provisioned
storage clusters
* Automatic service discovery of compute nodes and RemoteFS VMs capable of
adding and removing monitored resources even through Batch pool
autoscale/resize and storage cluster resizes
* Automated installs of all required collectors and services on supported
resources, including Batch pools and RemoteFS VMs
* Fully automated setup of nginx reverse proxy to Grafana (and optionally
Prometheus server) with automatic provisioning of Let's Encrypt TLS
certificates for encrypted HTTP access
* Automatic set up of network security rules for exposed services
* Rich default dashboard for monitoring Batch Shipyard resources out-of-the
box
* Support for monitoring resource VM suspension (deallocation) and restart
* Support for accelerated networking, boot diagnostics and serial console
access
* Automatic SSH keypair provisioning and setup
## Mental Model
A Batch Shipyard provisioned monitoring resource is built on top of different
resources in Azure. To more readily explain the concepts that form a Batch
Shipyard monitoring resource, let's start with a high-level conceptual
layout of all of the components and possible interacting actors.
```
+-------------+ +------------------------+
| | | |
| Azure Batch | | Azure Resource Manager |
| | | |
+---------^---+ +----^-------------------+
| |
| |
+-------------------------------------------------------------------------------------+
| | | |
| |-----------------------------------------------------| |
| | | | | |
| | --------------------------------------------------- | |
| | | | | | | +---------------------+ |
+---------+ | | | +-----------+ | MSI | MSI | | | +-----------------+ | |
| | | | | | | | | | | | | | | |
| Let's | | | | | Let's | +-+-----------+--+ | | | | Batch Shipyard | | |
| Encrypt <----------+ Encrypt | | | | | | | RemoteFS VM Y | | |
| CA | | | | | TLS Certs | | Batch Shipyard | | | | | | | |
| | | | | | | | Heimdall | | | | +---------------+ | | |
+---------+ | | | +----+------+ | | +------------> Node Exporter | | | |
| | | | +-------+--------+ | | | | +---------------+ | | |
| | | | | | | | | | | | |
| | | +-----v--+ | | | | | +------------+ | | |
| | | | | | | | | | | Private IP | | | |
| | | | nginx | +-----------+ | Automated | | | | | 10.2.0.4 | | | |
| | | | | | | | Service | | | | +------------+----+ | |
| | | +------+ | | Grafana | | Discovery | | | | Subnet C | |
+---------+ | | | | Port +-----> | | | | | | 10.2.0.0/24 | |
| +---------> 443 | | +--------+--+ | | | | +---------------------+ |
| Web | | | | +------+ | | | | | | |
| Browser | | | | | Port | | +--v-------v-----+ | | | +---------------------+ |
| +---------> 9090 +-----------> | | | | | +-----------------+ | |
+---------+ | | | +------+ | | Prometheus +--------+ | | | | | | |
| | | | | | | | | | | Azure Batch | | |
| | | +--------+ +---------+------+ | | | | Compute Node X | | |
| | | | | | | | | | |
| | | | | | | +---------------+ | | |
| | +-----------+------------+ | | | +----> Node Exporter | | | |
| | | Public IP | Private IP | +-----------------------+ | +----------+----+ | | |
| | | 1.2.3.4 | 10.0.0.4 | | | +----> cAdvisor | | | |
| | +-----------+------------+------------------------+ | | +----------+ | | |
| | Subnet A | | | | | |
| | 10.0.0.0/24 | | +------------+ | | |
| +-----------------------------------------------------+ | | Private IP | | | |
| | | 10.1.0.4 | | | |
| | +------------+----+ | |
| | Subnet B | |
| Virtual Network | 10.1.0.0/24 | |
| 10.0.0.0/8 +---------------------+ |
+-------------------------------------------------------------------------------------+
```
The base layer for all of the resources within a monitoring resource is
an Azure Virtual Network. This virtual network can be shared
amongst other network-level resources such as network interfaces. The virtual
network can be "partitioned" into sub-address spaces through the use of
subnets. In the example above, we have three subnets where
`Subnet A 10.0.0.0/24` hosts the resource monitor,
`Subnet B 10.1.0.0/16` contains a pool of Azure Batch compute nodes to
monitor, and `Subnet C 10.2.0.0/24` contains a Batch Shipyard RemoteFS
cluster to monitor. No resource in `Subnet B` or `Subnet C` is strictly
required for the Batch Shipyard monitoring resource to work, although you
will want either one or the other at the minimum so you have some resource
to monitor.
When provisioning Batch pools or RemoteFS storage clusters, you are able
to specify `prometheus` compatible collectors to install. If configured,
Batch Shipyard takes care of installing these packages to the resources and
are immediately ready to be scraped by the Prometheus server.
When the resource monitor virtual machine is created, the bootstrap
process automatically contacts the Let's Encrypt CA to provision TLS
certificates for nginx. Nginx is configured to reverse proxy requests to
Grafana over the standard HTTPS port (443) and, optionally, to the Prometheus
server on the specified port. Grafana is automatically provisioned with
the correct data source and a rich default dashboard for monitoring Batch
Shipyard resources. Internally, a Batch Shipyard process runs alongside
Grafana and the Prometheus server to enumerate any resources that have
been specified to monitor. The "Batch Shipyard Heimdall" container
encapsulates this functionality by either querying the Azure Batch service
or Azure Resource Manager endpoints for the requested resources to monitor.
No sensitive credentials are passed to the resource monitoring virtual
machine. Instead, Batch Shipyard Heimdall uses Azure MSI to authenticate
with Azure Active Directory with least user privilege (LUP) to enumerate the
specified resources to monitor. This information is then used to populate
Prometheus service discovery. Once the Prometheus server begins to scrape
metrics, then this data is available for visualization in Grafana.
## Configuration
In order to enable resource monitoring, there are a few configuration changes
that must be made to enable this feature. You must enable a resource or
set of resources to be monitored and then create the monitoring resource.
### Monitored Resource Configuration
Batch pools and RemoteFS storage clusters can be monitored. Below explains
the configuration required to enable each.
#### Pool Configuration
The following is a sample snippet for a Batch pool to be monitored. Note that
this configuration must be applied prior to creation.
```yaml
pool_specification:
# ... other settings
virtual_network:
# virtual network settings must be set
prometheus:
node_exporter:
enabled: true
cadvisor:
enabled: true
```
A `virtual_network` must be specified so the resource monitor can connect
to the compute nodes in the Batch pool. Please see the
[virtual network guide](64-batch-shipyard-byovnet.md) for more information.
The `prometheus` section enables the Prometheus-compatible collectors to
be automatically installed and configured. For Batch pools, two collectors
are available:
1. [Node Exporter](https://github.com/prometheus/node_exporter)
2. [cAdvisor](https://github.com/google/cadvisor)
It is recommended to enable both of these collectors if utilizing
resource monitoring with Batch pool targets. Other `prometheus` options and
more information can be found in the
[Pool configuration doc](13-batch-shipyard-configuration-pool.md).
#### RemoteFS Configuration
The following is a sample snippet for a RemoteFS storage cluster to be
monitored. Note that this configuration must be applied prior to creation.
```yaml
remote_fs:
# ... other settings
virtual_network:
# virtual network settings must be set
prometheus:
node_exporter:
enabled: true
```
The `prometheus` section enables the Prometheus-compatible collectors to
be automatically installed and configured. Only the
[Node Exporter](https://github.com/prometheus/node_exporter) collector is
currently available for RemoteFS clusters. Other `prometheus` options and
more information can be found in the
[RemoteFS configuration doc](15-batch-shipyard-configuration-fs.md).
### Resource Monitor Configuration
The resource monitoring virtual machine requires configuration to provision.
#### Credentials Configuration
Specifying the Grafana admin credentials are required in the credentials
configuration. Below is a sample:
```yaml
credentials:
# ... other settings
monitoring:
grafana:
admin:
username: admin
password: admin
```
Note that you can also use a KeyVault secret id for the `password` or store
the credentials entirely within KeyVault. Please see the
[credentials](11-batch-shipyard-configuration-credentials.md) configuration
guide for more information.
#### Monitor Configuration
The resource monitor must be configured according to the
[monitor configuration doc](16-batch-shipyard-configuration-monitor.md).
Please refer to that guide for a full explanation of each monitoring
configuration option.
## Usage Documentation
The workflow for standing up a monitoring resource is creation followed by
adding an applicable resources to monitor. Below is an example, assuming
monitoring has been properly configured as per prior section guidance.
```shell
# create a resource monitor
shipyard monitor create
# note the FQDN emitted in the log at the end of the provisioning process
# create a Batch pool where work is to be performed
# this hypothetical pool id is mybatchpool
shipyard pool add
# add the Batch pool above as a resource to monitor
shipyard monitor add --poolid mybatchpool
```
After the monitor is added, you can point your web browser at the
monitoring resource FQDN emitted above. You can remove individual
resources to monitor with the command `shipyard monitor remove`.
Once you have no need for your monitoring resource, you can either suspend
it or remove it altogether.
```shell
# remove the prior Batch pool monitor
shipyard monitor remove --poolid mybatchpool
# destroy the monitoring resource entirely
shipyard monitor destroy
```
Please see [this page](20-batch-shipyard-usage.md) for in-depth documentation
on `monitor` command usage.

Просмотреть файл

@ -63,3 +63,7 @@ underlying VM and host drivers.
* Adding tasks to the same job across multiple, concurrent Batch Shipyard
invocations may result in failure if task ids for these jobs are
auto-generated.
### Monitoring Limitations
* Only Linux Batch pools and RemoteFS clusters can be monitored. Windows
Batch pools are not supported.

Просмотреть файл

@ -18,6 +18,7 @@ pages:
- Pool: 13-batch-shipyard-configuration-pool.md
- Jobs: 14-batch-shipyard-configuration-jobs.md
- RemoteFS: 15-batch-shipyard-configuration-fs.md
- Monitoring: 16-batch-shipyard-configuration-monitor.md
- CLI Commands and Usage: 20-batch-shipyard-usage.md
- Platform Image support: 25-batch-shipyard-platform-image-support.md
- In-Depth Feature Guides:
@ -27,6 +28,7 @@ pages:
- Custom Images for Host Compute Nodes: 63-batch-shipyard-custom-images.md
- Virtual Networks: 64-batch-shipyard-byovnet.md
- Remote Filesystems: 65-batch-shipyard-remote-fs.md
- Resource Monitoring: 66-batch-shipyard-resource-monitoring.md
- Data Movement: 70-batch-shipyard-data-movement.md
- Azure KeyVault for Credential Management: 74-batch-shipyard-azure-keyvault.md
- Credential Encryption: 75-batch-shipyard-credential-encryption.md

Просмотреть файл

@ -194,139 +194,3 @@ mapping:
path:
type: str
required: true
monitoring:
type: map
mapping:
location:
type: str
required: true
resource_group:
type: str
required: true
hostname_prefix:
type: str
required: true
ssh:
type: map
required: true
mapping:
username:
type: str
required: true
ssh_public_key:
type: str
ssh_public_key_data:
type: str
ssh_private_key:
type: str
generated_file_export_path:
type: str
public_ip:
type: map
mapping:
enabled:
type: bool
static:
type: bool
virtual_network:
type: map
required: true
mapping:
name:
type: str
required: true
resource_group:
type: str
existing_ok:
type: bool
address_space:
type: str
subnet:
type: map
mapping:
name:
type: str
required: true
address_prefix:
type: str
required: true
network_security:
type: map
required: true
mapping:
ssh:
type: seq
required: true
sequence:
- type: str
grafana:
type: seq
required: true
sequence:
- type: str
prometheus:
type: seq
sequence:
- type: str
custom_inbound_rules:
type: map
mapping:
regex;([a-zA-Z0-9]+):
type: map
mapping:
destination_port_range:
type: str
required: true
protocol:
type: str
enum: ['*', 'tcp', 'udp']
source_address_prefix:
type: seq
required: true
sequence:
- type: str
vm_size:
type: str
required: true
accelerated_networking:
type: bool
services:
type: map
mapping:
resource_polling_interval:
type: int
lets_encrypt:
type: map
mapping:
enabled:
type: bool
required: true
use_staging_environment:
type: bool
prometheus:
type: map
mapping:
port:
type: int
required: true
scrape_interval:
type: str
grafana:
type: map
mapping:
admin:
type: map
mapping:
user:
type: str
required: true
password:
type: str
required: true
additional_dashboards:
type: map
mapping:
regex;([a-zA-Z0-9]+\.json):
type: str
required: true

Просмотреть файл

@ -205,3 +205,20 @@ mapping:
type: str
uri:
type: str
monitoring:
type: map
mapping:
grafana:
type: map
mapping:
admin:
type: map
required: true
mapping:
username:
type: str
required: true
password:
type: str
password_keyvault_secret_id:
type: str

130
schemas/monitor.yaml Normal file
Просмотреть файл

@ -0,0 +1,130 @@
desc: Monitoring Configuration Schema
type: map
mapping:
monitoring:
type: map
mapping:
location:
type: str
required: true
resource_group:
type: str
required: true
hostname_prefix:
type: str
required: true
ssh:
type: map
required: true
mapping:
username:
type: str
required: true
ssh_public_key:
type: str
ssh_public_key_data:
type: str
ssh_private_key:
type: str
generated_file_export_path:
type: str
public_ip:
type: map
mapping:
enabled:
type: bool
static:
type: bool
virtual_network:
type: map
required: true
mapping:
name:
type: str
required: true
resource_group:
type: str
existing_ok:
type: bool
address_space:
type: str
subnet:
type: map
mapping:
name:
type: str
required: true
address_prefix:
type: str
required: true
network_security:
type: map
required: true
mapping:
ssh:
type: seq
required: true
sequence:
- type: str
grafana:
type: seq
required: true
sequence:
- type: str
prometheus:
type: seq
sequence:
- type: str
custom_inbound_rules:
type: map
mapping:
regex;([a-zA-Z0-9]+):
type: map
mapping:
destination_port_range:
type: str
required: true
protocol:
type: str
enum: ['*', 'tcp', 'udp']
source_address_prefix:
type: seq
required: true
sequence:
- type: str
vm_size:
type: str
required: true
accelerated_networking:
type: bool
services:
type: map
required: true
mapping:
resource_polling_interval:
type: int
lets_encrypt:
type: map
mapping:
enabled:
type: bool
required: true
use_staging_environment:
type: bool
prometheus:
type: map
mapping:
port:
type: int
scrape_interval:
type: str
grafana:
type: map
mapping:
additional_dashboards:
type: map
mapping:
regex;([a-zA-Z0-9]+\.json):
type: str
required: true

Просмотреть файл

@ -61,8 +61,11 @@ class CliContext(object):
self.yes = False
self.raw = None
self.config = None
self.conf_config = None
self.conf_pool = None
self.conf_jobs = None
self.conf_fs = None
self.conf_monitor = None
# clients
self.batch_mgmt_client = None
self.batch_client = None
@ -122,7 +125,8 @@ class CliContext(object):
self._set_global_cli_options()
self._init_keyvault_client()
self._init_config(
skip_global_config=False, skip_pool_config=True, fs_storage=True)
skip_global_config=False, skip_pool_config=True,
skip_monitor_config=True, fs_storage=True)
_, self.resource_client, self.compute_client, self.network_client, \
self.storage_mgmt_client, _, _ = \
convoy.clients.create_all_clients(self)
@ -130,8 +134,7 @@ class CliContext(object):
convoy.fleet.fetch_storage_account_keys_from_aad(
self.storage_mgmt_client, self.config, fs_storage=True)
self.blob_client, _ = convoy.clients.create_storage_clients()
self._cleanup_after_initialize(
skip_global_config=False, skip_pool_config=True)
self._cleanup_after_initialize()
def initialize_for_monitor(self):
# type: (CliContext) -> None
@ -142,7 +145,8 @@ class CliContext(object):
self._set_global_cli_options()
self._init_keyvault_client()
self._init_config(
skip_global_config=False, skip_pool_config=True, fs_storage=True)
skip_global_config=False, skip_pool_config=True,
skip_monitor_config=False, fs_storage=True)
self.auth_client, self.resource_client, self.compute_client, \
self.network_client, self.storage_mgmt_client, _, _ = \
convoy.clients.create_all_clients(self)
@ -151,8 +155,7 @@ class CliContext(object):
self.storage_mgmt_client, self.config, fs_storage=True)
self.blob_client, self.table_client = \
convoy.clients.create_storage_clients()
self._cleanup_after_initialize(
skip_global_config=False, skip_pool_config=True)
self._cleanup_after_initialize()
def initialize_for_keyvault(self):
# type: (CliContext) -> None
@ -163,9 +166,9 @@ class CliContext(object):
self._set_global_cli_options()
self._init_keyvault_client()
self._init_config(
skip_global_config=True, skip_pool_config=True, fs_storage=False)
self._cleanup_after_initialize(
skip_global_config=True, skip_pool_config=True)
skip_global_config=True, skip_pool_config=True,
skip_monitor_config=True, fs_storage=False)
self._cleanup_after_initialize()
def initialize_for_batch(self):
# type: (CliContext) -> None
@ -176,7 +179,8 @@ class CliContext(object):
self._set_global_cli_options()
self._init_keyvault_client()
self._init_config(
skip_global_config=False, skip_pool_config=False, fs_storage=False)
skip_global_config=False, skip_pool_config=False,
skip_monitor_config=True, fs_storage=False)
_, self.resource_client, self.compute_client, self.network_client, \
self.storage_mgmt_client, self.batch_mgmt_client, \
self.batch_client = \
@ -186,8 +190,7 @@ class CliContext(object):
self.storage_mgmt_client, self.config, fs_storage=False)
self.blob_client, self.table_client = \
convoy.clients.create_storage_clients()
self._cleanup_after_initialize(
skip_global_config=False, skip_pool_config=False)
self._cleanup_after_initialize()
def initialize_for_storage(self):
# type: (CliContext) -> None
@ -198,7 +201,8 @@ class CliContext(object):
self._set_global_cli_options()
self._init_keyvault_client()
self._init_config(
skip_global_config=False, skip_pool_config=False, fs_storage=False)
skip_global_config=False, skip_pool_config=False,
skip_monitor_config=True, fs_storage=False)
# inject storage account keys if via aad
_, _, _, _, self.storage_mgmt_client, _, _ = \
convoy.clients.create_all_clients(self)
@ -206,8 +210,7 @@ class CliContext(object):
self.storage_mgmt_client, self.config, fs_storage=False)
self.blob_client, self.table_client = \
convoy.clients.create_storage_clients()
self._cleanup_after_initialize(
skip_global_config=False, skip_pool_config=False)
self._cleanup_after_initialize()
def _set_global_cli_options(self):
# type: (CliContext) -> None
@ -224,22 +227,18 @@ class CliContext(object):
if self.verbose:
convoy.util.set_verbose_logger_handlers()
def _cleanup_after_initialize(
self, skip_global_config, skip_pool_config):
def _cleanup_after_initialize(self):
# type: (CliContext) -> None
"""Cleanup after initialize_for_* funcs
:param CliContext self: this
:param bool skip_global_config: skip global config
:param bool skip_pool_config: skip pool config
"""
# free conf objects
del self.conf_credentials
del self.conf_fs
if not skip_global_config:
del self.conf_config
if not skip_pool_config:
del self.conf_pool
del self.conf_jobs
del self.conf_config
del self.conf_pool
del self.conf_jobs
del self.conf_monitor
# free cli options
del self.verbose
del self.yes
@ -312,12 +311,13 @@ class CliContext(object):
def _init_config(
self, skip_global_config=False, skip_pool_config=False,
fs_storage=False):
# type: (CliContext, bool, bool, bool) -> None
skip_monitor_config=True, fs_storage=False):
# type: (CliContext, bool, bool, bool, bool) -> None
"""Initializes configuration of the context
:param CliContext self: this
:param bool skip_global_config: skip global config
:param bool skip_pool_config: skip pool config
:param bool skip_monitor_config: skip monitoring config
:param bool fs_storage: adjust storage settings for fs
"""
# reset config
@ -357,6 +357,16 @@ class CliContext(object):
self.conf_fs = CliContext.ensure_pathlib_conf(self.conf_fs)
convoy.validator.validate_config(
convoy.validator.ConfigType.RemoteFS, self.conf_fs)
# set/validate monitoring config
if not skip_monitor_config:
self.conf_monitor = self._form_conf_path(
self.conf_monitor, 'monitor')
if self.conf_monitor is None:
raise ValueError('monitor conf file was not specified')
self.conf_monitor = CliContext.ensure_pathlib_conf(
self.conf_monitor)
convoy.validator.validate_config(
convoy.validator.ConfigType.Monitor, self.conf_monitor)
# fetch credentials from keyvault, if conf file is missing
kvcreds = None
if self.conf_credentials is None or not self.conf_credentials.exists():
@ -405,6 +415,8 @@ class CliContext(object):
self.conf_jobs = CliContext.ensure_pathlib_conf(self.conf_jobs)
if self.conf_jobs.exists():
self._read_config_file(self.conf_jobs)
if not skip_monitor_config:
self._read_config_file(self.conf_monitor)
# adjust settings
convoy.fleet.initialize_globals(convoy.settings.verbose(self.config))
if not skip_global_config:
@ -728,6 +740,19 @@ def fs_option(f):
callback=callback)(f)
def monitor_option(f):
def callback(ctx, param, value):
clictx = ctx.ensure_object(CliContext)
clictx.conf_monitor = value
return value
return click.option(
'--monitor',
expose_value=False,
envvar='SHIPYARD_MONITOR_CONF',
help='Resource monitoring config file',
callback=callback)(f)
def _storage_cluster_id_argument(f):
def callback(ctx, param, value):
return value
@ -787,6 +812,7 @@ def fs_cluster_options(f):
def monitor_options(f):
f = monitor_option(f)
f = _azure_subscription_id_option(f)
return f