9c28c3535e
Add option to replace the request push infrastructure (Event Grid) with a pull infrastructure (Service Bus Queues). This is useful for scaling is not possible (due to cost constraints, for instance). Co-authored-by: Fabian Salamo <v-fsalam@microsoft.com> |
||
---|---|---|
.. | ||
README.md | ||
customize_aks.sh | ||
deploy_aks.sh | ||
deploy_api_management.sh | ||
deploy_backend_queue_function.sh | ||
deploy_backend_webhook_function.sh | ||
deploy_cache_manager.sh | ||
deploy_cache_prerequisites.sh | ||
deploy_custom_metrics_adapter.sh | ||
deploy_event_grid_subscription.sh | ||
deploy_event_grid_topic.sh | ||
deploy_infrastructure.sh | ||
deploy_prerequisites.sh | ||
deploy_request_reporter_function.sh | ||
deploy_servicebus_queue.sh | ||
deploy_task_process_logger_function.sh | ||
setup_env.sh |
README.md
API Platform Deployment
The API Platform may be deployed using shell scripts. Effort was made to separate core components for two reasons:
- Optional functionality can be chosen
- Connectivity issues can result in failure of service deployment. If this occurs, the failed and subsequent services may be deployed without starting over.
There are three types of files required for deployment:
- setup_env.sh
- Variable setting script, which is used for all deployment configuration.
- This script MUST be edited prior to execution of any deployment scripts.
- deploy_infrastructure.sh
- Master script that runs each deployment script in sequence.
- Individual component/feature deployment scripts.
Contents
Installation Process
Note: Requires Bash 4+ To quickly get up and running, follow these steps.
- Edit the setup_env.sh file. This is where you configure the deployment.
- From the top-level directory, run the following script. Note that connection issues and service creation latencies may result in errors. The scripts are designed such that you can rerun and services will not be recreated. There are some commented out resolutions in the scripts that may be of value.
bash InfrastructureDeployment/deploy_infrastructure.sh
- Secure the Istio Gateway. This is optional, but should be completed for production instances. All of these steps are documented at the above link, but are listed here for brevity. To secure the gateway, please follow these steps:
- Get the ingress IP and ports of the Istio gateway:
kubectl get svc istio-ingressgateway -n istio-system
- Generate server certificate and private key:
openssl req -x509 -sha256 -nodes -days 365 -newkey rsa:2048 -subj '/O=example Inc./CN=example.com' -keyout example.com.key -out example.com.crt
- Create a certificate and a private key (replace httpbin.example.com and organization):
openssl req -out httpbin.example.com.csr -newkey rsa:2048 -nodes -keyout httpbin.example.com.key -subj "/CN=httpbin.example.com/O=httpbin organization" openssl x509 -req -days 365 -CA example.com.crt -CAkey example.com.key -set_serial 0 -in httpbin.example.com.csr -out httpbin.example.com.crt
- Create a Kubernetes secret to hold the server’s certificate and private key (the secret must be named istio-ingressgateway-certs in the istio-system namespace):
kubectl create -n istio-system secret tls istio-ingressgateway-certs --key httpbin.example.com.key --cert httpbin.example.com.crt
- Modify the default Istio gateway to use the HTTPS protocol (replace httpbin.example.com):
kubectl apply -f - <<EOF apiVersion: networking.istio.io/v1alpha3 kind: Gateway metadata: name: ai4e-gateway spec: selector: istio: ingressgateway # use istio default ingress gateway servers: - port: number: 443 name: https protocol: HTTPS tls: mode: SIMPLE serverCertificate: /etc/istio/ingressgateway-certs/tls.crt privateKey: /etc/istio/ingressgateway-certs/tls.key hosts: - "*" # replace with API Management URL EOF
- Deploy your APIs.
Component and feature deployment scripts
Provided that prerequisites have been deployed, component deployment scripts may be run outside of the deploy_infrastructure.sh script. The scripts are typically executed in the following order.
deploy_prerequisites.sh
- Sets the current subscription.
- Creates the infrastructure resource group.
- Creates the Application Insights resource.
- Be sure to copy the instrumentation key.
- Create the container registry to be used to house AKS service images.
- Create the storage account for the required Azure Functions.
deploy_aks.sh
- Creates the resource group needed for AKS.
- Creates the service principal application for AKS.
- Be sure to copy the service principal details and keep them secure.
- Grant the service principal application container registry pull rights.
- Creates the AKS service.
- Adds configured node pools to the AKS cluster (CPU or GPU).
- Stores the AKS cluster's credentials in the local ~/.kube/config file.
- Enables the GPU NVidia plugin on AKS (GPU only).
customize_aks.sh
- Adds Kubernetes Dashboard admin account so that an admin can access the dashboard.
- Install Istio for AKS service mesh utilities and features.
- Downloads Istio.
- Applies the Istio manifest.
- Creates the istio-system namespace in AKS.
- Adds base routing rules.
- Creates an Azure Container Registry role so that AKS can pull the service images.
- Creates the Azure Container Registry AcrPull role assignment for AKS.
deploy_custom_metrics_adapter.sh
The Azure Kubernetes Metrics Adapter is used, in conjunction with Application Insights, to provide scaling on any metric logged to Application Insights.
- Deploys the Azure Kubernetes Metrics Adapter.
- Creates a service principal and secret for the metric adapter.
- Stores the service principal secret in AKS.
deploy_cache_prerequisites.sh
- The Cache Manager is the task system that is used to manage long-running (async) service (API) requests. The Cache Manager is constructed with custom Azure Functions, Azure Redis Cache, and Azure Event Grid. Due to the complex nature of the Cache Manager, the following order of scripts must be maintained:
- deploy_cache_prerequisites.sh
- deploy_event_grid_topic.sh
- deploy_cache_manager.sh
- Unless you are planning to use a TLS https gateway, the deploy_backend_webhook_function.sh deploy_cache_prerequisites installs the following:
- Creates an Azure Redis Cache.
- Creates an Azure Function App Plan to host the execution of the functions.
deploy_event_grid_topic.sh
- Creates the Event Grid Topic.
deploy_cache_manager.sh
- Creates the Cache Manager Function App.
- Configures the Cache Manager Function App settings.
deploy_backend_webhook_function.sh
The backend webhook is an Azure function that exposes an https URL and pushes the request to AKS. This is needed when a TLS https gateway is not used. A common usage is during development and testing.
- Creates the backend webhook Azure Function App.
- Configures the backend webhook Azure Function App settings.
deploy_request_reporter_function.sh
The request reporter stores and retrieves the number of requests a service is processing at any given moment. The request reporter works in conjunction with the API Framework. An API can be configured with a maximum requests processing count, which puts backpressure on the async task system or returns a 503, in the case of a sync API. The current number of requests are also logged to Application Insights, which can be used to scale up/down the available service instances in AKS via the Azure Kubernetes Metrics Adapter.
- Creates the request reporter Azure Function App.
- Configures the request reporter Azure Function App settings.
deploy_task_process_logger_function.sh
The task process logger retrieves the number of tasks being processed and the number of tasks awaiting processing. This can be used in conjunction with Application Insights, which can be used to scale up/down the available service instances in AKS via the Azure Kubernetes Metrics Adapter.
- Creates the task process logger Azure Function App.
- Configures the task process logger Azure Function App settings.
deploy_event_grid_subscription.sh
- Gets the backend webhook Azure Function's secret URL.
- Creates the Event Grid subscription for the backend webhook.
deploy_api_management.sh
- Gets the CacheConnectorGet Azure Function's secret URL.
- Configures the payload required to create an API Management service instance.
- Creates the API Management service instance.
- Creates the TaskManagement API in the API Management service instance.
- Creates the TaskManagement API GET operation.
- Creates the TaskManagement API GET operation's policy.
Infrastructure Management
The following is a collection of links and how-to's that will help with common infrastructure operations.
- General Troubleshooting
- Manually scaling an AKS cluster.
- There are two forms of autoscaling that are employed by default in the platform - pod and cluster.
- Application Insights is used by AI for Earth images to ingest metrics and logs from Python and R into Azure Monitor. Azure Monitor integrates with AKS via Azure Monitor for containers.
- The Custom Metrics Adapter, if deployed, can be used to scale based on any metric passed to Application Insights. By default, the AI for Earth images send the number of denied API requests, which is based on the maximum_concurrent_requests parameter within the function decorator. To add additional metrics, apply a new CustomMetric to AKS, Track the metric in your API, and apply an updated HorizontalPodAutoscaler, which includes the new metric, for each API service.