Add Application Gateway Ingress Controller (#109)

This puts AGIC as the ingress controller on the Hub. Some consequences:

1. We get a real hostname (planetarycomputer-hub-test.microsoft.com, for
   test)
2. We can drop the `/compute` prefix. The Hub is served at `/hub`.

Note that this change *requires* the upgrade to JupyterHub to make use
of the `ingressClassName` attribute.
This commit is contained in:
Tom Augspurger 2024-05-24 17:39:44 -05:00 коммит произвёл GitHub
Родитель db2daf3139
Коммит 368053e1db
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: B5690EEEBB952194
15 изменённых файлов: 376 добавлений и 33 удалений

Просмотреть файл

@ -122,6 +122,11 @@ $ az keyvault secret set --vault-name pc-deploy-secrets --name '<prefix>--<key-n
| pcc--velero-azure-client-id | Set in `velero_credentials.tpl` for backups / migrations |
| pcc--velero-azure-client-secret | Set in `velero_credentials.tpl` for backups / migrations |
```
az keyvault secret set --vault-name pc-deploy-secrets -n pcc-test-jupyterhub-proxy-secret-token --value (openssl rand -hex 32)
```
## Continuous deployment
This repository deploys on commits to the staging environment on commits `main`. We commit to production on tags.
@ -186,6 +191,73 @@ We're able to customize the JupyterHub and jupyterlab UIs following the approach
To test changes to the templates locally, [install jupyterhub](https://jupyterhub.readthedocs.io/en/stable/installation-guide.html) and run it from the root of the project directory, which includes a `jupyterhub_config.py` file. Changes to the template files in `helm/chart/files/etc/jupyterhub/templates/` can be previewed at `localhost:8000`.
## Ingress
This setup uses [Application Gateway Ingress Controller][agic] to serve traffic
over HTTPs without directly exposing the Kubernetes cluster to the internet.
We've chosen to create and manage the Application Gateway *outside* of
Terraform. The Ingress Controller also wants to make changes to it as Ingress
routes are added, causing some ownership conflicts over the Application Gateway.
```
set -x RESOURCE_GROUP ... # RG with the AKS cluster
set -x KEYVAULT_NAME ... # Keyvault with the TLS cert
set -x VNET_NAME ... # VNET with the AKS cluster
set -x SUBNET_NAME ... # Subnet with the AKS cluster
set -x APPGW_NAME ... # pick whatever
set -x PUBLIC_IP ... # the name of the public IP created by Terraform
set -x MI_NAME pcc-mi # The name of the managed identity created by Terraform
set -x CLUSTER_NAME ... # The name of the AKS cluster
# Derived variables
set -x MI_CLIENT_ID (az identity show -n $MI_NAME -g $RESOURCE_GROUP --query clientId -o tsv)
set -x MI_SCOPE (az identity show -g $RESOURCE_GROUP -n $MI_NAME --query id -o tsv)
set -x RG_ID (az group show -n $RESOURCE_GROUP --query id -o tsv)
```
With these variables set, we can create the Application Gateway and configure
it. If you're deploying from scratch, you'll need to do a `terraform apply`
first with `data.azurerm_application_gateway.pc_compute` disabled, along with
all references to it (e.g. in the AKS cluster)
```
# az keyvault network-rule add --subnet (az network vnet subnet show -n $SUBNET_NAME -g $RESOURCE_GROUP --vnet-name $VNET_NAME --query id -o tsv) -n $KEYVAULT_NAME
az network application-gateway create \
-n $APPGW_NAME \
-g $RESOURCE_GROUP \
--sku Standard_v2 \
--public-ip-address $PUBLIC_IP \
--vnet-name $VNET_NAME \
--subnet $SUBNET_NAME \
--priority 19500
set -x APPGW_ID (az network application-gateway show -n $APPGW_NAME -g $RESOURCE_GROUP -o tsv --query "id")
az role assignment create --role "Network Contributor" --scope (az group show -n $RESOURCE_GROUP --query id -o tsv) --assignee $MI_CLIENT_ID
az role assignment create --role Reader --scope $RG_ID --assignee 89ecce7c-7849-4802-9063-ee22b34609d1
az role assignment create --role Contributor --scope $APPGW_ID --assignee 89ecce7c-7849-4802-9063-ee22b34609d1
```
Now you can get the Ingress Controller added to the AKS cluster
```
terraform apply
```
Finally, we need to ensure that the managed identity has the necessary
permissions to manage the Application Gateway.
```
set -x INGRESS_MI (az aks show -g $RESOURCE_GROUP -n $CLUSTER_NAME --query addonProfiles.ingressApplicationGateway.identity.clientId -o tsv)
az role assignment create --role "Contributor" --scope $APPGW_ID --assignee $INGRESS_MI
az role assignment create --role "Owner" --scope "$MI_SCOPE" --assignee $INGRESS_MI
az keyvault set-policy --name pc-test-deploy-secrets --secret-permissions get --object-id (az identity show -n pcc-mi -g pcc-test-rg --query principalId -o tsv)
```
## Additional References
Many of the concepts used here were learned in deployments at the [pangeo-cloud-federation](https://github.com/pangeo-data/pangeo-cloud-federation) and [2i2c pilot hubs](https://github.com/2i2c-org/pilot-hubs). Those might serve as additional references for how to deploy a Hub.
@ -220,3 +292,4 @@ Any use of third-party trademarks or logos are subject to those third-party's po
[deployment guide]: https://planetarycomputer.microsoft.com/docs/concepts/hub-deployment/
[sp]: https://docs.microsoft.com/en-us/azure/active-directory/develop/app-objects-and-service-principals
[hub-service]: https://jupyterhub.readthedocs.io/en/stable/reference/services.html
[agic]: https://learn.microsoft.com/en-us/azure/application-gateway/ingress-controller-overview

Просмотреть файл

@ -13,9 +13,13 @@ daskhub:
nodeAffinity:
matchNodePurpose: "require"
ingress:
enabled: true
ingressClassName: "azure-application-gateway"
hub:
consecutiveFailureLimit: 0
baseUrl: "/compute/"
baseUrl: "/"
image:
name: pcccr.azurecr.io/jupyterhub/k8s-hub
tag: "2.0.0.post0"
@ -107,8 +111,8 @@ daskhub:
01-add-dask-gateway-values: |
# The daskhub helm chart doesn't correctly handle hub.baseUrl.
# DASK_GATEWAY__PUBLIC_ADDRESS set via terraform
c.KubeSpawner.environment["DASK_GATEWAY__ADDRESS"] = "http://proxy-http:8000/compute/services/dask-gateway/"
c.KubeSpawner.environment["DASK_GATEWAY__PUBLIC_ADDRESS"] = "https://${jupyterhub_host}/compute/services/dask-gateway/"
c.KubeSpawner.environment["DASK_GATEWAY__ADDRESS"] = "http://proxy-http:8000/services/dask-gateway/"
c.KubeSpawner.environment["DASK_GATEWAY__PUBLIC_ADDRESS"] = "https://${jupyterhub_host}/services/dask-gateway/"
templates: |
c.JupyterHub.template_paths.insert(0, "/etc/jupyterhub/templates")
pre_spawn_hook: |
@ -172,9 +176,9 @@ daskhub:
proxy:
https:
enabled: true
letsencrypt:
contactEmail: "taugspurger@microsoft.com"
enabled: false
# letsencrypt:
# contactEmail: "taugspurger@microsoft.com"
chp:
networkPolicy:
@ -272,11 +276,11 @@ daskhub:
dask-gateway:
gateway:
prefix: "/compute/services/dask-gateway"
prefix: "/services/dask-gateway"
auth:
jupyterhub:
apiToken: "{{ tf.jupyterhub_dask_gateway_token }}"
apiUrl: http://proxy-http:8000/compute/hub/api
apiUrl: http://proxy-http:8000/hub/api
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:

7
helm/tls.yaml Normal file
Просмотреть файл

@ -0,0 +1,7 @@
daskhub:
jupyterhub:
ingress:
enabled: true
ingressClassName: "azure-application-gateway"
tls:
- secretName: "${secret_name}"

85
scripts/agic-cert-secret Executable file
Просмотреть файл

@ -0,0 +1,85 @@
#!/usr/bin/env python3
import base64
import os
import subprocess
import azure.identity
import azure.keyvault.secrets
import kr8s
import argparse
def parse_args(args=None):
parser = argparse.ArgumentParser()
parser.add_argument("--keyvault-name", required=True)
parser.add_argument("--namespace", required=True)
parser.add_argument("--secret-name", required=True)
return parser.parse_args(args)
def main(args=None):
args = parse_args(args)
keyvault_name = args.keyvault_name
namespace = args.namespace
secret_name = args.secret_name
credential = azure.identity.DefaultAzureCredential()
kv_client = azure.keyvault.secrets.SecretClient(
f"https://{keyvault_name}.vault.azure.net", credential=credential
)
secret = kv_client.get_secret(secret_name)
decoded = base64.b64decode(secret.value)
pfx = "certificate.pfx"
key = "private.key"
crt = "certificate.crt"
with open(pfx, "wb") as f:
f.write(decoded)
cmd = f"openssl pkcs12 -in {pfx} -nocerts -nodes -passin pass: | openssl rsa -out {key}"
print("processing certificate")
subprocess.run(cmd, shell=True)
subprocess.run(
[
"openssl",
"pkcs12",
"-in",
pfx,
"-clcerts",
"-nokeys",
"-passin",
"pass:",
"-out",
crt,
]
)
print("creating secret")
subprocess.run(
[
"kubectl",
"-n",
namespace,
"create",
"secret",
"tls",
secret_name,
"--cert",
crt,
"--key",
key,
]
)
os.remove(pfx)
os.remove(crt)
os.remove(key)
if __name__ == "__main__":
main()

Просмотреть файл

@ -7,6 +7,12 @@ module "resources" {
# subscription = "Planetary Computer"
apim_resource_id = "/subscriptions/9da7523a-cb61-4c3e-b1d4-afa5fc6d2da9/resourceGroups/pc-manual-resources/providers/Microsoft.ApiManagement/service/planetarycomputer"
# TLS certs
certificate_kv = "pc-deploy-secrets"
certificate_kv_rg = "pc-manual-resources"
certificate_secret_name = "planetarycomputer-hub-staging"
pip_name = "pip-pcc-prod"
appgw_name = "appgw-pcc-prod"
# AKS ----------------------------------------------------------------------
kubernetes_version = null
@ -25,7 +31,7 @@ module "resources" {
# DaskHub ------------------------------------------------------------------
dns_label = "pccompute"
oauth_host = "planetarycomputer"
jupyterhub_host = "pccompute.westeurope.cloudapp.azure.com"
jupyterhub_host = "planetarycomputer-hub.microsoft.com"
user_placeholder_replicas = 1
stac_url = "https://planetarycomputer.microsoft.com/api/stac/v1/"

Просмотреть файл

@ -46,7 +46,12 @@ resource "azurerm_kubernetes_cluster" "pc_compute" {
}
orchestrator_version = var.kubernetes_version
temporary_name_for_rotation = "tmpdefault"
temporary_name_for_rotation = "azlinuxpool"
upgrade_settings {
max_surge = "10%"
}
}
auto_scaler_profile {
@ -56,6 +61,15 @@ resource "azurerm_kubernetes_cluster" "pc_compute" {
scale_down_delay_after_add = "5m"
skip_nodes_with_system_pods = false # ensures system pods don't keep GPU nodes alive
}
ingress_application_gateway {
gateway_id = data.azurerm_application_gateway.pc_compute.id
}
key_vault_secrets_provider {
secret_rotation_enabled = true
}
identity {
type = "SystemAssigned"
}

Просмотреть файл

@ -21,6 +21,7 @@ resource "helm_release" "dhub" {
"${templatefile("../../helm/chart/config.yaml", { oauth_host = var.oauth_host, jupyterhub_host = var.jupyterhub_host, namespace = var.environment, release = local.helm_release_name })}",
"${file("../../helm/jupyterhub_opencensus_monitor.yaml")}",
"${templatefile("../../helm/profiles.yaml", { python_image = var.python_image, r_image = var.r_image, gpu_pytorch_image = var.gpu_pytorch_image, gpu_tensorflow_image = var.gpu_tensorflow_image, qgis_image = var.qgis_image })}",
"${templatefile("../../helm/tls.yaml", { secret_name = var.certificate_secret_name })}"
# workaround https://github.com/hashicorp/terraform-provider-helm/issues/669
]
@ -41,7 +42,7 @@ resource "helm_release" "dhub" {
set {
name = "daskhub.jupyterhub.hub.config.GenericOAuthenticator.oauth_callback_url"
value = "https://${var.jupyterhub_host}/compute/hub/oauth_callback"
value = "https://${var.jupyterhub_host}/hub/oauth_callback"
}
set {
@ -92,7 +93,7 @@ resource "helm_release" "dhub" {
set {
name = "daskhub.jupyterhub.proxy.service.annotations.service\\.beta\\.kubernetes\\.io/azure-dns-label-name"
value = var.dns_label
value = "${var.dns_label}-direct"
}
set {
@ -115,13 +116,10 @@ resource "helm_release" "dhub" {
value = random_password.dask_gateway_api_token.result
}
set {
name = "daskhub.dask-gateway.traefik.service.annotations.service\\.beta\\.kubernetes\\.io/azure-dns-label-name"
value = "${var.dns_label}-dask"
}
}
data "azurerm_storage_account" "pc-compute" {
name = "${replace(local.prefix, "-", "")}storage"
resource_group_name = "${local.prefix}-shared-rg"
provider = azurerm.pc
}

Просмотреть файл

@ -1,6 +1,19 @@
data "azurerm_key_vault" "deploy_secrets" {
name = var.pc_resources_kv
resource_group_name = var.pc_resources_rg
provider = azurerm.pc
}
data "azurerm_key_vault" "certificate" {
name = var.certificate_kv
resource_group_name = var.certificate_kv_rg
}
data "azurerm_key_vault" "test_deploy_secrets" {
name = "pc-test-deploy-secrets"
resource_group_name = "pc-test-manual-resources"
provider = azurerm.pct
}
# JupyterHub
@ -25,3 +38,26 @@ data "azurerm_key_vault_secret" "microsoft_defender_log_analytics_workspace_id"
name = "${local.stack_id}--microsoft-defender-log-analytics-workspace-id"
key_vault_id = data.azurerm_key_vault.deploy_secrets.id
}
data "azurerm_key_vault_certificate" "pccompute" {
name = "planetarycomputer-hub-test"
key_vault_id = data.azurerm_key_vault.test_deploy_secrets.id
}
data "azurerm_client_config" "current" {}
resource "azurerm_key_vault_access_policy" "secret-reader" {
key_vault_id = data.azurerm_key_vault.test_deploy_secrets.id
tenant_id = data.azurerm_client_config.current.tenant_id
object_id = azurerm_kubernetes_cluster.pc_compute.key_vault_secrets_provider[0].secret_identity[0].object_id
secret_permissions = [
"Get",
]
certificate_permissions = [
"Get",
]
key_permissions = [
"Get",
]
}

11
terraform/resources/mi.tf Normal file
Просмотреть файл

@ -0,0 +1,11 @@
resource "azurerm_user_assigned_identity" "pc_compute" {
location = azurerm_resource_group.pc_compute.location
resource_group_name = azurerm_resource_group.pc_compute.name
name = "${local.stack_id}-mi"
}
resource "azurerm_role_assignment" "pccompute" {
scope = data.azurerm_key_vault.deploy_secrets.id
principal_id = azurerm_user_assigned_identity.pc_compute.principal_id
role_definition_name = "Key Vault Certificate User"
}

Просмотреть файл

@ -2,6 +2,22 @@ provider "azurerm" {
features {}
}
provider "azurerm" {
subscription_id = "9da7523a-cb61-4c3e-b1d4-afa5fc6d2da9"
alias = "pc"
features {}
}
provider "azurerm" {
subscription_id = "a84a690d-585b-4c7c-80d9-851a48af5a50"
alias = "pct"
features {}
}
provider "helm" {
# https://dev.to/danielepolencic/getting-started-with-terraform-and-kubernetes-on-azure-aks-3l4d
kubernetes {
@ -58,7 +74,7 @@ terraform {
helm = {
source = "hashicorp/helm"
version = "2.6.0"
version = "2.13.2"
}
}

Просмотреть файл

@ -157,6 +157,29 @@ variable "pc_resources_kv" {
description = "The Azure Key Vault name with pre-configured values."
}
variable "certificate_kv" {
type = string
description = "The name of the Key Vault with the public certificate used for TLS."
}
variable "certificate_kv_rg" {
type = string
description = "The name of the Resource Group with the Key Vault with the public certificate used for TLS."
}
variable "certificate_secret_name" {
type = string
}
variable "pip_name" {
type = string
}
variable "appgw_name" {
type = string
}
variable "user_placeholder_replicas" {
type = number
default = 0

Просмотреть файл

@ -13,25 +13,30 @@ resource "azurerm_subnet" "node_subnet" {
address_prefixes = ["10.1.0.0/16"]
}
resource "azurerm_subnet" "ag_subnet" {
name = "${var.maybe_versioned_prefix}-ag-subnet"
virtual_network_name = azurerm_virtual_network.pc_compute.name
resource_group_name = azurerm_resource_group.pc_compute.name
address_prefixes = ["10.2.0.0/16"]
service_endpoints = ["Microsoft.KeyVault"]
}
resource "azurerm_network_security_group" "pc_compute" {
name = "${var.maybe_versioned_prefix}-security-group"
location = azurerm_resource_group.pc_compute.location
resource_group_name = azurerm_resource_group.pc_compute.name
security_rule {
name = "hub-rule"
priority = 100
direction = "Inbound"
access = "Allow"
protocol = "Tcp"
source_port_range = "*"
destination_port_ranges = ["80", "443"]
source_address_prefix = "*"
destination_address_prefix = "*"
}
}
resource "azurerm_subnet_network_security_group_association" "pc_compute" {
subnet_id = azurerm_subnet.node_subnet.id
network_security_group_id = azurerm_network_security_group.pc_compute.id
resource "azurerm_public_ip" "pc_compute" {
name = var.pip_name
resource_group_name = azurerm_resource_group.pc_compute.name
location = azurerm_resource_group.pc_compute.location
allocation_method = "Static"
sku = "Standard"
domain_name_label = var.dns_label
}
data "azurerm_application_gateway" "pc_compute" {
name = var.appgw_name
resource_group_name = azurerm_resource_group.pc_compute.name
}

Просмотреть файл

@ -37,6 +37,7 @@ resource "azurerm_resource_group" "shared" {
tags = {
"ringValue" = "r1"
}
provider = azurerm.pc
}
resource "azurerm_storage_account" "pc-compute" {

Просмотреть файл

@ -7,6 +7,12 @@ module "resources" {
# subscription = "Planetary Computer"
apim_resource_id = "/subscriptions/9da7523a-cb61-4c3e-b1d4-afa5fc6d2da9/resourceGroups/pc-manual-resources/providers/Microsoft.ApiManagement/service/planetarycomputer"
# TLS certs
certificate_kv = "pc-deploy-secrets"
certificate_kv_rg = "pc-manual-resources"
certificate_secret_name = "planetarycomputer-hub-staging"
pip_name = "pip-pcc-staging"
appgw_name = "appgw-pcc-staging"
# AKS ----------------------------------------------------------------------
kubernetes_version = null
@ -25,7 +31,7 @@ module "resources" {
# DaskHub ------------------------------------------------------------------
dns_label = "pcc-staging"
oauth_host = "planetarycomputer-staging"
jupyterhub_host = "pcc-staging.westeurope.cloudapp.azure.com"
jupyterhub_host = "planetarycomputer-hub-staging.microsoft.com"
user_placeholder_replicas = 0
stac_url = "https://planetarycomputer-staging.microsoft.com/api/stac/v1/"

58
terraform/test/main.tf Normal file
Просмотреть файл

@ -0,0 +1,58 @@
module "resources" {
source = "../resources"
environment = "test"
region = "West Europe"
version_number = "2"
maybe_versioned_prefix = "pcc-test"
apim_resource_id = "/subscriptions/9da7523a-cb61-4c3e-b1d4-afa5fc6d2da9/resourceGroups/pc-manual-resources/providers/Microsoft.ApiManagement/service/planetarycomputer"
# TLS certs
certificate_kv = "pc-test-deploy-secrets"
certificate_kv_rg = "pc-test-manual-resources"
certificate_secret_name = "planetarycomputer-hub-test"
pip_name = "pccompute-public-ip"
appgw_name = "pccompute-appgateway"
# AKS ----------------------------------------------------------------------
kubernetes_version = null
aks_azure_active_directory_role_based_access_control = true
aks_automatic_channel_upgrade = "rapid"
# 2GiB of RAM, 1 CPU core
core_vm_size = "Standard_A4_v2"
core_os_disk_type = "Managed"
user_pool_min_count = 1
cpu_worker_pool_min_count = 0
# Logs ---------------------------------------------------------------------
workspace_id = "83dcaf36e047a90f"
# DaskHub ------------------------------------------------------------------
dns_label = "planetarycomputer-hub-test"
oauth_host = "planetarycomputer-staging"
jupyterhub_host = "planetarycomputer-hub-test.microsoft.com"
user_placeholder_replicas = 0
stac_url = "https://planetarycomputer-staging.microsoft.com/api/stac/v1/"
jupyterhub_singleuser_image_name = "pcccr.azurecr.io/planetary-computer/python"
jupyterhub_singleuser_image_tag = "2024.3.20.1"
python_image = "pcccr.azurecr.io/planetary-computer/python:2024.3.20.1"
r_image = "pcccr.azurecr.io/planetary-computer/r:2024.3.20.1"
gpu_pytorch_image = "pcccr.azurecr.io/planetary-computer/gpu-pytorch:2024.3.22.0"
gpu_tensorflow_image = "pcccr.azurecr.io/planetary-computer/gpu-tensorflow:2024.3.22.0"
qgis_image = "pcccr.azurecr.io/planetary-computer/qgis:2024.3.19.7"
}
terraform {
backend "azurerm" {
resource_group_name = "pc-test-manual-resources"
storage_account_name = "pctesttfstate"
container_name = "pcc"
key = "pcc.tfstate"
}
}
output "resources" {
value = module.resources
sensitive = true
}