17 KiB
Deploy an Entire RP Development Service
Prerequisites
- Your development environment is prepared according to the steps outlined in Prepare Your Dev Environment
- During the deployment, it's recommended to avoid editing files in your
ARO-RP repository so that
git status
reports a clean working tree. Otherwise aro container image will have-dirty
suffix, which can be problematic:- if the working tree becomes dirty during the process (eg. because you create a temporary helper script to run some of the setup), you could end up with different image tag pushed in the azure image registry compared to the tag expected by aro deployer
- with a dirty tag, it's not clear what's actually in the image
Deploying an int-like Development RP
-
Fetch the most up-to-date secrets specifying
SECRET_SA_ACCOUNT_NAME
to the name of the storage account containing your shared development environment secrets, eg.:SECRET_SA_ACCOUNT_NAME=rharosecretsdev make secrets
-
Copy and tweak your environment file:
cp env.example env vi env
You don't need to change anything in the env file, unless you plan on using hive to install the cluster. In that case add the following hive environment variables into your env file:
export ARO_INSTALL_VIA_HIVE=true export ARO_ADOPT_BY_HIVE=true
-
Create a full environment file, which overrides some defaults from
./env
options when sourcedcp env-int.example env-int vi env-int
What to change in
env-int
file:- if using a public key separate from
~/.ssh/id_rsa.pub
(for ssh access to RP and Gateway vmss instances), source it withexport SSH_PUBLIC_KEY=~/.ssh/id_separate.pub
- don't try to change
$USER
prefix used there - set tag of
FLUENTBIT_IMAGE
value to match the default frompkg/util/version/const.go
, eg.FLUENTBIT_IMAGE=${USER}aro.azurecr.io/fluentbit:1.9.10-cm20230426
- if you actually care about fluentbit image version, you need to change the default both in the env-int file and for ARO Deployer, which is out of scope of this guide
- if using a public key separate from
-
And finally source the env:
. ./env-int
-
Generate the development RP configuration
make dev-config.yaml
-
Run
make deploy
. This will fail on the first attempt to run due to AKS not being installed, so after the first failure, please skip to the next step to deploy the VPN Gateway and then deploy AKS.NOTE: If the deployment fails with
InvalidResourceReference
due to the RP Network Security Groups not found, delete the "gateway-production-predeploy" deployment in the gateway resource group, and re-runmake deploy
.NOTE: If the deployment fails with
A vault with the same name already exists in deleted state
, then you will need to recover the deleted keyvaults from a previous deploy using:az keyvault recover --name <KEYVAULT_NAME>
for each keyvault, and re-run. -
Deploy a VPN Gateway This is required in order to be able to connect to AKS from your local machine:
source ./hack/devtools/deploy-shared-env.sh deploy_vpn_for_dedicated_rp
-
Deploy AKS by running these commands from the ARO-RP root directory:
source ./hack/devtools/deploy-shared-env.sh deploy_aks_dev
NOTE: If the AKS deployment fails with missing RP VNETs, delete the "gateway-production-predeploy" deployment in the gateway resource group, and re-run
make deploy
and then re-rundeploy_aks_dev
. -
Install Hive into AKS
-
Download the VPN config. Please note that this action will OVER WRITE the
secrets/vpn-$LOCATION.ovpn
on your local machine. DO NOT runmake secrets-update
after doing this, as you will overwrite existing config, until such time as you have runmake secrets
to get the config restored.vpn_configuration
-
Connect to the Dev VPN in a new terminal:
sudo openvpn secrets/vpn-$LOCATION.ovpn
-
Now that your machine is able access the AKS cluster, you can deploy Hive:
make aks.kubeconfig ./hack/hive-generate-config.sh KUBECONFIG=$(pwd)/aks.kubeconfig ./hack/hive-dev-install.sh
-
-
Mirror the OpenShift images to your new ACR
NOTE: Running the mirroring through a VM in Azure rather than a local workstation is recommended for better performance. NOTE: Value of
USER_PULL_SECRET
variable comes from the secrets, which are sourced viaenv-int
file NOTE:DST_AUTH
token or the login to the registry expires after some time-
Setup mirroring environment variables
export DST_ACR_NAME=${USER}aro export SRC_AUTH_QUAY=$(echo $USER_PULL_SECRET | jq -r '.auths."quay.io".auth') export SRC_AUTH_REDHAT=$(echo $USER_PULL_SECRET | jq -r '.auths."registry.redhat.io".auth') export DST_AUTH=$(echo -n '00000000-0000-0000-0000-000000000000:'$(az acr login -n ${DST_ACR_NAME} --expose-token | jq -r .accessToken) | base64 -w0)
-
Login to the Azure Container Registry
docker login -u 00000000-0000-0000-0000-000000000000 -p "$(echo $DST_AUTH | base64 -d | cut -d':' -f2)" "${DST_ACR_NAME}.azurecr.io"
-
Run the mirroring
The
latest
argument will take the DefaultInstallStream frompkg/util/version/const.go
and mirror that versiongo run ./cmd/aro mirror latest
Troubleshooting: There could be some issues when mirroring the images to the ACR related to missing devmapper or btrfs (usually with "fatal error: btrfs/ioctl.h: No such file or directory" error) packages. If respectively installing device-mapper-devel or btrfs-progs-devel packages won't help, then you may ignore them as follows:
go run -tags=exclude_graphdriver_devicemapper,exclude_graphdriver_btrfs ./cmd/aro mirror latest
If you are going to test or work with multi-version installs, then you should mirror any additional versions as well, for example for 4.11.21 it would be
go run ./cmd/aro mirror 4.11.21
-
Mirror genevamdm and genevamdsd images from upstream distroless Geneva MDM/MDSD to your ACR
Run the following commands to mirror two Microsoft Geneva images based on the tags from pkg/util/version/const.go (e.g., 2.2024.517.533-b73893-20240522t0954 and mariner_20240524.1).
export GENEVAMDM_IMAGE_TAG=distroless/genevamdm:2.2024.517.533-b73893-20240522t0954 && az acr import --name $DST_ACR_NAME.azurecr.io/$GENEVAMDM_IMAGE_TAG --source linuxgeneva-microsoft.azurecr.io/$GENEVAMDM_IMAGE_TAG export GENEVAMDSD_IMAGE_TAG=distroless/genevamdsd:mariner_20240524.1 && az acr import --name $DST_ACR_NAME.azurecr.io/$GENEVAMDSD_IMAGE_TAG --source linuxgeneva-microsoft.azurecr.io/$GENEVAMDSD_IMAGE_TAG
-
Push the ARO and Fluentbit images to your ACR
If running this step from a VM separate from your workstation, ensure the commit tag used to build the image matches the commit tag where
make deploy
is run.For local builds and CI builds without
RP_IMAGE_ACR
environment variable set,make publish-image-*
targets will pull fromregistry.access.redhat.com
. If you need to use Azure container registry instead due to security compliance requirements, modify theRP_IMAGE_ACR
environment variable to point toarointsvc
orarosvc
instead. You will need authenticate to this registry usingaz acr login --name arointsvc
to pull the images.If the push fails on error like
unable to retrieve auth token: invalid username/password: unauthorized: authentication required
, try to createDST_AUTH
variable and login to the container registry (as explained in steps above) again. It will resolve the failure in case of an expired auth token.make publish-image-aro-multistage make publish-image-fluentbit
-
-
Update the DNS Child Domains
export PARENT_DOMAIN_NAME=osadev.cloud export PARENT_DOMAIN_RESOURCEGROUP=dns export GLOBAL_RESOURCEGROUP=$USER-global for DOMAIN_NAME in $USER-clusters.$PARENT_DOMAIN_NAME $USER-rp.$PARENT_DOMAIN_NAME; do CHILD_DOMAIN_PREFIX="$(cut -d. -f1 <<<$DOMAIN_NAME)" echo "########## Creating NS record to DNS Zone $CHILD_DOMAIN_PREFIX ##########" az network dns record-set ns create \ --resource-group "$PARENT_DOMAIN_RESOURCEGROUP" \ --zone "$PARENT_DOMAIN_NAME" \ --name "$CHILD_DOMAIN_PREFIX" >/dev/null for ns in $(az network dns zone show \ --resource-group "$GLOBAL_RESOURCEGROUP" \ --name "$DOMAIN_NAME" \ --query nameServers -o tsv); do az network dns record-set ns add-record \ --resource-group "$PARENT_DOMAIN_RESOURCEGROUP" \ --zone "$PARENT_DOMAIN_NAME" \ --record-set-name "$CHILD_DOMAIN_PREFIX" \ --nsdname "$ns" >/dev/null done done
-
Update the certificates in keyvault
NOTE: If you reuse an old name, you might run into soft-delete of the keyvaults. Run
az keyvault recover --name
to fix this.NOTE: Check to ensure that the $KEYVAULT_PREFIX environment variable set on workstation matches the prefix deployed into the resource group.
az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name rp-mdm \ --file secrets/rp-metrics-int.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-gwy" \ --name gwy-mdm \ --file secrets/rp-metrics-int.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name rp-mdsd \ --file secrets/rp-logging-int.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-gwy" \ --name gwy-mdsd \ --file secrets/rp-logging-int.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name cluster-mdsd \ --file secrets/cluster-logging-int.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name dev-arm \ --file secrets/arm.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name rp-firstparty \ --file secrets/firstparty.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-svc" \ --name rp-server \ --file secrets/localhost.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-por" \ --name portal-server \ --file secrets/localhost.pem >/dev/null az keyvault certificate import \ --vault-name "$KEYVAULT_PREFIX-por" \ --name portal-client \ --file secrets/portal-client.pem >/dev/null
-
Delete the existing VMSS
NOTE: This needs to be deleted as deploying won't recreate the VMSS if the commit hash is the same.
az vmss delete -g ${RESOURCEGROUP} --name rp-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) && az vmss delete -g $USER-gwy-$LOCATION --name gateway-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty)
-
Run
make deploy
. When the command finishes, there should be one VMSS for the RP with a single vm instance, and another VMSS with a single vm for Gateway. -
Create storage account and role assignment required for workload identity clusters
source ./hack/devtools/deploy-shared-env.sh deploy_oic_for_dedicated_rp
-
If you are going to use multiversion, you can now update the OpenShiftVersions DB as per OpenShift Version insttructions
SSH to RP VMSS Instance
-
Update the RP NSG to allow SSH
az network nsg rule create \ --name ssh-to-rp \ --resource-group $RESOURCEGROUP \ --nsg-name rp-nsg \ --access Allow \ --priority 500 \ --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \ --protocol Tcp \ --destination-port-ranges 22
-
SSH into the VM
VMSS_PIP=$(az vmss list-instance-public-ips -g $RESOURCEGROUP --name rp-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) | jq -r '.[0].ipAddress') ssh cloud-user@${VMSS_PIP}
SSH to Gateway VMSS Instance
-
Update the Gateway NSG to allow SSH
az network nsg rule create \ --name ssh-to-gwy \ --resource-group $USER-gwy-$LOCATION \ --nsg-name gateway-nsg \ --access Allow \ --priority 500 \ --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \ --protocol Tcp \ --destination-port-ranges 22
-
SSH into the VM
VMSS_PIP=$(az vmss list-instance-public-ips -g $USER-gwy-$LOCATION --name gateway-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) | jq -r '.[0].ipAddress') ssh cloud-user@${VMSS_PIP}
Deploy a Cluster
-
Add a NSG rule to allow tunneling to the RP instance
az network nsg rule create \ --name tunnel-to-rp \ --resource-group $RESOURCEGROUP \ --nsg-name rp-nsg \ --access Allow \ --priority 499 \ --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \ --protocol Tcp \ --destination-port-ranges 443
-
Run the tunnel program to tunnel to the RP
make tunnel
NOTE:
make tunnel
will print the public IP of your new RP VM NIC. Ensure that it's correct. -
Update the versions present available to install (run this as many times as you need for versions)
curl -X PUT -k "https://localhost:8443/admin/versions" --header "Content-Type: application/json" -d '{ "properties": { "version": "4.x.y", "enabled": true, "openShiftPullspec": "quay.io/openshift-release-dev/ocp-release@sha256:<sha256>", "installerPullspec": "<name>.azurecr.io/installer:release-4.x" }}'
-
Update environment variable to deploy in a different resource group
export RESOURCEGROUP=myResourceGroup
-
Create the resource group if it doesn't exist
az group create --resource-group $RESOURCEGROUP --location $LOCATION
-
Create VNets / Subnets
az network vnet create \ --resource-group $RESOURCEGROUP \ --name aro-vnet \ --address-prefixes 10.0.0.0/22
az network vnet subnet create \ --resource-group $RESOURCEGROUP \ --vnet-name aro-vnet \ --name master-subnet \ --address-prefixes 10.0.0.0/23 \ --service-endpoints Microsoft.ContainerRegistry
az network vnet subnet create \ --resource-group $RESOURCEGROUP \ --vnet-name aro-vnet \ --name worker-subnet \ --address-prefixes 10.0.2.0/23 \ --service-endpoints Microsoft.ContainerRegistry
-
Register your subscription with the resource provider (post directly to subscription cosmosdb container)
curl -k -X PUT -H 'Content-Type: application/json' -d '{ "state": "Registered", "properties": { "tenantId": "'"$AZURE_TENANT_ID"'", "registeredFeatures": [ { "name": "Microsoft.RedHatOpenShift/RedHatEngineering", "state": "Registered" } ] } }' "https://localhost:8443/subscriptions/$AZURE_SUBSCRIPTION_ID?api-version=2.0"
-
Create the cluster
export CLUSTER=$USER az aro create \ --resource-group $RESOURCEGROUP \ --name $CLUSTER \ --vnet aro-vnet \ --master-subnet master-subnet \ --worker-subnet worker-subnet
NOTE: The
az aro
CLI extension must be registered in order to runaz aro
commands against a local or tunneled RP. The usual hack script used to create clusters does not work due to keyvault mirroring requirements. The name of the cluster depends on the DNS zone that was created in an earlier step.