ARO-RP/docs/deploy-full-rp-service-in-d...

17 KiB

Deploy an Entire RP Development Service

Prerequisites

  1. Your development environment is prepared according to the steps outlined in Prepare Your Dev Environment
  2. During the deployment, it's recommended to avoid editing files in your ARO-RP repository so that git status reports a clean working tree. Otherwise aro container image will have -dirty suffix, which can be problematic:
    • if the working tree becomes dirty during the process (eg. because you create a temporary helper script to run some of the setup), you could end up with different image tag pushed in the azure image registry compared to the tag expected by aro deployer
    • with a dirty tag, it's not clear what's actually in the image

Deploying an int-like Development RP

  1. Fetch the most up-to-date secrets specifying SECRET_SA_ACCOUNT_NAME to the name of the storage account containing your shared development environment secrets, eg.:

    SECRET_SA_ACCOUNT_NAME=rharosecretsdev make secrets
    
  2. Copy and tweak your environment file:

    cp env.example env
    vi env
    

    You don't need to change anything in the env file, unless you plan on using hive to install the cluster. In that case add the following hive environment variables into your env file:

    export ARO_INSTALL_VIA_HIVE=true
    export ARO_ADOPT_BY_HIVE=true
    
  3. Create a full environment file, which overrides some defaults from ./env options when sourced

    cp env-int.example env-int
    vi env-int
    

    What to change in env-int file:

    • if using a public key separate from ~/.ssh/id_rsa.pub (for ssh access to RP and Gateway vmss instances), source it with export SSH_PUBLIC_KEY=~/.ssh/id_separate.pub
    • don't try to change $USER prefix used there
    • set tag of FLUENTBIT_IMAGE value to match the default from pkg/util/version/const.go, eg. FLUENTBIT_IMAGE=${USER}aro.azurecr.io/fluentbit:1.9.10-cm20230426
    • if you actually care about fluentbit image version, you need to change the default both in the env-int file and for ARO Deployer, which is out of scope of this guide
  4. And finally source the env:

    . ./env-int
    
  5. Generate the development RP configuration

    make dev-config.yaml
    
  6. Run make deploy. This will fail on the first attempt to run due to AKS not being installed, so after the first failure, please skip to the next step to deploy the VPN Gateway and then deploy AKS.

    NOTE: If the deployment fails with InvalidResourceReference due to the RP Network Security Groups not found, delete the "gateway-production-predeploy" deployment in the gateway resource group, and re-run make deploy.

    NOTE: If the deployment fails with A vault with the same name already exists in deleted state, then you will need to recover the deleted keyvaults from a previous deploy using: az keyvault recover --name <KEYVAULT_NAME> for each keyvault, and re-run.

  7. Deploy a VPN Gateway This is required in order to be able to connect to AKS from your local machine:

    source ./hack/devtools/deploy-shared-env.sh
    deploy_vpn_for_dedicated_rp
    
  8. Deploy AKS by running these commands from the ARO-RP root directory:

    source ./hack/devtools/deploy-shared-env.sh
    deploy_aks_dev
    

    NOTE: If the AKS deployment fails with missing RP VNETs, delete the "gateway-production-predeploy" deployment in the gateway resource group, and re-run make deploy and then re-run deploy_aks_dev.

  9. Install Hive into AKS

    1. Download the VPN config. Please note that this action will OVER WRITE the secrets/vpn-$LOCATION.ovpn on your local machine. DO NOT run make secrets-update after doing this, as you will overwrite existing config, until such time as you have run make secrets to get the config restored.

      vpn_configuration
      
    2. Connect to the Dev VPN in a new terminal:

      sudo openvpn secrets/vpn-$LOCATION.ovpn
      
    3. Now that your machine is able access the AKS cluster, you can deploy Hive:

      make aks.kubeconfig
      ./hack/hive-generate-config.sh
      KUBECONFIG=$(pwd)/aks.kubeconfig ./hack/hive-dev-install.sh
      
  10. Mirror the OpenShift images to your new ACR

    NOTE: Running the mirroring through a VM in Azure rather than a local workstation is recommended for better performance. NOTE: Value of USER_PULL_SECRET variable comes from the secrets, which are sourced via env-int file NOTE: DST_AUTH token or the login to the registry expires after some time

    1. Setup mirroring environment variables

      export DST_ACR_NAME=${USER}aro
      export SRC_AUTH_QUAY=$(echo $USER_PULL_SECRET | jq -r '.auths."quay.io".auth')
      export SRC_AUTH_REDHAT=$(echo $USER_PULL_SECRET | jq -r '.auths."registry.redhat.io".auth')
      export DST_AUTH=$(echo -n '00000000-0000-0000-0000-000000000000:'$(az acr login -n ${DST_ACR_NAME} --expose-token | jq -r .accessToken) | base64 -w0)
      
    2. Login to the Azure Container Registry

      docker login -u 00000000-0000-0000-0000-000000000000 -p "$(echo $DST_AUTH | base64 -d | cut -d':' -f2)" "${DST_ACR_NAME}.azurecr.io"
      
    3. Run the mirroring

      The latest argument will take the DefaultInstallStream from pkg/util/version/const.go and mirror that version

      go run ./cmd/aro mirror latest
      

      Troubleshooting: There could be some issues when mirroring the images to the ACR related to missing devmapper or btrfs (usually with "fatal error: btrfs/ioctl.h: No such file or directory" error) packages. If respectively installing device-mapper-devel or btrfs-progs-devel packages won't help, then you may ignore them as follows:

      go run -tags=exclude_graphdriver_devicemapper,exclude_graphdriver_btrfs ./cmd/aro mirror latest
    

    If you are going to test or work with multi-version installs, then you should mirror any additional versions as well, for example for 4.11.21 it would be

    go run ./cmd/aro mirror 4.11.21
    
    1. Mirror genevamdm and genevamdsd images from upstream distroless Geneva MDM/MDSD to your ACR

      Run the following commands to mirror two Microsoft Geneva images based on the tags from pkg/util/version/const.go (e.g., 2.2024.517.533-b73893-20240522t0954 and mariner_20240524.1).

          export GENEVAMDM_IMAGE_TAG=distroless/genevamdm:2.2024.517.533-b73893-20240522t0954 && az acr import --name $DST_ACR_NAME.azurecr.io/$GENEVAMDM_IMAGE_TAG --source linuxgeneva-microsoft.azurecr.io/$GENEVAMDM_IMAGE_TAG
          export GENEVAMDSD_IMAGE_TAG=distroless/genevamdsd:mariner_20240524.1 && az acr import --name $DST_ACR_NAME.azurecr.io/$GENEVAMDSD_IMAGE_TAG --source linuxgeneva-microsoft.azurecr.io/$GENEVAMDSD_IMAGE_TAG
      
    2. Push the ARO and Fluentbit images to your ACR

      If running this step from a VM separate from your workstation, ensure the commit tag used to build the image matches the commit tag where make deploy is run.

      For local builds and CI builds without RP_IMAGE_ACR environment variable set, make publish-image-* targets will pull from registry.access.redhat.com. If you need to use Azure container registry instead due to security compliance requirements, modify the RP_IMAGE_ACR environment variable to point to arointsvc or arosvc instead. You will need authenticate to this registry using az acr login --name arointsvc to pull the images.

      If the push fails on error like unable to retrieve auth token: invalid username/password: unauthorized: authentication required, try to create DST_AUTH variable and login to the container registry (as explained in steps above) again. It will resolve the failure in case of an expired auth token.

      make publish-image-aro-multistage
      make publish-image-fluentbit
      
  11. Update the DNS Child Domains

    export PARENT_DOMAIN_NAME=osadev.cloud
    export PARENT_DOMAIN_RESOURCEGROUP=dns
    export GLOBAL_RESOURCEGROUP=$USER-global
    
    for DOMAIN_NAME in $USER-clusters.$PARENT_DOMAIN_NAME $USER-rp.$PARENT_DOMAIN_NAME; do
        CHILD_DOMAIN_PREFIX="$(cut -d. -f1 <<<$DOMAIN_NAME)"
        echo "########## Creating NS record to DNS Zone $CHILD_DOMAIN_PREFIX ##########"
        az network dns record-set ns create \
            --resource-group "$PARENT_DOMAIN_RESOURCEGROUP" \
            --zone "$PARENT_DOMAIN_NAME" \
            --name "$CHILD_DOMAIN_PREFIX" >/dev/null
        for ns in $(az network dns zone show \
            --resource-group "$GLOBAL_RESOURCEGROUP" \
            --name "$DOMAIN_NAME" \
            --query nameServers -o tsv); do
            az network dns record-set ns add-record \
            --resource-group "$PARENT_DOMAIN_RESOURCEGROUP" \
            --zone "$PARENT_DOMAIN_NAME" \
            --record-set-name "$CHILD_DOMAIN_PREFIX" \
            --nsdname "$ns" >/dev/null
        done
    done
    
  12. Update the certificates in keyvault

    NOTE: If you reuse an old name, you might run into soft-delete of the keyvaults. Run az keyvault recover --name to fix this.

    NOTE: Check to ensure that the $KEYVAULT_PREFIX environment variable set on workstation matches the prefix deployed into the resource group.

    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name rp-mdm \
        --file secrets/rp-metrics-int.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-gwy" \
        --name gwy-mdm \
        --file secrets/rp-metrics-int.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name rp-mdsd \
        --file secrets/rp-logging-int.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-gwy" \
        --name gwy-mdsd \
        --file secrets/rp-logging-int.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name cluster-mdsd \
        --file secrets/cluster-logging-int.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name dev-arm \
        --file secrets/arm.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name rp-firstparty \
        --file secrets/firstparty.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-svc" \
        --name rp-server \
        --file secrets/localhost.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-por" \
        --name portal-server \
        --file secrets/localhost.pem >/dev/null
    az keyvault certificate import \
        --vault-name "$KEYVAULT_PREFIX-por" \
        --name portal-client \
        --file secrets/portal-client.pem >/dev/null
    
  13. Delete the existing VMSS

    NOTE: This needs to be deleted as deploying won't recreate the VMSS if the commit hash is the same.

    az vmss delete -g ${RESOURCEGROUP} --name rp-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) && az vmss delete -g $USER-gwy-$LOCATION --name gateway-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty)
    
  14. Run make deploy. When the command finishes, there should be one VMSS for the RP with a single vm instance, and another VMSS with a single vm for Gateway.

  15. Create storage account and role assignment required for workload identity clusters

    source ./hack/devtools/deploy-shared-env.sh
    deploy_oic_for_dedicated_rp
    
  16. If you are going to use multiversion, you can now update the OpenShiftVersions DB as per OpenShift Version insttructions

SSH to RP VMSS Instance

  1. Update the RP NSG to allow SSH

    az network nsg rule create \
        --name ssh-to-rp \
        --resource-group $RESOURCEGROUP \
        --nsg-name rp-nsg \
        --access Allow \
        --priority 500 \
        --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \
        --protocol Tcp \
        --destination-port-ranges 22
    
  2. SSH into the VM

    VMSS_PIP=$(az vmss list-instance-public-ips -g $RESOURCEGROUP --name rp-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) | jq -r '.[0].ipAddress')
    
    ssh cloud-user@${VMSS_PIP}
    

SSH to Gateway VMSS Instance

  1. Update the Gateway NSG to allow SSH

    az network nsg rule create \
        --name ssh-to-gwy \
        --resource-group $USER-gwy-$LOCATION \
        --nsg-name gateway-nsg \
        --access Allow \
        --priority 500 \
        --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \
        --protocol Tcp \
        --destination-port-ranges 22
    
  2. SSH into the VM

    VMSS_PIP=$(az vmss list-instance-public-ips -g $USER-gwy-$LOCATION --name gateway-vmss-$(git rev-parse --short=7 HEAD)$([[ $(git status --porcelain) = "" ]] || echo -dirty) | jq -r '.[0].ipAddress')
    
    ssh cloud-user@${VMSS_PIP}
    

Deploy a Cluster

  1. Add a NSG rule to allow tunneling to the RP instance

    az network nsg rule create \
        --name tunnel-to-rp \
        --resource-group $RESOURCEGROUP \
        --nsg-name rp-nsg \
        --access Allow \
        --priority 499 \
        --source-address-prefixes "$(curl --silent -4 ipecho.net/plain)/32" \
        --protocol Tcp \
        --destination-port-ranges 443
    
  2. Run the tunnel program to tunnel to the RP

    make tunnel
    

    NOTE: make tunnel will print the public IP of your new RP VM NIC. Ensure that it's correct.

  3. Update the versions present available to install (run this as many times as you need for versions)

    curl -X PUT -k "https://localhost:8443/admin/versions" --header "Content-Type: application/json" -d '{ "properties": { "version": "4.x.y", "enabled": true, "openShiftPullspec": "quay.io/openshift-release-dev/ocp-release@sha256:<sha256>", "installerPullspec": "<name>.azurecr.io/installer:release-4.x" }}'
    
  4. Update environment variable to deploy in a different resource group

    export RESOURCEGROUP=myResourceGroup
    
  5. Create the resource group if it doesn't exist

    az group create --resource-group $RESOURCEGROUP --location $LOCATION
    
  6. Create VNets / Subnets

    az network vnet create \
        --resource-group $RESOURCEGROUP \
        --name aro-vnet \
        --address-prefixes 10.0.0.0/22
    
    az network vnet subnet create \
        --resource-group $RESOURCEGROUP \
        --vnet-name aro-vnet \
        --name master-subnet \
        --address-prefixes 10.0.0.0/23 \
        --service-endpoints Microsoft.ContainerRegistry
    
    az network vnet subnet create \
        --resource-group $RESOURCEGROUP \
        --vnet-name aro-vnet \
        --name worker-subnet \
        --address-prefixes 10.0.2.0/23 \
        --service-endpoints Microsoft.ContainerRegistry
    
  7. Register your subscription with the resource provider (post directly to subscription cosmosdb container)

    curl -k -X PUT   -H 'Content-Type: application/json'   -d '{
        "state": "Registered",
        "properties": {
            "tenantId": "'"$AZURE_TENANT_ID"'",
            "registeredFeatures": [
                {
                    "name": "Microsoft.RedHatOpenShift/RedHatEngineering",
                    "state": "Registered"
                }
            ]
        }
    }' "https://localhost:8443/subscriptions/$AZURE_SUBSCRIPTION_ID?api-version=2.0"
    
  8. Create the cluster

    export CLUSTER=$USER
    
    az aro create \
        --resource-group $RESOURCEGROUP \
        --name $CLUSTER \
        --vnet aro-vnet \
        --master-subnet master-subnet \
        --worker-subnet worker-subnet
    

    NOTE: The az aro CLI extension must be registered in order to run az aro commands against a local or tunneled RP. The usual hack script used to create clusters does not work due to keyvault mirroring requirements. The name of the cluster depends on the DNS zone that was created in an earlier step.