* Generate mocks for Azure clients added in cluster MSI PR
* Add other small changes in response to previous PR feedback:
- Get subscription ID from subscription doc instead of a platform MI
- Remove an unused mock controller
Mocks for these interfaces were previously present, but if you remove them and make generate, they don't get replaced. I'm guessing that when they were added, the committer forgot to commit their changes to the generate.go files. This came to my attention as I was moving us over to the Uber fork because it caused errors while I was trying to get builds and unit tests working, so I codified the generation properly in this commit.
* Add a parameter for enabling Entra ID RBAC on key vaults
* Add an RP-level feature flag for determining whether to use the mock MSI RP
* Tweak the mock identity URL to play nicely with the mock MSI RP
* Add Azure SDK client wrappers for new clients (federated identity credentials control plane and key vault data plane)
* Vendor in new Azure SDK clients and update msi-dataplane
* Lay groundwork for use of cluster MSI...
- Initialize the MSI dataplane client, using the mock MSI RP/stub if
appropriate
- Initialize key vault store client (for MSI certificates; functionality
is implemented in MSI dataplane module)
- Create a cluster MSI certificate and store it in the key vault during
cluster bootstrap
- Instantiate an Azure SDK FederatedIdentityCredential client using the
cluster MSI certificate
- Delete the cluster MSI certificate as needed during cluster deletion
* Don't fail during cluster deletion if the cluster MSI certificate is
already gone from the key vault (or was potentially never created)
* Establish an RP-Config variable for the MSI RP endpoint
- Update doc comment for ensureClusterMsiCertificate
- Simplify conditional logic in MSI cert deletion
* Use pointer conversion functions that aren't deprecated
* Respond to PR comments (and fix some other things along the way)
- Move `clusterMsiResourceId` function to `OpenShiftCluster` type
- When persisting the MSI cert to KV, use the `NotAfter` returned by the MSI RP (for the stub, just use an arbitrary value)
- Move `getClientOptions` functionality to `AROEnvironment` type
- Move logic for determining cluster MSI key vault name to `pkg/env`
- Pull cloud name mapping stuff out to `AROEnvironment` type
- Update msi-dataplane module to include new changes and use `UserAssignedIdentities` type to get Azure credential in `pkg/cluster/clustermsi.go`
- Fix typo in https URL in comment in `pkg/cluster/delete.go`
- Implement suggestion to use `errors.As` instead of a type assertion in `pkg/cluster/delete.go`
* Update documentation with info about new feature flag
- Move new cluster MSI steps forward in bootstrap step order
- Move MSI dataplane client options stuff to pkg/env
- Explicitly check for a single cluster MSI in `ClusterMsiResourceId`
- Other small tweaks
* Vendor in msi-dataplane update that prevents a potential nil pointer dereference
* Add missing method to internal key vault client
* Make error messages more specific in ClusterMsiResourceId
* Add missing env vars to run-rp make target and uncomment dynamic validation bootstrap step
- In newly added Azure clients, return struct types instead of interface
types
- Move cluster MSI certificate deletion to be after Azure resource
deletion for safety just in case cx continues to use cluster that is
in Failed/Deleting provisioning state
* Add new env vars for MIWI to env.example for clarity/completeness
* Turn check for nonzero number of user assigned identities into a utility function
* Use existing constant for key vault dns suffix
* prevent updating existing platform identities
This adds a check to v20240812preview static validation that raises an
error if either the name or resource ID of an existing platform identity
* allow changing operator identity order
This allows changing the order of platform identities while still
preventing the resource ID and operator name from being changed
* additional platform identity update validation
This prevents removal of a platform identity or changing the identity's
OperatorName and ResourceID at the same time
* detect duplicate operator names in platform workload identity profiles
* use a map instead of a slice
* update the operator master deployment to support workload identity
This causes the spec for the operator master deployment to mount the
service account token as a volume, and maps the path to the environment
variable expected by Azure to support workload identities
* remove unused ExpectError value from test struct
* mount the token secret as a directory, not a file
* Remove dnf update cron job
Automatic OS Updates are configured. Updating packages via a cron job is no longer required.
* Remove certs arg from verify_role, Add/Remove comments
Certificate generation has been broken up into a named function for each VMSS role. This means it's no longer necessary to provide the certs=true argumenet when checking VMSS roles.
Add a comment for why AZURE_CLOUD_NAME returns an error if unset.
Remove az cli login comment from pull_container_images, it is no longer relevant after the last refactor.
* Add secret location to PlatformWorkloadIdentityRoleSet
* Add generatePlatformWorkloadIdentitySecrets function
* Add mutable:true validate:required struct tags to SecretLocation fields on admin api
* Add functions for other required WI resources
* Remove redundant UsesWorkloadIdentity check from generatePlatformWorkloadIdentitySecrets
* Fix coordinates for static CCO secret; move static coordinate strings to const values
* Return resources as map (w/ filename as key) instead of list
* Explicitly set TypeMeta on workload identity resources
This is needed in order to easily serialize these resources to YAML,
e.g. when setting them as string values in a Secret map for Hive to use
as an install manifest. Not setting these values will result in them being
omitted from the resulting JSON/YAML.
* ARO-4376 Track2 authorization api addition for roledefinitions
* ARO-4376 add a stringutil funcs
* ARO-4376 use dbPlatformWorkloadIdentityRoleSets to get platform identity roles for cluster version
* ARO-4376 add dynamic validation for platformworkloadidentityprofile
* ARO-4376 resolve initial comments
* ARO-4376 refactor error messages and checkaccess action crosscheck
* ARO-4376 Add unit tests and comments resolution
* ARO-4376 add validation for upgradeableTo
* ARO-4376 Comment resoultion and additional unit tests
* ARO-4376 minor version comparison handling
* ARO-4376 update permission error messaging handling for MIWI
* ARO-4376 update constructors to return non-interface type
* ARO-4376 add unit tests for GroupsIntersect
* ARO-4376 update generate files to support bingo
* Clarifying etcd cert renew test
- Updated the test to make it clear it is passing because timeout is being reached
- Updated the timeout from 10s -> 0s to pass faster
* Fix slow changefeed tests
* Update RP and Gateway vmss OS image to cbl-mariner-2-gen2 with Manually Configured FIPS Mode
System Changes:
Remove lvm disk resize, Mariner does not use lvm, the disk is automatically grown to the full size specified.
Remove semanage, Mariner Linux does not have selinux configured.
Remove gateway log rotation config
Log rotation for the podman level driver log was not the correct
approach. The podman log driver is now journald, so all logs will be
shipped to journald rather than a ctr.log file.
fips mode is manually configured following the example code at https://eng.ms/docs/products/azure-linux/features/security/fips
SKU cbl-mariner-2-gen2-fips does not support Automatic OS Updates, therefore we are switching to cbl-mariner-2-gen2, manually configuring fips mode, to allow for Automatic OS Updates.
Script Changes:
Restructure VMSS bootstrap bash scripts for increased reliability, and easier debugging
Move all shared code into a commonly shared file to be sourced by all
bootstrapping scripts. This allows for code reuse, minimal duplication.
Fix mdm mdsd certificate download script
During mdm and mdsd setup, I've added wait steps for the download
scripts to complete getting certificates. Without this, the download
scripts run in a subshell and fixing up the certificates fails.
Add firewalld configuration, required for podman networking
Add podman aro network creation to isolate RP containers from possible
interaction on the default podman network.
Package Changes:
Install Azure Security Monitor via VMSS Extension
Remove RHUI and Microsoft repo configuration, add Mariner Extended repo config
Increase rpm retry time to 30 minutes total, every 30 seconds.
* Embed scripts as strings rather than []byte
This is to reduce the amount of type conversions needed.
* Add unit tests for existing frontend version validation
* Use semver package to validate versions in frontend instead of regex
This allows the ability to provide prerelease versions or version strings
containing metadata.
* Ensure disableUpdates does not propagate metadata in version string to clusterversion resource
* Exclude platform identities from permissions denial
Add platform workload identities to the list of service principals
excluded from the permissions denial so that those identities can manage
Azure resources in the cluster's resource group
* improve testing of deny assignment generation
this confirms that ExcludePrincipals are generated correctly for the
deny assignment based on the presence of a ServicePrincipalProfile or a
PlatformWorkloadIdentityProfile
* use UsesWorkloadIdentity() helper function instead of bespoke check
* check empty ObjectID/SPObjectID values separately
* prevent nil pointer dereference for missing ServicePrincipalProfile