зеркало из https://github.com/Azure/ARO-RP.git
201 строка
8.4 KiB
Markdown
201 строка
8.4 KiB
Markdown
|
# Deployment model
|
||
|
|
||
|
For better or worse, the ARO-RP codebase has four different deployment models.
|
||
|
|
||
|
|
||
|
## 1. Production deployment (PROD)
|
||
|
|
||
|
Running in production. PROD deployments at a given commit are intended to be
|
||
|
identical (bar configuration) across all regions, regardless if the region is a
|
||
|
designated canary region (westcentralus / eastus2euap) or not.
|
||
|
|
||
|
Subscription [feature flags](feature-flags.md) are used to prevent end users
|
||
|
from accessing the ARO service in canary regions, or in regions or for
|
||
|
api-versions which are in the process of being built out. The subscription used
|
||
|
for regular E2E service health checking has the relevant feature flags set.
|
||
|
|
||
|
The RP configures deny assignments on cluster resource groups only when running
|
||
|
in PROD. This is because Azure policy only permits deny assignments to be set
|
||
|
by first party RPs when running in PROD. The deny assignment functionality is
|
||
|
gated by the DisableDenyAssignments RP feature flag, which must be set in all
|
||
|
non-PROD deployments.
|
||
|
|
||
|
|
||
|
## 2. Pre-production deployment (INT)
|
||
|
|
||
|
INT deployment is intended to be as identical as possible to PROD, although
|
||
|
inevitably there are always some differences.
|
||
|
|
||
|
A subscription [feature flag](feature-flags.md) is used to selectively redirect
|
||
|
requests to the INT RP.
|
||
|
|
||
|
Here is a non-exhaustive list of differences between INT and PROD:
|
||
|
|
||
|
* INT is deployed entirely separately from PROD in the MSIT tenant, which does
|
||
|
not have production access overheads.
|
||
|
|
||
|
* The INT ACR is entirely separate from PROD.
|
||
|
|
||
|
* INT uses different subdomains for hosting the RP service and clusters.
|
||
|
|
||
|
* INT does not use the production first party AAD application. Instead it uses
|
||
|
a multitenant AAD application which must be manually patched and granted
|
||
|
permissions in any subscription where the RP will deploy clusters.
|
||
|
|
||
|
* There is standing access (i.e. no JIT) to the INT environment, INT elevated
|
||
|
geneva actions and INT SRE portal.
|
||
|
|
||
|
* INT uses the Test instances of Geneva for RP and cluster logging and
|
||
|
monitoring. Geneva actions use separate credentials to authenticate to the
|
||
|
INT RP.
|
||
|
|
||
|
* Monitoring of the INT environment does not match PROD monitoring.
|
||
|
|
||
|
* As previously mentioned, deny assignments are not enabled in INT.
|
||
|
|
||
|
|
||
|
## 3. Development deployment
|
||
|
|
||
|
A developer is able to deploy the entire ARO service stack in Azure in a way
|
||
|
that is intended to be as representative as possible of PROD/INT, and many ARO
|
||
|
service components can also be meaningfully run and debugged without being run
|
||
|
on Azure infrastructure at all. This latter "local development mode" is also
|
||
|
currently used by our pull request E2E testing.
|
||
|
|
||
|
Some magic is needed to make all of this work, and this translates into a larger
|
||
|
delta from PROD/INT in some cases:
|
||
|
|
||
|
* Development deployment is entirely separate from INT and PROD and may in
|
||
|
principal use any AAD tenant.
|
||
|
|
||
|
* Development uses different subdomains again for hosting the RP service and
|
||
|
clusters.
|
||
|
|
||
|
* No inbound ARM layer
|
||
|
|
||
|
In PROD/INT, service REST API requests are made to PROD ARM, and this proxies
|
||
|
the requests to the RP service. Thus PROD/INT RPs are configured to authorize
|
||
|
only incoming service REST API requests from ARM.
|
||
|
|
||
|
In development, ARM does not front the RP service, thus different authorizers
|
||
|
are used. In development mode, the authorizer used for ARM is also used for
|
||
|
Geneva actions, so a developer can test Geneva actions manually.
|
||
|
|
||
|
The ARO Go and Python client libraries in this repo carry patches such that
|
||
|
they when the environment variable `RP_MODE=development` is set, they dial the
|
||
|
RP on localhost with no authentication instead of dialling ARM.
|
||
|
|
||
|
In addition, any HTTP headers injected by ARM via its proxying are unavailable
|
||
|
in development mode. For instance, the RP frontend fakes up the Referer
|
||
|
header in this case, in order for client polling code to work correctly in
|
||
|
development mode.
|
||
|
|
||
|
* No first party application
|
||
|
|
||
|
In PROD, ARM is configured to automagically grant the RP first party
|
||
|
application Owner on any resource group it creates in a customer subscription.
|
||
|
|
||
|
In INT, the INT multitenant application which fakes the first party
|
||
|
application is granted Owner on every subscription which is INT enabled. This
|
||
|
simple but has the disadvantage that the RP has more permissions in INT than
|
||
|
it does in PROD.
|
||
|
|
||
|
In development, pkg/env/armhelper.go fakes up ARM's automagic behaviour using
|
||
|
a completely separate helper AAD application. This makes setting up the
|
||
|
development more onerous, but has the advantage that the RP's permissions in
|
||
|
development match those in PROD.
|
||
|
|
||
|
* No cluster signed certificates
|
||
|
|
||
|
Integration with Digicert is disabled in development mode. This is controlled
|
||
|
by the DisableSignedCertificates RP feature flag.
|
||
|
|
||
|
* No readiness delay
|
||
|
|
||
|
In PROD/INT, the RP waits 2 minutes before indicating health to its load
|
||
|
balancer, helping us to detect if the RP crash loops. Similarly, it waits for
|
||
|
frontend and backend tasks to complete before exiting. To make the feature
|
||
|
development/test cycle faster, these behaviours are disabled in development
|
||
|
mode via the DisableReadinessDelay feature flag.
|
||
|
|
||
|
* Standard_D2s_v3 workers required
|
||
|
|
||
|
In development mode, use of Standard_D2s_v3 workers is required as a
|
||
|
cost-saving measure. This is controlled by the RequireD2sV3Workers feature
|
||
|
flag.
|
||
|
|
||
|
* There is standing access to development infrastructure using shared
|
||
|
development credentials.
|
||
|
|
||
|
* Test instances of Geneva, matching INT, are used in development mode for
|
||
|
cluster logging and monitoring (and RP logging and monitoring as appropriate).
|
||
|
|
||
|
* Development environments are not monitored.
|
||
|
|
||
|
* As previously mentioned, deny assignments are not enabled in development.
|
||
|
|
||
|
See [Prepare a shared RP development
|
||
|
environment](prepare-a-shared-rp-development-environment.md) for the process to
|
||
|
set up a development environment. The same development AAD applications and
|
||
|
credentials are used regardless whether the RP runs on Azure or locally.
|
||
|
|
||
|
|
||
|
## 3a. Development on Azure
|
||
|
|
||
|
In the case that a developer deploys the entire ARO service stack in Azure, in
|
||
|
addition to the differences listed in section 3, note the following:
|
||
|
|
||
|
* Currently a separate ACR is created which must be populated with the latest
|
||
|
OpenShift release. TODO: this is inconvenient and adds expense.
|
||
|
|
||
|
* Service VMSS capacity is set to 1 instead of 3 (i.e. not highly available) to
|
||
|
save time and money.
|
||
|
|
||
|
* Because the RP is internet-facing, TLS subject name and issuer authentication
|
||
|
is required for all API accesses.
|
||
|
|
||
|
* hack/tunnel is used to forward RP API requests from a listener on localhost,
|
||
|
wrapping these with the aforementioned TLS client authentication.
|
||
|
|
||
|
|
||
|
## 3b. Local development mode / CI
|
||
|
|
||
|
Many ARO service components can be meaningfully run and debugged locally on a
|
||
|
developer's laptop. Notable exceptions include the deployment tooling including
|
||
|
the custom script extension which is used to initialize the RP VMSS.
|
||
|
|
||
|
"Local development mode" is also currently used by our pull request E2E testing.
|
||
|
This has the advantage of saving the time, money and flakiness that would be
|
||
|
implied by setting up an entire service stack on every PR. However it is also
|
||
|
disadvantageous in the sense that coverage is less and the testing is less
|
||
|
representative.
|
||
|
|
||
|
When running in local development mode, in addition to the differences listed in
|
||
|
section 3, note the following:
|
||
|
|
||
|
* Local development mode is enabled, regardless of component, by setting the
|
||
|
environment variable `RP_MODE=development`. This enables code guarded by
|
||
|
`env.IsLocalDevelopmentMode()` and also automatically sets many of the RP
|
||
|
feature flags listed in section 3.
|
||
|
|
||
|
* All services listen on localhost only and authentication is largely disabled.
|
||
|
|
||
|
The ARO Go and Python client libraries in this repo carry patches such that
|
||
|
they when the environment variable `RP_MODE=development` is set, they dial the
|
||
|
RP on localhost with no authentication instead of dialling ARM.
|
||
|
|
||
|
* Generation of ACR tokens per cluster is disabled; the INT ACR is used to pull
|
||
|
OpenShift container images.
|
||
|
|
||
|
* Production VM instance metadata and MSI authorizers obviously don't work.
|
||
|
These are fixed up using environment variables. See
|
||
|
pkg/util/instancemetadata.
|
||
|
|
||
|
* The INT/PROD mechanism of dialing a cluster API server whose private endpoint
|
||
|
is on the RP vnet also obviously doesn't work. Local development RPs share a
|
||
|
proxy VM which is deployed on the RP vnet which can proxy these connections.
|
||
|
See pkg/proxy.
|
||
|
|
||
|
* As a cost saving exercise, all local development RPs share a single Cosmos DB
|
||
|
account (but containing a unique database per developer) per region.
|