Rename AlwaysOn to Azure Mission-Critical (#184)

* update links and naming

* rename

* Apply suggestions from code review

* updates

* push

* update testing

* push

* fix link

* ALZ update

* online architecture update

* push

* push

* drop foundational

* push

* push

* update icon

* update icon enable dark mode

* update link

* remove anotation

* update vsdx file

* small markdown fixes

Co-authored-by: Hansjoerg Scherer <hscherer@microsoft.com>
This commit is contained in:
heoelri 2022-03-04 08:24:54 +01:00 коммит произвёл GitHub
Родитель 6315010ba3
Коммит 5a4ceb6449
Не найден ключ, соответствующий данной подписи
Идентификатор ключа GPG: 4AEE18F83AFDEB23
57 изменённых файлов: 234 добавлений и 228 удалений

Просмотреть файл

@ -1,10 +1,10 @@
# Azure DevOps Workflows
As explained in the [DevOps design decisions](/docs/reference-implementation/DeployAndTest-DevOps-Design-Decisions.md) section, the AlwaysOn reference implementation is using Azure Pipelines to implement CI/CD pipelines. Azure Pipelines is part of the Azure DevOps (ADO) service and used to automate all build and release tasks.
As explained in the [DevOps design decisions](/docs/reference-implementation/DeployAndTest-DevOps-Design-Decisions.md) section, the Azure Mission-Critical online reference implementation is using Azure Pipelines to implement CI/CD pipelines. Azure Pipelines is part of the Azure DevOps (ADO) service and used to automate all build and release tasks.
## Pipelines
The AlwaysOn project consists of multiple pipelines automating various aspects and tasks needed to deploy and operate AlwaysOn. The pipelines to release INT, PROD and E2E are basically identical (with a few different parameters). They are the implementation of the [Zero-downtime deployment strategy](/docs/reference-implementation/DeployAndTest-DevOps-Zero-Downtime-Update-Strategy.md):
The Azure Mission-Critical project consists of multiple pipelines automating various aspects and tasks needed to deploy and operate Azure Mission-Critical. The pipelines to release INT, PROD and E2E are basically identical (with a few different parameters). They are the implementation of the [Zero-downtime deployment strategy](/docs/reference-implementation/DeployAndTest-DevOps-Zero-Downtime-Update-Strategy.md):
- **Azure.AlwaysOn INT Release** (`azure-release-int.yaml`) deploys and updates the entire solution for the INT environment.
@ -18,7 +18,7 @@ Additionally there are some auxiliary pipelines:
- **Azure.AlwaysOn Deploy Azure Load Generator.** (`azure-deploy-loadgenerator.yaml`) deploys a standalone Azure Functions-based load generator for simulating user activity. See the article on the [load generator](/src/testing/userload-generator/README.md) for more information.
All pipelines are defined in YAML and are stored in the AlwaysOn GitHub repository in the `.ado/pipelines` directory:
All pipelines are defined in YAML and are stored in the Azure Mission-Critical online reference implementation GitHub repository in the `.ado/pipelines` directory:
![img](/docs/media/devops1.png)

Просмотреть файл

@ -1,22 +1,22 @@
# How to Contribute to AlwaysOn
# How to Contribute to Azure Mission-Critical
## Content
The structure of the AlwaysOn repository is broken down into three overarching directories:
The structure of the Azure Mission-Critical repository is broken down into three overarching directories:
* `/docs/` contains the majority of AlwaysOn documentation, covering the architectural framework and design approach as well as detailed documentation to accompany the reference implementation.
* `/docs/` contains the majority of Azure Mission-Critical documentation, covering the architectural framework and design approach as well as detailed documentation to accompany the reference implementation.
* `/src/` contains all source code and technical artifacts for the reference implementation along with low level implementation documentation.
* `/.ado/pipelines` contains the Azure DevOps pipelines to build and deploy the core reference implementation.
## Content Changes and Pull Requests
To add or edit content within the AlwaysOn repository, please take a fork of the repository to iterate on changes before subsequently opening a Pull Request (PR) to get your forked branch merged into the main branch for the AlwaysOn repository. Your PR will be reviewed by the core engineering team for the AlwaysOn project, and once approved, your content accessible to everybody.
To add or edit content within the Azure Mission-Critical repository, please take a fork of the repository to iterate on changes before subsequently opening a Pull Request (PR) to get your forked branch merged into the main branch for the Mission-Critical repository. Your PR will be reviewed by the core engineering team for the Azure Mission-Critical project, and once approved, your content accessible to everybody.
> **Important!** Please make sure that your PR is focused on a specific area of AlwaysOn to facilitate a targeted review, as this will speed up the process to get your changes merged into our repository.
> **Important!** Please make sure that your PR is focused on a specific area of Mission-Critical to facilitate a targeted review, as this will speed up the process to get your changes merged into our repository.
## Documentation Conventions
* Overarching topics concerning the AlwaysOn architecture, design principles, design decisions, and cross-component integration are documented as separate markdown documents within the `/docs/` directory.
* Overarching topics concerning the Mission-Critical architecture, design principles, design decisions, and cross-component integration are documented as separate markdown documents within the `/docs/` directory.
* Each source code component within the reference implementation has it's own `README.md` file which explains how that particular component works, how it is supposed to be used, and how it may interact with other aspects of the AlwaysOn solution.
* Each source code component within the reference implementation has it's own `README.md` file which explains how that particular component works, how it is supposed to be used, and how it may interact with other aspects of the Mission-Critical solution.
* Within the `main` branch, each `README.md` file must accurately represent the state of the associated component which will serve as a core aspect of PR reviews. Any modifications to source components must therefore be reflected in the documentation as well.

Просмотреть файл

@ -1,37 +1,38 @@
[![Always On Application](./icon.png "Azure AlwaysOn Foundational Online")](./README.md)
![Azure Mission-Critical Application](./icon-light.png#gh-light-mode-only)
![Azure Mission-Critical Application](./icon-dark.png#gh-dark-mode-only)
## Welcome to Azure AlwaysOn Foundational Online
## Welcome to Azure Mission-Critical Online Reference Implementation
AlwaysOn is an open source project that provides a **prescriptive architectural approach to building highly-reliable cloud-native applications on Microsoft Azure for mission-critical workloads**. This repository contains a **Fully Functional Production-Ready AlwaysOn Reference Implementation**, intended to provide a solution oriented basis to showcase mission-critical application development on Microsoft Azure, leveraging Azure-native platform capabilities to maximize reliability and operational effectiveness. More specifically, the reference implementation consists of:
Azure Mission-Critical is an open source project that provides a **prescriptive architectural approach to building highly-reliable cloud-native applications on Microsoft Azure for mission-critical workloads**. This repository contains a **Fully Functional Production-Ready Mission-Critical Reference Implementation**, intended to provide a solution oriented basis to showcase mission-critical application development on Microsoft Azure, leveraging Azure-native platform capabilities to maximize reliability and operational effectiveness. More specifically, the reference implementation consists of:
- Design and implementation guidance to help readers understand and use the AlwaysOn design methodology in the context of a particular industry scenario.
- Production-ready technical artifacts including Infrastructure-as-Code (IaC) resources and Continuous-Integration/Continuous-Deployment (CI/CD) pipelines (GitHub and Azure DevOps) to deploy an AlwaysOn application with mature end-to-end operational wrappers.
- Design and implementation guidance to help readers understand and use the Azure Mission-Critical design methodology in the context of a particular industry scenario.
- Production-ready technical artifacts including Infrastructure-as-Code (IaC) resources and Continuous-Integration/Continuous-Deployment (CI/CD) pipelines (GitHub and Azure DevOps) to deploy an Mission-Critical application with mature end-to-end operational wrappers.
This repository contains the reference implementation for an AlwaysOn "online" scenario, i.e. a workload which does not require direct connectivity to other company resources (such as via a hub-and-spoke model). The pipeline deploys the application Azure Subscription security and compliance guardrails and has no network connectivity requirements. It will be used if the AlwaysOn application is access over a public endpoint without additional dependencies to other company resources.
This repository contains the reference implementation for an Mission-Critical "online" scenario, i.e. a workload which does not require direct connectivity to other company resources (such as via a hub-and-spoke model). The pipeline deploys the application Azure Subscription security and compliance guardrails and has no network connectivity requirements. It will be used if the Mission-Critical application is access over a public endpoint without additional dependencies to other company resources.
## Reference implementation - Table of Contents
- [Reference Implementation Solution Guide](./docs/reference-implementation/README.md) - Everything required to understand and build a copy of the reference implementation
- [Reference Implementation Build Artifacts](./src/infra/README.md) - Contains the Infrastructure-as-Code artifacts, CI/CD pipelines, and application code required to deploy the pre-configured reference solution
![Architecture overview](/docs/media/Architecture-Foundational-Online.png)
![Architecture overview](/docs/media/mission-critical-architecture-online.png)
## AlwaysOn overview and design guidelines
## Azure Mission-Critical overview and design guidelines
The following articles provides more information about AlwaysOn design guidelines and design areas located in the [AlwaysOn GitHub](https://github.com/Azure/AlwaysOn) repo:
The following articles provides more information about Azure Mission-Critical design guidelines and design areas located in the [Azure Mission-Critical GitHub](https://github.com/Azure/Mission-Critical) repo:
- [Introduction - What is AlwaysOn?](https://github.com/Azure/AlwaysOn/blob/main/docs/introduction/README.md) (➡️ `Azure/AlwaysOn`) - Detailed introduction into AlwaysOn, the problem it is intended to solve and the value it can provide.
- [Design Guidelines](https://github.com/Azure/AlwaysOn/blob/main/docs/design-methodology/README.md) (➡️ `Azure/AlwaysOn`) - Prescriptive guidance aligned to 8 critical design areas guides users to design and build an AlwaysOn application, outlining a recommended decision process.
- [Introduction - What is Azure Mission-Critical?](https://github.com/Azure/Mission-Critical/blob/main/docs/introduction/README.md) (➡️ `Azure/Mission-Critical`) - Detailed introduction into Mission-Critical, the problem it is intended to solve and the value it can provide.
- [Design Guidelines](https://github.com/Azure/Mission-Critical/blob/main/docs/design-methodology/README.md) (➡️ `Azure/Mission-Critical`) - Prescriptive guidance aligned to 8 critical design areas guides users to design and build an Mission-Critical application, outlining a recommended decision process.
## Helpful Information
- [Getting Started](./docs/reference-implementation/Getting-Started.md) outlines the process and required steps to deploy AlwaysOn in your environment, including preparing the Azure DevOps pipelines. It should be read in tandem with the [Reference Implementation Guide](./docs/reference-implementation/README.md).
- [Frequently Asked Questions](./docs/reference-implementation/FAQ.md) captures responses to common issues and challenges associated with leveraging AlwaysOn.
- [Full List of Documentation](./docs/README.md) contains a complete breakdown of the AlwaysOn repository to help navigate the contained guidance.
- [Getting Started](./docs/reference-implementation/Getting-Started.md) outlines the process and required steps to deploy Mission-Critical in your environment, including preparing the Azure DevOps pipelines. It should be read in tandem with the [Reference Implementation Guide](./docs/reference-implementation/README.md).
- [Frequently Asked Questions](./docs/reference-implementation/FAQ.md) captures responses to common issues and challenges associated with leveraging Mission-Critical.
- [Full List of Documentation](./docs/README.md) contains a complete breakdown of the Mission-Critical repository to help navigate the contained guidance.
## Contributing
AlwaysOn is a community driven open source project that welcomes contributions as well as suggestions. Most contributions require you to agree to a
Azure Mission-Critical is a community driven open source project that welcomes contributions as well as suggestions. Most contributions require you to agree to a
Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit the [CLA portal](https://cla.opensource.microsoft.com).
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g. status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
@ -44,4 +45,4 @@ For more details, please read [how to contribute](./CONTRIBUTE.md).
## Microsoft Sponsorship
The AlwaysOn project was created by the **Microsoft Customer Architecture Team (CAT)** who continue to actively sponsor the sustained evolution of the AlwaysOn project through the creation of additional reference implementations for common industry scenarios.
The Azure Mission-Critical project was created by the **Microsoft Customer Architecture Team (CAT)** who continue to actively sponsor the sustained evolution of the Azure Mission-Critical project through the creation of additional reference implementations for common industry scenarios.

Просмотреть файл

@ -1,14 +1,14 @@
# AlwaysOn - Full List of Documentation
# Azure Mission-Critical - Full List of Documentation
## AlwaysOn Landing Page
## Azure Mission-Critical Landing Page
- [Landing Page](../README.md)
## Introduction to AlwaysOn
## Introduction to Azure Mission-Critical
- [Introduction](https://github.com/Azure/AlwaysOn/blob/main/docs/introduction/README.md) (➡️ `Azure/AlwaysOn`)
- [Introduction](https://github.com/Azure/Mission-Critical/blob/main/docs/introduction/README.md) (➡️ `Azure/Mission-Critical`)
## AlwaysOn Reference Implementation Guide
## Azure Mission-Critical Reference Implementation Guide
- [Overview](./reference-implementation/README.md)
- [Getting Started](./reference-implementation/Getting-Started.md)
@ -56,7 +56,7 @@
## Documentation Conventions
- Overarching topics concerning the AlwaysOn architecture, design principles, design decisions, and cross-component integration are documented as separate markdown documents within the `/docs/` folder.
- Overarching topics concerning the Azure Mission-Critical architecture, design principles, design decisions, and cross-component integration are documented as separate markdown documents within the `/docs/` folder.
- Each source code component for the reference implementation has it's own `README.md` file which explains how that particular component works, how it is supposed to be used, and how it may interact with other aspects of the AlwaysOn solution.
- Each source code component for the reference implementation has it's own `README.md` file which explains how that particular component works, how it is supposed to be used, and how it may interact with other aspects of the Azure Mission-Critical solution.
- Within the `main` branch, each `README.md` file must accurately represent the state of the associated component which will serve as a core aspect of PR reviews. Any modifications to source components must therefore be reflected in the documentation as well.

Двоичные данные
docs/media/AlwaysOn-ESLZ.gif

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.2 MiB

Двоичные данные
docs/media/AlwaysOn-Subscription-Scale.gif

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 2.4 MiB

Двоичные данные
docs/media/Architecture-Foundational-Online.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 72 KiB

Двоичные данные
docs/media/mission-critical-architecture-online.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 107 KiB

Двоичные данные
docs/media/mission-critical-landing-zones.gif Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 5.2 MiB

Двоичный файл не отображается.

Просмотреть файл

@ -4,7 +4,7 @@ This section explains how the application was designed and what patterns were im
## The workload
The foundational AlwaysOn reference implementation considers a simple web shop catalog workflow where end users can browse through a catalog of items, see details of an item, and post ratings and comments for items. Although fairly straight forward, this application enables the Reference Implementation to demonstrate the asynchronous processing of requests and how to achieve high throughput within a solution.
The Azure Mission-Critical online reference implementation considers a simple web shop catalog workflow where end users can browse through a catalog of items, see details of an item, and post ratings and comments for items. Although fairly straight forward, this application enables the Reference Implementation to demonstrate the asynchronous processing of requests and how to achieve high throughput within a solution.
The workload consists of three components:
@ -14,13 +14,13 @@ The workload consists of three components:
## Queue-based asynchronous processing
In order to achieve high responsiveness for all operations, AlwaysOn implements the [Queue-Based Load leveling pattern](https://docs.microsoft.com/azure/architecture/patterns/queue-based-load-leveling) combined with [Competing Consumers pattern](https://docs.microsoft.com/azure/architecture/patterns/competing-consumers) where multiple producer instances (`CatalogService` in our case) generate messages which are then asynchronously processed by consumers (`BackgroundProcessor`). This allows the API to accept the request and return to the caller quickly whilst the more demanding database write operation is processed separately.
In order to achieve high responsiveness for all operations, Azure Mission-Critical implements the [Queue-Based Load leveling pattern](https://docs.microsoft.com/azure/architecture/patterns/queue-based-load-leveling) combined with [Competing Consumers pattern](https://docs.microsoft.com/azure/architecture/patterns/competing-consumers) where multiple producer instances (`CatalogService` in our case) generate messages which are then asynchronously processed by consumers (`BackgroundProcessor`). This allows the API to accept the request and return to the caller quickly whilst the more demanding database write operation is processed separately.
![Competing consumers diagram](/docs/media/competing-consumers-diagram.png)
*Image source: https://docs.microsoft.com/azure/architecture/patterns/competing-consumers*
- The current AlwaysOn reference implementation uses **Azure Event Hub** as the message queue but provides interfaces in code which enable the use of other messaging services if required (Azure Service Bus was successfully tested as an alternative solution).
- The current Azure Mission-Critical online reference implementation uses **Azure Event Hub** as the message queue but provides interfaces in code which enable the use of other messaging services if required (Azure Service Bus was successfully tested as an alternative solution).
- **ASP.NET Core API** is used to implement the producer REST API.
- **.NET Core Worker Service** is used to implement the consumer service.
@ -38,7 +38,7 @@ There is no backchannel which communicates to the client if the operation comple
## Authentication
This foundational reference implementation of AlwaysOn uses a simple authentication scheme based on API keys for some restricted operations, such as creating new catalog items or deleting comments.
This online reference implementation of Azure Mission-Critical uses a simple authentication scheme based on API keys for some restricted operations, such as creating new catalog items or deleting comments.
More advanced scenarios such as user authentication and user roles are not in scope here.
## Scalability
@ -55,22 +55,22 @@ The `BackgroundProcessor` service has very different requirements and is conside
## 12-Factor App
AlwaysOn aligns to the [12-Factor Application](https://12factor.net/) Methodology as follows.
Azure Mission-Critical aligns to the [12-Factor Application](https://12factor.net/) Methodology as follows.
| Factor | AlwaysOn Alignment |
| Factor | Azure Mission-Critical Alignment |
| --- | --- |
| [Codebase](https://12factor.net/codebase) | All AlwaysOn assets are stored and tracked under source control including CI/CD pipelines, application code, all test code and scripts, infrastructure as code, and configuration management.<br /><br />There is one AlwaysOn codebase and multiple deployments to multiple environments are supported. |
| [Dependencies](https://12factor.net/dependencies) | AlwaysOn applications have NuGet package dependencies which are restored into the build environment.<br /><br />AlwaysOn makes no assumptions about the existence of any dependencies in the build environment. |
| [Config](https://12factor.net/config) | Variable files, both general as well as per-environment, store deployment and configuration data and are stored in the source code repository. Sensitive values are stored in Azure DevOps variable groups.<br /><br />All application runtime configuration is stored in Azure Key Vault - this applies to both, secret and non-sensitive settings. The Key Vaults are only populated by the Terraform deployment. The required values are either sourced directly by Terraform (such as database connection strings) or passed through as Terraform variables from the deployment pipeline.<br /><br />The applications run in containers on Azure Kubernetes Service. Containers use Container Storage Interface bindings to enable AlwaysOn applications to access Azure Key Vault configuration values, surfaced as environment variables, at runtime.<br /><br />Configuration values and environment variables are standalone and not reproduced in different runtime "environments", but are differentiated by target environment at deployment. |
| [Backing Services](https://12factor.net/backing-services) | AlwaysOn applications treat local and third-party services as attached resources, accessed via URL or locator/credentials stored in config.<br /><br />Different resource instances can be accessed by changing the URL or locator/credentials in config. |
| [Build, release, run](https://12factor.net/build-release-run) | AlwaysOn CI/CD pipelines have separate stages. Application stages include build, test, and deploy. Infrastructure stages include global and regional stamp deploy as well as configuration. Releases and runs have distinct IDs. |
| [Processes](https://12factor.net/processes) | AlwaysOn applications are stateless in process, share nothing, and store state in a backing service, Azure Cosmos DB.<br /><br />Sticky sessions are not used.<br /><br />The loss of a stamp will not lose any committed data as it will have been persisted to a backing store. |
| [Port binding](https://12factor.net/port-binding) | AlwaysOn applications run in containers. Endpoints are exported via port binding.<br /><br />Containers are built from images which include the required HTTPS services; no serving capabilities are injected at runtime. |
| [Concurrency](https://12factor.net/concurrency) | AlwaysOn runs different workloads in distinct processes.<br /><br />The front end runs in an HTTP serving process suited for handling web requests, whereas the back end runs in a worker process suited for handling background tasks.<br /><br />The processes manage internal multiplexing/multi-threading. Horizontal scaling is enabled by the shared-nothing, stateless design. |
| [Disposability](https://12factor.net/disposability) | AlwaysOn applications are shared-nothing and stateless. They can be started or stopped with little or zero notice.<br /><br />Hosting in containers on Azure Kubernetes Service enables very fast startup and shutdown which is important for resilience in case of code or config changes. |
| [Dev/prod parity](https://12factor.net/dev-prod-parity) | AlwaysOn is designed for continuous integration and deployment to keep the gaps between development and downstream environment very small.<br /><br />As developers push code updates, testing and deployment are fully automated through CI/CD pipelines.<br /><br />The same pipelines are used to deploy and configure multiple environments as well as build and deploy the application code to the environments, minimizing drift between environments. |
| [Logs](https://12factor.net/logs) | AlwaysOn applications write logs, metrics, and telemetry to a backing log system, Azure Monitor.<br /><br />The applications do not write log files in the runtime environment, or manage log formats or the logging environment. There are no log boundaries (e.g. date rollover) defined or managed by the applications, rather logging is an ongoing event stream and the backing log system is where log analytics and querying are performed. |
| [Codebase](https://12factor.net/codebase) | All Azure Mission-Critical assets are stored and tracked under source control including CI/CD pipelines, application code, all test code and scripts, infrastructure as code, and configuration management.<br /><br />There is one Mission-Critical codebase and multiple deployments to multiple environments are supported. |
| [Dependencies](https://12factor.net/dependencies) | Mission-Critical applications have NuGet package dependencies which are restored into the build environment.<br /><br />Azure Mission-Critical makes no assumptions about the existence of any dependencies in the build environment. |
| [Config](https://12factor.net/config) | Variable files, both general as well as per-environment, store deployment and configuration data and are stored in the source code repository. Sensitive values are stored in Azure DevOps variable groups.<br /><br />All application runtime configuration is stored in Azure Key Vault - this applies to both, secret and non-sensitive settings. The Key Vaults are only populated by the Terraform deployment. The required values are either sourced directly by Terraform (such as database connection strings) or passed through as Terraform variables from the deployment pipeline.<br /><br />The applications run in containers on Azure Kubernetes Service. Containers use Container Storage Interface bindings to enable Mission-Critical applications to access Azure Key Vault configuration values, surfaced as environment variables, at runtime.<br /><br />Configuration values and environment variables are standalone and not reproduced in different runtime "environments", but are differentiated by target environment at deployment. |
| [Backing Services](https://12factor.net/backing-services) | Mission-Critical applications treat local and third-party services as attached resources, accessed via URL or locator/credentials stored in config.<br /><br />Different resource instances can be accessed by changing the URL or locator/credentials in config. |
| [Build, release, run](https://12factor.net/build-release-run) | Mission-Critical CI/CD pipelines have separate stages. Application stages include build, test, and deploy. Infrastructure stages include global and regional stamp deploy as well as configuration. Releases and runs have distinct IDs. |
| [Processes](https://12factor.net/processes) | Mission-Critical applications are stateless in process, share nothing, and store state in a backing service, Azure Cosmos DB.<br /><br />Sticky sessions are not used.<br /><br />The loss of a stamp will not lose any committed data as it will have been persisted to a backing store. |
| [Port binding](https://12factor.net/port-binding) | Azure Mission-Critical applications run in containers. Endpoints are exported via port binding.<br /><br />Containers are built from images which include the required HTTPS services; no serving capabilities are injected at runtime. |
| [Concurrency](https://12factor.net/concurrency) | Azure Mission-Critical runs different workloads in distinct processes.<br /><br />The front end runs in an HTTP serving process suited for handling web requests, whereas the back end runs in a worker process suited for handling background tasks.<br /><br />The processes manage internal multiplexing/multi-threading. Horizontal scaling is enabled by the shared-nothing, stateless design. |
| [Disposability](https://12factor.net/disposability) | Azure Mission-Critical applications are shared-nothing and stateless. They can be started or stopped with little or zero notice.<br /><br />Hosting in containers on Azure Kubernetes Service enables very fast startup and shutdown which is important for resilience in case of code or config changes. |
| [Dev/prod parity](https://12factor.net/dev-prod-parity) | Azure Mission-Critical is designed for continuous integration and deployment to keep the gaps between development and downstream environment very small.<br /><br />As developers push code updates, testing and deployment are fully automated through CI/CD pipelines.<br /><br />The same pipelines are used to deploy and configure multiple environments as well as build and deploy the application code to the environments, minimizing drift between environments. |
| [Logs](https://12factor.net/logs) | Azure Mission-Critical applications write logs, metrics, and telemetry to a backing log system, Azure Monitor.<br /><br />The applications do not write log files in the runtime environment, or manage log formats or the logging environment. There are no log boundaries (e.g. date rollover) defined or managed by the applications, rather logging is an ongoing event stream and the backing log system is where log analytics and querying are performed. |
| [Admin processes](https://12factor.net/admin-processes) | Administrative tasks such as environment (re)configuration would be performed in the same deployment pipelines used to initially configure and deploy the environments. Deployments are idempotent and incremental due to the underlying Azure Resource Manager platform. |
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -2,9 +2,9 @@
## Introduction
The AlwaysOn architecture is based on the [deployment stamp pattern](https://docs.microsoft.com/azure/architecture/patterns/deployment-stamp). Each deployment stamp is stateless, independent and is considered to be one scale unit. If a stamp is considered to be unhealthy, it can be entirely replaced by a newly deployed healthy stamp.
The Azure Mission-Critical architecture is based on the [deployment stamp pattern](https://docs.microsoft.com/azure/architecture/patterns/deployment-stamp). Each deployment stamp is stateless, independent and is considered to be one scale unit. If a stamp is considered to be unhealthy, it can be entirely replaced by a newly deployed healthy stamp.
AlwaysOn stamps share several global resources which are durable through stamp deployments. This document summarizes Business Continuity capabilities and configurations as well as Disaster Recovery processes for each global resource type shared by AlwaysOn stamps.
Azure Mission-Critical stamps share several global resources which are durable through stamp deployments. This document summarizes Business Continuity capabilities and configurations as well as Disaster Recovery processes for each global resource type shared by these deployment stamps.
## Azure Container Registry (ACR)
@ -33,4 +33,4 @@ Each stamp in the reference implementation contains its own Log Analytics Worksp
For cost-saving reasons there are daily data caps configured on stamp and global resources. This can be problematic during load tests as any telemetry beyond these limits is lost.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,14 +1,14 @@
# SLO and Availability
AlwaysOn has set a targeted availability of **99.95%**. This document covers the reasoning and how this number was defined.
Azure Mission-Critical has set a targeted availability of **99.95%**. This document covers the reasoning and how this number was defined.
> While it is understood that the implementation is literally called "AlwaysOn" and therfore implies availability of 100%, in cloud reality this number is extremely difficult to achieve. Instead it is accepted that each component can/ will become unavailable at some point and have designed the architecture to be as tolerant and adaptive to this as possible.
> While it is understood that the implementation is literally called "AlwaysOn" and therefore implies availability of 100%, in cloud reality this number is extremely difficult to achieve. Instead it is accepted that each component can/ will become unavailable at some point and have designed the architecture to be as tolerant and adaptive to this as possible.
## Service Level Agreement (SLA) and Service Level Objective (SLO)
An **SLA** describes a contractual commitment for application availability and as the purpose of AlwaysOn is not to define contractual agreements, we prefer an availability target in the form of **SLO**. This is a percentage figure which represents the amount of time in a month when the application is *available*.
An **SLA** describes a contractual commitment for application availability and as the purpose of Azure Mission-Critical is not to define contractual agreements, we prefer an availability target in the form of **SLO**. This is a percentage figure which represents the amount of time in a month when the application is *available*.
**Availability** for AlwaysOn means that end users are able to perform game operations using the website. These operations include:
**Availability** for Azure Mission-Critical means that end users are able to perform game operations using the website. These operations include:
1. Enter the home page.
1. Sign in with provided credentials.
@ -22,7 +22,7 @@ An SLO of 99.95% equates to an accepted downtime of **5 minutes per week** or **
To define a realistic SLO it is important to understand the SLAs of the individual Azure components. Cloud services rely on each other and can potentially fail at the same time, therefore, their availability numbers need to be combined into a Composite SLA.
> While AlwaysOn does not have contract with its users (hence providing an SLO not SLA) it does have one with Azure and so we can consider the official SLAs of the platform.
> While Azure Mission-Critical does not have contract with its users (hence providing an SLO not SLA) it does have one with Azure and so we can consider the official SLAs of the platform.
Composite SLA is calculated as individual SLAs multiplied with each other.
@ -58,7 +58,7 @@ Composite SLA of Stamp tier: **99.77%**.
## Final SLO
The fact that AlwaysOn uses multiple stamps improves the Stamp tier availability and resiliency, but at the same time the hard dependency on the Global tier limits the overall achievable availability. This also means that adding more stamps will not improve the overall infrastructure SLA, however, this can improve performance and resiliency in case a stamp fails.
The fact that Azure Mission-Critical uses multiple stamps improves the Stamp tier availability and resiliency, but at the same time the hard dependency on the Global tier limits the overall achievable availability. This also means that adding more stamps will not improve the overall infrastructure SLA, however, this can improve performance and resiliency in case a stamp fails.
The maximum availability (based on the underlying Azure infrastructure) is 99.979% when running with at least **three** stamps. To allow for deployments and application-level outages, this number was reduced slightly to **99.95%**.
@ -69,7 +69,7 @@ https://docs.microsoft.com/azure/architecture/framework/resiliency/business-metr
## Observability
AlwaysOn uses Application Insights availability probes to probe health endpoints for each stamp every 5 minutes. If the probe responds with success, then the website storage account is reachable. These are the same probing calls which Azure Front Door uses to determine backend health.
Azure Mission-Critical uses Application Insights availability probes to probe health endpoints for each stamp every 5 minutes. If the probe responds with success, then the website storage account is reachable. These are the same probing calls which Azure Front Door uses to determine backend health.
![Availability in Application Insights](/docs/media/SLA-appi-availability.png)
@ -82,4 +82,4 @@ Availability can also be observed via Front Door backend monitoring which is bas
![Front Door backend health](/docs/media/SLA-backend-health-fd.png)
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,6 +1,6 @@
# Data Platform Design Decisions
The AlwaysOn application data access pattern has the following characteristics:
The Azure Mission-Critical application data access pattern has the following characteristics:
- Read pattern - Point reads e.g. queries which fetch a single record. These queries have a "WHERE" clause defined so that a single row is selected for reads.
- Write pattern - Small writes e.g. queries which usually insert a single or a very small number of records in a transaction.
@ -10,7 +10,7 @@ The AlwaysOn application data access pattern has the following characteristics:
- Low response time (in order of milli-seconds)
- Low Latency (in order of milli-seconds)
The OLTP nature of the access pattern of AlwaysOn has a bearing on the choice of architectural characteristics and must be considered while choosing backend datastores. The key architectural characteristics are:
The OLTP nature of the access pattern of Azure Mission-Critical has a bearing on the choice of architectural characteristics and must be considered while choosing backend datastores. The key architectural characteristics are:
- Performance
- Latency
@ -20,24 +20,24 @@ The OLTP nature of the access pattern of AlwaysOn has a bearing on the choice of
- Resiliency
- Security
Based on these characteristics, AlwaysOn uses the following data stores:
Based on these characteristics, Azure Mission-Critical uses the following data stores:
- Cosmos DB to serve as the main backend database.
- Event Hubs for messaging capabilities.
> **Note** - From data platform capabilities perspective, the current reference implementation of AlwaysOn focuses on the operational data store. In future, we plan to update AlwaysOn guidance to include analytics capabilities. In the meantime, we encourage readers to refer to [Enterprise Scale Analytics](https://docs.microsoft.com/azure/cloud-adoption-framework/scenarios/data-management/enterprise-scale-landing-zone) guidance for enabling analytics at scale on Azure.
> **Note** - From data platform capabilities perspective, the current reference implementation of Azure Mission-Critical focuses on the operational data store. In future, we plan to update Azure Mission-Critical guidance to include analytics capabilities. In the meantime, we encourage readers to refer to [Enterprise Scale Analytics](https://docs.microsoft.com/azure/cloud-adoption-framework/scenarios/data-management/enterprise-scale-landing-zone) guidance for enabling analytics at scale on Azure.
## Database
**[Azure Cosmos DB](https://azure.microsoft.com/services/cosmos-db/)** was chosen as the main database as it provides the crucial ability of multi-region writes: each stamp can write to the Cosmos DB replica in the same region with Cosmos DB internally handling data replication and synchronization between regions.
AlwaysOn is a cloud-native application. Its data model does not require features offered by traditional relational databases (e.g. entity linking across tables with foreign keys, strict row/column schema, views etc.).
Azure Mission-Critical is a cloud-native application. Its data model does not require features offered by traditional relational databases (e.g. entity linking across tables with foreign keys, strict row/column schema, views etc.).
The SQL API of Cosmos DB is being used as it provides the most features and there is no requirement for migration scenario (to or from some other database like MongoDB).
The reference implementation uses Cosmos DB as follows:
- **Consistency level** is set to the default "Session consistency" as the most widely used level for single region and globally distributed applications. AlwaysOn does not use weaker consistency with higher throughput because the asynchronous nature of write processing doesn't require low latency on database write.
- **Consistency level** is set to the default "Session consistency" as the most widely used level for single region and globally distributed applications. Azure Mission-Critical does not use weaker consistency with higher throughput because the asynchronous nature of write processing doesn't require low latency on database write.
- **Partition key** is set to `/id` for all collections. This decision is based on the usage pattern which is mostly "writing new documents with random GUID as ID" and "reading wide range of documents by ID". Providing the application code maintains its ID uniqueness, new data will be evenly distributed into partitions by Cosmos DB.
@ -76,17 +76,17 @@ indexing_policy {
- `EnableContentResponseOnWrite` is set to `false` to prevent the Cosmos DB client from returning the resource from Create, Upsert, Patch and Replace operations to reduce network traffic and because this is not needed for further processing on the client.
- Custom serialization is used to set the JSON property naming policy to `JsonNamingPolicy.CamelCase` (to translate .NET-style properties to standard JSON-style and vice-versa) and the default ignore condition to ignore properties with null values when serializing (`JsonIgnoreCondition.WhenWritingNull`).
The AlwaysOn reference implementation leverages the native backup feature of Cosmos DB for data protection. [Cosmos DB's backup feature](https://docs.microsoft.com/azure/cosmos-db/online-backup-and-restore) supports online backups and on-demand data restore.
The Azure Mission-Critical reference implementation leverages the native backup feature of Cosmos DB for data protection. [Cosmos DB's backup feature](https://docs.microsoft.com/azure/cosmos-db/online-backup-and-restore) supports online backups and on-demand data restore.
> Note - In practice, most workloads are not purely OLTP. There is an increasing demand for real-time reporting, such as running reports against the operational system. This is also referred to as HTAP (Hybrid Transactional and Analytical Processing). Cosmos DB supports this capability via [Azure Synapse Link for Cosmos DB](https://docs.microsoft.com/azure/cosmos-db/synapse-link-use-cases).
## Messaging bus
**[Azure Event Hubs](https://docs.microsoft.com/azure/event-hubs/event-hubs-about)** service is used for the asynchronous messaging between the API service (CatalogService) and the background worker (BackgroundProcessor). It was chosen over alternative services like Azure Service Bus because of its high throughput support and because AlwaysOn does not require features like Service Bus' in-order delivery.
**[Azure Event Hubs](https://docs.microsoft.com/azure/event-hubs/event-hubs-about)** service is used for the asynchronous messaging between the API service (CatalogService) and the background worker (BackgroundProcessor). It was chosen over alternative services like Azure Service Bus because of its high throughput support and because Azure Mission-Critical does not require features like Service Bus' in-order delivery.
Event Hubs offers Zone Redundancy in its Standard SKU, whereas Service Bus requires Premium tier for this reliability feature.
The only event processor in the AlwaysOn reference implementation is the **BackgroundProcessor** service which captures and processes events from all Event Hubs partitions.
The only event processor in the Azure Mission-Critical reference implementation is the **BackgroundProcessor** service which captures and processes events from all Event Hubs partitions.
Every message needs to contain the `action` metadata property which directs the route of processing:
@ -125,5 +125,4 @@ See [BackgroundProcessor](/src/app/AlwaysOn.BackgroundProcessor/README.md) for m
> **Note** - A messaging queue is not intended to be used as a persistent data store for an long periods of time. Event Hubs supports [Capture feature](https://docs.microsoft.com/azure/event-hubs/event-hubs-capture-enable-through-portal) which enables an Event Hub to automatically write a copy of messages to a linked Azure Storage account. This keeps utilization of an Event Hubs queue in-check but it also serves as a mechanism to backup messages.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,18 +1,18 @@
# DevOps Design Decisions
# Source Code Repository
## Source Code Repository
**GitHub** was the clear choice for the AlwaysOn reference implementation as it is the leading code sharing platform in terms of Git repositories.
**GitHub** was the clear choice for the Azure Mission-Critical reference implementations as it is the leading code sharing platform in terms of Git repositories.
## CI/CD pipelines
**Azure Pipelines**. This part of the Azure DevOps (ADO) service is being used by AlwaysOn for all build, test and release tasks. It is a well proven and feature rich tool set that is used in many organizations, both when targeting Azure and even when not targeting Azure as the deployment environment.
**Azure Pipelines**. This part of the Azure DevOps (ADO) service is being used by Azure Mission-Critical for all build, test and release tasks. It is a well proven and feature rich tool set that is used in many organizations, both when targeting Azure and even when not targeting Azure as the deployment environment.
GitHub Actions was considered instead of ADO and for build-related tasks (CI) it would have worked equally well - with the added benefit that source code and pipeline would have lived in the same place. However, Azure Pipelines were chosen because of richer Continuous Deployment (CD) capabilities. It is expected that GitHub Actions will reach parity with ADO in the future, but for now, ADO is the best choice.
**Build Agents**. The foundational reference implementation of AlwaysOn uses Microsoft Hosted build agents as this removes any management burden on the developers to maintain and update the build agent whilst also making start up times for build jobs quicker. The exception is when using [connected mode](https://github.com/Azure/AlwaysOn-Foundational-Connected) of the Reference Implementation, which does require the use of self-hosted Build Agents.
**Build Agents**. The online reference implementation of Azure Mission-Critical uses Microsoft Hosted build agents as this removes any management burden on the developers to maintain and update the build agent whilst also making start up times for build jobs quicker. The exception is when using the [connected](https://github.com/Azure/Mission-Critical-Connected) version of the Azure Mission-Critical reference implementation, which does require the use of self-hosted Build Agents.
See [DevOps Pipelines](/.ado/pipelines/README.md) for more details about the concrete pipeline implementation.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,10 +1,10 @@
# Zero-downtime Update Strategy
*"How to deploy updates to AlwaysOn without causing any downtime?"*
*"How to deploy updates to Azure Mission-Critical without causing any downtime?"*
## High-level overview
In short, the update process for AlwaysOn is that any update, no matter whether infrastructure or application-related, is deployed on fully independent stamps called **release units**. Only the globally shared infrastructure components such as Front Door, Cosmos DB and Container Registry are shared across release units.
In short, the update process for Azure Mission-Critical is that any update, no matter whether infrastructure or application-related, is deployed on fully independent stamps called **release units**. Only the globally shared infrastructure components such as Front Door, Cosmos DB and Container Registry are shared across release units.
This means that for any update, existing stamps are not touched but instead completely new stamps (as many as currently existing) are deployed and that the new application version will only be deployed to these new stamps. Then, these new stamps are added to the global load balancer (Azure Front Door) and traffic is gradually moved over to the new stamps (i.e. blue/green approach). Once all traffic is served from the new release unit with no issues, the previous release units are deleted.
@ -20,12 +20,12 @@ The following diagram is a snapshot of the Azure DevOps deployment pipeline for
## Infrastructure vs. Application-level updates
There are two main parts involved in the AlwaysOn reference implementation:
There are two main parts involved in the Azure Mission-Critical reference implementation:
1. Underlying infrastructure. This is mostly deployed using Terraform and its associated configuration.
1. Application. This on top, which is based on Docker containers and, for the UI, npm-built artifacts (HTML and JavaScript).
In many customer systems there is an assumption that application updates are more frequent than infrastructure updates, and, as such, there are different update procedures for each. Within a public cloud infrastructure, these changes can happen at a much faster pace and it is this and the rate of change on the Azure platform that led AlwaysOn to utilize only one deployment process whether application or infrastructure. This allows:
In many customer systems there is an assumption that application updates are more frequent than infrastructure updates, and, as such, there are different update procedures for each. Within a public cloud infrastructure, these changes can happen at a much faster pace and it is this and the rate of change on the Azure platform that led Azure Mission-Critical to utilize only one deployment process whether application or infrastructure. This allows:
- **One consistent process.** This mean less chances for mistakes if changes in both infrastructure and application get mixed together within a release (whether intentional or not).
- **Enables proper blue/green deployment** for every update utilizing a gradual migration of traffic to the new release.
@ -35,7 +35,7 @@ In many customer systems there is an assumption that application updates are mor
## Branching strategy
A foundation of the AlwayOn update strategy is around how branches are used in the Git repository. AlwaysOn uses 3 types of branches:
A foundation of the Azure Mission-Critical update strategy is around how branches are used in the Git repository. Azure Mission-Critical uses 3 types of branches:
- **`feature/*` and `fix/*` branches**
- These are the entry points for any change. They are created by developers and should be named something like `feature/catalog-update` or `fix/worker-timeout-bug`. Once changes are ready to be merged, a pull request (PR) against the `main` branch needs to be created. Every PR needs to be approved by at least one reviewer. With very few exceptions, every change that is proposed in a PR must run through the E2E (end-to-end) validation pipeline. The E2E pipeline can – and should – also be used by developers to test and debug their changes on a complete environment. For this, the E2E pipeline can be executed without the destroy step at the end. The environment can then live for a longer period of time with new updates to the branch quickly getting released to it.
@ -56,7 +56,7 @@ For this to work without major issues, it is important that the hotfix consists
# Environments
As already described in the previous section, AlwaysOn uses two types of environment: short-lived and permanent.
As already described in the previous section, Azure Mission-Critical uses two types of environment: short-lived and permanent.
## Short-lived
@ -64,7 +64,7 @@ These environments are deployed using the E2E validation pipeline. They are eith
## Permanent
These are `integration` (`int`) and `production` (`prod`) environments and live continuously i.e. not destroyed. They also use fixed domain names like *int.always-on.app*. In a real-world scenario, customers would probably also add a `staging` (or "pre-prod") environment. This would be used to deploy and validate `release/*` branches with the same update process as in `prod` (i.e. blue/green deployment). AlwaysOn does not have a staging environment simply for cost reasons.
These are `integration` (`int`) and `production` (`prod`) environments and live continuously i.e. not destroyed. They also use fixed domain names like *int.always-on.app*. In a real-world scenario, customers would probably also add a `staging` (or "pre-prod") environment. This would be used to deploy and validate `release/*` branches with the same update process as in `prod` (i.e. blue/green deployment). Azure Mission-Critical does not have a staging environment simply for cost reasons.
### Integration (int)
@ -76,7 +76,7 @@ These are `integration` (`int`) and `production` (`prod`) environments and live
## Shared and dedicated resources
It is important to understand the different types of resources that exist in the AlwaysOn deployment for the permanent environments (`int` and `prod`). These are either globally shared resources or dedicated to a particular release and exist only until the next release unit has taken over.
It is important to understand the different types of resources that exist in the Azure Mission-Critical deployment for the permanent environments (`int` and `prod`). These are either globally shared resources or dedicated to a particular release and exist only until the next release unit has taken over.
### Globally shared resources
@ -160,4 +160,4 @@ This update strategy comes with a couple of inherent disadvantages and risks whi
This update strategy can support multiple versions of an API and worker components running at the same time. Since the Cosmos DB is shared between the two or more versions, there is the possibility that data elements changed by one version may not always match the version of the API or worker consuming it. To allow for this, the API layers and workers must implement forward compatibility design characteristics. To accomplish this, earlier versions of the API or worker components can process data that was inserted by later API or worker component versions, ignoring any parts it does not understand.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,8 +1,8 @@
# Failure Injection Testing
Based on the [Failure Analysis](./Health-Failure-Analysis.md), the AlwaysOn team performed some manual failure injection testing (also known as "Chaos Testing" or "Chaos Monkey testing"). This article shares some learnings around what was tested and how this informed the development of the solution.
Based on the [Failure Analysis](./Health-Failure-Analysis.md), the Azure Mission-Critical team performed some manual failure injection testing (also known as "Chaos Testing" or "Chaos Monkey testing"). This article shares some learnings around what was tested and how this informed the development of the solution.
When the AlwaysOn project started, no automated Failure Injection Testing was implemented and a series of manual testing was performed which provided a lot of valuable insights.
When the Azure Mission-Critical project started, no automated Failure Injection Testing was implemented and a series of manual testing was performed which provided a lot of valuable insights.
All tests were performed in an E2E validation environment so that fully representative tests could be conducted without any risk of interference from other environments. Most of the failures can be observed directly in the Application Insights [Live metrics](https://docs.microsoft.com/azure/azure-monitor/app/live-stream) view - and a few minutes later in the Failures view and corresponding log tables. Other failures need deeper debugging such as the use of `kubectl` to observe the behavior inside of AKS.
@ -10,7 +10,7 @@ All tests were performed in an E2E validation environment so that fully represen
DNS failure injection is a good test case since it can simulate multiple issues. Firstly it simulates the case when the DNS resolution fails, for instance because Azure DNS experiences an issue ,but it can also help to simulate general connection issues between a client and a service, for example when the BackgroundProcessor cannot connect to the Event Hub.
In single-host scenarios you can simply modify the local `hosts` file to overwrite DNS resolution. In a larger system with multiple dynamic servers like AKS, this is not feasible. However, we can use [Azure Private DNS Zones](https://docs.microsoft.com/azure/dns/private-dns-privatednszone) as an alternative (See the Event Hubs example below for a configuration walk-through).
In single-host scenarios you can simply modify the local `hosts` file to overwrite DNS resolution. In a larger system with multiple dynamic servers like AKS, this is not feasible. However, we can use [Azure Private DNS Zones](https://docs.microsoft.com/azure/dns/private-dns-privatednszone) as an alternative (See the Event Hubs example below for a configuration walk-through).
### Event Hub
@ -28,7 +28,7 @@ As this retry and failover logic in the SDK takes about 2 minutes, the Health Se
## Firewall blocking
Most Azure services support firewall access restrictions based on VNets and/or IP addresses. In AlwaysOn these are already used to restrict access, for instance, to Cosmos DB or Event Hub. Blocking access by removing existing Allow rules or adding new Block rules is a straightforward test. This can serve to simulate firewall misconfigurations but also actual service outages. Note that similar to above, existing established connections might continue to work for a period before they start to fail.
Most Azure services support firewall access restrictions based on VNets and/or IP addresses. In Azure Mission-Critical these are already used to restrict access, for instance, to Cosmos DB or Event Hub. Blocking access by removing existing Allow rules or adding new Block rules is a straightforward test. This can serve to simulate firewall misconfigurations but also actual service outages. Note that similar to above, existing established connections might continue to work for a period before they start to fail.
### Key Vault
@ -63,4 +63,5 @@ From there on there were still several components in place which could have been
Overall, this particular failure injection showed that even for a skilled operations team it can be quite challenging to detect (and then attempt to fix) the root cause of an issue in a distributed system.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -2,25 +2,25 @@
The [Enterprise-Scale architecture](https://github.com/azure/enterprise-scale) provides prescriptive guidance coupled with Azure best practices, and it follows design principles across the critical design areas for organizations to define their Azure architecture.
It is crucial to understand and identify in which connectivity scenario an AlwaysOn application will be used and deployed. Enterprise-Scale supports different landing zones separated into different Management Group scopes.
It is crucial to understand and identify in which connectivity scenario an Azure Mission-Critical application will be used and deployed. Enterprise-Scale supports different landing zones separated into different Management Group scopes.
This scope will define the guardrails with Azure Policy and RBAC plus will provide several shared services from which an AlwaysOn application will benefit. DNS, Routes (UDR), VNet and its configuration are the most common services that will be provided from a central platform team (NetOps).
Organizations require centralized platform logging and monitoring capabilities that provides a holistic view for Operation (Ops) and Security (SecOps) teams. AlwaysOn leverages the central Management subscription recommended by Enterprise-scale landing zone and sends, enforced by Azure Policy, the required logs to the Log Analytics Workspace.
This scope will define the guardrails with Azure Policy and RBAC plus will provide several shared services from which an Azure Mission-Critical application will benefit. DNS, Routes (UDR), VNet and its configuration are the most common services that will be provided from a central platform team (NetOps).
Organizations require centralized platform logging and monitoring capabilities that provides a holistic view for Operation (Ops) and Security (SecOps) teams. Azure Mission-Critical leverages the central Management subscription recommended by Enterprise-scale landing zone and sends, enforced by Azure Policy, the required logs to the Log Analytics Workspace.
The three most common deployment scenarios are:
- Public application endpoint **without** corporate network connectivity. (online)
- Public application endpoint **with** corporate network connectivity (management and backend service connectivity). (corp)
- Private application endpoint **without** public connectivity. (corp)
This diagram visualizes the relationship and dependency an AlwaysOn application can take on Enterprise-Scale landing zone.
This diagram visualizes the relationship and dependency an Azure Mission-Critical application can take on Enterprise-Scale landing zone.
![AlwaysOn - ESLZ dependency](/docs/media/AlwaysOn-ESLZ.gif "ESLZ dependency")
![Azure Mission-Critical - ESLZ dependency](/docs/media/mission-critical-landing-zones.gif "ESLZ dependency")
> Note: The AlwaysOn reference implementation is aligned with the Enterprise-Scale architecture and was successfully deployed and validated in an "online" landing zone (subscription).
> Note: The Azure Mission-Critical reference implementation is aligned with the Enterprise-Scale architecture and was successfully deployed and validated in an "online" landing zone (subscription).
See [Enterprise-Scale](https://github.com/Azure/Enterprise-Scale/) and [Enterprise-Scale design principles](https://github.com/Azure/Enterprise-Scale/wiki/How-Enterprise-Scale-Works#enterprise-scale-design-principles) for more information.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,12 +1,13 @@
# Frequently Asked Questions (FAQ)
In this section we document all the FAQ related to this reference implementation. There is another FAQ section covering general and non reference related questions in the [AlwaysOn repo](https://github.com/Azure/AlwaysOn/blob/docs/main/docs/FAQ.md) (➡️ `Azure/AlwaysOn`).
In this section we document all the FAQ related to this reference implementation. There is another FAQ section covering general and non reference related questions in the [Azure Mission-Critical repo](https://github.com/Azure/Mission-Critical/blob/docs/main/docs/FAQ.md) (➡️ `Azure/Mission-Critical`).
## General
> Why is the reference implementation called *foundational-online* and when should I use it?
> Why is the reference implementation called *online* and when should I use it?
AlwaysOn can be leveraged in different scenarios and the reference implementation represent a customer implementation pattern and scenario. You should use the *foundational-online* reference implementation for a publicly exposed application without connectivity requirement to your on-premises data center. There is almost no pre-requisite required and the pipeline deploys all the required resources into an Azure Subscription.
Azure Mission-Critical can be leveraged in different scenarios and the reference implementation represent a customer implementation pattern and scenario. You should use the *online* reference implementation for a publicly exposed application without connectivity requirement to your on-premises data center. There is almost no pre-requisite required and the pipeline deploys all the required resources into an Azure Subscription.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,20 +1,20 @@
# Getting started
This step-by-step guide describes the process to deploy AlwaysOn in your own environment from the beginning. At the end of this guide you will have an Azure DevOps organization and project set up to deploy a copy of AlwaysOn into an Azure Subscription.
This step-by-step guide describes the process to deploy Azure Mission-Critical in your own environment from the beginning. At the end of this guide you will have an Azure DevOps organization and project set up to deploy a copy of the Azure Mission-Critical reference implementation into an Azure Subscription.
## How to deploy?
The AlwaysOn project is using a GitHub repository for version control of code artifacts and manifest files. The project leverages Azure DevOps Pipelines for build and deployment (CI/CD) pipelines.
The Azure Mission-Critical project is using a GitHub repository for version control of code artifacts and manifest files. The project leverages Azure DevOps Pipelines for build and deployment (CI/CD) pipelines.
> Instead of GitHub also other Git-based repositories can be used, such as *Azure DevOps Repos*.
All relevant code artifacts and manifest files are stored in this GitHub repository and can easily be forked into your own account or organization.
This guide describes the end-to-end process for setting up all pre-requisites and dependencies before deploying AlwaysOn into an Azure subscription of your choice.
This guide describes the end-to-end process for setting up all pre-requisites and dependencies before deploying Azure Mission-Critical into an Azure subscription of your choice.
## Pre-requisites
The following must be installed on the client machine used to deploy AlwaysOn reference implementation:
The following must be installed on the client machine used to deploy Azure Mission-Critical reference implementation:
- [Azure CLI](https://docs.microsoft.com/cli/azure/service-page/azure%20cli?view=azure-cli-latest)
@ -25,10 +25,10 @@ This guide offers two paths: Using Azure DevOps Portal or script-based via Azure
## Overview
The process to deploy AlwaysOn is comprised of the following steps:
The process to deploy Azure Mission-Critical is comprised of the following steps:
1) Create an [Azure DevOps organization and project](#create-a-new-azure-devops-project)
1) Generate your own repository based on the [AlwaysOn GitHub template](https://github.com/Azure/AlwaysOn-Foundational-Online/generate) repository
1) Generate your own repository based on the [Azure Mission-Critical GitHub template](https://github.com/Azure/Mission-Critical-Online/generate) repository
1) Import [deployment pipelines](#3-import-deployment-pipelines)
1) Create [Service Principals](#4-create-azure-service-principal) for each individual Azure subscription
1) Create [Service Connections](#5-create-azure-service-connections) in Azure DevOps
@ -40,7 +40,7 @@ The process to deploy AlwaysOn is comprised of the following steps:
### 1) Create a new Azure DevOps organization and project
To deploy AlwaysOn, you need to create a new Azure DevOps organization, or re-use an existing one. In this organization you will then create a new project used to host all pipelines for AlwaysOn.
To deploy the Azure Mission-Critical reference implementation, you need to create a new Azure DevOps organization, or re-use an existing one. In this organization you will then create a new project used to host all pipelines for Azure Mission-Critical.
- [Create an organization or project collection](https://docs.microsoft.com/azure/devops/organizations/accounts/create-organization?view=azure-devops)
@ -72,11 +72,11 @@ For all the subsequent tasks done via `az devops` or `az pipelines` the context
az devops configure --defaults organization=https://dev.azure.com/<your-org> project=<your-project>
```
### 2) Generate your own repository based on the AlwaysOn GitHub template
### 2) Generate your own repository based on the Azure Mission-Critical GitHub template
Azure DevOps Repos would allow us to import the AlwaysOn GitHub repository into Azure DevOps as well. For this guide we have decided to generate our own repository based on the template on GitHub and use it from there.
Azure DevOps Repos would allow us to import the Azure Mission-Critical reference implementation GitHub repository into Azure DevOps as well. For this guide we have decided to generate our own repository based on the template on GitHub and use it from there.
Go to the root of the AlwaysOn repository on GitHub and click on "Use this template" in the top right corner:
Go to the root of the Azure Mission-Critical reference implementation repository on GitHub and click on "Use this template" in the top right corner:
![Use GitHub Repo template](/docs/media/AlwaysOnGettingStarted2Fork.png)
@ -194,7 +194,7 @@ More information about the required permissions needed to deploy via Terraform c
### 5) Create Azure Service Connections
Our AlwaysOn reference implementation knows three different environments: prod, int and e2e. These three environments can be selected for each individual pipeline run and can refer to the same or different (recommended) Azure subscriptions for proper separation. These environments are represented by service connections in Azure DevOps:
Our Azure Mission-Critical reference implementation knows three different environments: prod, int and e2e. These three environments can be selected for each individual pipeline run and can refer to the same or different (recommended) Azure subscriptions for proper separation. These environments are represented by service connections in Azure DevOps:
> **Important!** Since these connection names are used in pipelines, use them exactly as specified above. If you change the name of the service connection, you have to also change it in pipeline YAML.
@ -336,4 +336,5 @@ With the completion of at least one deployment pipeline it is now a good time to
- Detailed information about the infrastructure layer - [Terraform documentation](/src/infra/workload/README.md#get-started).
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -65,4 +65,4 @@ Alerts based on the data stored in a Log Analytics workspace can be created usin
To demonstrate their setup and usage, a query-based alert on Application Insights is configured as part of the infrastructure deployment within each stamp ([/src/infra/workload/releaseunit/modules/stamp/alerts.tf](/src/infra/workload/releaseunit/modules/stamp/alerts.tf)). It looks at the number of responses sent by the CatalogService which start with a 5xx status code. If those exceed the set threshold within a 5 minute window, it will fire an alert.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,10 +1,10 @@
# Failure analysis
*"What does it take for AlwaysOn to go down?"*
*"What does it take for Azure Mission-Critical to go down?"*
This article walks through a number of possible failure scenarios of the various components of the AlwaysOn reference implementation. It does not claim to be complete since there can always be failure cases which we have not thought of yet. So for any workload, this list should be a living document that gets updated over time.
This article walks through a number of possible failure scenarios of the various components of the Azure Mission-Critical reference implementation. It does not claim to be complete since there can always be failure cases which we have not thought of yet. So for any workload, this list should be a living document that gets updated over time.
Composing the failure analysis is mostly a theoretical planning exercise. It can - and should - be complemented by actual failure injection testing. Through testing, at least some of the failure cases and their impact can be simulated and thus validate the theoretical analysis. See [the related article](./DeployAndTest-Testing-FailureInjection.md) for failure injection testing that was done as part of AlwaysOn.
Composing the failure analysis is mostly a theoretical planning exercise. It can - and should - be complemented by actual failure injection testing. Through testing, at least some of the failure cases and their impact can be simulated and thus validate the theoretical analysis. See [the related article](./DeployAndTest-Testing-FailureInjection.md) for failure injection testing that was done as part of Azure Mission-Critical.
## Outage risks of individual components
@ -37,7 +37,7 @@ Global replication protects Cosmos DB instances from regional outage. The Cosmos
| **Risk** | **Impact/Mitigation/Comment** | **Outage** |
| -------------------------------------------- | ------------------------------------------------------------ | ------------------------------------------- |
| **Database/collection is renamed** | Can happen due to mismatch in configuration when deploying – Terraform would overwrite the whole database, which could result in data loss (this can be prevented by using [database/collection level locks](https://feedback.azure.com/forums/263030-azure-cosmos-db/suggestions/35535298-enable-locks-at-database-and-collection-level-as-w)). <br />**Application will not be able to access any data**. App configuration needs to be updated and pods restarted. | Yes |
| **Regional outage** | AlwaysOn has multi-region writes enabled, so in case of failure on read or write, the **client retries the current operation** and all the future operations are permanently [routed to the next region](https://docs.microsoft.com/azure/cosmos-db/troubleshoot-sdk-availability#regional-outage) in order of preference. In case the preference list only had one entry (or was empty) but the account has other regions available, it will route to the next region in the account list. | No |
| **Regional outage** | Azure Mission-Critical has multi-region writes enabled, so in case of failure on read or write, the **client retries the current operation** and all the future operations are permanently [routed to the next region](https://docs.microsoft.com/azure/cosmos-db/troubleshoot-sdk-availability#regional-outage) in order of preference. In case the preference list only had one entry (or was empty) but the account has other regions available, it will route to the next region in the account list. | No |
| **Extensive throttling due to lack of RUs** | Depending on how we decide on how many RUs (max setting for the auto scaler), we want to deploy and what load balancing we employ on Front Door level, it could be that certain stamp(s) run hot on Cosmos utilization while others could still serve more requests. <br />Could be mitigated by better load distribution to more stamps – or of course more RUs. | No |
### Container Registry
@ -90,4 +90,4 @@ Global replication protects Cosmos DB instances from regional outage. The Cosmos
| **Expired credentials (globally shared resource)** | If, for example, Cosmos DB API key was changed without properly updating it in all stamp Key Vaults so that the pods can use them, the respective application components will start to fail. **This would likely bring all stamps down at about the same time and cause an workload-wide outage.** See the article on [Key Rotation](./OpProcedures-KeyRotation.md) for an example walkthrough how to execute this process properly without downtime. For a possible way around the need for keys and secrets in the first place using AAD auth, see the previous item. | Full |
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -9,7 +9,7 @@
---
AlwaysOn is using [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/logs/log-analytics-overview) as a central store for logs and metrics for all application and infrastructure components and [Azure Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview) for all application monitoring data. Each stamp has its own, dedicated Log Analytics Workspace and App Insights instance. Next to those is one Log Analytics Workspace for the globally shared resources such as Front Door and Cosmos DB.
Azure Mission-Critical is using [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/logs/log-analytics-overview) as a central store for logs and metrics for all application and infrastructure components and [Azure Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview) for all application monitoring data. Each stamp has its own, dedicated Log Analytics Workspace and App Insights instance. Next to those is one Log Analytics Workspace for the globally shared resources such as Front Door and Cosmos DB.
![Monitoring overview](/docs/media/MonitoringOverview.png)
@ -19,7 +19,7 @@ As all stamps are short-lived and continuously replaced with each new release (s
### Diagnostic settings
All Azure services used for AlwaysOn are configured to send all their Diagnostic data including logs and metrics to the deployment specific (global or stamp) Log Analytics Workspace. This happens automatically as part of the [Terraform](/src/infra/README.md#infrastructure) deployment. New options will be identified automatically and added as part of `terraform apply`.
All Azure services used for Azure Mission-Critical are configured to send all their Diagnostic data including logs and metrics to the deployment specific (global or stamp) Log Analytics Workspace. This happens automatically as part of the [Terraform](/src/infra/README.md#infrastructure) deployment. New options will be identified automatically and added as part of `terraform apply`.
![Diagnostic Settings](/docs/media/Monitoring1DiagnosticSettings.png)
@ -51,7 +51,7 @@ To monitor the availability of the individual stamps and the overall solution fr
## Queries
AlwaysOn uses different Kusto Query Language (KQL) queries to implement complex, custom queries as functions to retrieve data from Log Analytics. These queries are stored as individual files in the `/src/infra/monitoring/queries` directory (separated into global and stamp) and are imported and applied automatically via Terraform as part of each infrastructure pipeline run.
Azure Mission-Critical uses different Kusto Query Language (KQL) queries to implement complex, custom queries as functions to retrieve data from Log Analytics. These queries are stored as individual files in the `/src/infra/monitoring/queries` directory (separated into global and stamp) and are imported and applied automatically via Terraform as part of each infrastructure pipeline run.
This approach separates the query logic from the visualization layer. It allows us to call these functions individually and use them either directly to retrieve data from Log Analytics or to visualize the results in Azure Dashboards, Azure Monitor Workbooks or 3rd-Party dashboarding solutions like Grafana.
@ -65,7 +65,7 @@ This result provides a granular overview about the cluster's health status based
## Visualization
The Visualization of the Kusto [Queries](#Queries) described above was implemented using Grafana. Grafana is used to show the results of Log Analytics queries and does not contain any logic itself. The Grafana stack is not part of the solution's deployment lifecycle, but released separately. For a detailed description of the Grafana deployment for AlwaysOn, please refer to the [Grafana README](/src/infra/monitoring/grafana/README.md).
The Visualization of the Kusto [Queries](#Queries) described above was implemented using Grafana. Grafana is used to show the results of Log Analytics queries and does not contain any logic itself. The Grafana stack is not part of the solution's deployment lifecycle, but released separately. For a detailed description of the Grafana deployment for Azure Mission-Critical, please refer to the [Grafana README](/src/infra/monitoring/grafana/README.md).
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,6 +1,6 @@
# Custom Domain support
AlwaysOn fully supports the use of custom domain names e.g. `contoso.com`. In the [Terraform reference implementation](/src/infra/workload/README.md), custom domains can be optionally used for both `int` and `prod` environments. For E2E environments, custom domains can also be added, however, it was decided not to use custom domain names in the reference implementation owing to the short-lived nature of E2E coupled with the increased deployment time when using custom domains with the encompassing SSL certificate in Front Door.
The Azure Mission-Critical online reference implementation fully supports the use of custom domain names e.g. `contoso.com`. In the [Terraform reference implementation](/src/infra/workload/README.md), custom domains can be optionally used for both `int` and `prod` environments. For E2E environments, custom domains can also be added, however, it was decided not to use custom domain names in the reference implementation owing to the short-lived nature of E2E coupled with the increased deployment time when using custom domains with the encompassing SSL certificate in Front Door.
To enable full automation of the deployment, the custom domain is expected to be managed through an Azure DNS Zone. The infrastructure deployment pipeline dynamically creates CNAME records in the Azure DNS zone and maps these automatically to the Azure Front Door instance. Azure DNS zone also enables the Front Door-managed SSL certificates so that there is no need for manual certificate renewals on Front Door.
@ -11,4 +11,4 @@ Environments which are not provisioned with custom domains can be accessed throu
> Note: On the cluster ingress controllers, custom domains are not used in either case; instead an Azure-provided DNS name such as _[prefix]-cluster.[region].cloudapp.azure.com_ is used with Let's Encrypt enabled to issue free SSL certificates for those endpoints.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -4,13 +4,13 @@
- Each stamp uses its own Virtual Network (VNet) and as there is no cross-stamp traffic, no VNet peerings or VPN connections to other stamps are required.
- The per-stamp VNet is split into two subnets for Kubernetes (containing all nodes and pods) and private endpoints.
- Private endpoints (Private Link) are only partially used as ingress traffic comes from the public internet and egress traffic control (to mitigate the risk of data exfiltration) is not within scope of AlwaysOn.
- Private endpoints (Private Link) are only partially used as ingress traffic comes from the public internet and egress traffic control (to mitigate the risk of data exfiltration) is not within scope of Azure Mission-Critical.
## Global load balancer
**Azure Front Door** (AFD) is used as the global entry point for all incoming client traffic. As AlwaysOn only uses HTTP(S) traffic and uses Web Application Firewall (WAF) capabilities, AFD is the best choice to act as global load balancer. Azure Traffic Manager could be a cost-effective alternative, but it does not have features such as WAF and because it is DNS-based, Azure Traffic Manager usually has longer failover times compared to the TCP Anycast-based Azure Front Door.
**Azure Front Door** (AFD) is used as the global entry point for all incoming client traffic. As Azure Mission-Critical only uses HTTP(S) traffic and uses Web Application Firewall (WAF) capabilities, AFD is the best choice to act as global load balancer. Azure Traffic Manager could be a cost-effective alternative, but it does not have features such as WAF and because it is DNS-based, Azure Traffic Manager usually has longer failover times compared to the TCP Anycast-based Azure Front Door.
See [Custom Domain Support](./Networking-Custom-Domains.md) for more details about the implementation and usage of custom domain names in AlwaysOn.
See [Custom Domain Support](./Networking-Custom-Domains.md) for more details about the implementation and usage of custom domain names in Azure Mission-Critical.
## Stamp ingress point
@ -20,7 +20,7 @@ See [Custom Domain Support](./Networking-Custom-Domains.md) for more details abo
- Web Application Firewall (WAF) is provided as part of Azure Front Door.
- TLS termination happens on the ingress controller and thus inside the cluster.
- Using cert-manager, the procurement and renewal of SSL certificates is free of charge (with Let's Encrypt) and does not require additional processes or components.
- AlwaysOn does not have a requirement for the AKS cluster to only run on a private VNet and therefore, having a public Load Balancer in front is acceptable.
- Azure Mission-Critical does not have a requirement for the AKS cluster to only run on a private VNet and therefore, having a public Load Balancer in front is acceptable.
- (Auto-)Scaling of the ingress controller pods inside AKS is usually faster than scaling out Application Gateway to more instances.
- Configuration settings including path-based routing and HTTP header checks could potentially be easier to set up using Application Gateway. However, Nginx provides all the required features and is configured through Helm charts.
@ -35,13 +35,13 @@ See [Custom Domain Support](./Networking-Custom-Domains.md) for more details abo
## Considerations on not using fully private clusters as the default deployment mode
The main motivation of AlwaysOn is to build a highly reliable solution on Azure.
The main motivation of Azure Mission-Critical is to build a highly reliable solution on Azure.
The default version of the Reference Implementation of AlwaysOn does not use [fully private compute clusters](https://docs.microsoft.com/azure/aks/private-clusters) and does not fully lock down traffic for all Azure PaaS services.
The default version of the Reference Implementation of Azure Mission-Critical does not use [fully private compute clusters](https://docs.microsoft.com/azure/aks/private-clusters) and does not fully lock down traffic for all Azure PaaS services.
These decisions are explained further below:
> It is acknowledged that these decisions might not suit every use case, for instance in some regulated industries. Therefore, there is an alternative version of the Reference Implementation which deploys in a [Private Mode](https://github.com/Azure/AlwaysOn-foundational-private). However, this comes potentially at the expense of higher cost and reliability risk. Thus, the requirements and impact should be fully understood before making the switch.
> It is acknowledged that these decisions might not suit every use case, for instance in some regulated industries. Therefore, there is an alternative version of the Reference Implementation which deploys in a [Private Mode](https://github.com/Azure/Mission-Critical-Connected). However, this comes potentially at the expense of higher cost and reliability risk. Thus, the requirements and impact should be fully understood before making the switch.
### Public compute cluster endpoint
@ -55,8 +55,8 @@ These decisions are explained further below:
- The Reference Implementation uses Private Endpoints to access all PaaS instead of relying on Service Endpoints only. This has two reasons:
- Due to some limitations in the way the infrastructure gets deployed through Terraform, Service Endpoints could not be used for all services, so Private Endpoints would have been required at least partially in any case.
- The [Private Mode of AlwaysOn](https://github.com/Azure/AlwaysOn-foundational-private) requires the use of Private Endpoints for all used services. So using them also for the default, public mode, brings consistency and simplification in the deployment.
- One of the [main benefits](https://docs.microsoft.com/azure/private-link/private-link-overview#key-benefits) of using Private Endpoints is the protection against data leakage. However, this was not determined to be a priority requirement for public internet-facing applications like AlwaysOn. Similarly, we do not foresee the requirement to connect to AlwaysOn resources from on-prem networks (or otherwise connected via VPN etc).
- The [Private Mode of Azure Mission-Critical](https://github.com/Azure/Mission-Critical-Connected) requires the use of Private Endpoints for all used services. So using them also for the default, public mode, brings consistency and simplification in the deployment.
- One of the [main benefits](https://docs.microsoft.com/azure/private-link/private-link-overview#key-benefits) of using Private Endpoints is the protection against data leakage. However, this was not determined to be a priority requirement for public internet-facing applications like Azure Mission-Critical. Similarly, we do not foresee the requirement to connect to Azure Mission-Critical resources from on-prem networks (or otherwise connected via VPN etc).
### Requirements to utilize a fully private cluster
@ -64,7 +64,8 @@ As described above, to remove the public endpoint on the compute clusters, anoth
A more significant change if using private endpoints is the switch from hosted Build Agents (managed by Microsoft) to [self-hosted agents](https://docs.microsoft.com/azure/devops/pipelines/agents/agents?view=azure-devops&tabs=browser#install) which will need to be VNet-integrated in order to reach private services like Key Vault or AKS. Managing these agents and keeping them updated adds additional overhead and is not recommended as long as there is no actual requirement to switch to a fully private deployment.
To deploy Reference Implementation in a private configuration, follow the guides of [this GitHub repository](https://github.com/Azure/AlwaysOn-foundational-private).
To deploy Reference Implementation in a private configuration, follow the guides of [this GitHub repository](https://github.com/Azure/Mission-Critical-Connected).
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -2,11 +2,11 @@
Rotating (renewing) keys/secrets should be a standard procedure in any workload. Secrets might need to be changed on short notice after being exposed or regularly as a good security practice.
As expired or invalid secrets can cause outages to the application (see [Failure Analysis](./Health-Failure-Analysis.md#stamp-application)), it is important to have a clearly defined and proven process in place. For AlwaysOn, rotating secrets of stamp resources, such as Event Hub access keys, are not a significant concern as the stamps are expected to be live a few weeks at most. Also, even if secrets in one stamp expire, this would not bring down the whole application.
As expired or invalid secrets can cause outages to the application (see [Failure Analysis](./Health-Failure-Analysis.md#stamp-application)), it is important to have a clearly defined and proven process in place. For Azure Mission-Critical, rotating secrets of stamp resources, such as Event Hub access keys, are not a significant concern as the stamps are expected to be live a few weeks at most. Also, even if secrets in one stamp expire, this would not bring down the whole application.
Management of secrets to access long-living global resources, however, are critical, notably the Cosmos DB API keys. If these expire it is likely that all stamps will be affected simultaneously and cause a complete outage of the application.
AlwaysOn tested and documented the approach for how to rotate the keys for Cosmos DB without causing downtime and this is detailed below:
Azure Mission-Critical tested and documented the approach for how to rotate the keys for Cosmos DB without causing downtime and this is detailed below:
## Cosmos DB Key Rotation Walkthrough
@ -29,4 +29,4 @@ AlwaysOn tested and documented the approach for how to rotate the keys for Cosmo
1) Finally, the Terraform template should be changed back to use the primary key again for future deployments; if not, we can continue to use the secondary key and switch back to the primary key when we need to renew the secondary key in the future.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,6 +1,6 @@
# Operational Procedures
While the reference implementation of AlwaysOn only serves as a demonstration and thus is not really run in production, there are a couple of operational procedures that are lined out in this article, which are still relevant.
While the reference implementation of Azure Mission-Critical only serves as a demonstration and thus is not really run in production, there are a couple of operational procedures that are lined out in this article, which are still relevant.
- [General debugging / issue investigation](#general-debugging--issue-investigation)
- [Transient Pipeline Failures](#transient-pipeline-failures)
@ -72,4 +72,4 @@ In Azure DevOps a notification subscription can, for example, look like this (Pr
The topic of Key / Secret rotation is covered in a [separate article](./OpProcedures-KeyRotation.md).
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,10 +1,10 @@
# AlwaysOn - Reference Implementation - Solution Guide
# Azure Mission-Critical - Reference Implementation - Solution Guide
As outlined in the [AlwaysOn introduction](https://github.com/Azure/AlwaysOn/blob/main/docs/introduction/README.md) (➡️ `Azure/AlwaysOn`), AlwaysOn has been developed to help customers with business critical systems to design and build a best practice Azure based solution that maximizes reliability. AlwaysOn does this by giving customers prescriptive and opinionated guidance on how to build this best practice system as well as providing production ready technical artifacts for customers to quickly build that best practice system in their own environment.
As outlined in the [Azure Mission-Critical introduction](https://github.com/Azure/Mission-Critical/blob/main/docs/introduction/README.md) (➡️ `Azure/Mission-Critical`), Azure Mission-Critical has been developed to help customers with business critical systems to design and build a best practice Azure based solution that maximizes reliability. Azure Mission-Critical does this by giving customers prescriptive and opinionated guidance on how to build this best practice system as well as providing production ready technical artifacts for customers to quickly build that best practice system in their own environment.
Where the AlwaysOn [Design Principles](https://github.com/Azure/AlwaysOn/blob/main/docs/design-methodology/Principles.md) (➡️ `Azure/AlwaysOn`) provide the thought and justification behind the AlwaysOn architecture and product choices, this part of the repository tells you how to build your own Production-ready AlwaysOn solution using the technical artifacts provided within this repository i.e. Infrastructure-As-Code templates and CI/CD pipelines (via GitHub and Azure DevOps).
Where the Azure Mission-Critical [Design Principles](https://github.com/Azure/Mission-Critical/blob/main/docs/design-methodology/Principles.md) (➡️ `Azure/Mission-Critical`) provide the thought and justification behind the Azure Mission-Critical architecture and product choices, this part of the repository tells you how to build your own production-ready Azure Mission-Critical solution using the technical artifacts provided within this repository i.e. Infrastructure-As-Code templates and CI/CD pipelines (via GitHub and Azure DevOps).
As with the AlwaysOn Design Guidelines, the Reference Implementation section is divided into 8 Critical Design Areas, each giving clear instructions on how the solution is configured. When you are ready to start, the [Getting Started](./Getting-Started.md) guide outlines the process and required steps to deploy AlwaysOn in your environment, including preparing Azure DevOps pipelines.
As with the Azure Mission-Critical Design Guidelines, the Reference Implementation section is divided into 8 Critical Design Areas, each giving clear instructions on how the solution is configured. When you are ready to start, the [Getting Started](./Getting-Started.md) guide outlines the process and required steps to deploy Azure Mission-Critical in your environment, including preparing Azure DevOps pipelines.
## Critical Design Areas
@ -61,11 +61,11 @@ As with the AlwaysOn Design Guidelines, the Reference Implementation section is
## Helpful Information
- [Getting started](Getting-Started.md) outlines the process and required steps to deploy AlwaysOn in your environment, including preparing Azure DevOps pipelines. It should be read in tandem with the Reference Implementation guidance.
- [SLO and Availability](AppDesign-SLO-Availability.md) outlines the SLO for AlwaysOn (99.95%) and how this figure was calculated.
- [ESLZ Alignment](ESLZ-Alignment.md) outlines how AlwaysOn aligns with and compliments the Enterprise Scale Landing Zones.
- [Getting started](Getting-Started.md) outlines the process and required steps to deploy Azure Mission-Critical in your environment, including preparing Azure DevOps pipelines. It should be read in tandem with the Reference Implementation guidance.
- [SLO and Availability](AppDesign-SLO-Availability.md) outlines the SLO for Azure Mission-Critical (99.95%) and how this figure was calculated.
- [ESLZ Alignment](ESLZ-Alignment.md) outlines how Azure Mission-Critical aligns with and compliments the Enterprise Scale Landing Zones.
- [Troubleshooting](Troubleshooting.md) collects solutions to known issues during development and deployment.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Просмотреть файл

@ -1,6 +1,6 @@
# Troubleshooting guide
It's inevitable in a system with the complexity of AlwaysOn that issues and errors occur. This living document maintains a list of solutions for common errors, which are not directly caused by the AlwaysOn code (i.e. are outside of control of the development team and cannot be fixed as bugs in the codebase).
It is inevitable in a system with the complexity of Azure Mission-Critical that issues and errors occur. This living document maintains a list of solutions for common errors, which are not directly caused by the Azure Mission-Critical code (i.e. are outside of control of the development team and cannot be fixed as bugs in the codebase).
- [Deployment issues](#deployment-issues)
- [Infrastructure Deployment stages](#infrastructure-deployment-stages)
@ -105,4 +105,4 @@ To prevent the error from happening to begin with, you can manually de-select th
**Solution:** Re-run the failing step.
---
[AlwaysOn - Full List of Documentation](/docs/README.md)
[Azure Mission-Critical - Full List of Documentation](/docs/README.md)

Двоичные данные
icon-dark.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 30 KiB

Двоичные данные
icon-light.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 61 KiB

Двоичные данные
icon.png

Двоичный файл не отображается.

До

Ширина:  |  Высота:  |  Размер: 87 KiB

Просмотреть файл

@ -1,6 +1,6 @@
# Unit tests
AlwaysOn uses NUnit for unit testing the .NET Core part, but any other framework could be chosen as well (MSTest, xUnit). We developed only a handful of sample unit tests to demonstrate how they would be plugged into the whole development & deployment process.
The Azure Mission-Critical online reference implementation uses NUnit for unit testing the .NET Core part, but any other framework could be chosen as well (MSTest, xUnit). We developed only a handful of sample unit tests to demonstrate how they would be plugged into the whole development & deployment process.
Unit tests are executed automatically by Azure DevOps before container builds. If any test fails, the pipeline will stop and build & deployment will not proceed.

Просмотреть файл

@ -1,8 +1,8 @@
# UI Application
We decided to build a simple user interface application for AlwaysOn, which surfaces the API functionality to end users and also demonstrates how a different type of workload can be deployed to the cluster.
We decided to build a simple user interface application for Azure Mission-Critical, which surfaces the API functionality to end users and also demonstrates how a different type of workload can be deployed to the cluster.
It's a single-page application (SPA), built with the Vue.js framework, which runs entirely in the web browser and calls the AlwaysOn APIs directly.
It's a single-page application (SPA), built with the Vue.js framework, which runs entirely in the web browser and calls the Azure Mission-Critical APIs directly.
## How to run
@ -17,7 +17,7 @@ npm install
npm run serve
```
This will install all dependencies and start a development HTTP server. You can then go to http://localhost:8080/ to see the app in action.
This will install all dependencies and start a development HTTP server. You can then go to `http://localhost:8080/` to see the app in action.
### Production build
@ -60,14 +60,14 @@ Alternatively, you could make the config object part of the compiled app code an
Settings to configure:
* `window.API_URL` = URL of the API root which will be used, **without the trailing "/"**. For localhost this will be something like: *http://localhost:5000/api*, for cloud environment it will be: */api* (because the UI runs on the same domain as the API). This can also be the absolute URL of a published API, only make sure that no firewall and CORS restriction are in place.
* `window.API_URL` = URL of the API root which will be used, **without the trailing "/"**. For localhost this will be something like: `http://localhost:5000/api`, for cloud environment it will be: */api* (because the UI runs on the same domain as the API). This can also be the absolute URL of a published API, only make sure that no firewall and CORS restriction are in place.
* `window.APPINSIGHTS_INSTRUMENTATIONKEY` = Instrumentation key for the Application Insights instance to be used.
## Implementation notes
### CORS
Since this is a single-page application, running in the browser, it requires CORS (Cross-Origin Resource Sharing) to be enabled on the API, for cases when it's not running on the same root URL. The AlwaysOn application is set up in a way that it doesn't need CORS (UI running on `/` with API running on `/api`), but on localhost, this might not be the case (UI running on `localhost:8080` and API on `localhost:5000` which are considered different origins).
Since this is a single-page application, running in the browser, it requires CORS (Cross-Origin Resource Sharing) to be enabled on the API, for cases when it's not running on the same root URL. The Azure Mission-Critical application is set up in a way that it doesn't need CORS (UI running on `/` with API running on `/api`), but on localhost, this might not be the case (UI running on `localhost:8080` and API on `localhost:5000` which are considered different origins).
*Startup.cs*

Просмотреть файл

@ -11,7 +11,7 @@
<hr/>
<footer>This is a reference implementation of <a href="https://github.com/Azure/AlwaysOn">Azure AlwaysOn</a>. Built in 2021-2022. Version: {{ versionLabel }}</footer>
<footer>This is a reference implementation of <a href="https://github.com/Azure/Mission-Critical">Azure AlwaysOn</a>. Built in 2021-2022. Version: {{ versionLabel }}</footer>
</div>
</template>

Просмотреть файл

@ -3,9 +3,9 @@
<h1>😎 Welcome,</h1>
<p>... to Azure AlwaysOn! This is a reference implementation of a community driven project designed to demo and document the process, requirements and design decisions to setup a highly scalable and available application in Microsoft Azure.</p>
<a href="https://github.com/Azure/AlwaysOn"><img src="/img/logo.png" title="AlwaysOn logo" style="width: 50%" /></a>
<a href="https://github.com/Azure/Mission-Critical"><img src="/img/logo.png" title="AlwaysOn logo" style="width: 50%" /></a>
<p>You can find more details <a href="https://github.com/Azure/AlwaysOn-Foundational-Online">on GitHub</a>.</p>
<p>You can find more details <a href="https://github.com/Azure/Mission-Critical-Foundational-Online">on GitHub</a>.</p>
</div>
</template>

Просмотреть файл

@ -1,6 +1,6 @@
# Sample Application
The foundational AlwaysOn reference implementation uses a simple web shop catalog application where end users can browse through a catalog of items, see details of an item, and post ratings and comments for items. Although fairly straight forward, this application enables the [Reference Implementation](/docs/reference-implementation/README.md) to demonstrate the asynchronous processing of requests and how to achieve high throughput within a solution. The application consists of three components and is implemented in .NET Core and hosted on Azure Kubernetes Service.
The Azure Mission-Critical online reference implementation uses a simple web shop catalog application where end users can browse through a catalog of items, see details of an item, and post ratings and comments for items. Although fairly straight forward, this application enables the [Reference Implementation](/docs/reference-implementation/README.md) to demonstrate the asynchronous processing of requests and how to achieve high throughput within a solution. The application consists of three components and is implemented in .NET Core and hosted on Azure Kubernetes Service.
See [Application Design](/docs/reference-implementation/AppDesign-Application-Design.md) for more details about the application.
@ -27,7 +27,7 @@ The UI is compiled in the CI pipeline and uploaded to Azure Storage accounts in
The `/src/app/charts` directory contains individual Helm charts for each of the application components like CatalogService, BackgroundProcessor and HealthService. Helm is used to package the YAML manifests needed to deploy the individual components together including their deployment, services as well as the auto-scaling (HPA) configuration. Each Helm chart contains a `values.yaml` file that contains default values and is used as an argument reference.
These workload Helm charts used in AlwaysOn are currently not uploaded into a Helm registry, they're applied directly via Helm via an [Azure DevOps pipeline](/docs/reference-implementation/DeployAndTest-DevOps-Design-Decisions.md) from within the repository.
These workload Helm charts used in Azure Mission-Critical are currently not uploaded into a Helm registry, they're applied directly via Helm via an [Azure DevOps pipeline](/docs/reference-implementation/DeployAndTest-DevOps-Design-Decisions.md) from within the repository.
### Security Context

Просмотреть файл

@ -4,7 +4,7 @@ name: backgroundprocessor
description: "background processor worker"
version: 0.0.5
sources:
- https://github.com/azure/alwayson
- https://github.com/Azure/Mission-Critical
kubeVersion: ">=1.20.0"
maintainers:
- name: alwayson

Просмотреть файл

@ -4,7 +4,7 @@ name: catalogservice
description: "catalog application api service"
version: 0.0.6
sources:
- https://github.com/azure/alwayson
- https://github.com/Azure/Mission-Critical
kubeVersion: ">=1.20.0"
maintainers:
- name: alwayson

Просмотреть файл

@ -4,7 +4,7 @@ name: healthservice
description: "healthservice api"
version: 0.0.5
sources:
- https://github.com/azure/alwayson
- https://github.com/Azure/Mission-Critical
kubeVersion: ">=1.20.0"
maintainers:
- name: alwayson

Просмотреть файл

@ -4,7 +4,7 @@ The "configuration layer" builds the bridge between the infrastructure deployed
## Versioning
All dependencies and components used for AlwaysOn are defined using a specific, static version to avoid issues due to changes with untested, newer versions of certain components.
All dependencies and components used for the Azure Mission-Critical reference implementation are defined using a specific, static version to avoid issues due to changes with untested, newer versions of certain components.
These versions are specified in `.ado/pipelines/config/configuration.yaml` and loaded into all Azure DevOps pipelines. Here's an example how this looks like:

Просмотреть файл

@ -4,7 +4,7 @@ name: csi-secrets-config-keyvault
description: "Configuration for CSI secret driver backed by Azure Key Vault"
version: 0.0.1
sources:
- https://github.com/azure/alwayson
- https://github.com/Azure/Mission-Critical
kubeVersion: ">=1.18.0"
maintainers:
- name: alwayson

Просмотреть файл

@ -14,19 +14,19 @@
---
The AlwaysOn reference implementation follows a layered and modular approach. This approach achieves the following goals:
The Azure Mission-Critical reference implementation follows a layered and modular approach. This approach achieves the following goals:
- Cleaner and manageable deployment design
- Ability to switch service(s) with other services providing similar capabilities depending on requirements
- Separation between layers which enables implementation of RBAC easier in case multiple teams are responsible for different aspects of AlwaysOn application deployment and operations
- Separation between layers which enables implementation of RBAC easier in case multiple teams are responsible for different aspects of Azure Mission-Critical application deployment and operations
The AlwaysOn reference implementation is composed of three distinct layers:
The Azure Mission-Critical reference implementations are composed of three distinct layers:
- Infrastructure
- Configuration
- Application
Infrastructure layer contains all infrastructure components and underlying foundational services required for AlwaysOn reference implementation. It is deployed using [Terraform](./workload/README.md).
Infrastructure layer contains all infrastructure components and underlying foundational services required for Azure Mission-Critical reference implementation. It is deployed using [Terraform](./workload/README.md).
> Note: Bicep (ARM DSL) was considered during the early stages as part of a proof-of-concept. Please refer to the following [(archived stub)](/docs/reference-implementation/ZZZ-Archived-Bicep.md) for more details.
@ -36,7 +36,7 @@ Infrastructure layer contains all infrastructure components and underlying found
## Architecture
![Architecture overview](/docs/media/Architecture-Foundational-Online.png)
![Architecture overview](/docs/media/mission-critical-architecture-online.png)
### Stamp independence
@ -58,7 +58,7 @@ As much as possible, no state should be stored on the compute clusters with all
In addition to [stamp independence](#stamp-independence) and [stateless compute clusters](#stateless-compute-clusters), each "stamp" is considered to be a Scale Unit (SU) following the [Deployment stamps pattern](https://docs.microsoft.com/azure/architecture/patterns/deployment-stamp). All components and services within a given stamp are configured and tested to serve requests in a given range. This includes auto-scaling capabilities for each service as well as proper minimum and maximum values and regular evaluation.
An example SU design in AlwaysOn consists of scalability requirements i.e. minimum values / the expected capacity:
An example Scale Unit design in Azure Mission-Critical consists of scalability requirements i.e. minimum values / the expected capacity:
**Scalability requirements**
| Metric | max |
@ -87,9 +87,9 @@ Each SU is deployed into an Azure region and is therefore primarily handling tra
### Available Azure Regions
The reference implementation of AlwaysOn deploys a set of Azure services. These services are not available across all Azure regions. In addition, only regions which offer **[Availability Zones](https://docs.microsoft.com/azure/availability-zones/az-region)** (AZs) are considered for a stamp. AZs are gradually being rolled-out and are not yet available across all regions. Due to these constraints, the reference implementation cannot be deployed to all Azure regions.
The reference implementation of Azure Mission-Critical deploys a set of Azure services. These services are not available across all Azure regions. In addition, only regions which offer **[Availability Zones](https://docs.microsoft.com/azure/availability-zones/az-region)** (AZs) are considered for a stamp. AZs are gradually being rolled-out and are not yet available across all regions. Due to these constraints, the reference implementation cannot be deployed to all Azure regions.
As of February 2022, following regions have been successfully tested with the reference implementation of AlwaysOn:
As of February 2022, following regions have been successfully tested with the reference implementation of Azure Mission-Critical:
**Europe/Africa**
@ -180,7 +180,7 @@ The current networking setup consists of a single Azure Virtual Network per _sta
Azure Kubernetes Service (AKS) is used as the compute platform as it is most versatile and as Kubernetes is the de-facto compute platform standard for modern applications, both inside and outside of Azure.
AlwaysOn uses Linux-only clusters as there is no requirement for any Windows-based containers and Linux is the more mature platform in terms of Kubernetes.
Azure Mission-Critical uses Linux-only clusters as there is no requirement for any Windows-based containers and Linux is the more mature platform in terms of Kubernetes.
- `role_based_access_control` (RBAC) is **enabled**.
- `sku_tier` set to **Paid** (Uptime SLA) to achieve the 99.95% SLA within a single region (with `availability_zones` enabled).
@ -239,16 +239,16 @@ Azure Policy is used to monitor and enforce certain baselines. All policies are
#### Supporting services
This repository also contains a couple of supporting services for the AlwaysOn project:
This repository also contains a couple of supporting services for the Azure Mission-Critical project:
- [Self-hosted Agents](./build-agents/README.md)
- [Locust Load Testing](../testing/loadtest-locust/README.md)
These supporting services are required / optional based on how you chose to use AlwaysOn.
These supporting services are required / optional based on how you chose to use Azure Mission-Critical.
## Naming conventions
All resources used for AlwaysOn follow a pre-defined and consistent naming structure to make it easier to identify them and to avoid confusion. Resource abbreviations are based on the [Cloud Adoption Framework](https://docs.microsoft.com/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#general). These abbreviations are typically attached as a suffix to each resource in Azure.
All resources used for Azure Mission-Critical follow a pre-defined and consistent naming structure to make it easier to identify them and to avoid confusion. Resource abbreviations are based on the [Cloud Adoption Framework](https://docs.microsoft.com/azure/cloud-adoption-framework/ready/azure-best-practices/resource-abbreviations#general). These abbreviations are typically attached as a suffix to each resource in Azure.
A **prefix** is used to uniquely identify "deployments" as some names in Azure must be worldwide unique. Examples of these include Storage Accounts, Container Registries and CosmosDB accounts.

Просмотреть файл

@ -10,7 +10,7 @@
---
AlwaysOn is using [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/logs/log-analytics-overview) as a central store for logs and metrics for all application and infrastructure components and [Azure Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview) for all application monitoring data. Each stamp has its own, dedicated Log Analytics Workspace and App Insights instance. Next to those is one Log Analytics Workspace for the globally shared resources such as Front Door and Cosmos DB.
Azure Mission-Critical is using [Azure Log Analytics](https://docs.microsoft.com/azure/azure-monitor/logs/log-analytics-overview) as a central store for logs and metrics for all application and infrastructure components and [Azure Application Insights](https://docs.microsoft.com/azure/azure-monitor/app/app-insights-overview) for all application monitoring data. Each stamp has its own, dedicated Log Analytics Workspace and App Insights instance. Next to those is one Log Analytics Workspace for the globally shared resources such as Front Door and Cosmos DB.
![Monitoring overview](/docs/media/MonitoringOverview.png)
@ -20,7 +20,7 @@ As all stamps are short-lived and continuously replaced with each new release (s
### Diagnostic settings
All Azure services used for AlwaysOn are configured to send all their Diagnostic data including logs and metrics to the deployment specific (global or stamp) Log Analytics Workspace. This happens automatically as part of the [Terraform](/src/infra/README.md#infrastructure) deployment. New options will be identified automatically and added as part of `terraform apply`.
All Azure services used for Azure Mission-Critical are configured to send all their Diagnostic data including logs and metrics to the deployment specific (global or stamp) Log Analytics Workspace. This happens automatically as part of the [Terraform](/src/infra/README.md#infrastructure) deployment. New options will be identified automatically and added as part of `terraform apply`.
![Diagnostic Settings](/docs/media/Monitoring1DiagnosticSettings.png)
@ -52,7 +52,7 @@ To monitor the availability of the individual stamps and the overall solution fr
## Queries
AlwaysOn uses different Kusto Query Language (KQL) queries to implement complex, custom queries as functions to retrieve data from Log Analytics. These queries are stored as individual files in the `/src/infra/monitoring/queries` directory (separated into global and stamp) and are imported and applied automatically via Terraform as part of each infrastructure pipeline run.
Azure Mission-Critical uses different Kusto Query Language (KQL) queries to implement complex, custom queries as functions to retrieve data from Log Analytics. These queries are stored as individual files in the `/src/infra/monitoring/queries` directory (separated into global and stamp) and are imported and applied automatically via Terraform as part of each infrastructure pipeline run.
This approach separates the query logic from the visualization layer. It allows us to call these functions individually and use them either directly to retrieve data from Log Analytics or to visualize the results in Azure Dashboards, Azure Monitor Workbooks or 3rd-Party dashboarding solutions like Grafana.
@ -66,7 +66,7 @@ This result provides a granular overview about the cluster's health status based
## Visualization
The Visualization of the Kusto [Queries](#Queries) described above was implemented using Grafana. Grafana is used to show the results of Log Analytics queries and does not contain any logic itself. The Grafana stack is not part of the solution's deployment lifecycle, but released separately. For a detailed description of the Grafana deployment for AlwaysOn, please refer to the [Grafana README](/src/infra/monitoring/grafana/README.md).
The Visualization of the Kusto [Queries](#Queries) described above was implemented using Grafana. Grafana is used to show the results of Log Analytics queries and does not contain any logic itself. The Grafana stack is not part of the solution's deployment lifecycle, but released separately. For a detailed description of the Grafana deployment for Azure Mission-Critical, please refer to the [Grafana README](/src/infra/monitoring/grafana/README.md).
## Alerting

Просмотреть файл

@ -10,6 +10,7 @@ When the Dockerfile is built, a container is created with the following:
- AlwaysOn-Healthmodelpanel custom visualization
## Environment Variables
The container expects the following environment variables to be set:
| Name | Value |
@ -19,30 +20,32 @@ The container expects the following environment variables to be set:
| AZURE_DEFAULT_SUBSCRIPTION | Id of the Azure subscription that holds the Log Analytics instances |
## Managed Identity
The data source has been set for Managed Identity authentication to Azure.
This means that the infrastructure running the container, e.g. Azure App Service, should have its system-managed identity enabled and that identity should be assigned, at minimum, the 'Log Analytics Reader' permission on a scope that includes all required Log Analytics instances.
## Grafana Authentication
Currently, authentication has been set to a username/password. Obviously this is not the best way in production scenarios, but OAuth authentication requires external dependencies that make this reference implementation harder to deploy and may be subject to security constraints in your local environment.
Before deploying this to your production environment, it is *highly recommended* to enable OAuth. This is done by editing the grafana.ini file and uncommenting/filling the values under the authentication section. Naturally, don't add secrets there. You can add ${MY_SECRET_VALUE} as a value and include that at runtime through environment variables.
Before deploying this to your production environment, it is *highly recommended* to enable OAuth. This is done by editing the `grafana.ini` file and uncommenting/filling the values under the authentication section. Naturally, don't add secrets there. You can add ${MY_SECRET_VALUE} as a value and include that at runtime through environment variables.
## Note about line endings
When editing on Windows, ensure that for the dashboard queries as well as the .ts and .tsx files, line endings are set to **LF** to ensure a smooth docker build process.
When editing on Windows, ensure that for the dashboard queries as well as the `.ts` and `.tsx` files, line endings are set to **LF** to ensure a smooth docker build process.
# Grafana Health Model Panel
## Grafana Health Model Panel
The AlwaysOn health model has been implemented in Azure Log Analytics using KQL queries. This is a custom Grafana visualization panel, which can be used to visualize that health model. Its main purpose is to visualize, in an intuitive way:
The Azure Mission-Critical health model has been implemented in Azure Log Analytics using KQL queries. This is a custom Grafana visualization panel, which can be used to visualize that health model. Its main purpose is to visualize, in an intuitive way:
- The health state of each component
- The hierarchical dependencies between components.
This document describes the specifics of the custom Grafana visualization and the dependencies it has on the underlying solution. For a broader context, view the AlwaysOn guidance on Azure Architecture Center.
This document describes the specifics of the custom Grafana visualization and the dependencies it has on the underlying solution. For a broader context, view the Azure Mission-Critical guidance on Azure Architecture Center.
## Usage
### Usage
### Input Data
#### Input Data
The panel depends on a Log Analytics query result that contains the relevant information. The following columns are required in the query result:
@ -81,16 +84,16 @@ This gives the following result, which is the input for the health model panel:
This query is subsequently visualized in the following way:
![Example healthmodelpanel](/docs/media/healthmodel-example.png)
# Build & Deploy
## Build & Deploy
## Option 1: Docker Build for the entire Grafana container
### Option 1: Docker Build for the entire Grafana container
1. Docker build:
`docker build -t alwayson-grafana .`
This docker container contains a full Grafana install as well as the health model panel and can be run directly on any container hosting environment. The required environment variable for running unsigned panels has already been set.
## Option 2: Manually Build the health model panel
### Option 2: Manually Build the health model panel
1. Go to the _healthmodelpanel_ directory
@ -105,5 +108,4 @@ This docker container contains a full Grafana install as well as the health mode
5. In order to run an unsigned Grafana panel, ensure that the following environment variable has been set:
`GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS="alwayson-healthmodelpanel"`
![Solution Health Monitoring Screenshot](/docs/media/healthmodel-example-fullpage.png)

Просмотреть файл

@ -7,7 +7,7 @@
"description": "",
"author": {
"name": "nielsb",
"url": "https://github.com/Azure/AlwaysOn"
"url": "https://github.com/Azure/Mission-Critical"
},
"keywords": [],
"logos": {

Просмотреть файл

@ -1,8 +1,8 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "AlwaysOn Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Contact = var.contact_email
Environment = var.environment

Просмотреть файл

@ -1,8 +1,8 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "AlwaysOn Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Contact = var.contact_email
Environment = var.environment

Просмотреть файл

@ -1,3 +1,3 @@
# Query Archive
These queries are not currently used in the solution, but may be useful as artifacts for future AlwaysOn solutions.
These queries are not currently used in the solution, but may be useful as artifacts for future Azure Mission-Critical solutions.

Просмотреть файл

@ -9,12 +9,12 @@ The number and the selected regions for these "stamp" deployments can easily be
## Public and Private versions
The reference implementation can be used to deploy to different flavors of the AlwaysOn infrastructure:
The reference implementation can be used to deploy to different flavors of the Azure Mission-Critical infrastructure:
- A "public" version which does not fully lock down all services, but in turn it can be deployed using Azure DevOps-hosted Build Agents. Plus, developers and administrators can more easily connect to the resources and debug them.
- A fully "private" version which locks all traffic to the services down to Private Endpoints. This provides even tighter security but requires the use of self-hosted, VNet-integrated Build Agents. Also, for any debugging etc. users most connect through Azure Bastion and Jump Servers.
Head over to [this GitHub repository](https://github.com/Azure/AlwaysOn-foundational-private) for detailed instructions how to set up the private version.
Head over to [this GitHub repository](https://github.com/Azure/Mission-Critical-Connected) for detailed instructions how to set up the private version.
## Get started

Просмотреть файл

@ -1,8 +1,8 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "AlwaysOn Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Contact = var.contact_email
Environment = var.environment

Просмотреть файл

@ -1,8 +1,8 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "AlwaysOn Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Contact = var.contact_email
Environment = var.environment

Просмотреть файл

@ -1,14 +1,14 @@
# Testing Implementation
The AlwaysOn reference implementation contains various kinds of tests used at different stages. These include:
The Azure Mission-Critical reference implementation contains various kinds of tests used at different stages. These include:
- **Unit tests**. These validate that the business logic of the application works as expected. AlwaysOn contains a [sample suite of C# unit tests](/src/app/AlwaysOn.Tests/README.md) that are automatically executed before every container build.
- **Unit tests**. These validate that the business logic of the application works as expected. Azure Mission-Critical contains a [sample suite of C# unit tests](/src/app/AlwaysOn.Tests/README.md) that are automatically executed before every container build.
- **Load tests**. These can help to evaluate the capacity, scalability and potential bottlenecks of a given workload and stack.
- **Smoke tests**. These identify if the infrastructure and workload are available and act as expected. Smoke tests are executed as part of every deployment.
- **UI tests**. These validate that the user interface was deployed and works as expected. Currently AlwaysOn only [captures screenshots](/src/testing/ui-test-playwright/README.md) of several pages after deployment without any actual testing.
- **Failure Injection tests**. These are done in two ways: First, AlwaysOn integrates Azure Chaos Studio for automated testing as part of the deployment pipelines. Secondly, manual failure injection test can be conducted. See below for details.
- **UI tests**. These validate that the user interface was deployed and works as expected. Currently Azure Mission-Critical only [captures screenshots](/src/testing/ui-test-playwright/README.md) of several pages after deployment without any actual testing.
- **Failure Injection tests**. These are done in two ways: First, Azure Mission-Critical integrates Azure Chaos Studio for automated testing as part of the deployment pipelines. Secondly, manual failure injection test can be conducted. See below for details.
Additionally, AlwaysOn contains a [user load generator](/src/testing/userload-generator/README.md) to create synthetic load patterns which can be used to simulate real life traffic. This can also be used completely independently of the reference implementation.
Additionally, Azure Mission-Critical contains a [user load generator](/src/testing/userload-generator/README.md) to create synthetic load patterns which can be used to simulate real life traffic. This can also be used completely independently of the reference implementation.
## Failure Injection testing and Chaos Engineering
@ -18,11 +18,11 @@ Resilience is a property of an entire system and injecting faults helps to find
Manual failure injection testing was initially performed across both global and deployment stamp resources. Please consult the [Failure Injection article](/docs/reference-implementation/DeployAndTest-Testing-FailureInjection.md) for details.
AlwaysOn integrates [Azure Chaos Studio](https://aka.ms/chaosstudio) to deploy and run a set of Azure Chaos Studio Experiments to inject various faults at the global and stamp levels.
Azure Mission-Critical integrates [Azure Chaos Studio](https://aka.ms/chaosstudio) to deploy and run a set of Azure Chaos Studio Experiments to inject various faults at the global and stamp levels.
## Frameworks
The AlwaysOn reference implementation uses existing testing capabilities and frameworks whenever possible. The subsequent sections contain an overview of the used tools and frameworks.
The Azure Mission-Critical online reference implementation uses existing testing capabilities and frameworks whenever possible. The subsequent sections contain an overview of the used tools and frameworks.
- [Locust](#locust) for load testing
- [Playwright](#playwright) for UI testing
@ -38,11 +38,11 @@ Playwright is an open source Node.js library to automate Chromium, Firefox and W
### Azure Chaos Studio
To inject failures for resiliency validation, AlwaysOn uses Azure Chaos Studio as an optional step in the E2E validation pipeline. See [Chaos Testing](./chaos-testing/README.md) for more details about the implementation and configuration.
To inject failures for resiliency validation, Azure Mission-Critical uses Azure Chaos Studio as an optional step in the E2E validation pipeline. See [Chaos Testing](./chaos-testing/README.md) for more details about the implementation and configuration.
## User Load Generator
To simulate real user traffic patterns, AlwaysOn implements a [user load generator](./userload-generator/README.md) to generate synthetic traffic. It uses a Playwright test definition and can be also used completely independently of AlwaysOn.
To simulate real user traffic patterns, Azure Mission-Critical implements a [user load generator](./userload-generator/README.md) to generate synthetic traffic. It uses a Playwright test definition and can be also used completely independently of Azure Mission-Critical reference implementations.
---

Просмотреть файл

@ -1,6 +1,6 @@
# Chaos Experiments
The reference implementation of AlwaysOn integrates Azure Chaos Studio (currently in preview) to inject faults by creating and executing Chaos experiments.
The reference implementation for the Mission-Critical project integrates Azure Chaos Studio (currently in preview) to inject faults by creating and executing Chaos experiments.
Chaos experiments can be executed as an optional part of the E2E deployment pipeline. In case they are executed, the optional load test is always executed in parallel as well. This is to create some load on the cluster to actually validate the impact of the injected faults.

Просмотреть файл

@ -2,7 +2,7 @@
[locust.io](https://locust.io) is an easy to use, scriptable and scalable open source load and performance testing tool.
The AlwaysOn reference implementation leverages Locust in two different ways:
The Azure Mission-Critical reference implementation leverages Locust in two different ways:
* **Embedded** is used to automatically run load tests as part of the end-to-end (e2e) validation pipeline using a fixed set of parameters. This is intended to compare each e2e run (currently manually) and the changes that were made against a given performance baseline.
@ -10,7 +10,7 @@ The AlwaysOn reference implementation leverages Locust in two different ways:
## Infrastructure
The standalone as well as the embedded Locust implementation used for AlwaysOn consists of one master node and one or more worker nodes distributed across multiple Azure regions. The worker nodes execute the load testing tasks and communicate with the master node on port `5557/TCP`. The master node is orchestrating the worker nodes, gathering the load test data and (in standalone-mode only) hosting a web interface on port `8089/TCP` to conduct and monitor load tests.
The standalone as well as the embedded Locust implementation used for the Azure Mission-Critical reference implementation consists of one master node and one or more worker nodes distributed across multiple Azure regions. The worker nodes execute the load testing tasks and communicate with the master node on port `5557/TCP`. The master node is orchestrating the worker nodes, gathering the load test data and (in standalone-mode only) hosting a web interface on port `8089/TCP` to conduct and monitor load tests.
All nodes are represented as individual container instances, hosted on Azure Container Instances (ACI) and are deployed via Terraform. The Terraform definition is stored in the `src/infra/loadtest-locust` directory.
@ -49,7 +49,7 @@ And uploads the load test results at the end at the end of each successful run a
## Authentication
Some of the REST methods on the AlwaysOn API are protected with API key-based authentication. In order to call the API and run tests, Locust needs to present the `X-API-Key: XXX` HTTP header. The corresponding value can be fetched from of of the Azure Key Vault of the deployment (it is the same key between all the stamps).
Some of the REST methods on the Azure Mission-Critical sample workload API are protected with API key-based authentication. In order to call the API and run tests, Locust needs to present the `X-API-Key: XXX` HTTP header. The corresponding value can be fetched from of of the Azure Key Vault of the deployment (it is the same key between all the stamps).
## Load Testing

Просмотреть файл

@ -1,7 +1,7 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "Always-on Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Environment = var.environment
Prefix = var.prefix

Просмотреть файл

@ -1,7 +1,7 @@
locals {
default_tags = {
Owner = "AlwaysOn V-Team"
Project = "Always-on Solution Engineering"
Owner = "Azure Mission-Critical V-Team"
Project = "Azure Mission-Critical Solution Engineering"
Toolkit = "Terraform"
Environment = var.environment
Prefix = var.prefix