Updated setting up environment and securing the cluster.

2018-09-04 22:20:00 -04:00 · 2018-09-04 22:20:00 -04:00 · 13592ec12e
--- a/Security_securing_a_cluster.md
+++ b/Security_securing_a_cluster.md
@ -30,4 +30,68 @@ Table of Contents
 - [ ] :cloud: Maintenance of certificate and key rotation, cleanup of docker registry
 - [ ] :cloud: Security Impact of activating addons and dashboard
 - [ ] :cloud: Encrypted service to service communication across nodes
- [ ] :cloud: Service Endpoints for PaaS Service lockdown
+- [ ] :cloud: Service Endpoints for PaaS Service lockdown
+
+## Master Endpoint Security
+
+The best thing to do is to ensure that RBAC with Azure AD Integration is configured. The API Server has three security levels. The first level is Authentication, the second level is Authorization, and the third level is Admission Controllers. By integrating Authentication and Authorization to Azure AD it means that only users from and Organization's Azure AD, and been granted the appopriate permissions, can access the API Server. If not, they are denied. The last layer around Admission Controllers is about applying policy to ensure that the permitted requests that do come through are adhering to the policy of the Organization. Admission Controllers are discussed more in the securing workloads section.
+
+## Securing Host Access
+
+By default, SSH access to the Control Planes is not permitted as AKS is a managed service. In terms of the workers, SSH access from the Public Internet is blocked by default via Network Security Groups (NSGs). Even if someone was able to get inside of the VNET, they would need to have the SSH Keys to be able to access the servers.
+
+The recommended practice is to generate your own SSH Keys ahead of Cluster creation and secure those keys in something like Azure Key Vault (AZK). This was the keys are not floating around or stored on a server for someone to use.
+
+## Default RBAC Setup
+
+??? - Dennis has some stuff that he has done that could go here.
+
+## Continuous Security (Should this be called Image Management)
+
+One could argue that Container Image Management is one of the most critical parts of using Containers. It all starts with DevOps and the ability to automate the deployment of Applications/Services into AKS. If you trust where the images are coming from and put policy restrictions in place so that only trusted images are deployed into the AKS Cluster, that is half the battle.
+
+If we can only deploy from a Trusted Registry then that eliminates the threat of deploying a malicious container from an untrusted registry. So how do we trust the registry? It all starts with a solid DevOps pipline and continuous security in the form of Container Image scanning at Build time as well as Runtime. If we scan Container Images at Build Time and only upload trusted images to the Registry, that means we are only pulling/deploying trusted images to our AKS Cluster. But what if a Container Image is comprimised at runtime, somehow someone has got access to a Container via some exploit? This is where Runtime Image scanning comes into play and let's you know that a Container might have a vulnerability. The policy could be simply to Notify someone, or it can be automated to recycle the Container which would immediately terminate the comprimised Container as the process would be killed and restarted.
+
+## Dev Namespace Example
+
+The following is an example of how to configure a Namespace, in this case we will call it **dev**. The goal is to apply LimitRanges, Quotas and RBACk to the Namespace to ensure it does not become a noisy neighbour and only consume the resources it has intended to along with providing security isolation so that it can only interact with the resources it is intended too.
+
+```bash
+kubectl apply -f securing_a_cluster/create-namespaces.yaml
+kubectl apply -f securing_a_cluster/namespace-limitranges.yaml
+kubectl apply -f securing-a_cluster/namespace-quotas.yaml
+```
+
+## Upgrading and Maintaing Hosts
+
+??? - Not sure we can do anythig in this area today.
+
+## Security Benchmarks
+
+The purpose of running security benchmarks is to ensure that the proper policy, procedures and controls are in place when an AKS Cluster is created. By running a simple check we can ensure things are ready to go and in place from a security perspective.
+
+Kubebench is a great example of one of the tools, click [here](https://github.com/aquasecurity/kube-bench) for more details.
+
+## Certificate Maintenance and Rotation
+
+??? - Rolling certificates in AKS today is not possible as access to the Control Plane is not allowed. It is on the roadmap, but not sure what, if anything, we do in the interim.
+
+## Activating Add-ons
+
+The recommended practice in this area is to enable monitoring so that logs are captured and can be used for troubleshooting down the road. Given there is no SSH access to the Control Plane and SSH access to the worker nodes is limited, having a solid Monitoring and Loggins strategy in place is key to Troubleshooting. A centralized loggin store is a good place to start.
+
+As for the Kubernetes Dashboard, the recommendation is to not install it. If a solid DevOps pipline is in place along with a Loggin strategy, there is not need to provide access to the Kubernetes Dashboard. Misconfiguration of the Dashboard can be a big security risk, allowing users to have admin level access to do and see everything in the AKS Cluster.
+
+## Encrypted Service to Service Communication
+
+The question that usually gets raised, what about traffic inside of the cluster? The question back is what vulnerabilities/risks are we trying to mitigate? If we trust services that have got through the front door, do we need to do anything more? This is uncomfortable for a lot of Organizations as they are used to all service to service communication going through a WAF and monitoring traffic. When using Distributed Systems it does not make sense to have each request go back out to the WAF and come back in as the WAF is not aware of where Services are located, which is also one of the main reasons we use Ingress Controllers as they are Service Discovery aware.
+
+So how do we deal with this trust problem? Typically the risk can be mitigated with the use of mutual TLS (mTLS) between services so that services can only interact with other trusted services. This is where the concept of Service Meshes come in. The details of Service Meshes are a topic for another time.
+
+## Private API Server and Service Endpoints
+
+Many Organizations are concerned that the Kubernetes API Server is Publicly exposed and are asking for a Private Endpoint. Private Endpoints are not an option today, but are on the roadmap. The ability to lock down access to the Public Endpoint to only certain VNETs (Service Endpoints) is also not available today, but on the roadmap.
+
+**A bit of a duplicate of the first API Server section.**
+
+So what can I do in the interim? The best thing to do is to ensure that RBAC with Azure AD Integration is configured. The API Server has three security levels. The first level is Authentication, the second level is Authorization, and the third level is Admission Controllers. By integrating Authentication and Authorization to Azure AD it means that only users from and Organization's Azure AD, and been granted the appopriate permissions, can access the API Server. If not, they are denied. The last layer around Admission Controllers is about applying policy to ensure that the permitted requests that do come through are adhering to the policy of the Organization. Admission Controllers are discussed more in the securing workloads section.
--- a/Security_securing_workloads.md
+++ b/Security_securing_workloads.md
@ -24,7 +24,6 @@ Table of Contents

 ## Image protection

-
 For multitentant clusters it is useful to enforce the image pull policy to Always - see AlwaysPullImages admission controller

 ## Admission controlers
--- a/Security_setting_up_environments.md
+++ b/Security_setting_up_environments.md
@ -16,3 +16,39 @@ Table of Contents
 - [ ] :fire: Inbound/ Outbound traffic control
 - [ ] :fire: Setting up RBAC roles and default bindings
 - [ ] :cloud: Dedicated nodes / hyper-v isolation on Nodes
+
+## Cluster vs Nodes vs Namespace Isolation
+
+Some thought needs to get put into how you are going to isolate environments and workloads. For example, an organization can use a single cluster and isolate different environments and workloads using Namespaces. Another example to separate environments is to have a cluster per environment, or one cluster for Production and one for non-Production. If Network isolation is of the utmost importance then having a cluster per network zone is another option.
+
+There is not right or wrong answer as it depends on the needs of the organization. The biggest thing is to balance operations and maintenance with practicality. Yes having a separate cluster for each network zone satisfies a requirement, but on the flip side there are more clsuter to manage and maintain which means incurring more technical debt.
+
+A general rule of thumb is to have a Production Cluster and a non-Production Cluster at a minimum. The workloads (microservices) within each cluster are separated via K8s Namespaces in order to apply LimitRanges, Quotas and RBAC so that one service does not be a noisy neighbour to another service.
+
+## Azure Service Principals and MSI
+
+Today, controlling what an AKS Cluster can do within an Azure Subscripiton is done via Service Principals. Meaning, the cluster can only provision or interact with resources it is allowed to access. The recommended practice is to create the Service Principal ahead of time and only grant it the minimal permissions it needs for creation of the AKS Cluster.
+
+Let's use an example of how this works today. As we know, one of the recommended practices with Containers is to not store sensitive information within a Container Image, but leverage a store like Azure Key Vault (AKV) to store that sensitive information. But how does the service get access to AKV in order to retrieve said information? Today, with Service Principals that means storing the credentials to access AKV somewhere, which means they need to be managed. What if there was another way?
+
+This is where Managed Service Identities (MSIs) come in. MSIs are a managed identity in Azure AD that allow an application/service to get access to other resources in Azure, AKV for example, without having to provide credentials. The credentials are managed on the service's behalf and allow the service identity to authenticate to any service that supports Azure AD authentication. It just so happens that Azure Key Vault is one of those service, meaning you don't hvae to store credentials in your code.
+
+## Inbound/Outbound Traffic Control
+
+Let's put the Kubernetes API Server aside for a second and focus on the Application Workloads. Getting access to Application Workloads from the Internet or Intranet is done via Ingress Controllers. The best way to think of Ingress Controllers is a Reverse Proxy that can help control traffic coming into the AKS Cluster. We could expose each service using an Internal Load Balancer or via a Public IP Address, but that would become hard to manage, especially if we want to have a single DNS Endpoint. Ingress Controllers help us to focus all interal or external traffic through a few key exposed endpoints.
+
+Keeping the above in mind, those Public or Internal endpoints can be protected in the same way that endpoints are protected today, by putting a Web Application Firewll (WAF) in front of that endpoint and only allowing traffic to the Ingress Controller from the WAF. The WAF can then take of the OWASP vulnerabilities we have become accustomed to them taking care of for us.
+
+The question that usually gets raised, what about traffic inside of the cluster? The question back is what vulnerabilities/risks are we trying to mitigate? If we trust services that have got through the front door, do we need to do anything more? This is uncomfortable for a lot of Organizations as they are used to all service to service communication going through a WAF and monitoring traffic. When using Distributed Systems it does not make sense to have each request go back out to the WAF and come back in as the WAF is not aware of where Services are located, which is also one of the main reasons we use Ingress Controllers as they are Service Discovery aware.
+
+So how do we deal with this trust problem? Typically the risk can be mitigated with the use of mutual TLS (mTLS) between services so that services can only interact with other trusted services. This is where the concept of Service Meshes come in. The details of Service Meshes are a topic for another time.
+
+## RBAC Setup and Default Bindings
+
+One could argue that how RBAC is setup in an AKS cluster is the single most security control you can apply. RBAC controls what can be done via the API Server as well as inside of the AKS Cluster. The recommended way to implement RBAC within an AKS Cluster is by connecting it to Azure Active Directory (Azure AD). This allows an Organization to secure an AKS Cluster in the same manner as they secure access to an Azure Subscription.
+
+At a minimum, Organizations should grant a Group in Azure AD Cluster Admin priviliges and another group in Azure AD read-only privileges. This ensure that the right folks in the Organization have admin access to the cluster and another group has read privileges to see what is going on. Outside of that, Organizations should look to setup specific RBAC per namespace that is setup in Kubernetes (K8s). Namespaces provide a great isolation boundary ensuring that one service does not impact another service or be a noisy neighbour.
+
+## Dedicated Nodes and Hyper-V Isolation Nodes
+
+???
--- a/securing_a_cluster/create-namespaces.yaml
+++ b/securing_a_cluster/create-namespaces.yaml
@ -0,0 +1,4 @@
+apiVersion: v1
+kind: Namespace
+metadata:
+  name: dev
--- a/securing_a_cluster/namespace-limitranges.yaml
+++ b/securing_a_cluster/namespace-limitranges.yaml
@ -0,0 +1,25 @@
+apiVersion: v1
+kind: LimitRange
+metadata:
+  name: cpu-limit-range
+  namespace: dev
+spec:
+  limits:
+  - default:
+      cpu: 0.5
+      memory: 512Mi
+    defaultRequest:
+      cpu: 0.25
+      memory: 256Mi
+    max:
+      cpu: 1
+      memory: 1Gi
+    min:
+      cpu: 200m
+      memory: 256Mi
+    type: Container
+  - max:
+      storage: 2Gi
+    min:
+      storage: 1Gi
+    type: PersistentVolumeClaim
--- a/securing_a_cluster/namespace-quotas.yaml
+++ b/securing_a_cluster/namespace-quotas.yaml
@ -0,0 +1,13 @@
+apiVersion: v1
+kind: ResourceQuota
+metadata:
+  name: mem-cpu-demo
+  namespace: dev
+spec:
+  hard:
+    requests.cpu: "1"
+    requests.memory: 1Gi
+    limits.cpu: "2"
+    limits.memory: 2Gi
+    persistentvolumeclaims: "5"
+    requests.storage: "10Gi"