> Most of the examples use `sumo-prod` as an example namespace. SUMO dev/stage/prod run in the `sumo-dev`/`sumo-stage`/`sumo-prod` namespaces respectively.
### General
Most examples are using the `kubectl get ...` subcommand. If you'd prefer output that's more readable, you can substitute the `get` subcommand with `describe`:
```
kubectl -n sumo-prod describe pod sumo-prod-web-76b74db69-dvxbh
```
> Listing resources is easier with the `get` subcommand.
To see all SUMO pods currently running:
```
kubectl -n sumo-prod get pods
```
To see all pods running and the K8s nodes they are assigned to:
```
kubectl -n sumo-prod get pods -o wide
```
To show yaml for a single pod:
```
kubectl -n sumo-prod get pod sumo-prod-web-76b74db69-dvxbh -o yaml
```
To show all deployments:
```
kubectl -n sumo-prod get deployments
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
sumo-prod-celery 3 3 3 3 330d
sumo-prod-cron 0 0 0 0 330d
sumo-prod-web 50 50 50 50 331d
```
To show yaml for a single deployment:
```
kubectl -n sumo-prod get deployment sumo-prod-web -o yaml
Secret values are base64 encoded when viewed in K8s output. Once setup as an environment variable or mounted file in a pod, the values are base64 decoded automatically.
Kitsune uses secrets specified as environment variables in a deployment spec:
kubectl -n sumo-prod get secret sumo-secrets-prod -o yaml
```
To view a secret with decoded values (aka "human readable"):
> This example uses the [ksv](https://github.com/metadave/ksv) utility
```
kubectl -n sumo-prod get secret sumo-secrets-prod -o yaml | ksv
```
To encode a secret value:
```
echo -n "somevalue" | base64
```
> The `-n` flag strips the newline before base64 encoding.
> Values must be specified without newlines, the `base64` command on Linux can take a `-w 0` parameter that outputs without newlines. The `base64` command in Macos Sierra seems to output encoded values without newlines.
Our hosted Elasticsearch cluster is in the `us-west-2` region of AWS. Elastic.co hosting status can be found on [this](https://cloud-status.elastic.co/) page.
6. from the `Actions` menu (close to the top of the page), click `Edit`
7. the `Details` tab for the ASG should appear, set the appropriate `Min`, `Desired` and `Max` values.
1. it's probably good to set `Min` and `Desired` to the same value in case the cluster autoscaler decides to scale down the cluster smaller than the `Min`.
8. click `Save`
9. if you click on `Instances` from the navigation on the left side of the page, you can see the new instances that are starting/stopping.
10. you can see when the nodes join the K8s cluster with the following command:
There are limits that apply to using VPC ACLs documented [here](http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/VPC_Appendix_Limits.html#vpc-limits-nacls).
> Note: Route 53 will provide automated cluster failover, these docs cover things to consider if there is a catastrophic failure in Oregon and Frankfurt must be promoted to primary rather than a read-only failover.
-`eu-central-1` (Frankfurt) has a read-replica of the SUMO production database
- the replica is currently a `db.m4.xlarge`, while the prod DB is `db.m4.4xlarge`
- this may be ok in maintenance mode, but if you are going to enable write traffic, the instance type must be scaled up.
- SRE's performed a manual instance type change on the Frankfurt read-replica, and it took ~10 minutes to change from a `db.t2.medium` to a `db.m4.xlarge`.
- although we have alerting in place to notify the SRE team in the event of a replication error, it's a good idea to check the replication status on the RDS details page for the `sumo` MySQL instance.
- specifically, check the `DB Instance Status`, `Read Replica Source`, `Replication State`, and `Replication Error` values.
- decide if [promoting the read-replica](http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/USER_ReadRepl.html#USER_ReadRepl.Promote) to a main is appropriate.
- it's preferrable to have a multi-AZ RDS instance, as we can take snapshots against the failover instance (RDS does this by default in a multi-AZ setup).
- if data is written to a promoted instance, and failover back to the us-west-2 clusters is desirable, a full DB backup and restore in us-west-2 is required.
- the replica is automatically rebooted before being promoted to a full instance.
-**ensure image versions are up to date**
- Most MySQL changes should already be replicated to the read-replica, however, if you're reading this, chances are things are broken. Ensure that the DB schema is correct for the iamges you're deploying.
- the [prod deployments yaml](https://github.com/mozilla/kitsune/blob/99c4c2bf5c102f38910485b29fc87c2299daa18b/k8s/regions/oregon/prod.yaml#L24-L48) contains the correct number of replicas, but here are some safe values to use in an emergency: