2016-08-05 03:12:08 +03:00
|
|
|
# kube-etcd-controller
|
|
|
|
|
2016-08-26 02:51:12 +03:00
|
|
|
Project status: pre-alpha
|
|
|
|
|
2016-08-10 03:58:05 +03:00
|
|
|
Managed etcd clusters on Kubernetes:
|
|
|
|
|
|
|
|
- creation
|
|
|
|
- destroy
|
|
|
|
- resize
|
|
|
|
- recovery
|
|
|
|
- backup
|
2016-10-03 03:55:18 +03:00
|
|
|
- cluster migration
|
|
|
|
- migrate the non managed etcd cluster into the controller's manage
|
2016-08-10 03:58:05 +03:00
|
|
|
- rolling upgrade
|
|
|
|
|
2016-08-10 04:13:58 +03:00
|
|
|
## Requirements
|
|
|
|
|
2016-08-17 06:39:18 +03:00
|
|
|
- Kubernetes 1.4+
|
|
|
|
- etcd 3.0+
|
|
|
|
|
|
|
|
## Limitations
|
|
|
|
|
|
|
|
- Backup only works for data in etcd3 storage, not etcd2 storage.
|
2016-10-03 03:55:18 +03:00
|
|
|
- Migration only supports single member cluster with all nodes running in the same Kuberentes cluster.
|
2016-08-10 04:13:58 +03:00
|
|
|
|
2016-08-24 00:26:04 +03:00
|
|
|
## Deploy kube-etcd-controller
|
2016-08-05 03:12:08 +03:00
|
|
|
|
|
|
|
```bash
|
2016-08-24 00:26:04 +03:00
|
|
|
$ kubectl create -f example/etcd-controller.yaml
|
2016-09-30 20:53:58 +03:00
|
|
|
pod "kube-etcd-controller" created
|
2016-08-05 03:12:08 +03:00
|
|
|
```
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
kube-etcd-controller will create a "EtcdCluster" TPR and "etcd-controller-backup" storage class automatically.
|
2016-08-05 03:12:08 +03:00
|
|
|
|
|
|
|
```bash
|
2016-08-05 03:35:46 +03:00
|
|
|
$ kubectl get thirdpartyresources
|
2016-08-05 03:12:08 +03:00
|
|
|
NAME DESCRIPTION VERSION(S)
|
2016-08-24 00:26:04 +03:00
|
|
|
etcd-cluster.coreos.com Managed etcd clusters v1
|
2016-08-05 03:30:46 +03:00
|
|
|
```
|
|
|
|
|
2016-08-05 03:12:08 +03:00
|
|
|
## Create an etcd cluster
|
|
|
|
|
|
|
|
```bash
|
2016-08-05 03:35:46 +03:00
|
|
|
$ kubectl create -f example/example-etcd-cluster.yaml
|
2016-08-05 03:12:08 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
```bash
|
2016-08-05 03:35:46 +03:00
|
|
|
$ kubectl get pods
|
2016-09-30 20:53:58 +03:00
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0000 1/1 Running 0 23s
|
|
|
|
etcd-cluster-0001 1/1 Running 0 16s
|
|
|
|
etcd-cluster-0002 1/1 Running 0 8s
|
|
|
|
etcd-cluster-backup-tool-rhygq 1/1 Running 0 18s
|
2016-08-05 03:12:08 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
```bash
|
2016-08-05 03:35:46 +03:00
|
|
|
$ kubectl get services
|
2016-09-30 20:53:58 +03:00
|
|
|
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
|
|
|
|
etcd-cluster-0000 10.0.84.34 <none> 2380/TCP,2379/TCP 37s
|
|
|
|
etcd-cluster-0001 10.0.51.78 <none> 2380/TCP,2379/TCP 30s
|
|
|
|
etcd-cluster-0002 10.0.140.141 <none> 2380/TCP,2379/TCP 22s
|
|
|
|
etcd-cluster-backup-tool 10.0.59.243 <none> 19999/TCP 32s
|
2016-08-05 03:12:08 +03:00
|
|
|
```
|
|
|
|
|
|
|
|
```bash
|
2016-09-30 20:53:58 +03:00
|
|
|
$ kubectl logs etcd-cluster-0000
|
2016-08-05 03:35:46 +03:00
|
|
|
...
|
|
|
|
2016-08-05 00:33:32.453768 I | api: enabled capabilities for version 3.0
|
|
|
|
2016-08-05 00:33:32.454178 N | etcdmain: serving insecure client requests on 0.0.0.0:2379, this is strongly discouraged!
|
2016-08-05 03:12:08 +03:00
|
|
|
```
|
2016-08-10 00:20:42 +03:00
|
|
|
|
2016-08-26 01:40:55 +03:00
|
|
|
## Resize an etcd cluster
|
2016-08-25 23:17:25 +03:00
|
|
|
|
|
|
|
`kubectl apply` doesn't work for TPR at the moment. See [kubernetes/#29542](https://github.com/kubernetes/kubernetes/issues/29542).
|
|
|
|
|
2016-08-26 00:32:46 +03:00
|
|
|
In this example, we use cURL to update the cluster as a workaround.
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
Use kubectl to create a reverse proxy:
|
|
|
|
```
|
|
|
|
$ kubectl proxy --port=8080
|
|
|
|
Starting to serve on 127.0.0.1:8080
|
|
|
|
```
|
|
|
|
Now we can talk to apiserver via "http://127.0.0.1:8080".
|
|
|
|
|
|
|
|
Have json file:
|
|
|
|
```
|
|
|
|
$ cat body.json
|
|
|
|
{
|
|
|
|
"apiVersion": "coreos.com/v1",
|
|
|
|
"kind": "EtcdCluster",
|
|
|
|
"metadata": {
|
|
|
|
"name": "etcd-cluster",
|
|
|
|
"namespace": "default"
|
|
|
|
},
|
|
|
|
"spec": {
|
|
|
|
"backup": {
|
|
|
|
"maxSnapshot": 5,
|
|
|
|
"snapshotIntervalInSecond": 30,
|
|
|
|
"volumeSizeInMB": 512
|
|
|
|
},
|
|
|
|
"size": 5
|
|
|
|
}
|
|
|
|
}
|
|
|
|
```
|
2016-08-26 00:32:46 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
In another terminal, use the following command changed the cluster size from 3 to 5.
|
2016-08-25 23:17:25 +03:00
|
|
|
```
|
2016-09-30 20:53:58 +03:00
|
|
|
$ curl -H 'Content-Type: application/json' -X PUT --data @body.json http://127.0.0.1:8080/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster
|
|
|
|
{"apiVersion":"coreos.com/v1","kind":"EtcdCluster","metadata":{"name":"etcd-cluster","namespace":"default","selfLink":"/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster","uid":"4773679d-86cf-11e6-9086-42010af00002","resourceVersion":"438492","creationTimestamp":"2016-09-30T05:32:29Z"},"spec":{"backup":{"maxSnapshot":5,"snapshotIntervalInSecond":30,"volumeSizeInMB":512},"size":5}}
|
2016-08-25 23:17:25 +03:00
|
|
|
```
|
|
|
|
|
2016-08-26 00:32:46 +03:00
|
|
|
We should see
|
|
|
|
|
2016-08-25 23:17:25 +03:00
|
|
|
```
|
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
2016-09-30 20:53:58 +03:00
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0000 1/1 Running 0 3m
|
|
|
|
etcd-cluster-0001 1/1 Running 0 2m
|
|
|
|
etcd-cluster-0002 1/1 Running 0 2m
|
|
|
|
etcd-cluster-0003 1/1 Running 0 9s
|
|
|
|
etcd-cluster-0004 0/1 ContainerCreating 0 1s
|
|
|
|
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 2m
|
2016-08-25 23:17:25 +03:00
|
|
|
```
|
|
|
|
|
2016-08-26 00:32:46 +03:00
|
|
|
Now we can decrease the size of cluster from 5 back to 3.
|
|
|
|
|
|
|
|
```
|
|
|
|
$ curl -H 'Content-Type: application/json'-X PUT http://127.0.0.1:8080/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster -d '{"apiVersion":"coreos.com/v1", "kind": "EtcdCluster", "metadata": {"name": "etcd-cluster", "namespace": "default"}, "spec": {"size": 3}}'
|
|
|
|
{"apiVersion":"coreos.com/v1","kind":"EtcdCluster","metadata":{"name":"etcd-cluster","namespace":"default","selfLink":"/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster","uid":"e5828789-6b01-11e6-a730-42010af00002","resourceVersion":"32179","creationTimestamp":"2016-08-25T20:24:17Z"},"spec":{"size":3}}
|
|
|
|
```
|
|
|
|
|
|
|
|
We should see
|
|
|
|
|
|
|
|
```
|
|
|
|
$ kubectl get pods
|
2016-09-30 20:53:58 +03:00
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0002 1/1 Running 0 3m
|
|
|
|
etcd-cluster-0003 1/1 Running 0 1m
|
|
|
|
etcd-cluster-0004 1/1 Running 0 1m
|
|
|
|
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 3m
|
2016-08-26 00:32:46 +03:00
|
|
|
```
|
|
|
|
|
2016-08-10 03:58:05 +03:00
|
|
|
## Destroy an existing etcd cluster
|
2016-08-10 00:20:42 +03:00
|
|
|
|
|
|
|
```bash
|
|
|
|
$ kubectl delete -f example/example-etcd-cluster.yaml
|
|
|
|
```
|
|
|
|
|
|
|
|
```bash
|
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
```
|
2016-08-10 03:45:48 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
## Try member recovery
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
If the minority of etcd members crash, the etcd controller will automatically recover the failure.
|
2016-09-30 20:53:58 +03:00
|
|
|
Let's walk through in the following steps.
|
2016-08-10 03:45:48 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
Redo "create" process to have initial 3 members cluster.
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
Simulate a member failure by deleting a pod:
|
2016-08-10 03:45:48 +03:00
|
|
|
```bash
|
2016-08-23 21:18:32 +03:00
|
|
|
$ kubectl delete pod etcd-cluster-0000
|
2016-08-10 03:45:48 +03:00
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
The etcd controller will recover the failure by creating a new pod `etcd-cluster-0003`
|
2016-08-10 03:45:48 +03:00
|
|
|
|
|
|
|
```bash
|
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
2016-08-23 21:18:32 +03:00
|
|
|
etcd-cluster-0001 1/1 Running 0 5s
|
|
|
|
etcd-cluster-0002 1/1 Running 0 5s
|
|
|
|
etcd-cluster-0003 1/1 Running 0 5s
|
2016-08-10 03:45:48 +03:00
|
|
|
```
|
2016-08-24 03:07:45 +03:00
|
|
|
|
2016-08-24 03:09:16 +03:00
|
|
|
## Try controller recovery
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
If the etcd controller restarts, it can recover its previous state.
|
2016-09-30 20:53:58 +03:00
|
|
|
|
|
|
|
Continued from above, you can try to simulate a controller crash and a member crash:
|
2016-08-24 03:07:45 +03:00
|
|
|
|
|
|
|
```bash
|
|
|
|
$ kubectl delete -f example/etcd-controller.yaml
|
2016-09-30 20:53:58 +03:00
|
|
|
pod "kube-etcd-controller" deleted
|
2016-08-24 03:07:45 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
$ kubectl delete etcd-cluster-0001
|
|
|
|
pod "etcd-cluster-0001" deleted
|
2016-08-24 03:07:45 +03:00
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
Then restart the etcd controller. It should automatically recover itself. It also recovers the etcd cluster!
|
2016-08-24 03:09:16 +03:00
|
|
|
|
2016-08-24 03:07:45 +03:00
|
|
|
```bash
|
2016-09-30 20:53:58 +03:00
|
|
|
$ kubectl create -f example/etcd-cluster.yaml
|
|
|
|
pod "kube-etcd-controller" created
|
2016-08-24 03:07:45 +03:00
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0002 1/1 Running 0 4m
|
2016-09-30 20:53:58 +03:00
|
|
|
etcd-cluster-0003 1/1 Running 0 4m
|
2016-08-24 03:07:45 +03:00
|
|
|
etcd-cluster-0004 1/1 Running 0 6s
|
|
|
|
```
|
2016-09-30 20:53:58 +03:00
|
|
|
|
|
|
|
## Try disaster recovery
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
If the majority of etcd members crash and at least one backup exists for the cluster, the etcd controller can restore
|
|
|
|
entire cluster from the backup.
|
|
|
|
|
|
|
|
By default, the etcd controller creates a storage class on initialization:
|
2016-09-30 20:53:58 +03:00
|
|
|
|
|
|
|
```
|
|
|
|
$ kubectl get storageclass
|
|
|
|
NAME TYPE
|
|
|
|
etcd-controller-backup kubernetes.io/gce-pd
|
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
This is used to request the persistent volume to store the backup data. (We are planning to support AWS EBS soon.)
|
|
|
|
|
|
|
|
Continued from last example, a persistent volume is claimed for the backup pod:
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
```
|
|
|
|
$ kubectl get pvc
|
|
|
|
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
|
|
|
|
pvc-etcd-cluster Bound pvc-164d18fe-8797-11e6-a8b4-42010af00002 1Gi RWO 14m
|
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
Let's try to write some data into etcd:
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
```
|
|
|
|
$ kubectl run --rm -i --tty fun --image quay.io/coreos/etcd --restart=Never -- /bin/sh
|
|
|
|
/ # ETCDCTL_API=3 etcdctl --endpoints http://etcd-cluster-0002:2379 put foo bar
|
|
|
|
OK
|
|
|
|
(ctrl-D to exit)
|
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
Now let's kill two pods to simulate a disaster failure:
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
```
|
|
|
|
$ kubectl delete pod etcd-cluster-000 etcd-cluster-0003
|
|
|
|
pod "etcd-cluster-0002" deleted
|
|
|
|
pod "etcd-cluster-0003" deleted
|
|
|
|
```
|
|
|
|
|
2016-10-03 03:50:25 +03:00
|
|
|
Now quorum is lost. The etcd controller will start ti recover the cluster by:
|
|
|
|
- create a new seed member to recover from the backup
|
|
|
|
- add enough members into the seed cluster
|
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
```
|
2016-10-03 03:50:25 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0005 0/1 Init:0/2 0 11s
|
|
|
|
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 18m
|
|
|
|
...
|
|
|
|
$ kubectl get pods
|
|
|
|
NAME READY STATUS RESTARTS AGE
|
|
|
|
etcd-cluster-0005 1/1 Running 0 3m
|
|
|
|
etcd-cluster-0006 1/1 Running 0 3m
|
|
|
|
etcd-cluster-0007 1/1 Running 0 3m
|
|
|
|
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 22m
|
|
|
|
```
|
2016-10-03 03:50:25 +03:00
|
|
|
|
2016-09-30 20:53:58 +03:00
|
|
|
Note that there might be race that it falls to member recovery because the second pod hasn't been deleted yet.
|