etcd-operator/README.md

263 строки
8.6 KiB
Markdown
Исходник Обычный вид История

2016-08-05 03:12:08 +03:00
# kube-etcd-controller
2016-08-26 02:51:12 +03:00
Project status: pre-alpha
2016-08-10 03:58:05 +03:00
Managed etcd clusters on Kubernetes:
- creation
- destroy
- resize
- recovery
- backup
2016-10-03 03:55:18 +03:00
- cluster migration
- migrate the non managed etcd cluster into the controller's manage
2016-08-10 03:58:05 +03:00
- rolling upgrade
2016-08-10 04:13:58 +03:00
## Requirements
2016-08-17 06:39:18 +03:00
- Kubernetes 1.4+
- etcd 3.0+
## Limitations
- Backup only works for data in etcd3 storage, not etcd2 storage.
2016-10-03 03:55:18 +03:00
- Migration only supports single member cluster with all nodes running in the same Kuberentes cluster.
2016-08-10 04:13:58 +03:00
2016-08-24 00:26:04 +03:00
## Deploy kube-etcd-controller
2016-08-05 03:12:08 +03:00
```bash
2016-08-24 00:26:04 +03:00
$ kubectl create -f example/etcd-controller.yaml
pod "kube-etcd-controller" created
2016-08-05 03:12:08 +03:00
```
kube-etcd-controller will create a "EtcdCluster" TPR and "etcd-controller-backup" storage class automatically.
2016-08-05 03:12:08 +03:00
```bash
2016-08-05 03:35:46 +03:00
$ kubectl get thirdpartyresources
2016-08-05 03:12:08 +03:00
NAME DESCRIPTION VERSION(S)
2016-08-24 00:26:04 +03:00
etcd-cluster.coreos.com Managed etcd clusters v1
2016-08-05 03:30:46 +03:00
```
2016-08-05 03:12:08 +03:00
## Create an etcd cluster
```bash
2016-08-05 03:35:46 +03:00
$ kubectl create -f example/example-etcd-cluster.yaml
2016-08-05 03:12:08 +03:00
```
```bash
2016-08-05 03:35:46 +03:00
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-cluster-0000 1/1 Running 0 23s
etcd-cluster-0001 1/1 Running 0 16s
etcd-cluster-0002 1/1 Running 0 8s
etcd-cluster-backup-tool-rhygq 1/1 Running 0 18s
2016-08-05 03:12:08 +03:00
```
```bash
2016-08-05 03:35:46 +03:00
$ kubectl get services
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
etcd-cluster-0000 10.0.84.34 <none> 2380/TCP,2379/TCP 37s
etcd-cluster-0001 10.0.51.78 <none> 2380/TCP,2379/TCP 30s
etcd-cluster-0002 10.0.140.141 <none> 2380/TCP,2379/TCP 22s
etcd-cluster-backup-tool 10.0.59.243 <none> 19999/TCP 32s
2016-08-05 03:12:08 +03:00
```
```bash
$ kubectl logs etcd-cluster-0000
2016-08-05 03:35:46 +03:00
...
2016-08-05 00:33:32.453768 I | api: enabled capabilities for version 3.0
2016-08-05 00:33:32.454178 N | etcdmain: serving insecure client requests on 0.0.0.0:2379, this is strongly discouraged!
2016-08-05 03:12:08 +03:00
```
2016-08-10 00:20:42 +03:00
2016-08-26 01:40:55 +03:00
## Resize an etcd cluster
2016-08-25 23:17:25 +03:00
`kubectl apply` doesn't work for TPR at the moment. See [kubernetes/#29542](https://github.com/kubernetes/kubernetes/issues/29542).
2016-08-26 00:32:46 +03:00
In this example, we use cURL to update the cluster as a workaround.
Use kubectl to create a reverse proxy:
```
$ kubectl proxy --port=8080
Starting to serve on 127.0.0.1:8080
```
Now we can talk to apiserver via "http://127.0.0.1:8080".
Have json file:
```
$ cat body.json
{
"apiVersion": "coreos.com/v1",
"kind": "EtcdCluster",
"metadata": {
"name": "etcd-cluster",
"namespace": "default"
},
"spec": {
"backup": {
"maxSnapshot": 5,
"snapshotIntervalInSecond": 30,
"volumeSizeInMB": 512
},
"size": 5
}
}
```
2016-08-26 00:32:46 +03:00
In another terminal, use the following command changed the cluster size from 3 to 5.
2016-08-25 23:17:25 +03:00
```
$ curl -H 'Content-Type: application/json' -X PUT --data @body.json http://127.0.0.1:8080/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster
{"apiVersion":"coreos.com/v1","kind":"EtcdCluster","metadata":{"name":"etcd-cluster","namespace":"default","selfLink":"/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster","uid":"4773679d-86cf-11e6-9086-42010af00002","resourceVersion":"438492","creationTimestamp":"2016-09-30T05:32:29Z"},"spec":{"backup":{"maxSnapshot":5,"snapshotIntervalInSecond":30,"volumeSizeInMB":512},"size":5}}
2016-08-25 23:17:25 +03:00
```
2016-08-26 00:32:46 +03:00
We should see
2016-08-25 23:17:25 +03:00
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
NAME READY STATUS RESTARTS AGE
etcd-cluster-0000 1/1 Running 0 3m
etcd-cluster-0001 1/1 Running 0 2m
etcd-cluster-0002 1/1 Running 0 2m
etcd-cluster-0003 1/1 Running 0 9s
etcd-cluster-0004 0/1 ContainerCreating 0 1s
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 2m
2016-08-25 23:17:25 +03:00
```
2016-08-26 00:32:46 +03:00
Now we can decrease the size of cluster from 5 back to 3.
```
$ curl -H 'Content-Type: application/json'-X PUT http://127.0.0.1:8080/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster -d '{"apiVersion":"coreos.com/v1", "kind": "EtcdCluster", "metadata": {"name": "etcd-cluster", "namespace": "default"}, "spec": {"size": 3}}'
{"apiVersion":"coreos.com/v1","kind":"EtcdCluster","metadata":{"name":"etcd-cluster","namespace":"default","selfLink":"/apis/coreos.com/v1/namespaces/default/etcdclusters/etcd-cluster","uid":"e5828789-6b01-11e6-a730-42010af00002","resourceVersion":"32179","creationTimestamp":"2016-08-25T20:24:17Z"},"spec":{"size":3}}
```
We should see
```
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-cluster-0002 1/1 Running 0 3m
etcd-cluster-0003 1/1 Running 0 1m
etcd-cluster-0004 1/1 Running 0 1m
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 3m
2016-08-26 00:32:46 +03:00
```
2016-08-10 03:58:05 +03:00
## Destroy an existing etcd cluster
2016-08-10 00:20:42 +03:00
```bash
$ kubectl delete -f example/example-etcd-cluster.yaml
```
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
```
2016-08-10 03:45:48 +03:00
## Try member recovery
2016-10-03 03:50:25 +03:00
If the minority of etcd members crash, the etcd controller will automatically recover the failure.
Let's walk through in the following steps.
2016-08-10 03:45:48 +03:00
Redo "create" process to have initial 3 members cluster.
2016-10-03 03:50:25 +03:00
Simulate a member failure by deleting a pod:
2016-08-10 03:45:48 +03:00
```bash
2016-08-23 21:18:32 +03:00
$ kubectl delete pod etcd-cluster-0000
2016-08-10 03:45:48 +03:00
```
2016-10-03 03:50:25 +03:00
The etcd controller will recover the failure by creating a new pod `etcd-cluster-0003`
2016-08-10 03:45:48 +03:00
```bash
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
2016-08-23 21:18:32 +03:00
etcd-cluster-0001 1/1 Running 0 5s
etcd-cluster-0002 1/1 Running 0 5s
etcd-cluster-0003 1/1 Running 0 5s
2016-08-10 03:45:48 +03:00
```
2016-08-24 03:07:45 +03:00
2016-08-24 03:09:16 +03:00
## Try controller recovery
2016-10-03 03:50:25 +03:00
If the etcd controller restarts, it can recover its previous state.
Continued from above, you can try to simulate a controller crash and a member crash:
2016-08-24 03:07:45 +03:00
```bash
$ kubectl delete -f example/etcd-controller.yaml
pod "kube-etcd-controller" deleted
2016-08-24 03:07:45 +03:00
$ kubectl delete etcd-cluster-0001
pod "etcd-cluster-0001" deleted
2016-08-24 03:07:45 +03:00
```
2016-10-03 03:50:25 +03:00
Then restart the etcd controller. It should automatically recover itself. It also recovers the etcd cluster!
2016-08-24 03:09:16 +03:00
2016-08-24 03:07:45 +03:00
```bash
$ kubectl create -f example/etcd-cluster.yaml
pod "kube-etcd-controller" created
2016-08-24 03:07:45 +03:00
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-cluster-0002 1/1 Running 0 4m
etcd-cluster-0003 1/1 Running 0 4m
2016-08-24 03:07:45 +03:00
etcd-cluster-0004 1/1 Running 0 6s
```
## Try disaster recovery
2016-10-03 03:50:25 +03:00
If the majority of etcd members crash and at least one backup exists for the cluster, the etcd controller can restore
entire cluster from the backup.
By default, the etcd controller creates a storage class on initialization:
```
$ kubectl get storageclass
NAME TYPE
etcd-controller-backup kubernetes.io/gce-pd
```
2016-10-03 03:50:25 +03:00
This is used to request the persistent volume to store the backup data. (We are planning to support AWS EBS soon.)
Continued from last example, a persistent volume is claimed for the backup pod:
```
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
pvc-etcd-cluster Bound pvc-164d18fe-8797-11e6-a8b4-42010af00002 1Gi RWO 14m
```
2016-10-03 03:50:25 +03:00
Let's try to write some data into etcd:
```
$ kubectl run --rm -i --tty fun --image quay.io/coreos/etcd --restart=Never -- /bin/sh
/ # ETCDCTL_API=3 etcdctl --endpoints http://etcd-cluster-0002:2379 put foo bar
OK
(ctrl-D to exit)
```
2016-10-03 03:50:25 +03:00
Now let's kill two pods to simulate a disaster failure:
```
$ kubectl delete pod etcd-cluster-000 etcd-cluster-0003
pod "etcd-cluster-0002" deleted
pod "etcd-cluster-0003" deleted
```
2016-10-03 03:50:25 +03:00
Now quorum is lost. The etcd controller will start ti recover the cluster by:
- create a new seed member to recover from the backup
- add enough members into the seed cluster
```
2016-10-03 03:50:25 +03:00
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-cluster-0005 0/1 Init:0/2 0 11s
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 18m
...
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
etcd-cluster-0005 1/1 Running 0 3m
etcd-cluster-0006 1/1 Running 0 3m
etcd-cluster-0007 1/1 Running 0 3m
etcd-cluster-backup-tool-e9gkv 1/1 Running 0 22m
```
2016-10-03 03:50:25 +03:00
Note that there might be race that it falls to member recovery because the second pod hasn't been deleted yet.