docs: upgrade + cluster-autoscaler notes (#381)

2019-01-31 10:54:29 -08:00 · 2019-01-31 10:54:29 -08:00 · 1a58ba3202
--- a/docs/topics/upgrade.md
+++ b/docs/topics/upgrade.md
@ -76,6 +76,16 @@ For example,
  --client-secret xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
 ```

-### What can go wrong
+## Known Limitations

-By its nature, the upgrade operation is long running and potentially could fail for various reasons, such as temporary lack of resources, etc. In this case, rerun the command. The `upgrade` command is idempotent, and will pick up execution from the point it failed on.
+### Manual reconciliation
+
+The upgrade operation is long running, and for large clusters, more susceptible to single operational failures. This is based on the design principle of upgrade enumerating, one-at-a-time, through each node in the cluster. A transient Azure resource allocation error could thus interrupt the successful progression of the overall transaction. At present, the upgrade operation is implemented to "fail fast"; and so, if a well formed upgrade operation fails before completing, it can be manually retried by invoking the exact same command line arguments as were sent originally. The upgrade operation will enumerate through the cluster nodes, skipping any nodes that have already been upgraded to the desired Kubernetes version. Those nodes that match the *original* Kubernetes version will then, one-at-a-time, be cordon and drained, and upgraded to the desired version. Put another way, an upgrade command is designed to be idempotent across retry scenarios.
+
+### Cluster-autoscaler + VMSS
+
+There are known limitations with VMSS cluster-autoscaler scenarios and upgrade. Our current guidance is not to use `aks-engine upgrade` on clusters with `cluster-autoscaler` functionality. See [here](https://github.com/Azure/aks-engine/issues/400) to get more information and to track progress of the issues related to these limitations.
+
+### Cluster-autoscaler + Availability Set
+
+We don't recommend using `aks-engine upgrade` on clusters that have Availability Set (non-VMSS) agent pools `cluster-autoscaler` at this time.
--- a/examples/addons/cluster-autoscaler/README.md
+++ b/examples/addons/cluster-autoscaler/README.md
@ -7,9 +7,13 @@

 This is the Kubernetes Cluster Autoscaler add-on for Virtual Machine Scale Sets. Add this add-on to your json file as shown below to automatically enable cluster autoscaler in your new Kubernetes cluster.

+See [this document](../../../docs/topic/upgrade.md) for details on known limitiations with `aks-engine upgrade` and VMSS.
+
 To use this add-on, make sure your cluster's Kubernetes version is 1.10 or above and your agent pool `availabilityProfile` is set to `VirtualMachineScaleSets`. By default, the first agent pool will autoscale the node count between 1 and 5. You can override these settings in `config` section of the `cluster-autoscaler` add-on.

-> At this time, only the primaryScaleSet (the first agent pool) is monitored by the autoscaler. To configure autoscale to monitor (and scale) other node pools, you must manually edit the autoscaler YAML at `/etc/kubernetes/addons` on each master node. See the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md) docs for guidance.
+> At this time, only the primaryScaleSet (the first agent pool) is monitored by the autoscaler. If you use this addon on your single agent pool cluster and later manually add more VMSS agent pools and want to use this cluster-autoscaler addon to monitor (and scale) other node pools, you must manually edit the autoscaler YAML at `/etc/kubernetes/addons` on each master node.
+
+> If you are creating a cluster with multiple VMSS agent pools, or you plan to manually evolve your agent pool count over time, we recommend you don't use this addon. See the [cluster-autoscaler](https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/cloudprovider/azure/README.md) docs for guidance on how to craft a cluster-autoscaler specification that can accommodate a multiple VMSS agent pool cluster.

 The following is an example: