It would be nice to run these alongside the tests. There is only one
vet error left in the x/build tree, an unreachable return next to a
TODO.
Change-Id: Ic31d96cdee5bb0f6b9795237ff3b743a4b76e3cf
Reviewed-on: https://go-review.googlesource.com/36898
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
From golang.org/cl/36692
Change-Id: Icc63edd28b687cf30c222a3125371ad912bc6da7
Reviewed-on: https://go-review.googlesource.com/36802
Reviewed-by: Kevin Burke <kev@inburke.com>
Reviewed-by: Matt Layher <mdlayher@gmail.com>
And moves Dial methods to *kubernetes.Client, instead of a separate
Dialer type.
Updates golang/go#18817
Change-Id: I045ac48441b9139cb0e01ceb4969b29cf5e72507
Reviewed-on: https://go-review.googlesource.com/36692
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.
The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.
So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.
But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.
Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.
Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.
Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
https://golang.org/cl/21765 was not tested and not adequately reviewed
and doesn't work at all. Roll it back for now, since the blind fix in
https://golang.org/cl/21765 helped a bit but wasn't enough.
Change-Id: I781fac04d019fa5e6120ec398e13671b89d802c5
Reviewed-on: https://go-review.googlesource.com/25043
Reviewed-by: Johan Euphrosine <proppy@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Improvements to support rapid scheduling of many build jobs:
- Retry logic in Kubernetes client to handle sporadic connection
closes from their API server under heavy load
- Cluster autoscaler scales on default CPU utilization metric
- Debug mode allows scheduling multiple builds to test scaling
- Account for scheduled vs. provisioned resources in a cluster
and use that information to estimate when a build's pod
will be scheduled and in running state
- Use estimated scheduled time to set context timeout
- Track pod lifecycle (requested time, estimated available time,
actual available time, terminate time, etc)
Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* status page shows kube pool details
* pods created by the coorindator are tracked
* pods that fail to create are deleted
* pods older than delete-at are deleted
* pods created by a different coordinator are deleted
Updates golang/go#12546
Change-Id: I4c4f8ff906962b4a014a66d0a9d490ff17710d62
Reviewed-on: https://go-review.googlesource.com/16101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* Replaced cancel with context.Context
* StartPod can be canceled
* Wait for buildlet to come online, but fail fast if pod fails first
* Support timeout waiting for pod to leave pending phase
* Use Kubernetes watch API (long poll)
Updates golang/go#12546
Change-Id: I792a3b8fed615362a0290feee7de0c2cefe43c0e
Reviewed-on: https://go-review.googlesource.com/15285
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>