Граф коммитов

12 Коммитов

Автор SHA1 Сообщение Дата
shawnps 72ebb1e3e7 all: fix typos
Change-Id: I16c20d6eb746a3ec81021c2a367d74c258437019
Reviewed-on: https://go-review.googlesource.com/34921
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-01-07 18:01:14 +00:00
Brad Fitzpatrick 71265acedb all: adjust things for upgrade from GKE 1.2 to GKE 1.4
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.

The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.

So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.

But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.

Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.

Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.

Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-01 23:48:21 +00:00
Brad Fitzpatrick 328c9b8918 all: split builder config into builder & host configs
Our builders are named of the form "GOOS-GOARCH" or
"GOOS-GOARCH-suffix".

Over time we've grown many builders. This CL doesn't change
that. Builders continue to be named and operate as before.

Previously the build configuration file (dashboard/builders.go) made
each builder type ("linux-amd64-race", etc) define how to create a
host running a buildlet of that type, even though many builders had
identical host configs. For example, these builders all share the same
host type (a Kubernetes container):

   linux-amd64
   linux-amd64-race
   linux-386
   linux-386-387

And these are the same host type (a GCE VM):

   windows-amd64-gce
   windows-amd64-race
   windows-386-gce

This CL creates a new concept of a "hostType" which defines how
the buildlet is created (Kube, GCE, Reverse, and how), and then each
builder itself references a host type.

Users never see the hostType. (except perhaps in gomote list output)
But they at least never need to care about them.

Reverse buildlets now can only be one hostType at a time, which
simplifies things. We were no longer using multiple roles per machine
once moving to VMs for OS X.

gomote continues to operate as it did previously but its underlying
protocol changed and clients will need to be updated. As a new
feature, gomote now has a new flag to let you reuse a buildlet host
connection for different builder rules if they share the same
underlying host type. But users can ignore that.

This CL is a long-standing TODO (previously attempted and aborted) and
will make many things easier and faster, including the linux-arm
cross-compilation effort, and keeping pre-warmed buildlets of VM types
ready to go.

Updates golang/go#17104

Change-Id: Iad8387f48680424a8441e878a2f4762bf79ea4d2
Reviewed-on: https://go-review.googlesource.com/29551
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-22 23:13:34 +00:00
Brad Fitzpatrick 4a3f4a8c16 kubernetes: avoid the WatchPod streaming API
Poll every 5 seconds instead.

Change-Id: I0a8a30fee1118b2aecf4b3cbbf342e07afc27602
Reviewed-on: https://go-review.googlesource.com/22820
Reviewed-by: Evan Brown <evanbrown@google.com>
2016-05-05 20:21:37 +00:00
Evan Brown 34ff1d9bc8 all: kubernetes builder autoscaling
Improvements to support rapid scheduling of many build jobs:

- Retry logic in Kubernetes client to handle sporadic connection
  closes from their API server under heavy load

- Cluster autoscaler scales on default CPU utilization metric

- Debug mode allows scheduling multiple builds to test scaling

- Account for scheduled vs. provisioned resources in a cluster
  and use that information to estimate when a build's pod
  will be scheduled and in running state

- Use estimated scheduled time to set context timeout

- Track pod lifecycle (requested time, estimated available time,
  actual available time, terminate time, etc)

Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-07 17:22:57 +00:00
Evan Brown a731151878 all: consolidate configuration for coordinator and gomote
buildenv.Environment type defines configuration options:

- Coordinator uses the GCE project name to lookup config. A custom
  config name can be provided at runtime to override.

- The conventional prod and stage project names ('symbolic-datum-552'
  and 'go-dashboard-dev') map to prod and staging configuration structs.

- Production and staging status is explicitly defined in configuration.

- GCS bucket names for buildlet, logs, and snapshots are
  configurable.

Change-Id: I7e6d7874eb0bdfe35dbdd5fcf6212ab50d576b88
Reviewed-on: https://go-review.googlesource.com/19502
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-04 22:24:11 +00:00
Evan Brown c8b96fa9fa all: report cluster utilization to Cloud Monitoring
Change-Id: Iec8e03fdef27582756d96b731c2fb12b56175ca9
Reviewed-on: https://go-review.googlesource.com/18396
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-26 17:10:41 +00:00
Evan Brown 2e452e1be8 all: improve Kubernetes API client
* pod creation returns an *api.Pod
* WatchPod replaces WatchPodStatus

Updates golang/go#12546

Change-Id: I34bb6e0d994e552b41a8082cc4672a663ce961a3
Reviewed-on: https://go-review.googlesource.com/17100
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-20 17:56:59 +00:00
Evan Brown 83f9748046 all: display pool status and delete failed/old pods
* status page shows kube pool details
* pods created by the coorindator are tracked
* pods that fail to create are deleted
* pods older than delete-at are deleted
* pods created by a different coordinator are deleted

Updates golang/go#12546

Change-Id: I4c4f8ff906962b4a014a66d0a9d490ff17710d62
Reviewed-on: https://go-review.googlesource.com/16101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-05 11:33:24 +00:00
Evan Brown 956434c23a all: improved monitoring of buildlet pods after creation
* Replaced cancel with context.Context
* StartPod can be canceled
* Wait for buildlet to come online, but fail fast if pod fails first
* Support timeout waiting for pod to leave pending phase
* Use Kubernetes watch API (long poll)

Updates golang/go#12546

Change-Id: I792a3b8fed615362a0290feee7de0c2cefe43c0e
Reviewed-on: https://go-review.googlesource.com/15285
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-15 23:45:50 +00:00
Evan Brown 4dfb85f32a all: basic support for linux-amd64 buildlets on Kubernetes
* set correct metadata as env vars on each container in pod

* stage0 detects when running in a pod

* add scope to support farmer on GCE communicating with Google
  Container Engine API

* dev mode supports GCE and Kubernetes even when not running
  on GCE

* pod buildlet returns a configured http.Client

* pod buildlets work, but pods are not removed after completion
  (or on creation failure)

Updates golang/go#12546

Change-Id: If91673b49223130c1e7077c130f1abe1e7966d02
Reviewed-on: https://go-review.googlesource.com/15041
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-01 17:25:38 +00:00
Evan Brown 882e20c4c9 cmd/coordinator: discover kube cluster
Find the default Kubernetes cluster and configure a client to talk to it.
Use application default credentials.

Updates golang/go#12546

Change-Id: Ifb1ce57f52f4fbbee3267f8cc3cf02a78146bd5b
Reviewed-on: https://go-review.googlesource.com/14532
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-21 18:28:38 +00:00