Граф коммитов

79 Коммитов

Автор SHA1 Сообщение Дата
Brad Fitzpatrick 6ae8750e84 all: use Container-Optimized VMs instead of Kubernetes for buildlet containers
Fixes golang/go#25108

Change-Id: I084669b52b699700ed26a7fdd890d9205a8b9dc9
Reviewed-on: https://go-review.googlesource.com/111267
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-05-11 03:29:07 +00:00
Brad Fitzpatrick 23fc7515b0 buildlet: remove support for preemptible VMs we haven't used in a long time
Change-Id: I867c4b491327962410768291f8656665c2399e7b
Reviewed-on: https://go-review.googlesource.com/111641
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-05 17:19:19 +00:00
Brad Fitzpatrick 14d99737ae all: rename Kube to Container and IsGCE to IsVM
Once containers run on COS instead of Kubernetes, one name (Kube*) is
wrong and the other (GCE) is ambiguous. So rename them now to be more
specific.

No behavior changes. Just renaming in this step, to reduce size of
next CL.

Updates golang/go#25108

Change-Id: Ib09eb682ef74acbbf6ed50b46074f834ef5e0c0b
Reviewed-on: https://go-review.googlesource.com/111639
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-05 16:50:07 +00:00
Brad Fitzpatrick 9929a3fe2b all: move to using the oauth2/google.Credentials type, add buildenv accessor
This removes some duplication of scopes and how to get the
TokenSource and which credentials to use.

And update the coordinator deps, since its rev of
golang.org/x/oauth2/google was too old to have the new type.

I want to clean this up more, but I need to make some changes to to
the oauth2/google package first. More later.

Change-Id: Ic2799ec2ec62f67c65de6380b373fe915a43003e
Reviewed-on: https://go-review.googlesource.com/111266
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-05-04 17:04:37 +00:00
Brad Fitzpatrick a6dd626c4c all: use "context" instead of golang.org/x/net/context
Change-Id: I106c81a16eb5b39ff38c6f896095a27f597b3f8d
Reviewed-on: https://go-review.googlesource.com/99016
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-03-06 23:26:52 +00:00
Brad Fitzpatrick 73f88a6d4c all: add README.md files where missing, and tool to keep them updated
Change-Id: I385171c415bf168c04c6c3a7a996bff88964af84
Reviewed-on: https://go-review.googlesource.com/52856
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2017-08-02 22:17:52 +00:00
Brad Fitzpatrick 4eceee2d0f cmd/coordinator, cmd/buildlet, cmd/gomote: add SSH support
This adds an SSH server to farmer.golang.org on port 2222 that proxies
SSH connections to users' gomote-created buildlet instances.

For example:

    $ gomote create openbsd-amd64-60
    user-bradfitz-openbsd-amd64-60-1

    $ gomote ssh user-bradfitz-openbsd-amd64-60-1
    Warning: Permanently added '[localhost]:33351' (ECDSA) to the list of known hosts.
    OpenBSD 6.0 (GENERIC.MP) golang/go#2319: Tue Jul 26 13:00:43 MDT 2016

    Welcome to OpenBSD: The proactively secure Unix-like operating system.

    Please use the sendbug(1) utility to report bugs in the system.
    Before reporting a bug, please try to reproduce it with the latest
    version of the code.  With bug reports, please try to ensure that
    enough information to reproduce the problem is enclosed, and if a
    known fix for it exists, include that as well.

    $

As before, if the coordinator process is restarted (or crashes, is
evicted, etc), all gomote instances die.

Not yet supported:

* scp (help wanted)
* not all host types are configured. most are. some will need slight
  config tweaks to the Docker image (e.g. adding openssh-server)

Supports currently:

* linux-amd64 (host type shared by 386, nacl)
* linux-arm
* linux-arm64
* darwin
* freebsd
* openbsd
* plan9-386
* windows

Implementation details:

* the ssh server process listens on port 2222 in the coordinator
  (farmer.golang.org), which is behind a GKE TCP load balancer.

* the ssh server library is github.com/gliderlabs/ssh

* authentication is done via Github users' public keys. It's assumed
  that gomote user == github user. But there's a mapping in the code
  for known exceptions.

* we can't give out access to this too widely. too many things are
  accessible from within the host environment if you look in the right
  places. Details omitted. But the Go team and other trusted gomote
  users can use this.

* the buildlet binary has a new /connect-ssh handler that acts like a
  CONNECT request but instead of taking an explicit host:port, just
  says "give me your machine's SSH connection". The buildlet can also
  start sshd if needed for the environment. The /connect-ssh handler
  also installs the coordinator's public key.

* a new buildlet client library method "ConnectSSH" hits the /connect-ssh
  handler and returns a net.Conn.

* the coordinator's ssh.Handler is just running the OpenSSH ssh client.

* because the OpenSSH ssh child process can't connect to a net.Conn,
  an emphemeral localhost port is created on the coordinator to proxy
  between the ssh client and the net.Conn returned by ConnectSSH.

* The /connect-ssh handler requires http.Hijacker, which requires
  fully compliant net.Conn implementations as of Go 1.8. So I needed
  to flesh out revdial too, testing it with the
  golang.org/x/net/nettest package.

* plan9 doesn't have an ssh server, so we use 0intro's new conterm
  program (drawterm without GUI support) to connect to plan9 from the
  coordinator ssh proxy instead of using the OpenSSH ssh client
  binary.

* windows doesn't have an ssh server, so we enable the telnet service
  and the coordinator ssh proxy uses telnet instead on the backend
  on the private network. (There is a Windows ssh server but only in
  new versions.)

Happy debugging over ssh!

Fixes golang/go#19956

Change-Id: I80a62064c5f85af1f195f980c862ba29af4015f0
Reviewed-on: https://go-review.googlesource.com/50750
Reviewed-by: Herbie Ong <herbie@google.com>
Reviewed-by: Jessie Frazelle <me@jessfraz.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-07-28 18:21:11 +00:00
Brad Fitzpatrick 81725e0e0e cmd/debugnewvm: new debug tool to create buildlets on GCE for testing
Updates golang/go#21153

Change-Id: I1d80424a3272e7ee21eb176b020273d5903e444b
Reviewed-on: https://go-review.googlesource.com/51030
Reviewed-by: Sarah Adams <shadams@google.com>
2017-07-25 22:16:56 +00:00
Josh Bleecher Snyder d5f7ed3f5b buildlet: add config file to alter username
My gomote username is josharian.
My whoami is josh.
As a result, I need to run 'gomote -user=josharian'
every time, which is an annoyance.

Add a simple config file to allow a user to change their user name.
On startup, gomote checks whether user-$(whoami).user exists.
If so, it reads it to get the real username.
So I have a config file ~/.config/gomote/user-josh.user
with the contents 'josharian'.

It's a bit of an ugly hack, but it works and spares me some pain.

Change-Id: I372d5a786b99c9e3c6a57f25b0a38b9146f23598
Reviewed-on: https://go-review.googlesource.com/42230
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-05-01 18:04:27 +00:00
Brad Fitzpatrick acd9ecfc7a dashboard: add configs for new windows 2008, 2012, 2016 builders
See also: https://golang.org/cl/41142

Updates golang/go#17513

Change-Id: Ie743ae4604e65892e28423ed0af008450b647197
Reviewed-on: https://go-review.googlesource.com/41393
Reviewed-by: Jeff Johnson <jrjohnson@google.com>
Reviewed-by: Alex Brainman <alex.brainman@gmail.com>
2017-04-21 23:29:29 +00:00
Kevin Burke f2cd214fae all: fix vet errors
Replace oauth2.NoContext (deprecated) with context.Background(),
which has been available for two consecutive releases.

Add more cloud.google.com/go packages to the cmd/coordinator
Dockerfile to fix an error building cmd/coordinator. A dependency is
not present in master of cloud.google.com/go, but was present in the
older revision, and was not getting checked out correctly during the
"go get" step. In addition, we were failing to fetch dependencies for
some packages that coordinator depends on. I added instructions for
hopefully doing this more systematically in the future.

Fix the gitmirror Dockerfile which has the same problem.

Change-Id: Id6c2220482350a686b87742ec7915c457a689e52
Reviewed-on: https://go-review.googlesource.com/40852
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-15 23:42:12 +00:00
Brad Fitzpatrick 9a094f9b85 buildlet: fix doc typo
Change-Id: I2e43aa33634eb63b7cbdcc6ba6296fe8e857ad81
Reviewed-on: https://go-review.googlesource.com/36912
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-13 18:03:35 +00:00
Brad Fitzpatrick 1e7c6919f1 cmd/buildlet, cmd/release: disable pargzip for release builds
Fixes golang/go#19052

Change-Id: Icf2923ea72290bd99b78e1c2fd9675668157b3ac
Reviewed-on: https://go-review.googlesource.com/36910
Reviewed-by: Quentin Smith <quentin@golang.org>
2017-02-13 17:51:36 +00:00
Kevin Burke 392d3a9c58 buildlet: add context to GetTar
This lets us set a timeout on the HTTP request, which lets us time out the
writeSnapshot call in cmd/coordinator.

Fixes golang/go#18812.

Change-Id: I370448df4d95130c9c5b30ba32459ce844a6c967
Reviewed-on: https://go-review.googlesource.com/36897
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-02-12 17:37:23 +00:00
shawnps 72ebb1e3e7 all: fix typos
Change-Id: I16c20d6eb746a3ec81021c2a367d74c258437019
Reviewed-on: https://go-review.googlesource.com/34921
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-01-07 18:01:14 +00:00
Brad Fitzpatrick 71265acedb all: adjust things for upgrade from GKE 1.2 to GKE 1.4
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.

The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.

So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.

But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.

Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.

Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.

Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-01 23:48:21 +00:00
Brad Fitzpatrick e4d08a5aad cmd/gomote, buildenv, buildlet: move config code to common places
Split off from Quentin's https://golang.org/cl/29399

Change-Id: I4578f8f485e97d6b9844fb12e84779167755752e
Reviewed-on: https://go-review.googlesource.com/29858
Reviewed-by: Quentin Smith <quentin@golang.org>
2016-09-27 18:47:08 +00:00
Brad Fitzpatrick 328c9b8918 all: split builder config into builder & host configs
Our builders are named of the form "GOOS-GOARCH" or
"GOOS-GOARCH-suffix".

Over time we've grown many builders. This CL doesn't change
that. Builders continue to be named and operate as before.

Previously the build configuration file (dashboard/builders.go) made
each builder type ("linux-amd64-race", etc) define how to create a
host running a buildlet of that type, even though many builders had
identical host configs. For example, these builders all share the same
host type (a Kubernetes container):

   linux-amd64
   linux-amd64-race
   linux-386
   linux-386-387

And these are the same host type (a GCE VM):

   windows-amd64-gce
   windows-amd64-race
   windows-386-gce

This CL creates a new concept of a "hostType" which defines how
the buildlet is created (Kube, GCE, Reverse, and how), and then each
builder itself references a host type.

Users never see the hostType. (except perhaps in gomote list output)
But they at least never need to care about them.

Reverse buildlets now can only be one hostType at a time, which
simplifies things. We were no longer using multiple roles per machine
once moving to VMs for OS X.

gomote continues to operate as it did previously but its underlying
protocol changed and clients will need to be updated. As a new
feature, gomote now has a new flag to let you reuse a buildlet host
connection for different builder rules if they share the same
underlying host type. But users can ignore that.

This CL is a long-standing TODO (previously attempted and aborted) and
will make many things easier and faster, including the linux-arm
cross-compilation effort, and keeping pre-warmed buildlets of VM types
ready to go.

Updates golang/go#17104

Change-Id: Iad8387f48680424a8441e878a2f4762bf79ea4d2
Reviewed-on: https://go-review.googlesource.com/29551
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-09-22 23:13:34 +00:00
Brad Fitzpatrick 44e74d53da buildlet: don't leak file descriptors to buildlets
Also, add count of fds and goroutines to the coordinator's status
page.

Change-Id: I857e609623cfa280716d5d079180d0e4021d0bac
Reviewed-on: https://go-review.googlesource.com/27550
Reviewed-by: Quentin Smith <quentin@golang.org>
2016-08-23 03:38:21 +00:00
Brad Fitzpatrick 940da928a2 buildlet: fix gomote crash
ctx was initialized in the wrong place.

Change-Id: I7fb3c56071a3e4d5cc2199e07fcc4b8ecbc7a674
Reviewed-on: https://go-review.googlesource.com/22967
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-05-10 03:51:36 +00:00
Brad Fitzpatrick 79f01e0ca7 buildlet: don't hang in HTTP requests if heartbeats fail
Fixes golang/go#15582

Change-Id: Iab31cb4770728d2cc405f930c973d87bae7fbf10
Reviewed-on: https://go-review.googlesource.com/22890
Reviewed-by: Andrew Gerrand <adg@golang.org>
2016-05-09 21:13:28 +00:00
Brad Fitzpatrick 4a3f4a8c16 kubernetes: avoid the WatchPod streaming API
Poll every 5 seconds instead.

Change-Id: I0a8a30fee1118b2aecf4b3cbbf342e07afc27602
Reviewed-on: https://go-review.googlesource.com/22820
Reviewed-by: Evan Brown <evanbrown@google.com>
2016-05-05 20:21:37 +00:00
Evan Brown 34ff1d9bc8 all: kubernetes builder autoscaling
Improvements to support rapid scheduling of many build jobs:

- Retry logic in Kubernetes client to handle sporadic connection
  closes from their API server under heavy load

- Cluster autoscaler scales on default CPU utilization metric

- Debug mode allows scheduling multiple builds to test scaling

- Account for scheduled vs. provisioned resources in a cluster
  and use that information to estimate when a build's pod
  will be scheduled and in running state

- Use estimated scheduled time to set context timeout

- Track pod lifecycle (requested time, estimated available time,
  actual available time, terminate time, etc)

Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-07 17:22:57 +00:00
Evan Brown a731151878 all: consolidate configuration for coordinator and gomote
buildenv.Environment type defines configuration options:

- Coordinator uses the GCE project name to lookup config. A custom
  config name can be provided at runtime to override.

- The conventional prod and stage project names ('symbolic-datum-552'
  and 'go-dashboard-dev') map to prod and staging configuration structs.

- Production and staging status is explicitly defined in configuration.

- GCS bucket names for buildlet, logs, and snapshots are
  configurable.

Change-Id: I7e6d7874eb0bdfe35dbdd5fcf6212ab50d576b88
Reviewed-on: https://go-review.googlesource.com/19502
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-04 22:24:11 +00:00
Evan Brown c8b96fa9fa all: report cluster utilization to Cloud Monitoring
Change-Id: Iec8e03fdef27582756d96b731c2fb12b56175ca9
Reviewed-on: https://go-review.googlesource.com/18396
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-26 17:10:41 +00:00
Evan Brown 2e452e1be8 all: improve Kubernetes API client
* pod creation returns an *api.Pod
* WatchPod replaces WatchPodStatus

Updates golang/go#12546

Change-Id: I34bb6e0d994e552b41a8082cc4672a663ce961a3
Reviewed-on: https://go-review.googlesource.com/17100
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-20 17:56:59 +00:00
Evan Brown 83f9748046 all: display pool status and delete failed/old pods
* status page shows kube pool details
* pods created by the coorindator are tracked
* pods that fail to create are deleted
* pods older than delete-at are deleted
* pods created by a different coordinator are deleted

Updates golang/go#12546

Change-Id: I4c4f8ff906962b4a014a66d0a9d490ff17710d62
Reviewed-on: https://go-review.googlesource.com/16101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-11-05 11:33:24 +00:00
Matthew Dempsky d5f422f474 all: fix vet warnings
Change-Id: Ia4519d9b050136b4d46dc8361f546e3c4205a9cd
Reviewed-on: https://go-review.googlesource.com/16288
Reviewed-by: Dave Cheney <dave@cheney.net>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-25 04:14:39 +00:00
Evan Brown 956434c23a all: improved monitoring of buildlet pods after creation
* Replaced cancel with context.Context
* StartPod can be canceled
* Wait for buildlet to come online, but fail fast if pod fails first
* Support timeout waiting for pod to leave pending phase
* Use Kubernetes watch API (long poll)

Updates golang/go#12546

Change-Id: I792a3b8fed615362a0290feee7de0c2cefe43c0e
Reviewed-on: https://go-review.googlesource.com/15285
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-15 23:45:50 +00:00
Evan Brown 4dfb85f32a all: basic support for linux-amd64 buildlets on Kubernetes
* set correct metadata as env vars on each container in pod

* stage0 detects when running in a pod

* add scope to support farmer on GCE communicating with Google
  Container Engine API

* dev mode supports GCE and Kubernetes even when not running
  on GCE

* pod buildlet returns a configured http.Client

* pod buildlets work, but pods are not removed after completion
  (or on creation failure)

Updates golang/go#12546

Change-Id: If91673b49223130c1e7077c130f1abe1e7966d02
Reviewed-on: https://go-review.googlesource.com/15041
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-10-01 17:25:38 +00:00
Evan Brown 882e20c4c9 cmd/coordinator: discover kube cluster
Find the default Kubernetes cluster and configure a client to talk to it.
Use application default credentials.

Updates golang/go#12546

Change-Id: Ifb1ce57f52f4fbbee3267f8cc3cf02a78146bd5b
Reviewed-on: https://go-review.googlesource.com/14532
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-21 18:28:38 +00:00
Brad Fitzpatrick 72f3eae620 revdial, cmd/coordinator: notice when buildlet TCP conns go away immediately
Previously it wasn't noticing their death until the next health check.

Take advantage of that the revdial is always blocked in a Read, so it
will see a TCP shutdown in the case of normal shutdowns. (health checks
will still catch disappearing machines)

Change-Id: I9a7f60a38b3acaf02057b2da9e0cbc91d328f651
Reviewed-on: https://go-review.googlesource.com/14736
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-18 02:03:18 +00:00
Brad Fitzpatrick 9620f5578a buildlet: log on heartbeat failures
should be rare now.

Change-Id: Icc4bfd13c8dfe8f2e189db819bc0d552f35fb3c9
Reviewed-on: https://go-review.googlesource.com/14731
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-18 01:35:04 +00:00
Brad Fitzpatrick d011dfc978 buildlet: try a bit harder to cancel slow HTTP requests
Use the new Go 1.5 mechanism to abort HTTP requests with a channel.
This is respected by the Go 1.5 http.Transport, which we always use.

This CL shouldn't be necessary (CL 14700 was the real bug), but
doesn't hurt. This still would've probably prevented most of
golang/go#12666

Change-Id: I6890e016ee04183fc0d600baed8046c2f79113d8
Reviewed-on: https://go-review.googlesource.com/14701
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-17 20:06:39 +00:00
Brad Fitzpatrick 8bad2a8c99 buildletclient: simplify Close
Now Close always means destroy.

The old code & API was a half-baked implementation of a (lack of)
design where there was a difference between being done with a buildlet
(with it possibly being reused by somebody else?) and you wanting to
nuke it completely. Unfortunately this just grew messier and more
broken over time.

This attempts to clean it all up.

We can add the sharing ideas back later when there's actually a design
and implementation. (We're not losing anything with this CL, because
nothing ever shared buildlets)

In fact, this CL fixes a problem where reverse buildlets weren't
getting their underlying net.Conns closed when their healthchecks
failed, leading to 4+ hour (and counting) build hangs due to buildlets
getting killed at inopportune moments (e.g. me testing running a
reverse buildlet on my home mac to help out with the dashboard
backlog)

Change-Id: I07be09f4d5f0f09d35e51e41c48b1296b71bb9b5
Reviewed-on: https://go-review.googlesource.com/14585
Reviewed-by: David Crawshaw <crawshaw@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-09-15 22:42:57 +00:00
Brad Fitzpatrick 61ebbcda68 all: heartbeat buildlets always, now that reverse buildlet protocol allows it
Also, don't run shootout on linux-arm (golang/go#12623)

Change-Id: I2c85b8a04917cb0992086012eaa59b00a6cb31a4
Reviewed-on: https://go-review.googlesource.com/14582
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-15 18:38:18 +00:00
Brad Fitzpatrick 1f0d8f287c all: tons of builder work
* reverse buildlet rework (multiplexed TCP connections, instead
  of a hacky reverse roundtripper)

* scaleway ARM image improvements

* parallel gzip implementation, which makes things ~8x faster on
  Scaleway.

* merge watcher into the coordinator, for easier deployments

Change-Id: I55d769f982e6583b261435309faa1f718a15fde1
Reviewed-on: https://go-review.googlesource.com/12665
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-15 08:28:33 +00:00
David Crawshaw c4787c2583 buildlet: fix build
Change-Id: Idbe55892621e673905068849ab3d868f9b7c1ff2
Reviewed-on: https://go-review.googlesource.com/14209
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-09-03 17:47:45 +00:00
Brad Fitzpatrick 7b6d1b1b28 all: remote buildlets
This creates a mechanism for clients (such as cmd/release and
cmd/gomote) to obtain buildlets via the coordinator. Previously
cmd/release and cmd/gomote could only create GCE VMs themselves, and
required the GCE project's credentials. In addition to the awkwardness
of needing to hand out the GCE credentials, it also meant ARM and
Darwin buildlets (which use the reverse buildlet pool) weren't usable.

Instead, this creates a new auth mechanism where the coordinator is
contacted over TLS with key pinning (the CA system isn't used) in the
same way that the reverse builders already dialed into the
coordinator, and then a "user build type" and hash are sent as the
username and password. The same master key is used to sign user
builder keys, and they always start with "user-". (which isn't a GOOS).

Then the coordinator provides an API to create and list buildlets.
They auto-expire after a duration and are auto-renewed upon use.

The buildlet library (as used by cmd/release etc) then proxies HTTP
requests via the coordinator to the backend buildlet.

See doc/remote-buildlet.txt for protocol details.

Change-Id: I12e27eae788fdd91927cb182b950893dc759f8e9
Reviewed-on: https://go-review.googlesource.com/11901
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-07-07 16:45:21 +00:00
Brad Fitzpatrick c43006e3bc cmd/buildlet: make heartbeat less strict
Now you have to fail 3 in a row before you're killed.

This helped Plan 9 a bunch for a few days, before the change was
reverted (and then it started failing consistently again).

We still don't know why Plan 9 sucks at replying to heartbeats.

Change-Id: Ic64d2e8fb75f544c7c3e9a62c28ab9e20ff8392d
Reviewed-on: https://go-review.googlesource.com/11132
Reviewed-by: Andrew Gerrand <adg@golang.org>
Reviewed-by: David du Colombier <0intro@gmail.com>
2015-06-16 20:31:04 +00:00
Brad Fitzpatrick d4ea014ceb cmd/coordinator, buildlet: reliability fixes
Add background heartbeats to detect dead GCE VMs (the OpenBSD
buildlets seem to hang a lot),

Add timeouts to test executions.

Take helper buildlets out of service once they're marked bad.

Keep the in-order buildlet running forever when sharding tests, in
case all the helpers die. (observed once)

Keep a cache of recently deleted VMs and don't try to delete VMs again
if we've recently deleted them. (they're slow to delete)

More reverse buildlets more paranoid in their health checking and closing
of the connection.

Make status page link to /try set URLs.

Also, better logging (more sometimes, less others0, and misc bug fixes.

Change-Id: I57a5e8e39381234006cac4dd799b655d64be71bb
Reviewed-on: https://go-review.googlesource.com/10981
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-13 01:50:33 +00:00
Brad Fitzpatrick 378fb2949f cmd/coordinator, buildlet: kill TCP conn of misbehaving reversePool buildlets
Also: heartbeat reverse buildlets more often
Also: adjust cmd/go test expected duration.

Change-Id: I75fbb66a0b20ac7357ad7cf78fc545101ac9aa33
Reviewed-on: https://go-review.googlesource.com/10884
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-11 00:06:14 +00:00
Brad Fitzpatrick 080252283b buildlet: make exec fail with a remote error if no headers after 5 seconds
The reverse buildlets' RoundTrip are hanging, which is its own problem,
but this calling code should be robust and time out anyway.

Change-Id: Id9e3e1d9feb6ffa58cc0995d0623bd90845bb9d6
Reviewed-on: https://go-review.googlesource.com/10847
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-09 16:29:23 +00:00
Brad Fitzpatrick 1b1e086fd1 buildlet, cmd/coordinator: GCE quota accounting, fixes
And deal with Preemptible resource exhaustion errors.

And change all-compile to misc-compile and only do the builders
not covered otherwise (Fixes #11073)

And make the watcher serve git source.

And cache and singleflight fetching of git source.

And a million other things.

Fixes golang/go#11073

Change-Id: I0f45610f0c6a06bd0c8ba9632b8624e00aeb52fc
Reviewed-on: https://go-review.googlesource.com/10750
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-08 22:21:00 +00:00
Brad Fitzpatrick 79f3fc0823 cmd/coordinator: test sharding, better status, logs
Also gomote updates which came about during the process of developing
and debugging this.

Change-Id: Ia53d674118a6b99bcdda7062d3b7161279b6ad52
Reviewed-on: https://go-review.googlesource.com/10463
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-06-04 20:33:49 +00:00
Andrew Gerrand 1fc56ca750 buildlet: add Path option to ExecOpts
Change-Id: I1353ecc14dfe232886dcfadc47547bbcf91f0b6d
Reviewed-on: https://go-review.googlesource.com/10303
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-05-21 03:38:07 +00:00
Brad Fitzpatrick 1e25be43f2 buildlet: tweak GCE scheduling policy
Change-Id: Ia23d0f3ff409ec20e71ed3c513f7b3275fc7a9d9
Reviewed-on: https://go-review.googlesource.com/10054
Reviewed-by: Andrew Gerrand <adg@golang.org>
2015-05-14 19:46:08 +00:00
David Crawshaw a3dce2ca0e cmd/buildlet, cmd/coordinator: check build keys
Have the buildlet send build keys to the buildlet in reverse mode.
Rename Info to Status, as it will be used periodically by the
coordinator to check that the buildlet is OK.

And add a simple connection test.

Change-Id: Ic7285636c7818e2584e11555a8f5b5b66be30638
Reviewed-on: https://go-review.googlesource.com/8593
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-12 12:41:15 +00:00
David Crawshaw 581ddd17e2 all: add buildlet reverse mode
In -reverse mode, the buildlet dials the coordinator. The connection
is then turned around so the coordinator can control the buildlet.
This lets us start buildlets on machines without APIs behind
firewalls.

Also add the -mode flag to the coordinator, which defaults to dev
mode when running off GCE. In prod mode the coordinator attempts to
become farmer.golang.org and pick up its real certificates. In the
new mode, builtin certificates are used allowing you to start a
local coordinator and buildlet for testing.

A simple connection test will be in a followup CL, as soon as key
checking is implemented.

Change-Id: I2a7dcdfbb4efda71df31b571788945e9ce1f3365
Reviewed-on: https://go-review.googlesource.com/8490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2015-04-07 12:59:17 +00:00
Brad Fitzpatrick 7b2f9d74c5 cmd/coordinator: add BuildletPool abstraction; don't assume GCE
This starts the process of making the coordinator about to use the
buildlet on things other than GCE.

Update golang/go#8647 (ARM, reverse proxy or online.net)
Update golang/go#10267 (windows 2003 on AWS)
Update golang/go#9495 (OS X VMWare VMs on racked Mac Minis)

Change-Id: I5df79ea67e0ececba8b880e81bd93d4c80897455
Reviewed-on: https://go-review.googlesource.com/8198
Reviewed-by: David Crawshaw <crawshaw@golang.org>
2015-03-31 14:33:42 +00:00