Граф коммитов

25 Коммитов

Автор SHA1 Сообщение Дата
Brad Fitzpatrick 61a567e73e cmd/coordinator: only bound old revdial builds
Revert CL 173517 and replace with a similar but different mechanism.

Now that the new revdial is out, only penalize the old revdial users
(a few stragglers who haven't updated). In practice the limit of 10 at
once won't be a problem but will protect the coordinator during submit
floods.

Fixes golang/go#31639

Change-Id: I6b6c3567205fdd98e0b80def96d75827e986fe4f
Reviewed-on: https://go-review.googlesource.com/c/build/+/174325
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2019-04-29 18:36:39 +00:00
Brad Fitzpatrick 725da52178 cmd/coordinator, buildenv: bound the number of reverse buildlets for now
To mitigate bug in revdial that's retaining way too much memory and
causing coordinator evictions.

Also double memory limit for now.

Updates golang/go#31639

Change-Id: I19b6df92df9905b087c29884140096946037fa38
Reviewed-on: https://go-review.googlesource.com/c/build/+/173517
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
2019-04-26 14:00:31 +00:00
Brad Fitzpatrick 403c55142e buildenv: diagnose authentication problems in more places
Check when getting an oauth2.TokenSource, and when using it.

Change-Id: I1b240e5eafc23202d7d22d74d3f5cf9a59ca292e
Reviewed-on: https://go-review.googlesource.com/130315
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2018-08-21 03:24:00 +00:00
Brad Fitzpatrick f7da0f8f06 all: warn if user hasn't run gcloud auth application-default login
"gcloud auth application-default login" is NOT the same as
"gcloud auth login"

... which is super confusing.

Add a check for users.

We should probably add this in more places over time. Baby steps.

Change-Id: I4584f5b84b941fa6893eccd09fd61e8850538607
Reviewed-on: https://go-review.googlesource.com/130195
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2018-08-21 02:14:17 +00:00
Brad Fitzpatrick 5524b9ef25 all: stop using service accounts, prefer user creds
It's better to authenticate with an identity (e.g. foo@google.com),
rather than all of us copy/pasting a service account JSON (a glorified
access token) around and the server not knowing who's who.

I was previously misled into using service accounts because the gcloud
tool (when run on a GCE VM) strongly urges you not to use user
accounts and says you should be using service accounts instead. But
turns out that's because it assumes you'd never use GCE for
development and only for deployment. That is, gcloud assumes you'd
only use gcloud on desktop/laptop, and then deploy your binary to GCE
where the application itself would use service accounts. (We do use
service accounts for the application, but I also develop on a GCE VM.)

Also, the precursor helper function to FindDefaultCredentials (for
"Application Default Credentials") had a different search order for
credentials and prefered the GCE VM's service account instead of the
user-specific credentials. Now that FindDefaultCredentials uses a good
order we can remove some of our old complexity.

Change-Id: Ia888e264cfb88e977f3ff1a3a4bb583db70466ab
Reviewed-on: https://go-review.googlesource.com/129416
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
2018-08-15 17:50:03 +00:00
Brad Fitzpatrick 6ae8750e84 all: use Container-Optimized VMs instead of Kubernetes for buildlet containers
Fixes golang/go#25108

Change-Id: I084669b52b699700ed26a7fdd890d9205a8b9dc9
Reviewed-on: https://go-review.googlesource.com/111267
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-05-11 03:29:07 +00:00
Brad Fitzpatrick e6f02a5d40 internal/buildgo: add new Client type, move MakeBasepin code from buildongce
* adds new buildgo.Client type to hold primarily a
  buildenv.Environment and a google.Credentials

* starts to unify, document, and otherwise clean up
  how we look up authentication info.

* use the new buildgo.Client type in the "buildongce"
  (poorly named) cluster bootstrap tool.

* moves the "basepin" creation code from buildongce into
  a new method on buildgo.Client, for later use in the
  coordinator for golang/go#21305

Change-Id: I976b60c992bf67589e7f85fc5a9133f8cbb32e70
Reviewed-on: https://go-review.googlesource.com/112162
Reviewed-by: Bryan C. Mills <bcmills@google.com>
2018-05-08 21:12:55 +00:00
Brad Fitzpatrick a23395d5bb buildenv: add our GCP project numbers in addition to their named IDs
Updates golang/go#25108

Change-Id: I5a82a4b26407158cf24d770a887759f8335d6441
Reviewed-on: https://go-review.googlesource.com/111640
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-05-05 16:55:26 +00:00
Brad Fitzpatrick 9929a3fe2b all: move to using the oauth2/google.Credentials type, add buildenv accessor
This removes some duplication of scopes and how to get the
TokenSource and which credentials to use.

And update the coordinator deps, since its rev of
golang.org/x/oauth2/google was too old to have the new type.

I want to clean this up more, but I need to make some changes to to
the oauth2/google package first. More later.

Change-Id: Ic2799ec2ec62f67c65de6380b373fe915a43003e
Reviewed-on: https://go-review.googlesource.com/111266
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2018-05-04 17:04:37 +00:00
Brad Fitzpatrick 73f88a6d4c all: add README.md files where missing, and tool to keep them updated
Change-Id: I385171c415bf168c04c6c3a7a996bff88964af84
Reviewed-on: https://go-review.googlesource.com/52856
Reviewed-by: Andrew Bonventre <andybons@golang.org>
2017-08-02 22:17:52 +00:00
Brad Fitzpatrick 4eceee2d0f cmd/coordinator, cmd/buildlet, cmd/gomote: add SSH support
This adds an SSH server to farmer.golang.org on port 2222 that proxies
SSH connections to users' gomote-created buildlet instances.

For example:

    $ gomote create openbsd-amd64-60
    user-bradfitz-openbsd-amd64-60-1

    $ gomote ssh user-bradfitz-openbsd-amd64-60-1
    Warning: Permanently added '[localhost]:33351' (ECDSA) to the list of known hosts.
    OpenBSD 6.0 (GENERIC.MP) golang/go#2319: Tue Jul 26 13:00:43 MDT 2016

    Welcome to OpenBSD: The proactively secure Unix-like operating system.

    Please use the sendbug(1) utility to report bugs in the system.
    Before reporting a bug, please try to reproduce it with the latest
    version of the code.  With bug reports, please try to ensure that
    enough information to reproduce the problem is enclosed, and if a
    known fix for it exists, include that as well.

    $

As before, if the coordinator process is restarted (or crashes, is
evicted, etc), all gomote instances die.

Not yet supported:

* scp (help wanted)
* not all host types are configured. most are. some will need slight
  config tweaks to the Docker image (e.g. adding openssh-server)

Supports currently:

* linux-amd64 (host type shared by 386, nacl)
* linux-arm
* linux-arm64
* darwin
* freebsd
* openbsd
* plan9-386
* windows

Implementation details:

* the ssh server process listens on port 2222 in the coordinator
  (farmer.golang.org), which is behind a GKE TCP load balancer.

* the ssh server library is github.com/gliderlabs/ssh

* authentication is done via Github users' public keys. It's assumed
  that gomote user == github user. But there's a mapping in the code
  for known exceptions.

* we can't give out access to this too widely. too many things are
  accessible from within the host environment if you look in the right
  places. Details omitted. But the Go team and other trusted gomote
  users can use this.

* the buildlet binary has a new /connect-ssh handler that acts like a
  CONNECT request but instead of taking an explicit host:port, just
  says "give me your machine's SSH connection". The buildlet can also
  start sshd if needed for the environment. The /connect-ssh handler
  also installs the coordinator's public key.

* a new buildlet client library method "ConnectSSH" hits the /connect-ssh
  handler and returns a net.Conn.

* the coordinator's ssh.Handler is just running the OpenSSH ssh client.

* because the OpenSSH ssh child process can't connect to a net.Conn,
  an emphemeral localhost port is created on the coordinator to proxy
  between the ssh client and the net.Conn returned by ConnectSSH.

* The /connect-ssh handler requires http.Hijacker, which requires
  fully compliant net.Conn implementations as of Go 1.8. So I needed
  to flesh out revdial too, testing it with the
  golang.org/x/net/nettest package.

* plan9 doesn't have an ssh server, so we use 0intro's new conterm
  program (drawterm without GUI support) to connect to plan9 from the
  coordinator ssh proxy instead of using the OpenSSH ssh client
  binary.

* windows doesn't have an ssh server, so we enable the telnet service
  and the coordinator ssh proxy uses telnet instead on the backend
  on the private network. (There is a Windows ssh server but only in
  new versions.)

Happy debugging over ssh!

Fixes golang/go#19956

Change-Id: I80a62064c5f85af1f195f980c862ba29af4015f0
Reviewed-on: https://go-review.googlesource.com/50750
Reviewed-by: Herbie Ong <herbie@google.com>
Reviewed-by: Jessie Frazelle <me@jessfraz.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-07-28 18:21:11 +00:00
Quentin Smith 097978951d cmd/coordinator: run coordinator on Docker/GKE
- buildongce is taught how to create the go cluster
- config files for Docker services and pods
- Dockerfile updated to fix caching

Updates golang/go#18817

Change-Id: If25d41723996b92069117f4d75488202b63310ef
Reviewed-on: https://go-review.googlesource.com/42231
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-05-02 16:09:22 +00:00
Quentin Smith 0d86378958 cmd/coordinator: run benchmarks on try work
Benchmarks are treated as unit tests and distributed to the test
helpers, which allows them to fit in our 5m trybot budget.

Currently we only run the go1 and x/benchmarks. Running package
benchmarks is a TODO.

This feature is disabled by default, and is enabled by the
"farmer-run-bench" project attribute.

Updates golang/go#19178
Updates golang/go#19871

Change-Id: I9c3a14da60c3662e7e2cb4e71953060915cc4364
Reviewed-on: https://go-review.googlesource.com/38306
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-21 21:46:22 +00:00
Brad Fitzpatrick 8a696e555c cmd/coordinator: add LetsEncrypt support
The buildlets were partially updated to support this in
https://golang.org/cl/38792 but overlooked that ServerName == "go" was
still hard-coded. This CL also fixes that in the buildlet.

Fixes golang/go#16442

Change-Id: Ia2b794bdf9df8ab75875b9951b53a7bb5f5f6afe
Reviewed-on: https://go-review.googlesource.com/38798
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-02 17:11:27 +00:00
Quentin Smith 95595f7a2d buildenv: staging GKE needs smaller nodes
Change-Id: If5b496b55cfff83df157c611ec6f289fe745a8ac
Reviewed-on: https://go-review.googlesource.com/38295
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-17 20:10:04 +00:00
Brad Fitzpatrick 71265acedb all: adjust things for upgrade from GKE 1.2 to GKE 1.4
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.

The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.

So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.

But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.

Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.

Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.

Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-12-01 23:48:21 +00:00
Brad Fitzpatrick e4d08a5aad cmd/gomote, buildenv, buildlet: move config code to common places
Split off from Quentin's https://golang.org/cl/29399

Change-Id: I4578f8f485e97d6b9844fb12e84779167755752e
Reviewed-on: https://go-review.googlesource.com/29858
Reviewed-by: Quentin Smith <quentin@golang.org>
2016-09-27 18:47:08 +00:00
Quentin Smith d037c181fe buildenv, cmd/coordinator: make DashBase a property of the Environment
Change-Id: I6621cfaf12541f9910255048edd49301a1b2deba
Reviewed-on: https://go-review.googlesource.com/29398
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-09-26 16:47:25 +00:00
Brad Fitzpatrick 874eda751f cmd/coordinator: re-enable Kubernetes in staging
Change-Id: I06ac00b8be9717adf0add3f82e3663883825283a
Reviewed-on: https://go-review.googlesource.com/21761
Reviewed-by: Evan Brown <evanbrown@google.com>
2016-04-08 18:55:09 +00:00
Evan Brown 34ff1d9bc8 all: kubernetes builder autoscaling
Improvements to support rapid scheduling of many build jobs:

- Retry logic in Kubernetes client to handle sporadic connection
  closes from their API server under heavy load

- Cluster autoscaler scales on default CPU utilization metric

- Debug mode allows scheduling multiple builds to test scaling

- Account for scheduled vs. provisioned resources in a cluster
  and use that information to estimate when a build's pod
  will be scheduled and in running state

- Use estimated scheduled time to set context timeout

- Track pod lifecycle (requested time, estimated available time,
  actual available time, terminate time, etc)

Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-04-07 17:22:57 +00:00
Brad Fitzpatrick 08bad8ca42 buildenv: add a Development buildenv
Change-Id: I9a78632ad63a526397702b4ee1e3ec253a0e81e5
Reviewed-on: https://go-review.googlesource.com/21010
Reviewed-by: Andrew Gerrand <adg@golang.org>
2016-03-22 22:20:58 +00:00
Brad Fitzpatrick 44eb74d0cb buildenv: fix typos, clarify CoordinatorURL
Change-Id: I86b1545018abb208452c46797013b3a4bfd6b90d
Reviewed-on: https://go-review.googlesource.com/20979
Reviewed-by: Andrew Gerrand <adg@golang.org>
2016-03-22 09:44:49 +00:00
Andrew Gerrand 8e28dc9fa6 dashboard: rename Go14URL to goBootstrapURLTmpl and provide accessor
Also update all users, and rename another template field.
Includes changes to coordinator, gomote, and release.

Change-Id: I1c4408eadbcb83d61063a910dfa18cc395952bc2
Reviewed-on: https://go-review.googlesource.com/20976
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-21 22:19:10 +00:00
Evan Brown a731151878 all: consolidate configuration for coordinator and gomote
buildenv.Environment type defines configuration options:

- Coordinator uses the GCE project name to lookup config. A custom
  config name can be provided at runtime to override.

- The conventional prod and stage project names ('symbolic-datum-552'
  and 'go-dashboard-dev') map to prod and staging configuration structs.

- Production and staging status is explicitly defined in configuration.

- GCS bucket names for buildlet, logs, and snapshots are
  configurable.

Change-Id: I7e6d7874eb0bdfe35dbdd5fcf6212ab50d576b88
Reviewed-on: https://go-review.googlesource.com/19502
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-04 22:24:11 +00:00
Evan Brown d79fb69108 all: create buildlets cluster
* Buildlets cluster can be created using buildongce program
* Deployment Manager builds cluster and autoscaler
* Describe the environments a build system can be deployed to
* Merges https://golang.org/cl/17827

Updates golang/go#12546

Change-Id: Icac4efab2e9e72932c66f56da319e4d51326455a
Reviewed-on: https://go-review.googlesource.com/17334
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-22 00:15:41 +00:00