Revert CL 173517 and replace with a similar but different mechanism.
Now that the new revdial is out, only penalize the old revdial users
(a few stragglers who haven't updated). In practice the limit of 10 at
once won't be a problem but will protect the coordinator during submit
floods.
Fixesgolang/go#31639
Change-Id: I6b6c3567205fdd98e0b80def96d75827e986fe4f
Reviewed-on: https://go-review.googlesource.com/c/build/+/174325
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
To mitigate bug in revdial that's retaining way too much memory and
causing coordinator evictions.
Also double memory limit for now.
Updates golang/go#31639
Change-Id: I19b6df92df9905b087c29884140096946037fa38
Reviewed-on: https://go-review.googlesource.com/c/build/+/173517
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Russ Cox <rsc@golang.org>
Check when getting an oauth2.TokenSource, and when using it.
Change-Id: I1b240e5eafc23202d7d22d74d3f5cf9a59ca292e
Reviewed-on: https://go-review.googlesource.com/130315
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
"gcloud auth application-default login" is NOT the same as
"gcloud auth login"
... which is super confusing.
Add a check for users.
We should probably add this in more places over time. Baby steps.
Change-Id: I4584f5b84b941fa6893eccd09fd61e8850538607
Reviewed-on: https://go-review.googlesource.com/130195
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
It's better to authenticate with an identity (e.g. foo@google.com),
rather than all of us copy/pasting a service account JSON (a glorified
access token) around and the server not knowing who's who.
I was previously misled into using service accounts because the gcloud
tool (when run on a GCE VM) strongly urges you not to use user
accounts and says you should be using service accounts instead. But
turns out that's because it assumes you'd never use GCE for
development and only for deployment. That is, gcloud assumes you'd
only use gcloud on desktop/laptop, and then deploy your binary to GCE
where the application itself would use service accounts. (We do use
service accounts for the application, but I also develop on a GCE VM.)
Also, the precursor helper function to FindDefaultCredentials (for
"Application Default Credentials") had a different search order for
credentials and prefered the GCE VM's service account instead of the
user-specific credentials. Now that FindDefaultCredentials uses a good
order we can remove some of our old complexity.
Change-Id: Ia888e264cfb88e977f3ff1a3a4bb583db70466ab
Reviewed-on: https://go-review.googlesource.com/129416
Reviewed-by: Andrew Bonventre <andybons@golang.org>
Reviewed-by: Filippo Valsorda <filippo@golang.org>
* adds new buildgo.Client type to hold primarily a
buildenv.Environment and a google.Credentials
* starts to unify, document, and otherwise clean up
how we look up authentication info.
* use the new buildgo.Client type in the "buildongce"
(poorly named) cluster bootstrap tool.
* moves the "basepin" creation code from buildongce into
a new method on buildgo.Client, for later use in the
coordinator for golang/go#21305
Change-Id: I976b60c992bf67589e7f85fc5a9133f8cbb32e70
Reviewed-on: https://go-review.googlesource.com/112162
Reviewed-by: Bryan C. Mills <bcmills@google.com>
This removes some duplication of scopes and how to get the
TokenSource and which credentials to use.
And update the coordinator deps, since its rev of
golang.org/x/oauth2/google was too old to have the new type.
I want to clean this up more, but I need to make some changes to to
the oauth2/google package first. More later.
Change-Id: Ic2799ec2ec62f67c65de6380b373fe915a43003e
Reviewed-on: https://go-review.googlesource.com/111266
Reviewed-by: Andrew Bonventre <andybons@golang.org>
This adds an SSH server to farmer.golang.org on port 2222 that proxies
SSH connections to users' gomote-created buildlet instances.
For example:
$ gomote create openbsd-amd64-60
user-bradfitz-openbsd-amd64-60-1
$ gomote ssh user-bradfitz-openbsd-amd64-60-1
Warning: Permanently added '[localhost]:33351' (ECDSA) to the list of known hosts.
OpenBSD 6.0 (GENERIC.MP) golang/go#2319: Tue Jul 26 13:00:43 MDT 2016
Welcome to OpenBSD: The proactively secure Unix-like operating system.
Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code. With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.
$
As before, if the coordinator process is restarted (or crashes, is
evicted, etc), all gomote instances die.
Not yet supported:
* scp (help wanted)
* not all host types are configured. most are. some will need slight
config tweaks to the Docker image (e.g. adding openssh-server)
Supports currently:
* linux-amd64 (host type shared by 386, nacl)
* linux-arm
* linux-arm64
* darwin
* freebsd
* openbsd
* plan9-386
* windows
Implementation details:
* the ssh server process listens on port 2222 in the coordinator
(farmer.golang.org), which is behind a GKE TCP load balancer.
* the ssh server library is github.com/gliderlabs/ssh
* authentication is done via Github users' public keys. It's assumed
that gomote user == github user. But there's a mapping in the code
for known exceptions.
* we can't give out access to this too widely. too many things are
accessible from within the host environment if you look in the right
places. Details omitted. But the Go team and other trusted gomote
users can use this.
* the buildlet binary has a new /connect-ssh handler that acts like a
CONNECT request but instead of taking an explicit host:port, just
says "give me your machine's SSH connection". The buildlet can also
start sshd if needed for the environment. The /connect-ssh handler
also installs the coordinator's public key.
* a new buildlet client library method "ConnectSSH" hits the /connect-ssh
handler and returns a net.Conn.
* the coordinator's ssh.Handler is just running the OpenSSH ssh client.
* because the OpenSSH ssh child process can't connect to a net.Conn,
an emphemeral localhost port is created on the coordinator to proxy
between the ssh client and the net.Conn returned by ConnectSSH.
* The /connect-ssh handler requires http.Hijacker, which requires
fully compliant net.Conn implementations as of Go 1.8. So I needed
to flesh out revdial too, testing it with the
golang.org/x/net/nettest package.
* plan9 doesn't have an ssh server, so we use 0intro's new conterm
program (drawterm without GUI support) to connect to plan9 from the
coordinator ssh proxy instead of using the OpenSSH ssh client
binary.
* windows doesn't have an ssh server, so we enable the telnet service
and the coordinator ssh proxy uses telnet instead on the backend
on the private network. (There is a Windows ssh server but only in
new versions.)
Happy debugging over ssh!
Fixesgolang/go#19956
Change-Id: I80a62064c5f85af1f195f980c862ba29af4015f0
Reviewed-on: https://go-review.googlesource.com/50750
Reviewed-by: Herbie Ong <herbie@google.com>
Reviewed-by: Jessie Frazelle <me@jessfraz.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
- buildongce is taught how to create the go cluster
- config files for Docker services and pods
- Dockerfile updated to fix caching
Updates golang/go#18817
Change-Id: If25d41723996b92069117f4d75488202b63310ef
Reviewed-on: https://go-review.googlesource.com/42231
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Benchmarks are treated as unit tests and distributed to the test
helpers, which allows them to fit in our 5m trybot budget.
Currently we only run the go1 and x/benchmarks. Running package
benchmarks is a TODO.
This feature is disabled by default, and is enabled by the
"farmer-run-bench" project attribute.
Updates golang/go#19178
Updates golang/go#19871
Change-Id: I9c3a14da60c3662e7e2cb4e71953060915cc4364
Reviewed-on: https://go-review.googlesource.com/38306
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The buildlets were partially updated to support this in
https://golang.org/cl/38792 but overlooked that ServerName == "go" was
still hard-coded. This CL also fixes that in the buildlet.
Fixesgolang/go#16442
Change-Id: Ia2b794bdf9df8ab75875b9951b53a7bb5f5f6afe
Reviewed-on: https://go-review.googlesource.com/38798
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.
The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.
So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.
But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.
Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.
Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.
Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Improvements to support rapid scheduling of many build jobs:
- Retry logic in Kubernetes client to handle sporadic connection
closes from their API server under heavy load
- Cluster autoscaler scales on default CPU utilization metric
- Debug mode allows scheduling multiple builds to test scaling
- Account for scheduled vs. provisioned resources in a cluster
and use that information to estimate when a build's pod
will be scheduled and in running state
- Use estimated scheduled time to set context timeout
- Track pod lifecycle (requested time, estimated available time,
actual available time, terminate time, etc)
Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Also update all users, and rename another template field.
Includes changes to coordinator, gomote, and release.
Change-Id: I1c4408eadbcb83d61063a910dfa18cc395952bc2
Reviewed-on: https://go-review.googlesource.com/20976
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
buildenv.Environment type defines configuration options:
- Coordinator uses the GCE project name to lookup config. A custom
config name can be provided at runtime to override.
- The conventional prod and stage project names ('symbolic-datum-552'
and 'go-dashboard-dev') map to prod and staging configuration structs.
- Production and staging status is explicitly defined in configuration.
- GCS bucket names for buildlet, logs, and snapshots are
configurable.
Change-Id: I7e6d7874eb0bdfe35dbdd5fcf6212ab50d576b88
Reviewed-on: https://go-review.googlesource.com/19502
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* Buildlets cluster can be created using buildongce program
* Deployment Manager builds cluster and autoscaler
* Describe the environments a build system can be deployed to
* Merges https://golang.org/cl/17827
Updates golang/go#12546
Change-Id: Icac4efab2e9e72932c66f56da319e4d51326455a
Reviewed-on: https://go-review.googlesource.com/17334
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>