This CL skips deleting active remote buildlets.
The coordinator has multiple ways of tracking stale buildlets. For our
GCE buildlets, we periodically delete old VMs after their expiration
time, typically 45 minutes after their creation. The expiration tracking
in coordinator/gce.go does not account for remote buildlets, which are
buildlets created by users or cmd/release. Remote buildlets have their
own staleness checks and cleanup process, so we should skip the GCE
specific cleanup logic for them.
This adds an additional field to the buildlet Client in order to
correlate a GCE VM with a buildlet.
Updates golang/go#37001
Change-Id: Ib0acdf79c4dfbee6e0061c513f98b749d4b9cc64
Reviewed-on: https://go-review.googlesource.com/c/build/+/217722
Run-TryBot: Alexander Rakoczy <alex@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
buildlet should use a specific service account, allowing fine-grained
permission control that is different from the default service account
for whichever project they are run in.
Change-Id: I7a86308d6b65f370dfc49649ef10686d1d8b2974
Reviewed-on: https://go-review.googlesource.com/c/build/+/210958
Run-TryBot: Alexander Rakoczy <alex@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
This fixes stuff in CL 210498 and CL 210237.
I renamed the Zone field to ControlZone both to make it more clear and
to force compilation errors wherever Zone was used previously, which
revealed some things that were missed.
Updates golang/go#35987
Change-Id: I2f890727ece86d093a90a3b47701caa58de6ccbc
Reviewed-on: https://go-review.googlesource.com/c/build/+/210541
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
GCE zones can and will run out of resources. This will randomly
select a zone from a list of zones to deploy each new VM to.
Fixesgolang/go#35987
Change-Id: I57acad5c4e81d108f7db8f5bb1ff221a1845a422
Reviewed-on: https://go-review.googlesource.com/c/build/+/210237
Run-TryBot: Carlos Amedee <carlos@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Alexander Rakoczy <alex@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Also modernizes some code in the coordinator.
Updates golang/go#35354 (or fixes. But we could return more info.)
Change-Id: Ifc1aa85ca217a0932e388ec5d36ef0737b90c63d
Reviewed-on: https://go-review.googlesource.com/c/build/+/207841
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Maybe this will solve the golang/go#28365 problems. But at least it
gets us into codepaths that are known & trusted, and removes use of
deprecated API.
Also, add more logging to help debug golang/go#28365.
Updates golang/go#28365
Change-Id: Ibff2b03fd82573cbeedbbc22d12c30ae1a3c3aa0
Reviewed-on: https://go-review.googlesource.com/c/build/+/203217
Reviewed-by: Bryan C. Mills <bcmills@google.com>
Previously, we were invoking a single 'go test' run at the repository
root with the import path pattern of 'golang.org/x/{repo}/...'. This
pattern does not match packages that are located in nested modules
in the repository.
Look for go.mod files in all subdirectories of the repository to find
all inner modules. Then, run 'go test' inside each module root, thus
testing all packages in all modules of the repository. If one of the
test invocations fails, keep testing others, and report all failures.
When looking for inner modules, consider only those that have module
path that would not be ignored by the go tool and aren't vendored.
This way, go.mod files inside testdata directories aren't treated as
if they're proper modules.
This is being done only when the tests are running in module mode,
since module boundaries don't exist in GOPATH mode.
Fixesgolang/go#32528
Change-Id: I9f8558982885c9955d3b34127c80c485d713b380
Reviewed-on: https://go-review.googlesource.com/c/build/+/194559
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Binary releases need to build Go and include binaries such as bin/go,
bin/gofmt, and others. Previously, this was accomplished by running
all.bash script for some GOOS/GOARCH pairs, and make.bash for others
where it wasn't viable to run tests as part of the release process.
This change makes the release process more consistent by always
packaging the release archive file after running make.bash. We still
run all.bash in situations where it was previously run, but we do so
after the release file has already been created. This avoids the
risk of any changes to GOROOT that may occur as part of all.bash
(including changing file permissions to be read-only) being included
in the final release file.
Add a step to check that files in the buildlet's $WORKDIR/go and
$WORKDIR/go/bin directories have expected permissions before
creating the release file.
Fixesgolang/go#33537
Updates golang/go#30316
Change-Id: I7d40716dba656a8aca711377f2995df4880166c5
Reviewed-on: https://go-review.googlesource.com/c/build/+/189537
Reviewed-by: Andrew Bonventre <andybons@golang.org>
The debugnewvm command was printing out the wrong VM name, since the
old method had also assumed (but not documented) that it only applied
to nested virtualization containers. But remove that requirement and
make the empty string mean unspecified instead (which we currently
mean to use Container-Optimized OS) role and then use its new
definition.
Change-Id: Ieca138285aa567b1c24d585c5aa180f8a1534154
Reviewed-on: https://go-review.googlesource.com/c/build/+/177919
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
5 seconds is too slow on linux-arm it seems (starting sshd,
conditionally creating a host key, etc)
Change-Id: I2015908eb892d0b2dd580db8ba96c10726502cbb
Reviewed-on: https://go-review.googlesource.com/c/build/+/175998
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The old revdial has a simple multiplexing protocol that was like
HTTP/2 but without flow control, etc. But it was too simple (no flow
control) and too complex. Instead, just use one TCP connection per
reverse dialed connection. For now, the NAT'ed machine needs to go
re-connect for each incoming connection, but in practice that's just
once.
The old implementation is retained for now until all the buildlets are
updated.
Updates golang/go#31639
Change-Id: Id94c98d2949e695b677531b1221a827573543085
Reviewed-on: https://go-review.googlesource.com/c/build/+/174082
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
This adds scripts to create a new builder host image that acts like
Container-Optimized OS (has docker, runs konlet on startup) but with a
Debian 9 kernel + userspace that permits KVM for nested
virtualization.
Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)
Change-Id: Ib1d3a250556703856083c222be2a70c4e8d91884
Reviewed-on: https://go-review.googlesource.com/c/163301
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
This adds a linux-amd64 COS builder that should be just like our
existing linux-amd64 COS builder except that it's using a forked image
that has the VMX license bit enabled for nested virtualization. (GCE
appears to be using the license mechanism as some sort of opt-in
mechanism for features that aren't yet GA; might go away?)
Once this is in, it won't do any new builds as regular+trybot builders
are disabled. But it means I can then use gomote + debugnewvm to work
on preparing the other four image types.
Updates golang/go#15581 (solaris)
Updates golang/go#23060 (dragonfly)
Updates golang/go#30262 (riscv)
Updates golang/go#30267 (fuchsia)
Updates golang/go#23824 (android)
Change-Id: Ic55f17eea17908dba7f58618d8cd162a2ed9b015
Reviewed-on: https://go-review.googlesource.com/c/162959
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Once containers run on COS instead of Kubernetes, one name (Kube*) is
wrong and the other (GCE) is ambiguous. So rename them now to be more
specific.
No behavior changes. Just renaming in this step, to reduce size of
next CL.
Updates golang/go#25108
Change-Id: Ib09eb682ef74acbbf6ed50b46074f834ef5e0c0b
Reviewed-on: https://go-review.googlesource.com/111639
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This removes some duplication of scopes and how to get the
TokenSource and which credentials to use.
And update the coordinator deps, since its rev of
golang.org/x/oauth2/google was too old to have the new type.
I want to clean this up more, but I need to make some changes to to
the oauth2/google package first. More later.
Change-Id: Ic2799ec2ec62f67c65de6380b373fe915a43003e
Reviewed-on: https://go-review.googlesource.com/111266
Reviewed-by: Andrew Bonventre <andybons@golang.org>
This adds an SSH server to farmer.golang.org on port 2222 that proxies
SSH connections to users' gomote-created buildlet instances.
For example:
$ gomote create openbsd-amd64-60
user-bradfitz-openbsd-amd64-60-1
$ gomote ssh user-bradfitz-openbsd-amd64-60-1
Warning: Permanently added '[localhost]:33351' (ECDSA) to the list of known hosts.
OpenBSD 6.0 (GENERIC.MP) golang/go#2319: Tue Jul 26 13:00:43 MDT 2016
Welcome to OpenBSD: The proactively secure Unix-like operating system.
Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code. With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.
$
As before, if the coordinator process is restarted (or crashes, is
evicted, etc), all gomote instances die.
Not yet supported:
* scp (help wanted)
* not all host types are configured. most are. some will need slight
config tweaks to the Docker image (e.g. adding openssh-server)
Supports currently:
* linux-amd64 (host type shared by 386, nacl)
* linux-arm
* linux-arm64
* darwin
* freebsd
* openbsd
* plan9-386
* windows
Implementation details:
* the ssh server process listens on port 2222 in the coordinator
(farmer.golang.org), which is behind a GKE TCP load balancer.
* the ssh server library is github.com/gliderlabs/ssh
* authentication is done via Github users' public keys. It's assumed
that gomote user == github user. But there's a mapping in the code
for known exceptions.
* we can't give out access to this too widely. too many things are
accessible from within the host environment if you look in the right
places. Details omitted. But the Go team and other trusted gomote
users can use this.
* the buildlet binary has a new /connect-ssh handler that acts like a
CONNECT request but instead of taking an explicit host:port, just
says "give me your machine's SSH connection". The buildlet can also
start sshd if needed for the environment. The /connect-ssh handler
also installs the coordinator's public key.
* a new buildlet client library method "ConnectSSH" hits the /connect-ssh
handler and returns a net.Conn.
* the coordinator's ssh.Handler is just running the OpenSSH ssh client.
* because the OpenSSH ssh child process can't connect to a net.Conn,
an emphemeral localhost port is created on the coordinator to proxy
between the ssh client and the net.Conn returned by ConnectSSH.
* The /connect-ssh handler requires http.Hijacker, which requires
fully compliant net.Conn implementations as of Go 1.8. So I needed
to flesh out revdial too, testing it with the
golang.org/x/net/nettest package.
* plan9 doesn't have an ssh server, so we use 0intro's new conterm
program (drawterm without GUI support) to connect to plan9 from the
coordinator ssh proxy instead of using the OpenSSH ssh client
binary.
* windows doesn't have an ssh server, so we enable the telnet service
and the coordinator ssh proxy uses telnet instead on the backend
on the private network. (There is a Windows ssh server but only in
new versions.)
Happy debugging over ssh!
Fixesgolang/go#19956
Change-Id: I80a62064c5f85af1f195f980c862ba29af4015f0
Reviewed-on: https://go-review.googlesource.com/50750
Reviewed-by: Herbie Ong <herbie@google.com>
Reviewed-by: Jessie Frazelle <me@jessfraz.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
My gomote username is josharian.
My whoami is josh.
As a result, I need to run 'gomote -user=josharian'
every time, which is an annoyance.
Add a simple config file to allow a user to change their user name.
On startup, gomote checks whether user-$(whoami).user exists.
If so, it reads it to get the real username.
So I have a config file ~/.config/gomote/user-josh.user
with the contents 'josharian'.
It's a bit of an ugly hack, but it works and spares me some pain.
Change-Id: I372d5a786b99c9e3c6a57f25b0a38b9146f23598
Reviewed-on: https://go-review.googlesource.com/42230
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Replace oauth2.NoContext (deprecated) with context.Background(),
which has been available for two consecutive releases.
Add more cloud.google.com/go packages to the cmd/coordinator
Dockerfile to fix an error building cmd/coordinator. A dependency is
not present in master of cloud.google.com/go, but was present in the
older revision, and was not getting checked out correctly during the
"go get" step. In addition, we were failing to fetch dependencies for
some packages that coordinator depends on. I added instructions for
hopefully doing this more systematically in the future.
Fix the gitmirror Dockerfile which has the same problem.
Change-Id: Id6c2220482350a686b87742ec7915c457a689e52
Reviewed-on: https://go-review.googlesource.com/40852
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This lets us set a timeout on the HTTP request, which lets us time out the
writeSnapshot call in cmd/coordinator.
Fixesgolang/go#18812.
Change-Id: I370448df4d95130c9c5b30ba32459ce844a6c967
Reviewed-on: https://go-review.googlesource.com/36897
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
We hit GKE bugs and changes when upgrading from GKE 1.2 to 1.4.
The main issue is that Kubernetes does't reserve CPU or memory for
itself on nodes, so things were OOMing and getting killed. And when
Docker or Kubernetes got killed themselves, they were wedging and not
recovering.
So we're going to run a daemonset (POD on all nodes) to reserve space
for Kubernetes for it. That's not in this CL.
But this CL got us limping along and was already in production. It
doubles resource RAM usage for jobs, so fewer things schedule per node.
While we're at it, let jobs use more CPU if it's available.
Also, disable auto-scaling. It was off before by hand. Force it off
programatically too. And make the node count 5, like it was by hand.
Also, force un-graceful pod deletes, since GKE 1.3 or something
introduced a graceful-vs-ungraceful distinction, which we weren't
handling previously and therefore pods never were being deleted.
Change-Id: I3606e4e2e92c496d8194503d510921bd1614d34e
Reviewed-on: https://go-review.googlesource.com/33490
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Our builders are named of the form "GOOS-GOARCH" or
"GOOS-GOARCH-suffix".
Over time we've grown many builders. This CL doesn't change
that. Builders continue to be named and operate as before.
Previously the build configuration file (dashboard/builders.go) made
each builder type ("linux-amd64-race", etc) define how to create a
host running a buildlet of that type, even though many builders had
identical host configs. For example, these builders all share the same
host type (a Kubernetes container):
linux-amd64
linux-amd64-race
linux-386
linux-386-387
And these are the same host type (a GCE VM):
windows-amd64-gce
windows-amd64-race
windows-386-gce
This CL creates a new concept of a "hostType" which defines how
the buildlet is created (Kube, GCE, Reverse, and how), and then each
builder itself references a host type.
Users never see the hostType. (except perhaps in gomote list output)
But they at least never need to care about them.
Reverse buildlets now can only be one hostType at a time, which
simplifies things. We were no longer using multiple roles per machine
once moving to VMs for OS X.
gomote continues to operate as it did previously but its underlying
protocol changed and clients will need to be updated. As a new
feature, gomote now has a new flag to let you reuse a buildlet host
connection for different builder rules if they share the same
underlying host type. But users can ignore that.
This CL is a long-standing TODO (previously attempted and aborted) and
will make many things easier and faster, including the linux-arm
cross-compilation effort, and keeping pre-warmed buildlets of VM types
ready to go.
Updates golang/go#17104
Change-Id: Iad8387f48680424a8441e878a2f4762bf79ea4d2
Reviewed-on: https://go-review.googlesource.com/29551
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Also, add count of fds and goroutines to the coordinator's status
page.
Change-Id: I857e609623cfa280716d5d079180d0e4021d0bac
Reviewed-on: https://go-review.googlesource.com/27550
Reviewed-by: Quentin Smith <quentin@golang.org>
ctx was initialized in the wrong place.
Change-Id: I7fb3c56071a3e4d5cc2199e07fcc4b8ecbc7a674
Reviewed-on: https://go-review.googlesource.com/22967
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Improvements to support rapid scheduling of many build jobs:
- Retry logic in Kubernetes client to handle sporadic connection
closes from their API server under heavy load
- Cluster autoscaler scales on default CPU utilization metric
- Debug mode allows scheduling multiple builds to test scaling
- Account for scheduled vs. provisioned resources in a cluster
and use that information to estimate when a build's pod
will be scheduled and in running state
- Use estimated scheduled time to set context timeout
- Track pod lifecycle (requested time, estimated available time,
actual available time, terminate time, etc)
Change-Id: I14d6c5e01af0970dbb3390a29d1ee5c43049fff8
Reviewed-on: https://go-review.googlesource.com/19524
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
buildenv.Environment type defines configuration options:
- Coordinator uses the GCE project name to lookup config. A custom
config name can be provided at runtime to override.
- The conventional prod and stage project names ('symbolic-datum-552'
and 'go-dashboard-dev') map to prod and staging configuration structs.
- Production and staging status is explicitly defined in configuration.
- GCS bucket names for buildlet, logs, and snapshots are
configurable.
Change-Id: I7e6d7874eb0bdfe35dbdd5fcf6212ab50d576b88
Reviewed-on: https://go-review.googlesource.com/19502
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* status page shows kube pool details
* pods created by the coorindator are tracked
* pods that fail to create are deleted
* pods older than delete-at are deleted
* pods created by a different coordinator are deleted
Updates golang/go#12546
Change-Id: I4c4f8ff906962b4a014a66d0a9d490ff17710d62
Reviewed-on: https://go-review.googlesource.com/16101
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* Replaced cancel with context.Context
* StartPod can be canceled
* Wait for buildlet to come online, but fail fast if pod fails first
* Support timeout waiting for pod to leave pending phase
* Use Kubernetes watch API (long poll)
Updates golang/go#12546
Change-Id: I792a3b8fed615362a0290feee7de0c2cefe43c0e
Reviewed-on: https://go-review.googlesource.com/15285
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
* set correct metadata as env vars on each container in pod
* stage0 detects when running in a pod
* add scope to support farmer on GCE communicating with Google
Container Engine API
* dev mode supports GCE and Kubernetes even when not running
on GCE
* pod buildlet returns a configured http.Client
* pod buildlets work, but pods are not removed after completion
(or on creation failure)
Updates golang/go#12546
Change-Id: If91673b49223130c1e7077c130f1abe1e7966d02
Reviewed-on: https://go-review.googlesource.com/15041
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Find the default Kubernetes cluster and configure a client to talk to it.
Use application default credentials.
Updates golang/go#12546
Change-Id: Ifb1ce57f52f4fbbee3267f8cc3cf02a78146bd5b
Reviewed-on: https://go-review.googlesource.com/14532
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>