Bug 1523321 - Add the wrupdater code to sync m-c's copy of WR to github. r=jrmuizel

Differential Revision: https://phabricator.services.mozilla.com/D33547

--HG--
extra : moz-landing-system : lando
This commit is contained in:
Kartikaya Gupta 2019-07-15 21:31:46 +00:00
Родитель 0204171ff0
Коммит 4be1661f49
5 изменённых файлов: 591 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,46 @@
This folder contains scripts to sync the WebRender code from
mozilla-central to Github. The scripts in this folder were derived
from the code at `https://github.com/staktrace/wrupdater`; the main
difference is that the versions in this folder are designed to run
as a taskcluster task. The versions in this folder are the canonical
version going forward; the `staktrace/wrupdater` Github repo will
continue to exist only as a historical archive.
The main entry point is the `sync-to-github.sh` script. It creates a
staging directory at `~/.wrupdater` if one doesn't already exist,
and clones the `webrender` repo into it. The script also requires the
`GECKO_PATH` environment variable to point to a mercurial clone of
`mozilla-central`, and access to the taskcluster secrets service to
get a Github API token.
The script does some setup steps but the bulk of the actual work
is done by the `converter.py` script. This script scans the mercurial
repository for new changes to the `gfx/wr` folder, and adds commits to
the git repository corresponding to those changes. There are some
details in the implementation that make it more robust than simply
exporting patches and attempting to reapply them; in particular it
builds a commit tree structure that mirrors what is found in the
`mozilla-central` repository with respect to branches and merges. So
if conflicting changes land on autoland and inbound, and then get
merged, the git repository commits will have the same structure with
a fork/merge in the commit history. This was discovered to be
necessary after a previous version ran into multiple cases where
the simple patch approach didn't really work.
Once the converter is done converting, the `sync-to-github.sh` script
finishes the process by pushing the new commits to `moz-gfx/webrender`
and generating a pull request against `servo/webrender`. It also
leaves a comment on the PR that triggers testing and merge of the PR.
If there is already a pull request (perhaps from a previous run) the
pre-existing PR is force-updated instead. This allows for graceful
handling of scenarios where the PR failed to get merged (e.g. due to
CI failures on the Github side).
The script is intended to by run by taskcluster for any changes that
touch the `gfx/wr` folder that land on `mozilla-central`. This may mean
that multiple instances of this script run concurrently, or even out
of order (i.e. the task for an older m-c push runs after the task for
a newer m-c push). The script was written with these possibilities in
mind and should be able to eventually recover from any such scenario
automatically (although it may take additional changes to mozilla-central
for such recovery to occur).

Просмотреть файл

@ -0,0 +1,357 @@
#!/usr/bin/env python3
import os
import re
import subprocess
import sys
import pygit2
import requests
import hglib
DEBUG = False
def eprint(*args, **kwargs):
print(*args, file=sys.stderr, **kwargs)
def debugprint(*args, **kwargs):
if DEBUG:
eprint(*args, **kwargs)
class HgCommit:
def __init__(self, parent1, parent2):
self.parents = []
if parent1 == NULL_PARENT_REV:
raise Exception("Encountered a hg changeset with no parents! We don't handle this....")
self.parents.append(parent1)
if parent2 != NULL_PARENT_REV:
self.parents.append(parent2)
self.touches_wr_code = False
self.children = []
def add_child(self, rev):
self.children.append(rev)
class GitCommit:
def __init__(self, hg_rev, commit_obj):
self.hg_rev = hg_rev
self.commit_obj = commit_obj
def load_git_repository():
commit_map = dict()
re_commitmsg = re.compile(r"^\[wrupdater\] From https://hg.mozilla.org/mozilla-central/rev/([0-9a-fA-F]+)$", re.MULTILINE)
for commit in webrender_git_repo.walk(webrender_git_repo.head.target):
m = re_commitmsg.search(commit.message)
if not m:
continue
hg_rev = m.group(1)
commit_map[hg_rev] = GitCommit(hg_rev, commit)
debugprint("Loaded pre-existing commit hg %s -> git %s" % (hg_rev, commit.oid))
return commit_map
def timeof(git_commit):
return git_commit.commit_obj.commit_time + git_commit.commit_obj.commit_time_offset
def find_newest_commit(commit_map):
newest_hg_rev = None
newest_commit_time = None
for hg_rev, git_commit in commit_map.items():
if newest_hg_rev is None:
newest_hg_rev = hg_rev
newest_commit_time = timeof(git_commit)
elif timeof(git_commit) > newest_commit_time:
newest_hg_rev = hg_rev
newest_commit_time = timeof(git_commit)
return newest_hg_rev
def get_single_rev(revset):
output = subprocess.check_output(['hg', 'log', '-r', revset, '--template', '{node}'])
output = str(output, "ascii")
return output
def get_multiple_revs(revset, template):
output = subprocess.check_output(['hg', 'log', '-r', revset, '--template', template + '\\n'])
for line in output.splitlines():
yield str(line, "ascii")
def get_base_hg_rev(commit_map):
base_hg_rev = find_newest_commit(commit_map)
eprint("Using %s as base hg revisions" % base_hg_rev)
return base_hg_rev
def load_hg_commits(commits, query):
for cset in get_multiple_revs(query, '{node} {p1node} {p2node}'):
tokens = cset.split()
commits[tokens[0]] = HgCommit(tokens[1], tokens[2])
return commits
def get_real_base_hg_rev(hg_data, commit_map):
# Some of the WR commits we want to port to github may have landed on codelines
# that branched off central prior to base_hg_rev. So when we create the git
# equivalents, they will have parents that are not the HEAD of the git repo,
# but instead will be descendants of older commits in the git repo. In order
# to do this correctly, we need to find the hg-equivalents of all of those
# possible git parents. So first we identify all the "tail" hg revisions in
# our hg_data set (think "tail" as in opposite of "head" which is the tipmost
# commit). The "tail" hg revisions are the ones for which we don't have their
# ancestors in hg_data.
tails = []
for (rev, cset) in hg_data.items():
for parent in cset.parents:
if not parent in hg_data:
tails.append(rev)
eprint("Found hg tail revisions %s" % tails)
# Then we find their common ancestor, which will be some ancestor of base_hg_rev
# from which those codelines.
if len(tails) == 0:
common_ancestor = get_single_rev('.')
else:
common_ancestor = get_single_rev('ancestor(' + ','.join(tails) + ')')
eprint("Found common ancestor of tail revisions: %s" % common_ancestor)
# And then we find the newest git commit whose hg-equivalent is an ancestor of
# that common ancestor, to make sure we are starting from a known hg/git
# commit pair.
for git_commit in sorted(commit_map.values(), key=timeof, reverse=True):
new_base = get_single_rev('ancestor(' + common_ancestor + ',' + git_commit.hg_rev + ')')
if new_base == common_ancestor:
eprint("Pre-existing WR commit %s from hg rev %s is descendant of common ancestor; walking back further..." % (git_commit.commit_obj.id, git_commit.hg_rev))
continue
if new_base != git_commit.hg_rev:
eprint("Pre-existing WR commit %s from hg rev %s is on sibling branch of common ancestor; walking back further..." % (git_commit.commit_obj.id, git_commit.hg_rev))
continue
eprint("Pre-existing WR commit %s from hg rev %s is sufficiently old; stopping walk" % (git_commit.commit_obj.id, git_commit.hg_rev))
common_ancestor = new_base
break
return common_ancestor
# Now we prune out all the uninteresting changesets from hg_commits. The
# uninteresting ones are ones that don't touch WR code and are not merges. We
# do this by rewriting the parents to the "interesting" ancestor.
def prune_boring(rev):
while rev in hg_commits:
parent_pruned = False
for i in range(len(hg_commits[rev].parents)):
parent_rev = hg_commits[rev].parents[i]
if not parent_rev in hg_commits:
continue
if hg_commits[parent_rev].touches_wr_code:
continue
if len(hg_commits[parent_rev].parents) > 1:
continue
# If we get here, then `parent_rev` is a boring revision and we can
# prune it. Connect `rev` to its grandparent, and prune the parent
grandparent_rev = hg_commits[parent_rev].parents[0]
hg_commits[rev].parents[i] = grandparent_rev
# eprint("Pruned %s as boring parent of %s, using %s now" % (parent_rev, rev, grandparent_rev))
parent_pruned = True
if parent_pruned:
# If we pruned a parent, process `rev` again as we might want to
# prune more parents
continue
# If we get here, all of `rev`s parents are interesting, so we can't
# prune them. Move up to the parent rev and start processing that, or
# if we have multiple parents then recurse on those nodes.
if len(hg_commits[rev].parents) == 1:
rev = hg_commits[rev].parents[0]
continue
for parent_rev in hg_commits[rev].parents:
prune_boring(parent_rev)
return
class FakeCommit:
def __init__(self, oid):
self.oid = oid
def fake_commit(hg_rev, parent1, parent2):
if parent1 is None:
eprint("ERROR: Trying to build on None")
exit(1)
oid = "githash_%s" % hash(parent1)
eprint("Fake-built %s" % oid)
return FakeCommit(oid)
def build_tree(builder, treedata):
for (name, value) in treedata.items():
if isinstance(value, dict):
subbuilder = webrender_git_repo.TreeBuilder()
build_tree(subbuilder, value)
builder.insert(name, subbuilder.write(), pygit2.GIT_FILEMODE_TREE)
else:
(filemode, contents) = value
blob_oid = webrender_git_repo.create_blob(contents)
builder.insert(name, blob_oid, filemode)
def author_to_signature(author):
pieces = author.strip().split('<')
if len(pieces) != 2 or pieces[1][-1] != '>':
# We could probably handle this better
return pygit2.Signature(author, '')
name = pieces[0].strip()
email = pieces[1][:-1].strip()
return pygit2.Signature(name, email)
def real_commit(hg_rev, parent1, parent2):
filetree = dict()
manifest = mozilla_hg_repo.manifest(rev=hg_rev)
for (nodeid, permission, executable, symlink, filename) in manifest:
if not filename.startswith(b'gfx/wr/'):
continue
if symlink:
filemode = pygit2.GIT_FILEMODE_LINK
elif executable:
filemode = pygit2.GIT_FILEMODE_BLOB_EXECUTABLE
else:
filemode = pygit2.GIT_FILEMODE_BLOB
filecontent = mozilla_hg_repo.cat([filename], rev=hg_rev)
subtree = filetree
for component in filename.split(b'/')[2:-1]:
subtree = subtree.setdefault(component.decode("latin-1"), dict())
filename = filename.split(b'/')[-1]
subtree[filename.decode("latin-1")] = (filemode, filecontent)
builder = webrender_git_repo.TreeBuilder()
build_tree(builder, filetree)
tree_oid = builder.write()
parent1_obj = webrender_git_repo.get(parent1)
if parent1_obj.tree_id == tree_oid:
eprint("Early-exit; tree matched that of parent git commit %s" % parent1)
return parent1_obj
if parent2 is not None:
parent2_obj = webrender_git_repo.get(parent2)
if parent2_obj.tree_id == tree_oid:
eprint("Early-exit; tree matched that of parent git commit %s" % parent2)
return parent2_obj
hg_rev_obj = mozilla_hg_repo.log(revrange=hg_rev, limit=1)[0]
commit_author = hg_rev_obj[4].decode("latin-1")
commit_message = hg_rev_obj[5].decode("latin-1")
commit_message += '\n\n[wrupdater] From https://hg.mozilla.org/mozilla-central/rev/%s' % hg_rev + '\n'
parents = [parent1]
if parent2 is not None:
parents.append(parent2)
commit_oid = webrender_git_repo.create_commit(
None,
author_to_signature(commit_author),
pygit2.Signature('wrupdater', 'graphics-team@mozilla.staktrace.com'),
commit_message,
tree_oid,
parents,
)
eprint("Built git commit %s" % commit_oid)
return webrender_git_repo.get(commit_oid)
def try_commit(hg_rev, parent1, parent2 = None):
if False:
return fake_commit(hg_rev, parent1, parent2)
else:
return real_commit(hg_rev, parent1, parent2)
def build_git_commits(rev):
debugprint("build_git_commit(%s)..." % rev)
if rev in hg_to_git_commit_map:
debugprint(" maps to %s" % hg_to_git_commit_map[rev].commit_obj.oid)
return hg_to_git_commit_map[rev].commit_obj.oid
if rev not in hg_commits:
debugprint(" not in hg_commits")
return None
if len(hg_commits[rev].parents) == 1:
git_parent = build_git_commits(hg_commits[rev].parents[0])
if not hg_commits[rev].touches_wr_code:
eprint("WARNING: Found rev %s that is non-merge and non-WR" % rev)
return git_parent
eprint("Building git equivalent for %s on top of %s" % (rev, git_parent))
commit_obj = try_commit(rev, git_parent)
hg_to_git_commit_map[rev] = GitCommit(rev, commit_obj)
debugprint(" built %s as %s" % (rev, commit_obj.oid))
return commit_obj.oid
git_parent_1 = build_git_commits(hg_commits[rev].parents[0])
git_parent_2 = build_git_commits(hg_commits[rev].parents[1])
if git_parent_1 is None or git_parent_2 is None or git_parent_1 == git_parent_2:
git_parent = git_parent_1 if git_parent_2 is None else git_parent_2
if not hg_commits[rev].touches_wr_code or git_parent is None:
debugprint(" %s is merge with no parents or doesn't touch WR, returning %s" % (rev, git_parent))
return git_parent
eprint("WARNING: Found merge rev %s whose parents have identical WR code, but modifies WR" % rev)
eprint("Building git equivalent for %s on top of %s" % (rev, git_parent))
commit_obj = try_commit(rev, git_parent)
hg_to_git_commit_map[rev] = GitCommit(rev, commit_obj)
debugprint(" built %s as %s" % (rev, commit_obj.oid))
return commit_obj.oid
# An actual merge
eprint("Building git equivalent for %s on top of %s, %s" % (rev, git_parent_1, git_parent_2))
commit_obj = try_commit(rev, git_parent_1, git_parent_2)
hg_to_git_commit_map[rev] = GitCommit(rev, commit_obj)
debugprint(" built %s as %s" % (rev, commit_obj.oid))
return commit_obj.oid
if len(sys.argv) < 2:
eprint("Usage: %s <path-to-webrender-git-repo>")
eprint("Current dir must be the mozilla hg repo")
exit(1)
webrender_git_path = sys.argv[1]
mozilla_hg_path = os.getcwd()
NULL_PARENT_REV = '0000000000000000000000000000000000000000'
webrender_git_repo = pygit2.Repository(pygit2.discover_repository(webrender_git_path))
mozilla_hg_repo = hglib.open(mozilla_hg_path)
hg_to_git_commit_map = load_git_repository()
base_hg_rev = get_base_hg_rev(hg_to_git_commit_map)
hg_commits = load_hg_commits(dict(), 'only(.,' + base_hg_rev + ')')
eprint("Initial set has %s changesets" % len(hg_commits))
base_hg_rev = get_real_base_hg_rev(hg_commits, hg_to_git_commit_map)
eprint("Using hg rev %s as common ancestor of all interesting changesets" % base_hg_rev)
# Refresh hg_commits with our wider dataset
hg_tip = get_single_rev('.')
wider_range = "%s::%s" % (base_hg_rev, hg_tip)
hg_commits = load_hg_commits(hg_commits, wider_range)
eprint("Updated set has %s changesets" % len(hg_commits))
# Also flag any changes that touch WR
for cset in get_multiple_revs('(' + wider_range + ') & modifies("glob:gfx/wr/**")', '{node}'):
hg_commits[cset].touches_wr_code = True
eprint("Identified %s changesets that touch WR code" % sum([1 if v.touches_wr_code else 0 for (k, v) in hg_commits.items()]))
prune_boring(hg_tip)
# hg_tip itself might be boring
if not hg_commits[hg_tip].touches_wr_code and len(hg_commits[hg_tip].parents) == 1:
new_tip = hg_commits[hg_tip].parents[0]
eprint("Pruned tip %s as boring, using %s now" % (hg_tip, new_tip))
hg_tip = new_tip
# Extra logging, disabled by default
if DEBUG:
for (rev, cset) in hg_commits.items():
desc = " %s" % rev
desc += " touches WR: %s" % cset.touches_wr_code
desc += " parents: %s" % cset.parents
if rev in hg_to_git_commit_map:
desc += " git: %s" % hg_to_git_commit_map[rev].commit_obj.oid
if rev == hg_tip:
desc += " (tip)"
eprint(desc)
git_tip = build_git_commits(hg_tip)
if git_tip is None:
eprint("No new changesets generated, exiting.")
else:
webrender_git_repo.create_reference('refs/heads/wrupdater', git_tip, force=True)
eprint("Updated wrupdater branch to %s, done!" % git_tip)

Просмотреть файл

@ -0,0 +1,36 @@
#!/usr/bin/env python3
import json
import sys
j = json.load(sys.stdin)
components = sys.argv[1].split('/')
def next_match(json_fragment, components):
if len(components) == 0:
yield json_fragment
else:
component = components[0]
if type(json_fragment) == list:
if component == '*':
for item in json_fragment:
yield from next_match(item, components[1:])
else:
component = int(component)
if component >= len(j):
sys.exit(1)
yield from next_match(json_fragment[component], components[1:])
elif type(json_fragment) == dict:
if component == '*':
for key in sorted(json_fragment.keys()):
yield from next_match(json_fragment[key], components[1:])
elif component not in json_fragment:
sys.exit(1)
else:
yield from next_match(json_fragment[component], components[1:])
for match in list(next_match(j, components)):
if type(match) == dict:
print(' '.join(match.keys()))
else:
print(match)

Просмотреть файл

@ -0,0 +1,150 @@
#!/usr/bin/env bash
# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/. */
# Do NOT set -x here, since that will expose a secret API token!
set -o errexit
set -o nounset
set -o pipefail
if [[ "$(uname)" != "Linux" ]]; then
echo "Error: this script must be run on Linux due to readlink semantics"
exit 1
fi
# GECKO_PATH should definitely be set
if [[ -z "${GECKO_PATH}" ]]; then
echo "Error: GECKO_PATH must point to a hg clone of mozilla-central"
exit 1
fi
# Internal variables, don't fiddle with these
MYSELF=$(readlink -f ${0})
MYDIR=$(dirname "${MYSELF}")
WORKDIR="${HOME}/.wrupdater"
TMPDIR="${WORKDIR}/tmp"
SECRET="project/webrender-ci/wrupdater-github-token"
MOZ_SCM_LEVEL=${MOZ_SCM_LEVEL:-1} # 1 means try push, so no access to secret
mkdir -p "${TMPDIR}"
# Bring the webrender clone to a known good up-to-date state
if [[ ! -d "${WORKDIR}/webrender" ]]; then
echo "Setting up webrender repo..."
git clone https://github.com/servo/webrender "${WORKDIR}/webrender"
pushd "${WORKDIR}/webrender"
git remote add moz-gfx https://github.com/moz-gfx/webrender
popd
else
echo "Updating webrender repo..."
pushd "${WORKDIR}/webrender"
git checkout master
git pull
popd
fi
if [[ "${MOZ_SCM_LEVEL}" != "1" ]]; then
echo "Obtaining github API token..."
# Be careful, GITHUB_TOKEN is secret, so don't log it (or any variables
# built using it).
GITHUB_TOKEN=$(
curl -sSfL "http://taskcluster/secrets/v1/secret/${SECRET}" |
${MYDIR}/read-json.py "secret/token"
)
AUTH="moz-gfx:${GITHUB_TOKEN}"
fi
echo "Pushing base wrupdater branch..."
pushd "${WORKDIR}/webrender"
git fetch moz-gfx
git checkout -B wrupdater moz-gfx/wrupdater || git checkout -B wrupdater master
if [[ "${MOZ_SCM_LEVEL}" != "1" ]]; then
# git may emit error messages that contain the URL, so let's sanitize them
# or we might leak the auth token to the task log.
git push "https://${AUTH}@github.com/moz-gfx/webrender" \
wrupdater:wrupdater 2>&1 | sed -e "s/${AUTH}/_SANITIZED_/g"
fi
popd
# Run the converter
echo "Running converter..."
pushd "${GECKO_PATH}"
"${MYDIR}/converter.py" "${WORKDIR}/webrender"
popd
# Check to see if we have changes that need pushing
echo "Checking for new changes..."
pushd "${WORKDIR}/webrender"
PATCHCOUNT=$(git log --oneline moz-gfx/wrupdater..wrupdater | wc -l)
if [[ ${PATCHCOUNT} -eq 0 ]]; then
echo "No new patches found, aborting..."
exit 0
fi
# Log the new changes, just for logging purposes
echo "Here are the new changes:"
git log --graph --stat moz-gfx/wrupdater..wrupdater
# Collect PR numbers of PRs opened on Github and merged to m-c
set +e
FIXES=$(
git log master..wrupdater |
grep "\[import-pr\] From https://github.com/servo/webrender/pull" |
sed -e "s%.*pull/% Fixes #%" |
uniq |
tr '\n' ','
)
echo "${FIXES}"
set -e
if [[ "${MOZ_SCM_LEVEL}" == "1" ]]; then
echo "Running in try push, exiting now"
exit 0
fi
echo "Pushing new changes to moz-gfx..."
# git may emit error messages that contain the URL, so let's sanitize them
# or we might leak the auth token to the task log.
git push "https://${AUTH}@github.com/moz-gfx/webrender" +wrupdater:wrupdater \
2>&1 | sed -e "s/${AUTH}/_SANITIZED_/g"
CURL_HEADER="Accept: application/vnd.github.v3+json"
CURL=(curl -sSfL -H "${CURL_HEADER}" -u "${AUTH}")
# URL extracted here mostly to make servo-tidy happy with line lengths
API_URL="https://api.github.com/repos/servo/webrender"
# Check if there's an existing PR open
echo "Listing pre-existing pull requests..."
"${CURL[@]}" "${API_URL}/pulls?head=moz-gfx:wrupdater" |
tee "${TMPDIR}/pr.get"
set +e
COMMENT_URL=$(cat "${TMPDIR}/pr.get" | ${MYDIR}/read-json.py "0/comments_url")
HAS_COMMENT_URL="${?}"
set -e
if [[ ${HAS_COMMENT_URL} -ne 0 ]]; then
echo "Pull request not found, creating..."
# The PR doesn't exist yet, so let's create it
( echo -n '{ "title": "Sync changes from mozilla-central"'
echo -n ', "body": "'"${FIXES}"'"'
echo -n ', "head": "moz-gfx:wrupdater"'
echo -n ', "base": "master" }'
) > "${TMPDIR}/pr.create"
"${CURL[@]}" -d "@${TMPDIR}/pr.create" "${API_URL}/pulls" |
tee "${TMPDIR}/pr.response"
COMMENT_URL=$(
cat "${TMPDIR}/pr.response" |
${MYDIR}/read-json.py "comments_url"
)
fi
# At this point COMMENTS_URL should be set, so leave a comment to tell bors
# to merge the PR.
echo "Posting r+ comment to ${COMMENT_URL}..."
echo '{ "body": "@bors-servo r+" }' > "${TMPDIR}/bors_rplus"
"${CURL[@]}" -d "@${TMPDIR}/bors_rplus" "${COMMENT_URL}"
echo "All done!"

Просмотреть файл

@ -27,6 +27,8 @@ packages = [
# Files that are ignored for all tidy and lint checks.
files = [
"./wrench/src/egl.rs", # Copied from glutin
"./ci-scripts/wrupdater/converter.py", # servo-tidy doesn't like python3
"./ci-scripts/wrupdater/read-json.py", # servo-tidy doesn't like python3
]
# Many directories are currently ignored while we tidy things up