In bug 1271035 gps identified disk I/O as a major cause for reftest slowness.
He was able to fix most of it, but the new highest source of I/O in reftest is
reftest.log.
To note, we were only saving reftest.log when running via mach, so this won't
impact automation in any way:
https://hg.mozilla.org/mozilla-central/file/043082cb7bd8/layout/tools/reftest/mach_commands.py#l204
I don't know why we are doing this given that the same output goes to stdout.
And as dholbert pointed out, since bug 1034290 landed, that log contains raw
structured logs, which are not at all useful for debugging. Given that it is
no longer useful and causes slowness, we should stop saving it.
If anyone wishes to keep saving to a log, they can use:
./mach reftest --log-tbpl reftest.log
If they wish this to be the default behaviour they can make a machrc (or
.machrc) in either topsrcdir, ~/.mozbuild or $MACHRC. Then add:
[alias]
reftest = reftest --log-tbpl reftest.log
MozReview-Commit-ID: A3e2X7qF90H
--HG--
extra : rebase_source : 9286e3061c068c6fef55b503017b3f4a9588be46
It was a cold Friday night in San Francisco. Earlier in the day, I
informed Chris AtLee that I was going to start focusing on improving
the efficiency of Firefox automation and asked him where the biggest
capacity issues were. He said "we're hurting most on Windows tests."
As I was casually drinking a barleywine (note to reader: barleywines
are serious beers - there's nothing casual about them), I found myself
tediously clicking through Treeherder looking at logs for Windows jobs,
looking for patterns and other oddities. As I was clicking through,
something stood out to me: the sheer number of reftest jobs. I
recalled a random project I started a few years ago. Its aim was to
analyze buildbot job metadata so we could better understand where time
was spent in automation. I had mostly written off the side project as
a failure and a near complete waste of my time. Not even a stray
random thought of this project had entered my mind in the past year.
But clicking through Treeherder after a few glasses of barleywine
somehow reminded me of one of the few useful findings of that project:
reftest jobs consumed a seemingly disproportiate amount of machine time,
something like 35 or 40% IIRC of the time spent on all jobs.
Now, this finding was several years ago and almost certainly no longer
relevant. But, again, I had a few glasses of barleywine in me and was
bothered by the amount of reftest jobs and their duration, so I thought
"hey, why don't I run reftests and see why they take so long." So I
built Firefox on Windows - the platform Chris AtLee said we're "hurting
most on."
I decided to start my very casual profiling session by recording a
`mach reftest` run using Sysinternals Process Monitor. To my surprise,
it yielded a very obvious and unexpected result: the Places SQLite
database was incurring a lot of I/O. On my i7-6700K Windows 10 desktop
with a high performance SSD, `mach reftest` yielded the following:
File Time #Events #Reads #Writes RBytes WBytes Path
198s 980,872 243,270 669,231 7,971,471,360 20,667,084,080 places.sqlite-wal
165s 645,853 222,407 367,775 7,287,701,820 14.071,529,472 places.sqlite
2s 377,121 1 0 32,768 0 places.sqlite-shm
The Places SQLite database accounts for 2,003,846 of the total of
3,547,527 system calls (56.49%) recorded by procmon during `mach
reftest` execution. This represents a staggering 49,997,786,732 of the
50,307,660,589 (99.38%) bytes of I/O recorded! Yes, that's 50 GB.
I reckon the reason the Places database accumulates so much I/O load
during reftests is because the reftest suite essentially loads thousands
of pages as quickly as possible. This effectively performs a stress
test against the Places history recording service.
As effective as reftests are at stress-testing Places, it adds no value
to reftests because reftests are testing the layout features, not the
performance of history recording. So these 2M system calls and 50 GB
of I/O are overhead.
This commit disables Places when executing reftests and prevents
the overhead.
After this commit, `mach reftest` has significantly reduced interaction
with the Places SQLite database:
File Time #Events #Reads #Writes RBytes WBytes Path
0.09s 502 138 302 4,521,984 8,961,528 places.sqlite-wal
0.07s 254 20 140 524,604 8,126,464 places.sqlite
0.01s 3,289 1 0 32,768 0 places.sqlite-shm
Of the 948,033 system calls recorded with this change (26.7% of
original), 691,322 were related to I/O. The Places SQLite database
only consumed ~22MB of I/O, <0.01% of original. It's worth noting that
~half of the remaining I/O system calls are related to reftest.log,
which now accounts for the largest percentage of write I/O (only
~53 MB, however). It's worth noting that reftest.log appears to be
using unbuffered writes and is requiring an excessive amount of
system calls for writing. But that's for another bug and commit.
In terms of wall time, the drastic I/O reduction during `mach reftest`
appears to have minimal impact on my machine: maybe 30s shaved from a
~900s execution, or ~3%. But my machine with its modern SSD doesn't
struggle with I/O.
In automation, it is a different story.
I pushed this change to Try along with the base revision and triggered
4 runs of most reftest jobs. The runtime improvements in automation
are impressive. Here are the fastest reported times for various jobs:
Job Before After Delta
Linux Opt R1 31m 34m +3m
Linux Opt R2 43m 35m -8m
Linux Opt Ru1 40m 34m -6m
Linux Opt Ru2 43m 37m -6m
Linux Opt R E10s 89m 72m -17m
Linux Debug R1 52m 40m -12m
Linux Debug R2 49m 42m -7m
Linux Debug R3 60m 51m -9m
Linux Debug R4 42m 37m -5m
Linux Debug R1 E10s 84m 72m -12m
Linux Debug R2 E10s 97m 85m -12m
Linux64 Opt R1 35m 24m -11m
Linux64 Opt R2 37m 26m -11m
Linux64 Opt Ru1 32m 29m -3m
Linux64 Opt Ru2 37m 26m -12m
Linux64 Opt TC R1 12m 10m -2m
Linux64 Opt TC R2 10m 7m -3m
Linux64 Opt TC R3 11m 9m -2m
Linux64 Opt TC R4 11m 9m -2m
Linux64 Opt TC R5 13m 11m -2m
Linux64 Opt TC R6 11m 9m -2m
Linux64 Opt TC R7 9m 8m -1m
Linux64 Opt TC R8 11m 10m -1m
Linux64 Opt TC Ru1 30m 25m -5m
Linux64 Opt TC Ru2 36m 27m -11m
OS X 10.10 Opt 31m 27m -4m
OS X 10.10 Opt E10s 26m 25m -1m
OS X 10.10 Debug 68m 55m -13m
Win7 Opt R 30m 28m -2m
Win7 Opt Ru 28m 26m -2m
Win7 Opt R E10S 29m 27m -2m
Win7 Debug R 85m 76m -9m
Win7 Debug R E10S 75m 65m -10m
Win8 x64 Opt R 29m 26m -3m
Win8 x64 Opt Ru 27m 25m -2m
Win8 x64 Debug R 90m 71m -19m
Android 4.3 API15 Opt R1 89m 71m -18m
Android 4.3 API15 Opt R2 78m 64m -14m
Android 4.3 API15 Opt R3 75m 64m -11m
Android 4.3 API15 Opt R4 74m 68m -6m
Android 4.3 API15 Opt R5 75m 69m -6m
Android 4.3 API15 Opt R6 91m 86m -5m
Android 4.3 API15 Opt R7 87m 66m -21m
Android 4.3 API15 Opt R8 87m 82m -5m
Android 4.3 API15 Opt R9 80m 66m -14m
Android 4.3 API15 Opt R10 80m 67m -13m
Android 4.3 API15 Opt R11 73m 66m -7m
Android 4.3 API15 Opt R12 105m 91m -14m
Android 4.3 API15 Opt R13 72m 59m -13m
Android 4.3 API15 Opt R14 82m 61m -21m
Android 4.3 API15 Opt R15 73m 62m -11m
Android 4.3 API15 Opt R16 79m 78m -1m
The savings total 6+ *hours* or ~15% when running all reftests. I'd
say this isn't bad for a one-line code change!
MozReview-Commit-ID: H1LkACgSpVn
--HG--
extra : rebase_source : 891a5ce8e1f6c3d70fc646f116c2f49f897ad735
As of bug 1245092, reftest depends on marionette. Normally marionette client has an internal timeout of around
60 seconds where it waits for the server to become available. But when using debuggers (and especially valgrind),
it can take much much longer than a minute for Firefox to start. This patch adds a couple hidden command line
args to reftest to tweak the marionette timeouts.
MozReview-Commit-ID: 3xF0InBJNEf
--HG--
extra : rebase_source : f18092bc90a7d8aab34b527a15bd3b1dc6afc997
This extra window was initially left open because closing it was causing memory leaks
on debug e10s crashtests. There was pressure to get the regressing patch landed due
to addon signing, so it got landed with this extra window hanging around (as it didn't
impact test results).
But it is a UX wart for several reasons. Upon testing it again recently, the leaks all
seem to have vanished. I'm not sure why, possibly it was a bug fixed in e10s.
MozReview-Commit-ID: CEI2enKAOyv
--HG--
extra : rebase_source : da0e8889f67d57aa61670105c14c2235607d79a7
This is a internal-only syntax for guarding rules from a boolean
preference. Nothing causes @supports rules to be re-evaluated except
html.css registered in Part 2.
This is needed for rendering the disclosure triangle of the
summary element by using "display: list-item".
Usage example:
@supports -moz-bool-pref("dom.details_element.enabled") {
/* css rules */
}
MozReview-Commit-ID: HDCa8zHxYTA
--HG--
extra : rebase_source : b7a72a48166edf1d486014ff37363ed8be9127d9
Be warned. Do not attemp to change the .js "test" source code in ./js
They are meant to check
- the outdated 0666 octal constant is still parsed correctly,
- the outdated 0666 octal constant raises syntax error flag
in strict mode, etc.
So leave them alone.
There is an ImportError on Android, as well as a log related
regression from the structured log patch once that is fixed.
MozReview-Commit-ID: KxSEotr38qO
--HG--
extra : rebase_source : 15d8421aab813d9e0dbf6d00611f921aaa779a49
Structured logs bring many benefits. We can stop parsing the logs for magic strings, we
can modify the format without breaking things, and we can stream results into systems like
ActiveData. The structured logs originate primarily in reftest.js. StructuredLog.jsm is
used to generate the JSON-based log stream. Finally OutputHandler in the python harness
reads structured output from stdout, and formats it into human readable form.
MozReview-Commit-ID: G3ZLkMRl6p7
--HG--
extra : commitid : J3ui9XYWR3Q
extra : rebase_source : 77ed0ba842cc8e557141fb3494212b06868c728a
extra : amend_source : 735d48225a2e627e0fe45fc11b50b6c49a885a4b
extra : source : d1779fe421c3c7cd8e3d191816776390dc104f37
Structured logs bring many benefits. We can stop parsing the logs for magic strings, we
can modify the format without breaking things, and we can stream results into systems like
ActiveData. The structured logs originate primarily in reftest.js. StructuredLog.jsm is
used to generate the JSON-based log stream. Finally OutputHandler in the python harness
reads structured output from stdout, and formats it into human readable form.
--HG--
extra : commitid : J3ui9XYWR3Q
extra : rebase_source : 6bae978126dbd5beddc39332c7cbce0c1354cd87
extra : amend_source : 735d48225a2e627e0fe45fc11b50b6c49a885a4b
Also, add an opt-out for crashtest/reftest for the view-source thing so they don't all break, r=bz
--HG--
extra : commitid : 8NqvmbphSgh
extra : rebase_source : bbe0b6f11a77d7e6241a5733931d9baa95bb3fed
All the files modified are straightforward deletion except TouchManager
and ZoomConstraintsClient. I add some includes and wrap TouchManager by
mozilla namespace to fix build errors due to the removal of TouchCaret.
All the files modified are straightforward deletion except TouchManager
and ZoomConstraintsClient. I add some includes and wrap TouchManager by
mozilla namespace to fix build errors due to the removal of TouchCaret.
--HG--
extra : rebase_source : b31db322130f665e7dda53d1061cfc40f81ce411
This makes it clearer that really it's the same thing as FINAL_TARGET,
with preprocessing.
We still keep DIST_FILES in backend.mk because it's shorter and doesn't
really matter.
The bulk of this commit was generated with a script, executed at the top
level of a typical source code checkout. The only non-machine-generated
part was modifying MFBT's moz.build to reflect the new naming.
CLOSED TREE makes big refactorings like this a piece of cake.
# The main substitution.
find . -name '*.cpp' -o -name '*.cc' -o -name '*.h' -o -name '*.mm' -o -name '*.idl'| \
xargs perl -p -i -e '
s/nsRefPtr\.h/RefPtr\.h/g; # handle includes
s/nsRefPtr ?</RefPtr</g; # handle declarations and variables
'
# Handle a special friend declaration in gfx/layers/AtomicRefCountedWithFinalize.h.
perl -p -i -e 's/::nsRefPtr;/::RefPtr;/' gfx/layers/AtomicRefCountedWithFinalize.h
# Handle nsRefPtr.h itself, a couple places that define constructors
# from nsRefPtr, and code generators specially. We do this here, rather
# than indiscriminantly s/nsRefPtr/RefPtr/, because that would rename
# things like nsRefPtrHashtable.
perl -p -i -e 's/nsRefPtr/RefPtr/g' \
mfbt/nsRefPtr.h \
xpcom/glue/nsCOMPtr.h \
xpcom/base/OwningNonNull.h \
ipc/ipdl/ipdl/lower.py \
ipc/ipdl/ipdl/builtin.py \
dom/bindings/Codegen.py \
python/lldbutils/lldbutils/utils.py
# In our indiscriminate substitution above, we renamed
# nsRefPtrGetterAddRefs, the class behind getter_AddRefs. Fix that up.
find . -name '*.cpp' -o -name '*.h' -o -name '*.idl' | \
xargs perl -p -i -e 's/nsRefPtrGetterAddRefs/RefPtrGetterAddRefs/g'
if [ -d .git ]; then
git mv mfbt/nsRefPtr.h mfbt/RefPtr.h
else
hg mv mfbt/nsRefPtr.h mfbt/RefPtr.h
fi
--HG--
rename : mfbt/nsRefPtr.h => mfbt/RefPtr.h
This is slightly more involved than earlier changes because reftests
have a one-off mechanism for finding files. Essentially, the master
reftest manifest is loaded, directories are discovered, and every file
in those directories is packaged.
We add support to our test archive generation tool to read sources from
reftest manifests and tell it where the reftest manifests are.
print-manifest-dirs.py was only being used for staging reftest files.
Since we don't do that any more, the functionality doesn't need to exist
in a standalone file, so it has been moved inline into test_archive.py.
This change avoids copying ~26,000 tests consuming 131 MB during test
packaging. This is a majority of the file count that was remaining in
the stage directory at this point. On my machine (which hasn't typically
seen major wall time wins from not staging files due to its fast SSD),
this change made test packaging ~20% faster, reducing wall time from
~50s to ~40s!
A Try push seemed to indicate drastic results with the series up to this
point. Including the already landed changes to generate test archives
concurrently, test packaging times on OS X builders dropped from ~18:40
to 6:29! Times on Linux x64 remained about the same (~2:46). This is
possibly due to these machines already having SSDs and due to normal
variance in performance of builders and EC2 instances.
--HG--
extra : commitid : 34E8V8lSGg7
extra : rebase_source : 720afcd35f6a2b6cb1217df23ae981408a88cb94
After this, only reftest files themselves are staged. Those will be
addressed in a subsequent commit.
--HG--
extra : commitid : 9jWl9Twcizr
extra : rebase_source : 3e4a319d60b7ee7eddecc597eb250184140b1e71
This makes reftest command line arguments behave more like other test suites,
so we can use a simple unified syntax for e.g. |mach try|. The patch also
reworks the command line argument parsing to use argparse rather than optparse,
and causes mach to reuse the same parser as the suite.
This unifies how reftests are invoked across desktop and
mobile, and paves the way for introducing more complex
datatypes that are unreasonable to express on the
command line.
This makes reftest command line arguments behave more like other test suites,
so we can use a simple unified syntax for e.g. |mach try|. The patch also
reworks the command line argument parsing to use argparse rather than optparse,
and causes mach to reuse the same parser as the suite.
This unifies how reftests are invoked across desktop and
mobile, and paves the way for introducing more complex
datatypes that are unreasonable to express on the
command line.
The patch removes 455 occurrences of FAIL_ON_WARNINGS from moz.build files, and
adds 78 instances of ALLOW_COMPILER_WARNINGS. About half of those 78 are in
code we control and which should be removable with a little effort.
--HG--
extra : rebase_source : 82e3387abfbd5f1471e953961d301d3d97ed2973
This also completely remove build/automationutils.py.
--HG--
extra : commitid : 50v6EAQNEHV
extra : rebase_source : 4ac1347d73498f068979514c6afb16ac50ab4033
This removes ambiguity as to which modules are being imported, making
import slightly faster as Python doesn't need to test so many
directories for file presence.
All files should already be using absolute imports because mach command
modules aren't imported to the package they belong to: they instead
belong to the "mach" package. So relative imports shouldn't have been
used.
--HG--
extra : commitid : 6tFME1KKfTD
extra : rebase_source : 78728f82f5487281620e00c2a8004cd5e1968087
This is meant to avoid random issues with Gecko suddenly kicking off the
periodic updates in the middle of our test runs. This uses a hidden pref
intentionally since nobody should be able to discover this pref by going
to about:config.
Backed out changeset bbb1b526cb56 (bug 1137252)
Backed out changeset 0c3d670f0c14 (bug 1137252)
Backed out changeset 1e0fa4ab7c6f (bug 1137252)
CLOSED TREE
The test in question sets up an inconsistent state for desktop platforms because
it intends to simulate what the APZ does, but does it only partially. The APZ
code would set a CSS viewport (which the test does) but it doesn't set the
scroll-position-clamping-scroll-port-size which the APZ would always do.
Back when mozpack.path was added, it was used as:
import mozpack.path
mozpack.path.func()
Nowadays, the common idiom is:
import mozpack.path as mozpath
mozpath.func()
because it's shorter.
$ git grep mozpath\\. | wc -l
423
$ git grep mozpack.path\\. | wc -l
123
This change was done with:
$ git grep -l mozpack.path\\. | xargs sed -i 's/mozpack\.path\./mozpath./g'
$ git grep -l 'import mozpack.path$' | xargs sed -i 's/import mozpack.path$/\0 as mozpath/'
$ (pat='import mozpack.path as mozpath'; git grep -l "$pat" | xargs sed -i "1,/$pat/b;/$pat/d")
This adds support for class="reftest-opaque-layer" and for
reftest-assigned-layer="some-layer-name" to the reftest harness.
From reftest/README.txt:
Opaque Layer Tests: class="reftest-opaque-layer"
================================================
If an element should be assigned to a PaintedLayer that's opaque, set the class
"reftest-opaque-layer" on it. This checks whether the layer is opaque during
the last paint of the test, and it works whether your test is an invalidation
test or not. In order to pass the test, the element has to have a primary
frame, and that frame's display items must all be assigned to a single painted
layer and no other layers, so it can't be used on elements that create stacking
contexts (active or inactive).
Layerization Tests: reftest-assigned-layer="layer-name"
=======================================================
If two elements should be assigned to the same PaintedLayer, choose any string
value as the layer name and set the attribute reftest-assigned-layer="yourname"
on both elements. Reftest will check whether all elements with the same
reftest-assigned-layer value share the same layer. It will also test whether
elements with different reftest-assigned-layer values are assigned to different
layers.
The same restrictions as with class="reftest-opaque-layer" apply: All elements
must have a primary frame, and that frame's display items must all be assigned
to the same PaintedLayer and no other layers. If these requirements are not
met, the test will fail.
This attribute used to force APZ to be used on content processes even if the
overall APZ pref was false. However, this has a couple of problems, which cancel
each other out:
- If the pref is false, the APZ machinery is never created, and so it's
impossible to have content processes "using" APZ.
- The reftest harness checks for the pref and ignores mozasyncpanzoom when
evaluating the "asyncPanZoom" condition in reftest manifests.
Therefore, any reftests which were skip-if(!asyncPanZoom) would never run unless
the pref was set, and in those cases the mozasyncpanzoom attribute would not be
needed at all, as APZ is already enabled with the pref.
However, the mozasyncpanzoom attribute would cause some parts of the child
process code to behave as though APZ was enabled, which is incorrect. Removing
this attribute and relying solely on the pref corrects that.
Using undefined has the advantage that we can use < and > tests with the
OSX variable. (We currently have no such tests in the tree, perhaps
partly because they didn't work with non-OSX being 0.)
In some cases the area reported by MozAfterPaint can be extremely large. Since
that area determines what we pass to drawWindow, we could end up trying to
drawWindow an area that is too large for drawWindow to handle. Instead, this
patch clamps that area to the canvas size so that we don't unnecessarily try to
paint (and fail at painting) the whole invalid area.
When running reftests on Mulet, we do not want that forced update
checking gets triggered. We can block this by setting the following
prefs:
- app.update.enabled to false
- app.update.url.override and app.update.url to empty string
We also disable tiles-related pinging and downloading by setting empy
values for:
- browser.newtabpage.directory.source
- browser.newtabpage.directory.ping
Gaia also does force check app updates at some point, so we introduce a
new pref, webapps.update.enabled, that we can use to make sure webapps
update won't be triggered.
The logcat format used by tbpl jobs in some (maybe all) cases now has a
timestamp and other decorations at the beginning of the line. The regex that
was previously added to filter out reftest failure lines duplicated in logcat
no longer matches the lines correctly; this makes the regex more generic so that
the filtering works again.
Fixes:
./mach reftest error: runreftest.py: error: could not find the application path, --appname must be specified
--HG--
extra : rebase_source : 75c3c9ef00bceb0da418cdb592e736478dd442cb
This issue manifests itself as the reftest extension not being installed
for local testers. This is because the extension's location is
determined relative to SCRIPT_DIRECTORY, and for local testers, __file__
ignores the symlink from $OBJDIR/_tests/reftest/runreftest.py ->
$SRCDIR/layout/tools/reftest/runreftest.py.
This was changed in Bug 915865.
Instead of grabbing attributes off options at every call site, pass
in the options object to processLeakLog, and attempt to get the attributes
there. If not present, use a restrictive default value.
This will prevent silent harness failures if one of the many ways to invoke
processLeakLog fails to set up these options, and makes it so they
don't have to set it up if they don't care.