The prefs parser has two significant problems.
- It doesn't separate tokenizing from parsing.
- It is implemented as a loop around a big switch on a "current state"
variable.
As a result, it is hard to understand and modify, slower than it could be, and
in obscure cases (involving comments and whitespace) it fails to parse what
should be valid input.
This patch replaces it with a recursive descent parser (albeit one without any
recursion!) that has separate tokenization. The new parser is easier to
understand and modify, more correct, and has better error messages. It doesn't
do error recovery, but that would be much easier to add than in the old parser.
The new parser also runs about 1.9x faster than the existing parser. (As
measured by parsing greprefs.js's contents from memory 1000 times in
succession, omitting the prefs hash table construction. If the table
construction is included, it's about 1.6x faster.)
The new parser is slightly stricter than the old parser in a few ways.
- Disconcertingly, the old parser allowed arbitrary junk between prefs
(including at the start and end of the prefs file) so long as that junk
didn't include any of the following chars: '/', '#', 'u', 's', 'p'. I.e.
lines like these:
!foo@bar&pref("prefname", true);
ticky_pref("prefname", true); // missing 's' at start
User_pref("prefname", true); // should be 'u' at start
would all be treated the same as this:
pref("prefname", true);
The new parser disallows such junk because it isn't necessary and seems like
an unintentional botch by the old parser.
- The old parser allowed character 0x1a (SUB) between tokens and treated it
like '\n'.
The new parser does not allow this character. SUB was used to indicate
end-of-file (*not* end-of-line) in some old operating systems such as MS-DOS,
but this doesn't seem necessary today.
- The old parser tolerated (with a warning) invalid escape sequences within
string literals -- such as "\q" (not a valid escape) and "\x1" and "\u12"
(both of which have insufficient hex digits) -- accepting them literally.
The new parser does not tolerate invalid escape sequences because it doesn't
seem necessary and would complicate things.
- The old parser tolerated character 0x00 (NUL) within string literals; this is
dangerous because C++ code that manipulates string values with embedded NULs
will almost certainly consider those chars as end-of-string markers.
The new parser treats NUL chars as end-of-file, to avoid this danger and
because it facilitates a significant optimization (described within the
code).
- The old parser allowed integer literals to overflow, silently wrapping them.
The new parser treats integer overflow as a parse error. This seems better,
and it caught existing overflows of places.database.lastMaintenance, in
testing/profiles/prefs_general.js (bug 1424030) and
testing/talos/talos/config.py (bug 1434813).
The first of these changes meant that a couple of existing prefs with ";;" at
the end had to be changed (done in the preceding patch).
The minor increase in strictness shouldn't be a problem for default pref files
such as greprefs.js within the application (which we can modify), nor for
app-written prefs files such as prefs.js. It could affect user-written prefs
files such as user.js; the experience above suggests that integer overflow and
";;" are the most likely problems in practice. In my opinion, the risk here is
acceptable.
The new parser also does a better job of tracking line numbers because it (a)
treats "\r\n" sequences as a single end-of-line marker, and (a) pays attention
to end-of-line sequences within string literals.
Finally, the patch adds thorough tests of both valid and invalid syntax.
MozReview-Commit-ID: JD3beOQl4AJ
The prefs parser has two significant problems.
- It doesn't separate tokenizing from parsing.
- It is implemented as a loop around a big switch on a "current state"
variable.
As a result, it is hard to understand and modify, slower than it could be, and
in obscure cases (involving comments and whitespace) it fails to parse what
should be valid input.
This patch replaces it with a recursive descent parser (albeit one without any
recursion!) that has separate tokenization. The new parser is easier to
understand and modify, more correct, and has better error messages. It doesn't
do error recovery, but that would be much easier to add than in the old parser.
The new parser also runs about 1.9x faster than the existing parser. (As
measured by parsing greprefs.js's contents from memory 1000 times in
succession, omitting the prefs hash table construction. If the table
construction is included, it's about 1.6x faster.)
The new parser is slightly stricter than the old parser in a few ways.
- Disconcertingly, the old parser allowed arbitrary junk between prefs
(including at the start and end of the prefs file) so long as that junk
didn't include any of the following chars: '/', '#', 'u', 's', 'p'. I.e.
lines like these:
!foo@bar&pref("prefname", true);
ticky_pref("prefname", true); // missing 's' at start
User_pref("prefname", true); // should be 'u' at start
would all be treated the same as this:
pref("prefname", true);
The new parser disallows such junk because it isn't necessary and seems like
an unintentional botch by the old parser.
- The old parser allowed character 0x1a (SUB) between tokens and treated it
like '\n'.
The new parser does not allow this character. SUB was used to indicate
end-of-file (*not* end-of-line) in some old operating systems such as MS-DOS,
but this doesn't seem necessary today.
- The old parser tolerated (with a warning) invalid escape sequences within
string literals -- such as "\q" (not a valid escape) and "\x1" and "\u12"
(both of which have insufficient hex digits) -- accepting them literally.
The new parser does not tolerate invalid escape sequences because it doesn't
seem necessary and would complicate things.
- The old parser tolerated character 0x00 (NUL) within string literals; this is
dangerous because C++ code that manipulates string values with embedded NULs
will almost certainly consider those chars as end-of-string markers.
The new parser treats NUL chars as end-of-file, to avoid this danger and
because it facilitates a significant optimization (described within the
code).
- The old parser allowed integer literals to overflow, silently wrapping them.
The new parser treats integer overflow as a parse error. This seems better,
and it caught an existing overflow in testing/profiles/prefs_general.js, for
places.database.lastMaintenance (see bug 1424030).
The first of these changes meant that a couple of existing prefs with ";;" at
the end had to be changed (done in the preceding patch).
The minor increase in strictness shouldn't be a problem for default pref files
such as greprefs.js within the application (which we can modify), nor for
app-written prefs files such as prefs.js. It could affect user-written prefs
files such as user.js; the experience above suggests that ";;" is the most
likely problem in practice. In my opinion, the risk here is acceptable.
The new parser also does a better job of tracking line numbers because it (a)
treats "\r\n" sequences as a single end-of-line marker, and (a) pays attention
to end-of-line sequences within string literals.
Finally, the patch adds thorough tests of both valid and invalid syntax.
MozReview-Commit-ID: 8EYWH7KxGG
* * *
[mq]: win-fix
MozReview-Commit-ID: 91Bxjfghqfw
--HG--
extra : rebase_source : a8773413e5d68c33e4329df6819b6e1f82c22b85
We don't add build-time dependency on Wayland libraries; that allows Wayland enabled Firefox
builds to run on systems without Wayland support.
We also can't dlsym() Wayland symbols directly from libwayland-client.so library as we miss
global data entries referenced by code at wayland-client.h then.
As a partial solution (for glibc systems only) we create dummy libwayland-client.so library
implementation and place it *after* Gtk+ 3.0 libraries at linking time.
It fixes build-time dependencies (we link against our libwayland-client.so library).
Run-time dependency is resolved by ld.so - when Gtk+ 3.0 libraries are linked with
system libwayland-client.so library, wayland symbols are pulled from there instead
from our dummy libwayland-client.so library.
When Gtk+ 3.0 is not linked with system libwayland-client.so it means we're running
on system without Wayland support. Our dummy libwayland-client.so implementation
is used (symbols are pulled run-time from there) and Firefox Wayland support is
disabled then.
MozReview-Commit-ID: IyaePwp4MxV
--HG--
extra : rebase_source : 852955d001657176e0bf69c099580be862d0b448
The branch is based off serde 1.0.23 and includes additional changes for faster
enum deserialization which we want for WebRender. The changes are in the process
of being upstreamed but will take a while.
MozReview-Commit-ID: 4xaPQpffzYL
This contains the generated changes from running `mach vendor rust` on the
previous commit, and eliminates the redundant copy of libudev-sys we have
sitting in third_party/rust/
MozReview-Commit-ID: IXTI14beFMi
--HG--
extra : rebase_source : 0b4d3d991738e3a371bb467219b8a91a792158d3