The default handler and character-data handler callbacks are identical
and some Windows compilers just reconciled them into a single function.
This, unfortunately, resulted in a RLBox runtime error: the same
callback was registered twice. This patch removes the duplicate handler
implementation and just sets the character-data handler callback as the
default handler.
Depends on D104658
Differential Revision: https://phabricator.services.mozilla.com/D126369
parser/html/nsHtml5StreamParser.cpp:1046:10: error: variable 'totalRead' set but not used [-Werror,-Wunused-but-set-variable]
size_t totalRead = 0;
^
Differential Revision: https://phabricator.services.mozilla.com/D126458
NOTE! In cases where there is no HTTP-layer encoding declaration, and CSS
parsing inherits the encoding from the HTML document, for preloads, this
changes the inherited encoding from windows-1252 to UTF-8 in order to
make the speculative encoding correct in the common `<meta charset=utf-8>`
case.
Differential Revision: https://phabricator.services.mozilla.com/D123593
Automatically generated path that adds flag `REQUIRES_UNIFIED_BUILD = True` to `moz.build`
when the module governed by the build config file is not buildable outside on the unified environment.
This needs to be done in order to have a hybrid build system that adds the possibility of combing
unified build components with ones that are built outside of the unified eco system.
Differential Revision: https://phabricator.services.mozilla.com/D122345
This change ensures that the tokenizer sets the doctype name to null
when the doctype name is missing in the input source.
Otherwise, without this change, the doctype name is set to the empty
string — which doesn’t conform to the requirements in the HTML spec, and
which causes us to fail 9 tests in the html5lib-tests suite.
Relates to https://github.com/validator/htmlparser/issues/35
Differential Revision: https://phabricator.services.mozilla.com/D122936
This change ensures that for all cases with spec requirements in the
form “clear the stack back to a foo context” — which involves checking
for elements with particular names — we only look for elements in the
HTML namespace, rather than additionally looking for elements which
aren’t in the HTML namespace but that also have those particular names.
Otherwise, without this change, we aren’t in conformance with the spec
requirements, and we fail several cases in the html5lib-tests suite.
Fixes https://github.com/validator/htmlparser/issues/33
Differential Revision: https://phabricator.services.mozilla.com/D122722
This change brings the tokenizer’s handling of U+0000 NUL characters in
the DATA state and the CDATA section state into conformance with the
requirements in the HTML spec — for the case where only tokenization is
being performed, without tree construction; that is, the case where the
tokenizer() method is called, rather than parse() or parseFragment().
Specifically, the tokenization steps defined in the spec require that
when a U+0000 NUL is consumed in the DATA state or in the CDATA section
state, the parser must then emit a U+0000 NUL. But when performing tree
construction, the spec requires that when a U+0000 NUL is consumed, the
parser must instead emit a U+FFFD REPLACEMENT CHARACTER.
Without this change, the parser always emits a U+FFFD REPLACEMENT
CHARACTER — even when only tokenization is being performed. That causes
us to fail a number of tests in html5lib-tests suite.
For more background on the relevant behavior, see the following:
* https://www.w3.org/Bugs/Public/show_bug.cgi?id=9659
* https://github.com/whatwg/html/commit/d98f83e
* https://github.com/validator/htmlparser/commit/9b9c263
Relates to https://github.com/validator/htmlparser/issues/35
Differential Revision: https://phabricator.services.mozilla.com/D122721
When the parser encounters a `</template>` end tag and there are other
open elements, the HTML spec requires the parser to “generate all
implied end tags thoroughly”, which unlike “generate implied end tags”
also includes generating implied end tags for table-parts elements
(caption, colgroup, tbody, thead, tfoot, td, th, and tr).
Differential Revision: https://phabricator.services.mozilla.com/D82020
Doing `errUnclosedElements(eltPos, "template")` for EOF in the “in
template” state results in the error message “End tag `template` seen, but
there were open elements”, which is all wrong because the actual problem is
that though a `template` end tag was expected, EOF was reached without a
`template` end tag being seen.
So let’s instead when we reach this just report the list of open elements.
Differential Revision: https://phabricator.services.mozilla.com/D122598