Accept XHTML-style self-closing void tags (#305)
Allow the self-closing `/>` end for void tags. For non-void tags these were already "allowed" due to how the HTML parser works, but for elements where they actually occur, like `<br/>`, they caused a parse error. Support for them was not implemented since we only expect valid HTML5, e.g. the output of Firefox' Element.innerHTML. Use case: TranslateLocally uses Qt's HTML representation of rich text. That HTML uses self-closing tags like `<meta .../>` and `<br/>`. Implementing a string replace operation that would only match these elements without parsing HTML is tricky. Fixing it in bergamot-translator is not. Implementation: Currently `<img>` is marked as a void tag (an element which cannot have children or text, and therefore treated differently. Since void tags normally have no close tag, they are treated as immediately closed. The HTML parser we use reads `<img/>` as `<img></img>` which thus causes a problem since now we close an element that was never open, to begin with. This fix ignores the `TT_TAG_END` token from the parser when the tag name is that of a void tag.
This commit is contained in:
Родитель
6a4f409cda
Коммит
acbc46d816
|
@ -482,6 +482,12 @@ TEST_CASE("Test self-closing tag (HTML5)") {
|
|||
CHECK(input == "hello world and other creatures"); // Note double space between "hello" and "world"
|
||||
}
|
||||
|
||||
TEST_CASE("Test self-closing tag (XHTML)") {
|
||||
std::string input("<p>hello<img/>world</p>");
|
||||
HTML html(std::move(input), true);
|
||||
CHECK(input == "hello world"); // <img/> introduced space
|
||||
}
|
||||
|
||||
TEST_CASE("Test empty void tag at end of input") {
|
||||
std::string input("hello <br>");
|
||||
HTML html(std::move(input), true);
|
||||
|
|
|
@ -350,8 +350,10 @@ HTML::HTML(std::string &&source, bool process_markup, Options &&options) : optio
|
|||
} break;
|
||||
|
||||
case markup::Scanner::TT_TAG_END:
|
||||
// Note: self-closing tags emit TT_TAG_END immediately after TT_TAG_START
|
||||
// but since we're parsing HTML5, a sole <img> will never emit a TT_TAG_END
|
||||
// If this is the closing bit of a void tag, i.e. triggered by the "/>"
|
||||
// bit of "<img/>", then completely ignore it.
|
||||
if (contains(options_.voidTags, std::string(scanner.tag()))) break;
|
||||
|
||||
if (stack.empty()) throw BadHTML(format("Encountered more closing tags ({}) than opening tags", scanner.tag()));
|
||||
|
||||
if (stack.back()->name != scanner.tag())
|
||||
|
|
Загрузка…
Ссылка в новой задаче