Bug 1769534 - preserve NBSP when copying HTML content. r=hsivonen

To workaround the historical lack of 'white-space: pre', when an user
wants to compose HTML text with several consecutive spaces, WYSIWYG
HTML editors insert an alternating pattern of SPACE and NBSP to avoid
spaces being collapsed.

This is why browsers, when copying HTML content, usually strip all NBSP
from the copied text.

This commit changes the copying behavior, to strip only NBSP that were
potentially generated by an editor (and preserve the others).

The heuristic used is "An NBSP adjacent to a regular space doesn't make
sense, and can be replaced by a regular space". This detects the
alternating pattern of SPACE/NBSP, but also a space followed by a long
sequence of NBSP (because a line break would occur anyway in that case).

NB: included is a change that makes devtools use regular spaces
(rather than NBSPs) to indent stacktrace frames. This prevent NBSPs from
appearing in the clipboard when copying a stacktrace.

Attribution: the actual nsPlainTextSerializer changes were written by
Rachel Martin <rachel@betterbird.eu>, as a part of Betterbird.

Differential Revision: https://phabricator.services.mozilla.com/D149644
This commit is contained in:
Pierre de La Morinerie 2022-09-14 13:13:51 +00:00
Родитель 02b993c483
Коммит 5211456a86
8 изменённых файлов: 104 добавлений и 40 удалений

Просмотреть файл

@ -6,8 +6,6 @@ import React from "react";
export default function FrameIndent() {
return (
<span className="frame-indent clipboard-only">
&nbsp;&nbsp;&nbsp;&nbsp;
</span>
<span className="frame-indent clipboard-only">&#32;&#32;&#32;&#32;</span>
);
}

Просмотреть файл

@ -150,6 +150,10 @@
left: -9999px;
}
.frames .frame-indent {
white-space: pre;
}
.call-stack-pane [role="listitem"] .location-async-cause {
height: 20px;
line-height: 20px;

Просмотреть файл

@ -20,11 +20,11 @@ add_task(async function() {
expectedPattern =
"text-align: left;[\\r\\n]+" +
"element[\\r\\n]+" +
"Best Match this.style[\\r\\n]+" +
"Best Match this.style[\\r\\n]+" +
"left[\\r\\n]+" +
"width: 25px;[\\r\\n]+" +
"element[\\r\\n]+" +
"Best Match this.style[\\r\\n]+" +
"Best Match this.style[\\r\\n]+" +
"25px[\\r\\n]*";
info("Expanding computed view properties");

Просмотреть файл

@ -30,6 +30,10 @@
white-space: normal;
}
.frames .frame-indent {
white-space: pre;
}
.frames .title {
text-overflow: ellipsis;
white-space: nowrap;

Просмотреть файл

@ -23,8 +23,8 @@ is(s.convertToPlainText("foo", c.OutputLFLineBreak, 0), "foo", "Wrong conversion
is(s.convertToPlainText("foo foo foo", c.OutputWrap | c.OutputLFLineBreak, 7), "foo foo\nfoo", "Wrong conversion result 2");
is(s.convertToPlainText("<body><noscript>b<span>a</span>r</noscript>foo", c.OutputLFLineBreak, 0), "foo", "Wrong conversion result 3");
is(s.convertToPlainText("<body><noscript>b<span>a</span>r</noscript>foo", c.OutputNoScriptContent, 0), "barfoo", "Wrong conversion result 4");
is(s.convertToPlainText("foo\u00A0bar", c.OutputPersistNBSP | c.OutputLFLineBreak, 0), "foo\u00A0bar", "Wrong conversion result 5");
is(s.convertToPlainText("foo\u00A0bar", c.OutputLFLineBreak, 0), "foo bar", "Wrong conversion result 6");
is(s.convertToPlainText("foo\u00A0 bar", c.OutputPersistNBSP | c.OutputLFLineBreak, 0), "foo\u00A0 bar", "Wrong conversion result 5");
is(s.convertToPlainText("foo\u00A0 bar", c.OutputLFLineBreak, 0), "foo bar", "Wrong conversion result 6");
is(s.convertToPlainText("<body><noframes>bar</noframes>foo", c.OutputLFLineBreak, 0), "foo", "Wrong conversion result 7");
// OutputNoFramesContent doesn't actually work, because the flag gets overridden in all cases.
is(s.convertToPlainText("<body><noframes>bar</noframes>foo", c.OutputNoFramesContent | c.OutputLFLineBreak, 0), "foo", "Wrong conversion result 8");

Просмотреть файл

@ -2,6 +2,7 @@
<html>
<!--
https://bugzilla.mozilla.org/show_bug.cgi?id=359303
https://bugzilla.mozilla.org/show_bug.cgi?id=1769534
-->
<head>
<meta charset="utf-8" />
@ -12,43 +13,51 @@ https://bugzilla.mozilla.org/show_bug.cgi?id=359303
</head>
<body>
<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=359303">Mozilla Bug 359303</a>
<a target="_blank" href="https://bugzilla.mozilla.org/show_bug.cgi?id=1769534">Mozilla Bug 1769534</a>
<p id="display"></p>
<div id="content">
<!-- In a plain-text editable control (such as a textarea or textinput), copying to clipboard should
preserve non-breaking spaces. -->
preserve all non-breaking spaces. -->
<input
id="input-with-non-breaking-spaces"
value="Input content: This town is 100&nbsp;km away / «&nbsp;Est-ce Paris&nbsp;?&nbsp;» / Consecutive non-breaking spaces: '&nbsp;&nbsp;'">
value="Input content: This town is 100&nbsp;km away / «&nbsp;Est-ce Paris&nbsp;?&nbsp;» / Consecutive non-breaking spaces: '&nbsp;&nbsp;' / &nbsp; &nbsp; &nbsp;Text padded using editor-generated pattern.
">
<textarea id="textarea-with-non-breaking-spaces">
Textarea content:
- This town is 100&nbsp;km away.
- «&nbsp;Est-ce Paris&nbsp;?&nbsp;»
- Consecutive non-breaking spaces: "&nbsp;&nbsp;"
- Consecutive non-breaking spaces: '&nbsp;&nbsp;'
&nbsp; &nbsp; &nbsp;Text padded using editor-generated pattern.
</textarea>
<!-- In a content-editable div, copying to clipboard should preserve non-breaking spaces.
However, for compatibility with what other browsers currently do, the behavior of replacing non-breaking spaces by spaces is preserved for now.
See https://bugzilla.mozilla.org/show_bug.cgi?id=359303#c145
However, HTML editors programmatically insert a alternating pattern of space / NBSP when padding text with spaces
(otherwises spaces would be collapsed).
These programmatically-generated NBSPs should be replaced by plain spaces when copying to plain-text.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1769534
-->
<div contenteditable="true" id="content-editable-with-non-breaking-spaces">
Content-editable content:
- This town is 100&nbsp;km away.
- «&nbsp;Est-ce Paris&nbsp;?&nbsp;»
- Consecutive non-breaking spaces: "&nbsp;&nbsp;"
- Consecutive non-breaking spaces: '&nbsp;&nbsp;'
&nbsp; &nbsp; &nbsp;Text padded using editor-generated pattern.
</div>
<!-- In non-editable HTML nodes, like this paragraph, copying to clipboard should preserve non-breaking
spaces.
However, for compatibility with what other browsers currently do, the behavior of replacing non-breaking spaces by spaces is preserved for now.
See https://bugzilla.mozilla.org/show_bug.cgi?id=359303#c145
<!-- In non-editable HTML nodes, like this paragraph, copying to clipboard should preserve non-breaking spaces.
However, HTML editors programmatically insert a alternating pattern of space / NBSP when padding text with spaces
(otherwises spaces would be collapsed).
These programmatically-generated NBSPs should be replaced by plain spaces when copying to plain-text.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1769534
-->
<p id="paragraph-with-non-breaking-spaces">
Paragraph content:
- This town is 100&nbsp;km away.
- «&nbsp;Est-ce Paris&nbsp;?&nbsp;»
- Consecutive non-breaking spaces: "&nbsp;&nbsp;"
- Consecutive non-breaking spaces: '&nbsp;&nbsp;'
&nbsp; &nbsp; &nbsp;Text padded using editor-generated pattern.
</p>
</div>
@ -84,28 +93,40 @@ async function clipboardTextForElementId(aDomId, aExpectedString) {
return copiedText;
}
function assertUserNbspPreserved(aCopiedText) {
ok(aCopiedText.includes("100 km"), "NBSP between two characters should be preserved");
ok(aCopiedText.includes("« "), "A single NBSP near a punctuation mark should be preserved");
ok(aCopiedText.includes(" »"), "A single NBSP near a punctuation mark should be preserved");
ok(aCopiedText.includes(" ? "), "NBSPs before *and* after a character should be preserved");
ok(aCopiedText.includes("'  '"), "Consecutive NBSPs should be preserved");
}
function assertEditorGeneratedNbspPreserved(aCopiedText) {
ok(aCopiedText.includes("      Text padded"), "When copying from a plain-text control, editor-generated NBSP (if any) should be preserved");
}
function assertEditorGeneratedNbspReplaced(aCopiedText) {
ok(aCopiedText.includes(" Text padded"), "When copying plain-text from a non plain-text control, editor-generated NBSP should be replaced by standard spaces");
}
/** Test for Bug 359303 **/
SimpleTest.waitForExplicitFinish();
SimpleTest.waitForFocus(async function() {
let iValue = await clipboardTextForElementId("input-with-non-breaking-spaces", "Input");
ok(iValue.includes("100 km"), "NBSP between two characters should be preserved");
ok(iValue.includes("« "), "A single NBSP near a punctuation mark should be preserved");
ok(iValue.includes(" »"), "A single NBSP near a punctuation mark should be preserved");
ok(iValue.includes(" ? "), "NBSPs before *and* after a character should be preserved");
ok(iValue.includes("  "), "Consecutive NBSPs should be preserved");
let textCopiedFromInput = await clipboardTextForElementId("input-with-non-breaking-spaces", "Input");
assertUserNbspPreserved(textCopiedFromInput);
assertEditorGeneratedNbspPreserved(textCopiedFromInput);
let tValue = await clipboardTextForElementId("textarea-with-non-breaking-spaces", "Textarea");
ok(tValue.includes("100 km"), "NBSP between two characters should be preserved");
ok(tValue.includes("« "), "A single NBSP near a punctuation mark should be preserved");
ok(tValue.includes(" »"), "A single NBSP near a punctuation mark should be preserved");
ok(tValue.includes(" ? "), "NBSPs before *and* after a character should be preserved");
ok(tValue.includes("  "), "Consecutive NBSPs should be preserved");
let textCopiedFromTextarea = await clipboardTextForElementId("textarea-with-non-breaking-spaces", "Textarea");
assertUserNbspPreserved(textCopiedFromTextarea);
assertEditorGeneratedNbspPreserved(textCopiedFromTextarea);
let cValue = await clipboardTextForElementId("content-editable-with-non-breaking-spaces", "Content-editable");
ok(cValue.includes("100 km"), "NBSP should be replaced by spaces, until brower compatibility issues are sorted out");
let textCopiedFromContentEditable = await clipboardTextForElementId("content-editable-with-non-breaking-spaces", "Content-editable");
assertUserNbspPreserved(textCopiedFromContentEditable);
assertEditorGeneratedNbspReplaced(textCopiedFromContentEditable);
let pValue = await clipboardTextForElementId("paragraph-with-non-breaking-spaces", "Paragraph");
ok(pValue.includes("100 km"), "NBSP should be replaced by spaces, until brower compatibility issues are sorted out");
let textCopiedFromNonEditableHtmlContent = await clipboardTextForElementId("paragraph-with-non-breaking-spaces", "Paragraph");
assertUserNbspPreserved(textCopiedFromNonEditableHtmlContent);
assertEditorGeneratedNbspReplaced(textCopiedFromNonEditableHtmlContent);
SimpleTest.finish();
});

Просмотреть файл

@ -152,9 +152,12 @@ interface nsIDocumentEncoder : nsISupports
const unsigned long OutputEncodeBasicEntities = (1 << 14);
/**
* Normally &nbsp; is replaced with a space character when
* encoding data as plain text, set this flag if that's
* not desired.
* Normally, for non-text-control elements (that is, neither <textarea>
* nor <input>), &nbsp; characters adjacent to regular spaces are replaced
* by regular spaces.
* This flag suppresses that behavior, and causes all &nbsp; characters
* to be preserved.
*
* Plaintext output only.
*/
const unsigned long OutputPersistNBSP = (1 << 17);

Просмотреть файл

@ -107,10 +107,44 @@ static void DetermineLineBreak(const int32_t aFlags, nsAString& aLineBreak) {
void nsPlainTextSerializer::CurrentLine::MaybeReplaceNbspsInContent(
const int32_t aFlags) {
// HTML editors may enforce consecutive spaces in HTML output by replacing
// them with non-breaking spaces.
// Here we revert this hack when converting HTML text to plain text.
if (!(aFlags & nsIDocumentEncoder::OutputPersistNBSP)) {
// First, replace all nbsp characters with spaces,
// which the unicode encoder won't do for us.
mContent.ReplaceChar(kNBSP, kSPACE);
//
// Replace NBSP characters with spaces if they are adjacent to a space.
//
const uint32_t length = mContent.Length();
bool containsSpace = false;
bool containsNBSP = false;
// 1. Inspect the string forwards, and replace NBSPs that are **after**
// regular spaces.
//
// After that loop, all sequences of "spaceNBSP*" have been replaced by
// equally long "space*" sequences.
for (uint32_t i = 0; i < length; i++) {
if (mContent[i] == kSPACE) {
containsSpace = true;
} else if (mContent[i] == kNBSP) {
if (i > 0 && mContent[i - 1] == kSPACE) {
mContent.SetCharAt(kSPACE, i);
} else {
containsNBSP = true;
}
}
}
// 2. If we found spaces and didn't replace all relevant NBSPs, inpect the
// string backwards, and replace NBSPs that are **before** regular spaces.
//
// After that loop, all sequences of "NBSP*space" have been replaced by
// equally long "space*" sequences.
if (containsSpace && containsNBSP && length >= 1) {
for (uint32_t i = length - 1; i > 0; i--) {
if (mContent[i - 1] == kNBSP && mContent[i] == kSPACE) {
mContent.SetCharAt(kSPACE, i - 1);
}
}
}
}
}