2019-08-03 18:02:45 +03:00
|
|
|
---
|
|
|
|
title: "Basic Concepts"
|
|
|
|
isMarkdown: false
|
|
|
|
---
|
|
|
|
<h1 id="basic-concepts">Basic Concepts</h1>
|
|
|
|
<p>The page describes some basic concepts that are important to understand
|
|
|
|
when working with RichTextKit.</p>
|
|
|
|
<h2 id="text-blocks">Text Blocks</h2>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>RichTextKit primarily works with individual blocks of text where each is
|
|
|
|
essentially a paragraph.</p>
|
|
|
|
<p>RichTextKit doesn't provide document level constructs like list items,
|
2019-08-03 18:02:45 +03:00
|
|
|
block quotes, tables, table cells, divs etc... This project is strictly
|
|
|
|
about laying out a single block of text with rich (aka attributed) text
|
|
|
|
and then providing various functions for working with that text block.</p>
|
|
|
|
<p>RichTextKit doesn't have a concept of a DOM - that's a higher level concept
|
|
|
|
that RichTextKit is not attempting to solve.</p>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>Text blocks can contain forced line breaks with a newline character (<code>\n</code>).
|
|
|
|
These should be considered as in-paragraph "soft returns", as opposed to
|
|
|
|
hard paragraph terminators.</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
<p>Currently RichTextKit doesn't support non-text inline elements, although this
|
|
|
|
may be added in the future.</p>
|
|
|
|
<p>Text blocks are represented by the <a href="./ref/Topten.RichTextKit.TextBlock">TextBlock</a> class.</p>
|
|
|
|
<h2 id="styles">Styles</h2>
|
|
|
|
<p>Styles are used to describe the display attributes of text in a text block. A
|
|
|
|
text block is comprised of one of more runs of text each tagged with a
|
2019-08-04 09:40:57 +03:00
|
|
|
single style and these text runs are referred to a "style runs".</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
<p>Style runs are represented by the <a href="./ref/Topten.RichTextKit.StyleRun">StyleRun</a> class.</p>
|
|
|
|
<h2 id="font-fallback">Font Fallback</h2>
|
|
|
|
<p>Font fallback is the process of switching to a different font when the specified
|
|
|
|
font doesn't contain the required glyphs.</p>
|
|
|
|
<p>Font fallback is often required to display emoji characters (since most fonts don't
|
|
|
|
include emoji glyphs), for most complex scripts (eg: Arabic) and Asian scripts.</p>
|
|
|
|
<p>RichTextKit uses SkiaSharp's <code>MatchCharacter</code> function to resolve the typeface to
|
|
|
|
use for font fallback.</p>
|
2019-08-12 16:49:17 +03:00
|
|
|
<h2 id="text-shaping">Text Shaping</h2>
|
2019-08-03 18:02:45 +03:00
|
|
|
<p>For most Latin based languages, rendering text is simply a matter of placing one glyph
|
2019-08-04 09:40:57 +03:00
|
|
|
after another in a left-to-right manner. Other languages however require a more
|
|
|
|
complicated process that often involves drawing multiple glyphs for a single character.
|
2019-08-12 16:49:17 +03:00
|
|
|
This process is called "text shaping".</p>
|
|
|
|
<p>By default Skia (and therefore ShiaSharp) doesn't do text shaping and often displays
|
2019-08-04 09:40:57 +03:00
|
|
|
text incorrectly. RichTextKit uses <a href="https://www.freedesktop.org/wiki/Software/HarfBuzz/">HarfBuzz</a>
|
|
|
|
for text shaping.</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
<h2 id="font-runs">Font Runs</h2>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>Font run are derived by splitting style runs into smaller segments when text
|
|
|
|
is wrapped onto a new line, or when a font fallback is required.</p>
|
|
|
|
<p>Each font run represents a single run of text that uses the same font and other
|
|
|
|
style attributes for every character in the run.</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
<p>Font runs are represented by the <a href="./ref/Topten.RichTextKit.FontRun">FontRun</a> class.</p>
|
|
|
|
<h2 id="style-runs-vs-font-runs">Style Runs vs Font Runs</h2>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>Style runs and font runs are similar concepts but serve two different purposes:</p>
|
|
|
|
<ul>
|
|
|
|
<li>Style runs describe the logical view of a text block (before layout)</li>
|
|
|
|
<li>Font runs describe the physical view of a text block (after layout)</li>
|
|
|
|
</ul>
|
2019-08-03 18:02:45 +03:00
|
|
|
<h2 id="lines">Lines</h2>
|
|
|
|
<p>After a text block has been laid out it results in a set of lines each
|
|
|
|
consisting of one or more font runs.</p>
|
|
|
|
<p>Lines are represented by the <a href="./ref/Topten.RichTextKit.TextLine">TextLine</a> class.</p>
|
|
|
|
<h2 id="bi-directional-ltr-and-rtl-text">Bi-Directional, LTR and RTL Text</h2>
|
|
|
|
<p>Bi-directional text is the process of correctly displaying text for mixed
|
|
|
|
left-to-right and right-to-left based languages.</p>
|
|
|
|
<p>RichTextKit includes an implementation of the Unicode Bi-directional Text
|
2019-08-04 09:40:57 +03:00
|
|
|
Algorithm (<a href="http://www.unicode.org/reports/tr9/">UAX #9</a>) and each text block
|
|
|
|
has a "base direction" that controls the default layout order of text in that
|
|
|
|
paragraph.</p>
|
2019-08-08 16:15:52 +03:00
|
|
|
<p>The text direction of spans within the text block can be controlled using
|
|
|
|
either:</p>
|
|
|
|
<ul>
|
|
|
|
<li>Embedded control characters (as specifed by UAX #9)</li>
|
|
|
|
<li>By setting the text direction of style runs (See the <a href="./ref/Topten.RichTextKit.IStyle">IStyle</a>)</li>
|
|
|
|
</ul>
|
|
|
|
<p>When setting text direction property on style runs, the text is processed
|
|
|
|
in the same manner as an "isolating sequence" as described in UAX #9.</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
<h2 id="utf-16-utf-32-code-points-and-clusters">UTF-16, UTF-32, Code Points and Clusters</h2>
|
|
|
|
<p>Internally, RichTextKit works with UTF-32 encoded text. Specifically it uses the
|
|
|
|
<code>int</code> type to represent a character rather than the <code>char</code> type.</p>
|
|
|
|
<p>To avoid confusion, the API and code base use the term "code point" to
|
|
|
|
refer to a single UTF-32 character.</p>
|
|
|
|
<p>Since C# strings are encoded as UTF-16, when a string is added to a text
|
|
|
|
block it will be automatically converted to UTF-32. It's important to note that
|
|
|
|
indexes into the converted string may no longer match indicies into the original
|
|
|
|
C# string. (although RichTextKit does provide functions to map between them).</p>
|
|
|
|
<p>RichTextKit uses the term <code>CodePointIndex</code> to refer to the index of a code point
|
|
|
|
in a UTF-32 buffer.</p>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>Also note that line endings <code>\r</code> and <code>\r\n</code> are automatically normalized to <code>\n</code>.
|
2019-08-03 18:02:45 +03:00
|
|
|
This conversion can also affect the mapping of code point indicies to character
|
|
|
|
indicies in the original string.</p>
|
|
|
|
<p><em>(Note: <code>\n\r</code> style line endings aren't supported)</em></p>
|
|
|
|
<p>Some complex scripts require more that one code point to describe a single user
|
|
|
|
perceived character. These groupings of code points are called "clusters". This is
|
|
|
|
the same concept as used by HarfBuzz.</p>
|
|
|
|
<h2 id="measurement-and-size-units">Measurement and Size Units</h2>
|
2019-08-04 09:40:57 +03:00
|
|
|
<p>All measurements and sizes used by RichTextKit are logical.</p>
|
|
|
|
<p>The final size of rendered text will depend on the configuration of the target
|
|
|
|
Canvas scaling and it's up to you to determine the appropriate system of measurement
|
|
|
|
to use.</p>
|
2019-08-03 18:02:45 +03:00
|
|
|
|