gecko-dev/string/doc/string-guide.html

533 строки
18 KiB
HTML
Исходник Обычный вид История

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<head>
<title>the complete guide to mozilla/string</title>
<link rel="stylesheet" href="http://www.mozilla.org/projects/string/string-guide.css" title="remote stylesheet" type="text/css">
<link rel="alternate stylesheet" href="string-guide.css" title="local stylesheet" type="text/css">
</head>
<body>
<!-- ----|---------|---------|---------|---------|---------|---------|---------| -->
<!-- ...............................................................Front Matter -->
<h1>the complete guide to <a class="exact-uri" href="http://lxr.mozilla.org/seamonkey/source/string/">mozilla/string</a></h1>
<div class="author-note">
<p>by <a href="http://ScottCollins.net/">Scott Collins</a><!-- /p -->
<p>last modified 8 April 2001<!-- /p -->
</div>
<div class="abstract">
<p>
<h1>Abstract</h1>
This document <span class="LXRSHORTDESC">provides
an <a href="#users_guide">introduction</a> to the design and use of the string classes in mozilla,
<a href="#implementors_guide">detailed information</a> on their implementation and how one may extend them,
and <a href="#faq">answers</a> to frequently asked questions about strings</span>.
</p>
</div>
<h2><a name="contents">contents</a></h2>
<div class="contents">
<ul>
<li><a href="#users_guide" >user's guide</a></li>
<li><a href="#implementors_guide">implementor's guide</a></li>
<li><a href="#faq" >frequently asked questions</a></li>
</ul>
</div>
<div class="author-note">
<p>
A note to potential editors:
don't even <strong>consider</strong> modifying this document with an HTML editor.
That would destroy the internal formatting,
and make patches unmanagable.
</p>
</div>
<!-- ...............................................................User's Guide -->
<hr>
<h1><a name="users_guide">user's guide</a></h1>
<div class="author-note">
<p>
Strings in mozilla are a world apart from <span class="code">char*</span>s.
If you don't know why they are different,
this section is the place for you to start.
If you're already familiar with the hierarchy of string classes in mozilla,
then you might want to skip ahead to the <a href="#implementors_guide">implementor's guide</a>
or the <a href="#faq">FAQ</a>.
</p>
</div>
<div class="contents">
<ul>
<li><a href="#users_guide_introduction">introduction</a></li>
<li><a href="#users_guide_how_to" >using the string classes correctly; using the correct string class</a></li>
<li><a href="#users_guide_iterators" >using string iterators</a></li>
<li><a href="#users_guide_summary" >summary</a></li>
</ul>
</div>
<h2><a name="users_guide_introduction">introduction</a></h2>
<h3>what and what isn't a string?</h3>
<p>
A string is an opaque container holding a, possibly zero length, linear sequence of characters.
Understanding the implications of this statement is the foundation for understanding all mozilla's string classes.
</p>
<h3>readable and writable</h3>
<h3>promises</h3>
<h3>flat strings</h3>
<h3>encoding</h3>
<h3>sharing</h3>
<h2><a name="users_guide_how_to">using the string classes correctly; using the correct string class</a></h2>
<h3>basic string operations</h3>
<h4>comparison</h4>
<h4>concatenation</h4>
<h4>substrings</h4>
<h4>find and replace</h4>
<h3>conversions</h3>
<h4>calling a function that expects a different kind of string</h4>
<h4>converting between string classes</h4>
<h4>converting between encodings</h4>
<h3>selecting the right string class</h3>
<h4>user string classes</h4>
<h4>selecting the right string class for a parameter</h4>
<h4>selecting the right string class for a local variable</h4>
<h4>selecting the right string class for a member variable</h4>
<h4>selecting the right string class for a return value</h4>
<h4>selecting the right string class in IDL</h4>
<h3>dont's</h3>
<h2><a name="users_guide_iterators">using string iterators</a></h2>
<h3>what is an iterator?</h3>
<h3>reading iterators and writing iterators</h3>
<h3>`chunky' iterating for efficiency</h3>
<h3><span class="code">copy_string</span>, character sources and sinks</h3>
<h3>encoding conversion iterators</h3>
<h2><a name="users_guide_summary">summary</a></h2>
<!-- ........................................................Implementor's Guide -->
<hr>
<h1><a name="implementors_guide">implementor's guide</a></h1>
<div class="author-note">
<p>
</p>
</div>
<div class="contents">
<ul>
<!-- li></li -->
</ul>
</div>
<!-- ........................................................................FAQ -->
<hr>
<h1><a name="faq">frequently asked questions</a></h1>
<div class="author-note">
</div>
<div class="contents">
<ul>
<!--
<li>
I have a wide string, i.e., an instance of a class derived from <span class="code">nsAString</span>
<ul>
<li>I want a pointer to the characters</span>
<li>I want a narrow string</li>
<li>I want to <span class="code">printf</span> it</li>
</ul>
</li>
<li>
I have a <span class="code">PRUnichar*</span>
<ul>
<li>I want a wide string</span>
<li>I want a narrow string</span>
<li>I want to <span class="code">printf</span> it</li>
</ul>
</li>
<li>
I have a narrow string, i.e., an instance of a class derived from <span class="code">nsACString</span>
<ul>
<li>I want a pointer to the characters</span>
<li>I want a narrow string</li>
<li>I want to <span class="code">printf</span> it</li>
</ul>
</li>
<li>
I have a <span class="code">char*</span>
<ul>
<li>I want a wide string</span>
<li>I want a narrow string</span>
</ul>
</li>
<li>
I have a literal character sequence, e.g., <span class="code">"Hello, World!\n"</span>
<ul>
<li>I want a wide string</span>
<li>I want a narrow string</span>
</ul>
</li>
<li>What's the best way to return a string?</li>
<li>How can I get a pointer to the characters in a string?</li>
<li>How can I <span class="code">printf</span> a string?</li>
</ul>
-->
</div>
<table class="chart">
<tr>
<th></th>
<th colspan="5">you have some <span class="code">char</span>s</th>
</tr>
<tr>
<th>you want</th>
<th><span class="code">'x'</span></th>
<th><span class="code">char c</span></th>
<th><span class="code">"foo"</span></th>
<th><span class="code">char* cp</span></th>
<th><span class="code">nsACString& cs</span></th>
</tr>
<tr>
<th class="row-label"><span class="code">char</span></th>
<!-- 'x' --> <td>-</td>
<!-- char c --> <td>-</td>
<!-- "foo" --> <td><span class="code">[]</span></td>
<!-- char* cp --> <td><span class="code">[]</span></td>
<!-- nsACString& cs --> <td><a href="#faq_how_to_extract_a_character">how to extract a character</a></td>
</tr>
<tr>
<th class="row-label"><span class="code">PRUnichar</span></th>
<!-- 'x' --> <td><span class="code">PRUnichar('x')</span></td>
<!-- char c --> <td><span class="code">PRUnichar(c)</span></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">char*</span></th>
<!-- 'x' --> <td></td>
<!-- char c --> <td></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td><a href="#faq_how_to_get_a_pointer">how to get a pointer</a></td>
</tr>
<tr>
<th class="row-label"><span class="code">PRUnichar*</span></th>
<!-- 'x' --> <td></td>
<!-- char c --> <td></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">nsACString</span></th>
<!-- 'x' --> <td></td>
<!-- char c --> <td></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">nsAString</span></th>
<!-- 'x' --> <td></td>
<!-- char c --> <td></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td></td>
</tr>
<tr>
<th class="row-label">to call <span class="code">printf</span></th>
<!-- 'x' --> <td></td>
<!-- char c --> <td></td>
<!-- "foo" --> <td></td>
<!-- char* cp --> <td></td>
<!-- nsACString& cs --> <td></td>
</tr>
</table>
<table class="chart">
<tr>
<th></th>
<th colspan="3">you have some <span class="code">PRUnichar</span>s</th>
</tr>
<tr>
<th>you want</th>
<th><span class="code">PRUnichar w</span></th>
<th><span class="code">PRUnichar* wp</span></th>
<th><span class="code">nsAString& s</span></th>
</tr>
<tr>
<th class="row-label"><span class="code">char</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">PRUnichar</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">char*</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">PRUnichar*</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">nsACString</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
<tr>
<th class="row-label"><span class="code">nsAString</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td><a href="#faq_how_to_get_a_pointer">how to get a pointer</td>
</tr>
<tr>
<th class="row-label">to call <span class="code">printf</span></th>
<!-- PRUnichar w --> <td></td>
<!-- PRUnichar* wp --> <td></td>
<!-- nsAString& s --> <td></td>
</tr>
</table>
<div class="faq">
<dl>
<dt>
is there any string doc?
</dt>
<dd>
Yes, you're soaking in it!
</dd>
<!-- getting a pointer -->
<dt>
<a name="faq_how_to_get_a_pointer">I have a string, how do I get a pointer to the characters?</a>
</dt>
<dd>
You want to avoid this situation.
In your own interfaces, prefer string types over raw pointers.
Any interface that wants to process a string using a single pointer is making two expensive assumptions.
First, that the string is stored in one contiguous hunk; and
second, that the string is zero-terminated.
If this isn't the case,
then to get a pointer, storage must be allocated and the entire string must be copied to it and zero-terminated.
You may not be able to avoid needing a pointer when interacting with system calls.
</dd>
<dd>
Some string classes guarantee that they are `flat'.
That is, that their data is stored in one contiguous zero-terminated hunk.
This <strong>does not</strong> imply that there are no embedded nulls. Caveat emptor.
All strings that explicitly promise flatness
inherit from the class <span class="code">nsAFlatString</span>
or <span class="code">nsAFlatCString</span>
and can produce a constant pointer to their data with the <span class="code">get()</span> member function.
Even strings that don't explicitly promise to be flat
may happen to be flat.
The helper function <span class="code">PromiseFlatString</span> will produce
a <span class="code">const</span> dependent string that is guaranteed to be flat.
If you use this on a string that already happens to be flat,
the result is simply a reference through to that string.
Otherwise,
<span class="code">PromiseFlatString</span> does the work to allocate, copy, terminate, and manage
a temporary flat string.
Since the result of <span class="code">PromiseFlatString</span> is a temporary,
you must be careful not to get and hold a pointer to it's data for longer than the temporary itself lives.
</dd>
<dd>
<div class="source-code">
<pre>
/* I have a string, how do I get a pointer to the characters? */
extern void EvilNarrowOSFunction( const char* ); // evil OS routines that want a pointers
extern void EvilWideOSFunction( const PRUnichar* );
void func( const nsAString&amp; aString, const nsACString&amp; aCString )
{
EvilWideOSFunction( NS_LITERAL_STRING("Hello, World!").<span class="notice">get()</span> );
// literal strings are flat already (as are |nsString|s, et al), just use |.get()|
EvilWideOSFunction( <span class="notice">PromiseFlatString(</span>aString<span class="notice">).get()</span> );
// for strings that don't explicitly guarantee flatness, use |PromiseFlatString|
// beware holding the pointer for longer than the life of the promise
<span class="warning">const PRUnichar* wp = PromiseFlatString(aString).get(); // BAD! |wp| dangles
EvilWideOSFunction(wp);</span>
// if you really need to use the pointer from |PromiseFlatString| in more than one expression...
const nsAFlatString&amp; flat = <span class="notice">PromiseFlatString(</span>aString<span class="notice">)</span>;
EvilWideOSFunction(flat.<span class="notice">get()</span>);
SomeOtherFunction(flat.<span class="notice">get()</span>);
// similarly for |char| strings
EvilNarrowOSFunction( <span class="notice">PromiseFlatCString(</span>aCString<span class="notice">).get()</span> );
}
</pre>
</div>
</dd>
<!-- extracting a character -->
<dt>
<a name="faq_how_to_extract_a_character">How do I get a particular character out of a string?</a>
</dt>
<dd>
Flat strings provide <span class="code">operator[]</span> and <span class="code">CharAt()</span>.
All strings provide <span class="code">First()</span>, <span class="code">Last()</span>, and access with iterators.
<strong>Don't</strong> promise a string flat just to do character indexing.
Prefer, instead, to get an iterator and <span class="code">advance</span> it to the position you care about.
</dd>
<dd>
<div class="source-code">
<pre>
PRUnichar Get5thCharacterOf( const nsAString& aString )
{
if ( aString.Length() >= 5 )
{
nsAString::const_iterator iter;
aString.BeginReading(iter); // make |iter| point to the beginning of |aString|
iter.advance(5);
return *iter;
}
return PRUnichar(0);
}
</pre>
</div>
</dd>
<!-- how to return a string -->
<dt>
What is the best way to return a string?
</dt>
<dd>
<p>
There are several reasonable ways to produce a string result from a function.
If you are already holding the answer as a sharable string,
you can simply return that string (pass-by-value).
Otherwise,
the most efficient and flexible way to return a string is
to assign your result into a non-<span class="code">const</span> reference parameter.
Don't bother to create a sharable string from scratch with your generated result.
</p>
<p>
Why?
The two things you want to minimize in string manipulation are,
in order of importance,
heap allocation, and
moving characters around.
</p>
</dd>
<dd>
<div class="source-code">
<pre>
/* What is the best way to return a string? */
class foo
{
public:
// ...
void GetShortName( nsAString&amp; aResult ) const;
nsCommonString GetFullName() const;
private:
nsCommonString mFullName;
const PRUnichar* mShortName;
PRUint32 mShortNameLength;
};
nsCommonString
foo::GetFullName() const
{
return mFullName;
}
void
foo::GetShortName( nsAString&amp; aResult ) const
{
aResult = DependentString(mShortName, mShortNameLength);
}
</pre>
</div>
</dd>
<dt>
If I have a <span class="code">PRUnichar *aKey</span> [or other representation of a wide] string,
what can I use (easily :)
to convert it
to a <span class="code">printf()</span> printable string?
Just for debugging...
</dt>
<dd>
If it's just for debugging,
you probably wouldn't care if something odd was printed in the case of a UCS2 character that didn't have
an ASCII equivalent.
The simplest thing in this case is to make a temporary conversion using <span class="code">NS_ConvertUCS2toUTF8</span>.
Remember not to hold onto the pointer you get out of this beyond the lifetime of temporary.
</dd>
<dd>
<div class="source-code">
<pre>
const PRUnichar* aKey;
printf("%s\n", <span class="notice">NS_ConvertUCS2toUTF8(</span>aKey<span class="notice">).get()</span>); // GOOD
// the simplest way to get a |printf|-able |const char*| out of a string
// works just as well with an formal wide string type...
const nsAString&amp aString = ...; // perhaps it's a parameter
printf("%s\n", <span class="notice">NS_ConvertUCS2toUTF8(</span>aString<span class="notice">).get()</span>);
// But don't hold onto the pointer longer than the lifetime of the temporary!
const char* cstring = NS_ConvertUCS2toUTF8(aKey).get(); <span class="warning">// BAD!</span>
printf("%s\n", cstring); <span class="warning">// |cstring| is dangling</span>
</pre>
</div>
</dd>
</dl>
</div>
<!-- .................................................................End Matter -->
</body>
</html>