ruby/test/rexml/data/documentation.xml

543 строки
25 KiB
XML

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/css" href="http://www.germane-software.com/repositories/public/documentation/documentation.css"?>
<?xml-stylesheet alternative="yes" type="text/css" href="file:/home/ser/Work/documentation/documentation.css"?>
<?xml-stylesheet alternative="yes" type="text/xsl" href="http://www.germane-software.com/repositories/public/documentation/paged.xsl"?>
<!DOCTYPE documentation SYSTEM "http://www.germane-software.com/repositories/public/documentation/documentation.dtd">
<documentation>
<head>
<title>REXML</title>
<banner href="img/rexml.png" />
<version>@ANT_VERSION@</version>
<date>@ANT_DATE@</date>
<home>http://www.germane-software.com/software/rexml</home>
<base>rexml</base>
<language>ruby</language>
<author email="ser@germane-software.com"
href="http://www.ser1.net/" jabber="seanerussell@gmail.com">Sean
Russell</author>
</head>
<overview>
<purpose lang="en">
<p>REXML is a conformant XML processor for the Ruby programming
language. REXML passes 100% of the Oasis non-validating tests and
includes full XPath support. It is reasonably fast, and is implemented
in pure Ruby. Best of all, it has a clean, intuitive API. REXML is
included in the standard library of Ruby</p>
<p>This software is distribute under the <link href="LICENSE.txt">Ruby
license</link>.</p>
</purpose>
<general>
<p>REXML arose out of a desire for a straightforward XML API, and is an
attempt at an API that doesn't require constant referencing of
documentation to do common tasks. "Keep the common case simple, and the
uncommon, possible."</p>
<p>REXML avoids The DOM API, which violates the maxim of simplicity. It
does provide <em>a</em> DOM model, but one that is Ruby-ized. It is an
XML API oriented for Ruby programmers, not for XML programmers coming
from Java.</p>
<p>Some of the common differences are that the Ruby API relies on block
enumerations, rather than iterators. For example, the Java code:</p>
<example>for (Enumeration e=parent.getChildren(); e.hasMoreElements(); ) {
Element child = (Element)e.nextElement(); // Do something with child
}</example>
<p>in Ruby becomes:</p>
<example>parent.each_child{ |child| # Do something with child }</example>
<p>Can't you feel the peace and contentment in this block of code? Ruby
is the language Buddha would have programmed in.</p>
<p>One last thing. If you use and like this software, and you're in a
position of power in a company in Western Europe and are looking for a
software architect or developer, drop me a line. I took a lot of French
classes in college (all of which I've forgotten), and I lived in Munich
long enough that I was pretty fluent by the time I left, and I'd love to
get back over there.</p>
</general>
<features lang="en">
<item>Four intuitive parsing APIs.</item>
<item>Intuitive, powerful, and reasonably fast tree parsing API (a-la
DOM</item>
<item>Fast stream parsing API (a-la SAX)<footnote>This is not a SAX
API.</footnote></item>
<item>SAX2-based API<footnote>In addition to the native REXML streaming
API. This is slower than the native REXML API, but does a lot more work
for you.</footnote></item>
<item>Pull parsing API.</item>
<item>Small</item>
<item>Reasonably fast (for interpreted code)</item>
<item>Native Ruby</item>
<item>Full XPath support<footnote>Currently only available for the tree
API</footnote></item>
<item>XML 1.0 conformant<footnote>REXML passes all of the non-validating
OASIS tests. There are probably places where REXML isn't conformant, but
I try to fix them as they're reported.</footnote></item>
<item>ISO-8859-1, UNILE, UTF-16 and UTF-8 input and output; also,
support for any encoding the iconv supports.</item>
<item>Documentation</item>
</features>
</overview>
<operation lang="en">
<subsection title="Installation">
<p>You don't <em>have</em> to install anything; if you're running a
version of Ruby greater than 1.8, REXML is included. However, if you
choose to upgrade from the REXML distribution, run the command:
<code>ruby bin/install.rb</code>. By the way, you really should look at
these sorts of files before you run them as root. They could contain
anything, and since (in Ruby, at least) they tend to be mercifully
short, it doesn't hurt to glance over them. If you want to uninstall
REXML, run <code>ruby bin/install.rb -u</code>.</p>
</subsection>
<subsection title="Unit tests">
<p>If you have Test::Unit installed, you can run the unit test cases.
Run the command: <code>ruby bin/suite.rb</code>; it runs against the
distribution, not against the installed version.</p>
</subsection>
<subsection title="Benchmarks">
<p>There is a benchmark suite in <code>benchmarks/</code>. To run the
benchmarks, change into that directory and run <code>ruby
comparison.rb</code>. If you have nothing else installed, only the
benchmarks for REXML will be run. However, if you have any of the
following installed, benchmarks for those tools will also be run:</p>
<list>
<item>NQXML</item>
<item>XMLParser</item>
<item>Electric XML (you must copy <code>EXML.jar</code> into the
<code>benchmarks</code> directory and compile
<code>flatbench.java</code> before running the test)</item>
</list>
<p>The results will be written to <code>index.html</code>.</p>
</subsection>
<subsection title="General Usage">
<p>Please see <link href="docs/tutorial.html">the Tutorial</link>.</p>
<p>The API documentation is available <link
href="http://www.germane-software.com/software/XML/rexml/doc">on-line</link>,
or it can be downloaded as an archive <link
href="http://www.germane-software.com/software/archives/rexml_api_@ANT_VERSION@.tgz">in
tgz format (~70Kb)</link> or (if you're a masochist) <link
href="http://www.germane-software.com/software/archives/rexml_api_@ANT_VERSION@.zip">in
zip format (~280Kb)</link>. The best solution is to download and install
Dave Thomas' most excellent <link
href="http://rdoc.sourceforge.net">rdoc</link> and generate the API docs
yourself; then you'll be sure to have the latest API docs and won't have
to keep downloading the doc archive.</p>
<p>The unit tests in <code>test/</code> and the benchmarking code in
<code>benchmark/</code> provide additional examples of using REXML. The
Tutorial provides examples with commentary. The documentation unpacks
into <link href="doc/index.html"><code>rexml/doc</code></link>.</p>
<p>Kouhei Sutou maintains a <link
href="http://www.germane-software.com/software/rexml_doc_ja/current/index.html">Japanese
version</link> of the REXML API docs. <link
href="http://www.germane-software.com/software/rexml_doc_ja/current/japanese_documentation.html">Kou's
documentation page</link> contains links to binary archives for various
versions of the documentation.</p>
</subsection>
</operation>
<status>
<subsection title="Speed and Completeness">
<p>Unfortunately, NQXML is the only package REXML can be compared
against; XMLParser uses expat, which is a native library, and really is
a different beast altogether. So in comparing NQXML and REXML you can
look at four things: speed, size, completeness, and API.</p>
<p><link href="benchmarks/index.html">Benchmarks</link></p>
<p>REXML is faster than NQXML in some things, and slower than NQXML in a
couple of things. You can see this for yourself by running the supplied
benchmarks. Most of the places where REXML are slower are because of the
convenience methods<footnote>For example,
<code>element.elements[index]</code> isn't really an array operation;
index can be an Integer or an XPath, and this feature is relatively time
expensive.</footnote>. On the positive side, most of the convenience
methods can be bypassed if you know what you are doing. Check the <link
href="benchmarks/index.html"> benchmark comparison page</link> for a
<em>general</em> comparison. You can look at the benchmark code yourself
to decide how much salt to take with them.</p>
<p>The sizes of the XML parsers are close<footnote>As measured with
<code>ruby -nle 'print unless /^\s*(#.*|)$/' *.rb | wc -l</code>
</footnote>. NQXML 1.1.3 has 1580 non-blank, non-comment lines of code;
REXML 2.0 has 2340<footnote>REXML started out with about 1200, but that
number has been steadily increasing as features are added. XPath
accounts for 541 lines of that code, so the core REXML has about 1800
LOC.</footnote>.</p>
<p>REXML is a conformant XML 1.0 parser. It supports multiple language
encodings, and internal processing uses the required UTF-8 and UTF-16
encodings. It passes 100% of the Oasis non-validating tests.
Furthermore, it provides a full implementation of XPath, a SAX2 and a
PullParser API.</p>
</subsection>
<subsection title="XPath">
<p>As of release 2.0, XPath 1.0 is fully implemented.</p>
<p>I fully expect bugs to crop up from time to time, so if you see any
bogus XPath results, please let me know. That said, since I'm now
following the XPath grammar and spec fairly closely, I suspect that you
won't be surprised by REXML's XPath very often, and it should become
rock solid fairly quickly.</p>
<p>Check the "bugs" section for known problems; there are little bits of
XPath here and there that are not yet implemented, but I'll get to them
soon.</p>
<p>Namespace support is rather odd, but it isn't my fault. I can only do
so much and still conform to the specs. In particular, XPath attempts to
help as much as possible. Therefore, in the trivial cases, you can pass
namespace prefixes to Element.elements[...] and so on -- in these cases,
XPath will use the namespace environment of the base element you're
starting your XPath search from. However, if you want to do something
more complex, like pass in your own namespace environment, you have to
use the XPath first(), each(), and match() methods. Also, default
namespaces <em>force</em> you to use the XPath methods, rather than the
convenience methods, because there is no way for XPath to know what the
mappings for the default namespaces should be. This is exactly why I
loath namespaces -- a pox on the person(s) who thought them up!</p>
</subsection>
<subsection title="Namespaces">
<p>Namespace support is now fairly stable. One thing to be aware of is
that REXML is not (yet) a validating parser. This means that some
invalid namespace declarations are not caught.</p>
</subsection>
<subsection title="Mailing list">
<p>There is a low-volume mailing list dedicated to REXML. To subscribe,
send an empty email to <link
href="mailto:ser-rexml-subscribe@germane-software.com">ser-rexml-subscribe@germane-software.com</link>.
This list is more or less spam proof. To unsubscribe, similarly send a
message to <link
href="mailto:ser-rexml-unsubscribe@germane-software.com">ser-rexml-unsubscribe@germane-software.com</link>.</p>
</subsection>
<subsection title="RSS">
<p>An <link
href="http://www.germane-software.com/projects/rexml/timeline?ticket=on&amp;max=50&amp;daysback=90&amp;format=rss">RSS
file</link> for REXML is now being generated from the change log. This
allows you to be alerted of bug fixes and feature additions via "pull".
<link href="http://www.germane-software.com/software/rexml/rss.xml">Another
RSS</link> is available which contains a single item: the release notice
for the most recent release. This is an abuse of the RSS
mechanism, which was intended to be a distribution system for headlines
linked back to full articles, but it works. The headline for REXML is
the version number, and the description is the change log. The links all
link back to the REXML home page. The URL for the RSS itself is
http://www.germane-software.com/software/rexml/rss.xml.</p>
<p>The <link href="release.html">changelog itself is here</link>.</p>
<p>For those who are interested, there's a <link
href="docs/sloccount.txt">SLOCCount</link> (by David A. Wheeler) file
with stats on the REXML sourcecode. Note that the SLOCCount output
includes the files in the test/, benchmarks/, and bin/ directories, as
well as the main sourcecode for REXML itself.</p>
</subsection>
<subsection title="Applications that use REXML">
<list>
<item><link
href="http://www.pablotron.org/software/raggle/">Raggle</link> is a
console-based RSS aggregator.</item>
<item><link
href="http://www.zweknu.org/technical/index.rhtml?s=p|10/">getrss</link>
is an RSS aggregator</item>
<item>Ned Konz's <link
href="http://www.bikenomad.microship.com/ruby/">ruby-htmltools</link>
uses REXML</item>
<item>Hiroshi NAKAMURA's <link
href="http://www.ruby-lang.org/en/raa-list.rhtml?name=SOAP4R">SOAP4R</link>
package can use REXML as the XML processor.</item>
<item>Chris Morris' <link href="http://clabs.org/clxmlserial.htm">XML
Serializer</link>. XML Serializer provides a serialization mechanism
for Ruby that provides a bidirectional mapping between Ruby classes
and XML documents.</item>
<item>Much of the <link href="http://www.rubyxml.com">RubyXML</link>
site is generated with scripts that use REXML. RubyXML is a great
place to find information about th intersection between Ruby and
XML.</item>
</list>
</subsection>
<bugs lang="en">
<p>You can submit bug reports and feature requests, and view the list of
known bugs, at the <link
href="http://www.germane-software.com/projects/rexml">REXML bug report
page.</link> Please do submit bug reports. If you really want your bug
fixed fast, include an runit or Test::Unit method (or methods) that
illustrates the problem. At the very least, send me some XML that REXML
doesn't process properly.</p>
<p>You don't have to send an entire test suite -- just the unit test
methods. If you don't send me a unit test, I'll have to write one
myself, which will mean that your bug will take longer to fix.</p>
<p>When submitting bug reports, please include the version of Ruby and
of REXML that you're using, and the operating system you're running on.
Just run: <code>ruby -vrrexml/rexml -e 'p
REXML::VERSION,PLATFORM'</code> and paste the results in your bug
report. Include your email if you want a response about the bug.</p>
<item>Attributes are not handled internally as nodes, so you can't
perform node functions on them. This will have to change. It'll also
probably mean that, rather than returning attribute values, XPath will
return the Attribute nodes.</item>
<item>Some of the XPath <em>functions</em> are untested<footnote>Mike
Stok has been testing, debugging, and implementing some of these
Functions (and he's been doing a good job) so there's steady improvement
in this area.</footnote>. Any XPath functions that don't work are also
bugs... please report them. If you send a unit test that illustrates the
problem, I'll try to fix the problem within a couple of days (if I can)
and send you a patch, personally.</item>
<item>Accessing prefixes for which there is no defined namespace in an
XPath should throw an exception. It currently doesn't -- it just fails
to match.</item>
</bugs>
<todo lang="en">
<item>Reparsing a tree with a pull/SAX parser</item>
<item>Better namespace support in SAX</item>
<item>Lazy tree parsing</item>
<item>Segregate parsers, for optimized minimal distributions</item>
<item>XML &lt;-&gt; Ruby</item>
<item>Validation support</item>
<item>True XML character support</item>
<item>Add XPath support for streaming APIs</item>
<item status="request">XQuery support</item>
<item status="request">XUpdate support</item>
<item>Make sure namespaces are supported in pull parser</item>
<item status="request">Add document start and entity replacement events
in pull parser</item>
<item>Better stream parsing exception handling</item>
<item>I'd like to hack XMLRPC4R to use REXML, for my own
purposes.</item>
</todo>
</status>
<faq>
<q>REXML is hanging while parsing one of my XML files.</q>
<a>Your XML is probably malformed. Some malformed XML, especially XML that
contains literal '&lt;' embedded in the document, causes REXML to hang.
REXML should be throwing an exception, but it doesn't; this is a bug. I'm
aware that it is an extremely annoying bug, and it is one I'm trying to
solve in a way that doesn't significantly reduce REXML's parsing
speed.</a>
<q>I'm using the XPath '//foo' on an XML branch node X, and keep getting
all of the 'foo' elements in the entire document. Why? Shouldn't it return
only the 'foo' element descendants of X?</q>
<a>No. XPath specifies that '/' returns the document root, regardless of
the context node. '//' also starts at the document root. If you want to
limit your search to a branch, you need to use the self:: axe. EG,
'self::node()//foo', or the shorthand './/foo'.</a>
<q>I want to parse a document both as a tree, and as a stream. Can I do
this?</q>
<a>Yes, and no. There is no mechanism that directly supports this in
REXML. However, aside from writing your own traversal layer, there is a
way of doing this. To turn a tree into a stream, just turn the branch you
want to process as a stream back into a string, and re-parse it with your
preferred API. EG: pp = PullParser.new( some_element.to_s ). The other
direction is more difficult; you basically have to build a tree from the
events. REXML will have one of these builders, eventually, but it doesn't
currently exist.</a>
<q>Why is Element.elements indexed off of '1' instead of '0'?</q>
<a>Because of XPath. The XPath specification states that the index of the
first child node is '1'. Although it may be counter-intuitive to base
elements on 1, it is more undesireable to have element.elements[0] ==
element.elements[ 'node()[1]' ]. Since I can't change the XPath
specification, the result is that Element.elements[1] is the first child
element.</a>
<q>Why isn't REXML a validating parser?</q>
<a>Because validating parsers must include code that parses and interprets
DTDs. I hate DTDs. REXML supports the barest minimum of DTD parsing, and
even that isn't complete. There is DTD parsing code in the works, but I
only work on it when I'm really, really bored. Rumor has it that a
contributor is working on a DTD parser for REXML; rest assured that any
such contribution will be included with REXML as soon as it is
available.</a>
<q>I'm trying to create an ISO-8859-1 document, but when I add text to the
document it isn't being properly encoded.</q>
<a>Regardless of what the encoding of your document is, when you add text
programmatically to a REXML document you <em>must</em> ensure that you are
only adding UTF-8 to the tree. In particular, you can't add ISO-8859-1
encoded text that contains characters above 0x80 to REXML trees -- you
must convert it to UTF-8 before doing so. Luckily, this is easy:
<code>text.unpack('C*').pack('U*')</code> will do the trick. 7-bit ASCII
is identical to UTF-8, so you probably won't need to worry about this.</a>
<q>How do I get the tag name of an Element?</q>
<a>You take a look at the APIs, and notice that <code>Element</code>
includes <code>Namespace</code>. Then you click on the
<code>Namespace</code> link and look at the methods that
<code>Element</code> includes from <code>Namespace</code>. One of these is
<code>name()</code>. Another is <code>expanded_name()</code>. Yet another
is <code>prefix()</code>. Then, you email the author of rdoc and ask him
to extend rdoc so that it lists methods in the API that are included from
other files, so that you don't have to do all of that looking around for
your method.</a>
</faq>
<credits>
<p>I've had help from a number of resources; if I haven't listed you here,
it means that I just haven't gotten around to adding you, or that I'm a
dork and have forgotten. In either case, feel free to write me and
complain.</p>
<list>
<item>Mike Stok has been very active, sending not only fixes for bugs
(especially in Functions), but also by providing unit tests and making
sure REXML runs under Ruby 1.7. He also sent the most awesome hand
knitted tea cozy, with "REXML" and the Ruby knitted into it.</item>
<item>Kouhei Sutou translated the REXML API documentation to Japanese!
Links are in the API docs section of the main documentation. He has also
contributed a large number of bug reports and patches to fix bugs in
REXML.</item>
<item>Erik Terpstra heard my pleas and submitted several logos for
REXML. After sagely procrastinating for several weeks, I finally forced
my poor slave of a wife to pick one (this is what we call "delegation").
She did, with caveats; Erik quickly made the changes, and the result is
what you now see at the top of this page. He also supplied a <link
href="img/rexml_50p.png">smaller version</link> that you can include
with your projects that use REXML, if you'd like.</item>
<item>Ernest Ellingson contributed the sourcecode for turning UTF16 and
UNILE encodings into UTF8, which allowed REXML to get the 100% OASIS
valid tests rating.</item>
<item>Ian Macdonald provided me with a comprehensive, well written RPM
spec file.</item>
<item>Oliver M . Bolzer is maintaining a Debian package distribution of
REXML. He also has provided good feedback and bug reports about
namespace support.</item>
<item>Michael Granger supplied a patch for REXML that make the unit
tests pass under Ruby 1.7.</item>
<item>James Britt contributed code that makes using
Document.parse_stream easier to use by allowing it to be passed either a
Source, File, or String.</item>
<item>Tobias Reif: Numerous bug reports, and suggestions for
improvement.</item>
<item>Stefan Scholl, who provided a lot of feedback and bug reports
while I was trying to get ISO-8859-1 support working.</item>
<item>Steven E Lumos for volunteering information about XPath
particulars.</item>
<item>Fumitoshi UKAI provided some bug fixes for CData metacharacter
quoting.</item>
<item>TAKAHASHI Masayoshi, for information on UTF</item>
<item>Robert Feldt: Bug reports and suggestions/recommendations about
improving REXML. Testing is one of the most important aspects of
software development.</item>
<item><link
href="http://www.themindelectric.com/exml/index.html">Electric
XML</link>: This was, after all, the inspiration for REXML. Originally,
I was just going to do a straight port, and although REXML doesn't in
any way, shape or form resemble Electric XML, still the basic framework
and philosophy was inspired by E-XML. And I still use E-XML in my Java
projects.</item>
<item><link
href="http://www.io.com/~jimm/downloads/nqxml/index.html">NQXML</link>:
While I may complain about the NQXML API, I wrote a few applications
using it that wouldn't have been written otherwise, and it was very
useful to me. It also encouraged me to write REXML. Never complain about
free software *slap*.</item>
<item>See my <link
href="http://www.germane-software.com/~ser/technology.html">technologies
page</link> for a more comprehensive list of computer technologies that
I depend on for my day-to-day work.</item>
<item>rdoc, an excellent JavaDoc analog<footnote>When I was first
working on REXML, rdoc wasn't, IMO, very good, so I wrote API2XML.
API2XML was good enough for a while, and then there was a flurry of work
on rdoc, and it quickly surpassed API2XML in features. Since I was never
really interested in maintaining a JavaDoc analog, I stopped support of
API2XML, and am now recommending that people use
rdoc.</footnote>.</item>
<item>Many, many other people who've submitted bug reports, suggestions,
and positive feedback. You're all co-developers!</item>
</list>
</credits>
</documentation>