pjs/webtools/lxr/search-help.html

<HTML>
<HEAD>
<TITLE>mozilla cross-reference: search help</TITLE>
</HEAD>
<BODY   BGCOLOR="#FFFFFF" TEXT="#000000"
	LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">

<TABLE BGCOLOR="#000000" WIDTH="100%" BORDER=0 CELLPADDING=0 CELLSPACING=0>
<TR><TD><A HREF="http://www.mozilla.org/"><IMG
 SRC="http://www.mozilla.org/images/mozilla-banner.gif" ALT=""
 BORDER=0 WIDTH=600 HEIGHT=58></A></TD></TR></TABLE>

<H1 ALIGN=CENTER>search help<BR>
<FONT SIZE=3>
for the<br>
<A HREF="./"><I>mozilla cross-reference</I></A>
</FONT></H1>

<P><BR>
<BLOCKQUOTE><BLOCKQUOTE>
<I>
This text is derived from the Glimpse manual page.
For more information on glimpse, see the 
<A HREF="http://glimpse.cs.arizona.edu">Glimpse homepage</A>.
</I>
</BLOCKQUOTE></BLOCKQUOTE>

<A NAME="Patterns"></A><H2>Patterns</H2>
<UL>
glimpse supports a large variety of patterns, including simple
strings, strings with classes of characters, sets of strings, 
wild cards, and regular expressions (see <A HREF="#Limitations">Limitations</A>).

</UL>
<P> <H3>Strings</H3>
<UL>

Strings are any sequence of characters, including the special symbols
`^' for beginning of line and `$' for end of line.  The following
special characters (`$', `^', `*', `[', `^', `|', `(', `)', `!', and
`\' ) as well as the following meta characters special to glimpse (and
agrep): `;', `,', `#', `&gt;', `&lt;', `-', and `.', should be preceded by
`\' if they are to be matched as regular characters.  For example,
\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds
to the string abc at the beginning of a line.

</UL>
<P> <H3>Classes of characters</H3>
<UL>

a list of characters inside [] (in order) corresponds to any character
from the list.  For example, [a-ho-z] is any character between a and h
or between o and z.  The symbol `^' inside [] complements the list.
For example, [^i-n] denote any character in the character set except
character 'i' to 'n'.
The symbol `^' thus has two meanings, but this is consistent with
egrep.
The symbol `.' (don't care) stands for any symbol (except for the
newline symbol).

</UL>
<P> <H3>Boolean operations</H3>
<UL>

Glimpse 
supports an `AND' operation denoted by the symbol `;' 
an `OR' operation denoted by the symbol `,',
a limited version of a 'NOT' operation (starting at version 4.0B1)
denoted by the symbol `~',
or any combination.  
For example, pizza;cheeseburger' will output all lines containing
both patterns.
'{political,computer};science' will match 'political science' 
or 'science of computers'.

</UL>
<P><H3>Wild cards</H3>
<UL>

The symbol '#' is used to denote a sequence 
of any number (including 0) 
of arbitrary characters (see <A HREF="#Limitations">Limitations</A>).  
The symbol # is equivalent to .* in egrep.
In fact, .* will work too, because it is a valid regular expression
(see below), but unless this is part of an actual regular expression,
# will work faster. 
(Currently glimpse is experiencing some problems with #.)

</UL>
<P><H3>Combination of exact and approximate matching</H3>
<UL>

Any pattern inside angle brackets &lt;&gt; must match the text exactly even
if the match is with errors.  For example, &lt;mathemat&gt;ics matches
mathematical with one error (replacing the last s with an a), but
mathe&lt;matics&gt; does not match mathematical no matter how many errors are
allowed. (This option is buggy at the moment.)

</UL>
<H3>Regular expressions</H3>
<UL>

Since the index is word based, a regular expression must match words
that appear in the index for glimpse to find it.  Glimpse first strips
the regular expression from all non-alphabetic characters, and
searches the index for all remaining words.  It then applies the
regular expression matching algorithm to the files found in the index.
For example, glimpse 'abc.*xyz' will search the index for all files
that contain both 'abc' and 'xyz', and then search directly for
'abc.*xyz' in those files.  (If you use glimpse -w 'abc.*xyz', then
'abcxyz' will not be found, because glimpse will think that abc and
xyz need to be matches to whole words.)  The syntax of regular
expressions in glimpse is in general the same as that for agrep.  The
union operation `|', Kleene closure `*', and parentheses () are all
supported.  Currently '+' is not supported.  Regular expressions are
currently limited to approximately 30 characters (generally excluding
meta characters). The maximal number of errors
for regular expressions that use '*' or '|' is 4.

</UL>
<A NAME="Limitations"></A><H2>Limitations</H2>
<UL>

The index of glimpse is word based.  A pattern that contains more than
one word cannot be found in the index.  The way glimpse overcomes this
weakness is by splitting any multi-word pattern into its set of words
and looking for all of them in the index.
For example, <I>'linear programming'</I> will first consult the index
to find all files containing both <I>linear</I> and <I>programming</I>,
and then apply agrep to find the combined pattern.
This is usually an effective solution, but it can be slow for
cases where both words are very common, but their combination is not.

<P>
As was mentioned in the section on <A HREF="#Patterns">Patterns</A> above, some characters
serve as meta characters for glimpse and need to be
preceded by '\\' to search for them.  The most common
examples are the characters '.' (which stands for a wild card),
and '*' (the Kleene closure).
So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"
will not, and "glimpse ab*de" will not match ab*de, but 
"glimpse ab\\*de" will.
The meta character - is translated automatically to a hyphen
unless it appears between [] (in which case it denotes a range of
characters).

<P>
Search patterns are limited to 29 characters.
Lines are limited to 1024 characters.

</UL>
<P>
<HR>

<ADDRESS>
  <A HREF="mailto:lxr@linux.no">
    Arne Georg Gleditsch and Per Kristian Gjermshus</A>
</ADDRESS>

</BODY>
</HTML>
linked in dawn's search help doc 1998-06-13 01:42:37 +04:00			`<HTML>`
			`<HEAD>`
			`<TITLE>mozilla cross-reference: search help</TITLE>`
			`</HEAD>`
			`<BODY BGCOLOR="#FFFFFF" TEXT="#000000"`
			`LINK="#0000EE" VLINK="#551A8B" ALINK="#FF0000">`

			`<TABLE BGCOLOR="#000000" WIDTH="100%" BORDER=0 CELLPADDING=0 CELLSPACING=0>`
			`<TR><TD><A HREF="http://www.mozilla.org/"><IMG`
			`SRC="http://www.mozilla.org/images/mozilla-banner.gif" ALT=""`
			`BORDER=0 WIDTH=600 HEIGHT=58></A></TD></TR></TABLE>`

			`<H1 ALIGN=CENTER>search help<BR>`
			`<FONT SIZE=3>`
			`for the<br>`
			`<A HREF="./"><I>mozilla cross-reference</I></A>`
			`</FONT></H1>`

			`<P><BR>`
			`<BLOCKQUOTE><BLOCKQUOTE>`
			`<I>`
Rework the init subroutine in Common.pm so glimpse doesn't go through the normal http_wash stuff so now you can do regexp searches and search->for pointers. Reworked the search form so that searching for literals is the default. Meta characters automatically escaped unless the regexp button is pushed. Changed the docs so it no longer says to use double backslashes to escape metachars. That's only needed from the shell. 1999-09-10 23:03:39 +04:00			`This text is derived from the Glimpse manual page.`
			`For more information on glimpse, see the`
linked in dawn's search help doc 1998-06-13 01:42:37 +04:00			`<A HREF="http://glimpse.cs.arizona.edu">Glimpse homepage</A>.`
			`</I>`
			`</BLOCKQUOTE></BLOCKQUOTE>`

			`<A NAME="Patterns"></A><H2>Patterns</H2>`
			`<UL>`
			`glimpse supports a large variety of patterns, including simple`
			`strings, strings with classes of characters, sets of strings,`
			`wild cards, and regular expressions (see <A HREF="#Limitations">Limitations</A>).`

			`</UL>`
			`<P> <H3>Strings</H3>`
			`<UL>`

			`Strings are any sequence of characters, including the special symbols`
			`^' for beginning of line and `$' for end of line. The following
			special characters (`$', `^', `*', `[', `^', `\|', `(', `)', `!', and
			`\' ) as well as the following meta characters special to glimpse (and
			agrep): `;', `,', `#', `>', `<', `-', and `.', should be preceded by
Rework the init subroutine in Common.pm so glimpse doesn't go through the normal http_wash stuff so now you can do regexp searches and search->for pointers. Reworked the search form so that searching for literals is the default. Meta characters automatically escaped unless the regexp button is pushed. Changed the docs so it no longer says to use double backslashes to escape metachars. That's only needed from the shell. 1999-09-10 23:03:39 +04:00			`\' if they are to be matched as regular characters. For example,
			`\^abc\\\\ corresponds to the string ^abc\\, whereas ^abc corresponds`
linked in dawn's search help doc 1998-06-13 01:42:37 +04:00			`to the string abc at the beginning of a line.`

			`</UL>`
			`<P> <H3>Classes of characters</H3>`
			`<UL>`

			`a list of characters inside [] (in order) corresponds to any character`
			`from the list. For example, [a-ho-z] is any character between a and h`
			or between o and z. The symbol `^' inside [] complements the list.
			`For example, [^i-n] denote any character in the character set except`
			`character 'i' to 'n'.`
			The symbol `^' thus has two meanings, but this is consistent with
			`egrep.`
			The symbol `.' (don't care) stands for any symbol (except for the
			`newline symbol).`

			`</UL>`
			`<P> <H3>Boolean operations</H3>`
			`<UL>`

			`Glimpse`
			supports an `AND' operation denoted by the symbol `;'
			an `OR' operation denoted by the symbol `,',
			`a limited version of a 'NOT' operation (starting at version 4.0B1)`
			denoted by the symbol `~',
			`or any combination.`
			`For example, pizza;cheeseburger' will output all lines containing`
			`both patterns.`
			`'{political,computer};science' will match 'political science'`
			`or 'science of computers'.`

			`</UL>`
			`<P><H3>Wild cards</H3>`
			`<UL>`

			`The symbol '#' is used to denote a sequence`
			`of any number (including 0)`
			`of arbitrary characters (see <A HREF="#Limitations">Limitations</A>).`
			`The symbol # is equivalent to .* in egrep.`
			`In fact, .* will work too, because it is a valid regular expression`
			`(see below), but unless this is part of an actual regular expression,`
			`# will work faster.`
			`(Currently glimpse is experiencing some problems with #.)`

			`</UL>`
			`<P><H3>Combination of exact and approximate matching</H3>`
			`<UL>`

			`Any pattern inside angle brackets <> must match the text exactly even`
			`if the match is with errors. For example, <mathemat>ics matches`
			`mathematical with one error (replacing the last s with an a), but`
			`mathe<matics> does not match mathematical no matter how many errors are`
			`allowed. (This option is buggy at the moment.)`

			`</UL>`
			`<H3>Regular expressions</H3>`
			`<UL>`

			`Since the index is word based, a regular expression must match words`
			`that appear in the index for glimpse to find it. Glimpse first strips`
			`the regular expression from all non-alphabetic characters, and`
			`searches the index for all remaining words. It then applies the`
			`regular expression matching algorithm to the files found in the index.`
			`For example, glimpse 'abc.*xyz' will search the index for all files`
			`that contain both 'abc' and 'xyz', and then search directly for`
			`'abc.xyz' in those files. (If you use glimpse -w 'abc.xyz', then`
			`'abcxyz' will not be found, because glimpse will think that abc and`
			`xyz need to be matches to whole words.) The syntax of regular`
			`expressions in glimpse is in general the same as that for agrep. The`
			union operation `\|', Kleene closure `*', and parentheses () are all
			`supported. Currently '+' is not supported. Regular expressions are`
			`currently limited to approximately 30 characters (generally excluding`
			`meta characters). The maximal number of errors`
			`for regular expressions that use '*' or '\|' is 4.`

			`</UL>`
			`<A NAME="Limitations"></A><H2>Limitations</H2>`
			`<UL>`

			`The index of glimpse is word based. A pattern that contains more than`
			`one word cannot be found in the index. The way glimpse overcomes this`
			`weakness is by splitting any multi-word pattern into its set of words`
			`and looking for all of them in the index.`
			`For example, <I>'linear programming'</I> will first consult the index`
			`to find all files containing both <I>linear</I> and <I>programming</I>,`
			`and then apply agrep to find the combined pattern.`
			`This is usually an effective solution, but it can be slow for`
			`cases where both words are very common, but their combination is not.`

			`<P>`
			`As was mentioned in the section on <A HREF="#Patterns">Patterns</A> above, some characters`
			`serve as meta characters for glimpse and need to be`
			`preceded by '\\' to search for them. The most common`
			`examples are the characters '.' (which stands for a wild card),`
			`and '*' (the Kleene closure).`
			`So, "glimpse ab.de" will match abcde, but "glimpse ab\\.de"`
			`will not, and "glimpse abde" will not match abde, but`
			`"glimpse ab\\*de" will.`
Bugzilla Bug 90704 there is a correct spelling for gobbeldygook and this is not it r=terry [C=WP, A=OED] 2001-07-17 00:15:03 +04:00			`The meta character - is translated automatically to a hyphen`
linked in dawn's search help doc 1998-06-13 01:42:37 +04:00			`unless it appears between [] (in which case it denotes a range of`
			`characters).`

			`<P>`
search patterns are limited to 29 characters due to the -n option in glimpse 1999-10-19 06:14:09 +04:00			`Search patterns are limited to 29 characters.`
linked in dawn's search help doc 1998-06-13 01:42:37 +04:00			`Lines are limited to 1024 characters.`

			`</UL>`
			`<P>`
			`<HR>`

			`<ADDRESS>`
			`<A HREF="mailto:lxr@linux.no">`
			`Arne Georg Gleditsch and Per Kristian Gjermshus</A>`
			`</ADDRESS>`

			`</BODY>`
			`</HTML>`