pjs/webtools/bonsai/cvsregexp.html

<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="Author" CONTENT="lloyd tabb">
   <META NAME="GENERATOR" CONTENT="Mozilla/4.0 [en] (WinNT; I) [Netscape]">
   <TITLE>Regular expressions in the cvs query tool</TITLE>
</HEAD>
<BODY>

<H1>
Description of MySQL regular expression syntax.</H1>
Regular expressions are a powerful way of specifying complex searches.

<P><B>MySQL</B> uses Henry Spencer's implementation of regular expressions.
And that is aimed to conform to POSIX 1003.2. <B>MySQL</B> uses the extended
version.

<P>To get more exact information see Henry Spencer's regex.7 manual.

<P>This is a simplistic reference that skips the details. From here on
a regular expression is called a regexp.

<P>A regular expression describes a set of strings. The simplest case is
one that has no special characters in it. For example the regexp <TT>hello</TT>
matches <TT>hello</TT> and nothing else.

<P>Nontrivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regexp <TT>hello|word</TT>
matches either the string <TT>hello</TT> or the string <TT>word</TT>.

<P>And a more complex example regexp <TT>B[an]*s</TT> matches any of the
strings <TT>Bananas</TT>, <TT>Baaaaas</TT>, <TT>Bs</TT> and all other string
starting with a <TT>B</TT> and continuing with any number of <TT>a</TT>
<TT>n</TT> and ending with a <TT>s</TT>.

<P>The following special characters/constructs are known.
<DL COMPACT>
<DT>
<TT>^</TT></DT>

<DD>
Start of whole string.</DD>

<PRE>mysql> select "fo\nfo" regexp "^fo$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "fofo" regexp "^fo";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>$</TT></DT>

<DD>
End of whole string.</DD>

<PRE>mysql> select "fo\no" regexp "^fo\no$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "fo\no" regexp "^fo$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>.</TT></DT>

<DD>
Any character (including newline).</DD>

<PRE>mysql> select "fofo" regexp "^f.*";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "fo\nfo" regexp "^f.*";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>a*</TT></DT>

<DD>
Any sequence of zero or more a's.</DD>

<PRE>mysql> select "Ban" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Baaan" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Bn" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>a+</TT></DT>

<DD>
Any sequence of one or more a's.</DD>

<PRE>mysql> select "Ban" regexp "^Ba+n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Bn" regexp "^Ba+n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>a?</TT></DT>

<DD>
Either zero or one a.</DD>

<PRE>mysql> select "Bn" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Ban" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Baan" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>de|abc</TT></DT>

<DD>
Either the sequence <TT>de</TT> or <TT>abc</TT>.</DD>

<PRE>mysql> select "pi" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "axe" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "apa" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "apa" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pi" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pix" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>(abc)*</TT></DT>

<DD>
Zero or more times the sequence <TT>abc</TT>.</DD>

<PRE>mysql> select "pi" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pip" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "pipi" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>{1}</TT></DT>

<DT>
<TT>{2,3}</TT></DT>

<DD>
There is a more general way of writing regexps that match many occurrences.</DD>

<DL COMPACT>
<DT>
<TT>a*</TT></DT>

<DD>
Can be written as <TT>a{0,}</TT>.</DD>

<DT>
<TT>+</TT></DT>

<DD>
Can be written as <TT>a{1,}</TT>.</DD>

<DT>
<TT>?</TT></DT>

<DD>
Can be written as <TT>a{0,1}</TT>.</DD>
</DL>
To be more precise, an atom followed by a bound containing one integer <TT>i</TT>
and no comma matches a sequence of exactly <TT>i</TT> matches of the atom.
An atom followed by a bound containing one integer <TT>i</TT> and a comma
matches a sequence of <TT>i</TT> or more matches of the atom. An atom followed
by a bound containing two integers <TT>i</TT> and <TT>j</TT> matches a
sequence of <TT>i</TT> through <TT>j</TT> (inclusive) matches of the atom.
Both arguments must <TT>0 >= value &lt;= RE_DUP_MAX (default 255)</TT>,
and if there are two of them, the second must be bigger or equal to the
first.
<DT>
<TT>[a-dX]</TT></DT>

<DT>
<TT>[^a-dX]</TT></DT>

<DD>
Any character which is (not if ^ is used) either <TT>a</TT>, <TT>b</TT>,
<TT>c</TT>, <TT>d</TT> or <TT>X</TT>. To include <TT>]</TT> it has to be
written first. To include <TT>-</TT> it has to be written first or last.
So <TT>[0-9]</TT> matches any decimal digit. All character that does not
have a defined meaning inside a <TT>[]</TT> pair has no special meaning
and matches only itself.</DD>

<PRE>mysql> select "aXbc" regexp "[a-dXYZ]";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "aXbc" regexp "^[a-dXYZ]$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "aXbc" regexp "^[a-dXYZ]+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "aXbc" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "gheis" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp;&nbsp; -> 1
mysql> select "gheisa" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>[[.characters.]]</TT></DT>

<DD>
The sequence of characters of that collating element. The sequence is a
single element of the bracket expression's list. A bracket expression containing
a multi-character collating element can thus match more than one character,
e.g. if the collating sequence includes a <TT>ch</TT> collating element,
then the RE <TT>[[.ch.]]*c</TT> matches the first five characters of <TT>chchcc</TT>.</DD>

<DT>
<TT>[=character-class=]</TT></DT>

<DD>
An equivalence class, standing for the sequences of characters of all collating
elements equivalent to that one, including itself. For example, if <TT>o</TT>
and <TT>(+)</TT> are the members of an equivalence class, then <TT>[[=o=]]</TT>,
<TT>[[=(+)=]]</TT>, and <TT>[o(+)]</TT> are all synonymous. An equivalence
class may not be an endpoint of a range.</DD>

<DT>
<TT>[:character_class:]</TT></DT>

<DD>
Within a bracket expression, the name of a character class enclosed in
<TT>[:</TT> and <TT>:]</TT> stands for the list of all characters belonging
to that class. Standard character class names are:</DD>

<TABLE BORDER WIDTH="100%" NOSAVE >
<TR>
<TD>alnum&nbsp;</TD>

<TD>digit&nbsp;</TD>

<TD>punct&nbsp;</TD>
</TR>

<TR>
<TD>alpha&nbsp;</TD>

<TD>graph&nbsp;</TD>

<TD>space&nbsp;</TD>
</TR>

<TR>
<TD>blank&nbsp;</TD>

<TD>lower&nbsp;</TD>

<TD>upper&nbsp;</TD>
</TR>

<TR>
<TD>cntrl&nbsp;</TD>

<TD>print&nbsp;</TD>

<TD>xdigit&nbsp;</TD>
</TR>
</TABLE>
These stand for the character classes defined in ctype(3). A locale may
provide others. A character class may not be used as an endpoint of a range.
<PRE>mysql> select "justalnums" regexp "[[:alnum:]]+";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "!!" regexp "[[:alnum:]]+";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<LI>
[[:&lt;:]]</LI>

<LI>
[[:>:]] These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters which
is neither preceded nor followed by word characters. A word character is
an alnum character (as defined by ctype(3)) or an underscore.</LI>

<PRE>mysql> select "a word a" regexp "[[:&lt;:]]word[[:>:]]";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "a xword a" regexp "[[:&lt;:]]word[[:>:]]";&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>
</DL>

<PRE>mysql> select "weeknights" regexp "^(wee|week)(knights|nights)$"; -> 1</PRE>
&nbsp;
</BODY>
</HTML>