pjs/webtools/bonsai/cvsregexp.html

<HTML>
<HEAD>
   <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">
   <META NAME="Author" CONTENT="lloyd tabb">
   <META NAME="GENERATOR" CONTENT="Mozilla/4.0 [en] (WinNT; I) [Netscape]">
   <TITLE>Regular expressions in the cvs query tool</TITLE>
</HEAD>
<BODY>

<H1>
Description of MySQL regular expression syntax.</H1>
Regular expressions are a powerful way of specifying complex searches.

<P><B>MySQL</B> uses regular Henry Spencers inplementation of regular expressions.
And that is aimed to conform to POSIX 1003.2. <B>MySQL</B> uses the extended
version.

<P>To get more exact information see Henry Spencers regex.7 manual.

<P>This is a simplistic reference that skips the details. From here on
a regualr expressions is called a regexp.

<P>A regular expression describes a set of strings. The simplest case is
one that has no special characters in it. For example the regexp <TT>hello</TT>
matches <TT>hello</TT> and nothing else.

<P>Nontrivial regular expressions use certain special constructs so that
they can match more than one string. For example, the regexp <TT>hello|word</TT>
matches either the string <TT>hello</TT> or the string <TT>word</TT>.

<P>And a more comples example regexp <TT>B[an]*s</TT> matches any of the
strings <TT>Bananas</TT>, <TT>Baaaaas</TT>, <TT>Bs</TT> and all other string
starting with a <TT>B</TT> and continuing with any number of <TT>a</TT>
<TT>n</TT> and ending with a <TT>s</TT>.

<P>The following special characters/constructs are known.
<DL COMPACT>
<DT>
<TT>^</TT></DT>

<DD>
Start of whole string.</DD>

<PRE>mysql> select "fo\nfo" regexp "^fo$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "fofo" regexp "^fo";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>$</TT></DT>

<DD>
End of whole string.</DD>

<PRE>mysql> select "fo\no" regexp "^fo\no$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "fo\no" regexp "^fo$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>.</TT></DT>

<DD>
Any character (including newline).</DD>

<PRE>mysql> select "fofo" regexp "^f.*";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "fo\nfo" regexp "^f.*";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>a*</TT></DT>

<DD>
Any sequence of zero or more a's.</DD>

<PRE>mysql> select "Ban" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Baaan" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Bn" regexp "^Ba*n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>a+</TT></DT>

<DD>
Any sequence of one or more a's.</DD>

<PRE>mysql> select "Ban" regexp "^Ba+n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Bn" regexp "^Ba+n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>a?</TT></DT>

<DD>
Either zero or one a.</DD>

<PRE>mysql> select "Bn" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Ban" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "Baan" regexp "^Ba?n";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>de|abc</TT></DT>

<DD>
Either the sequence <TT>de</TT> or <TT>abc</TT>.</DD>

<PRE>mysql> select "pi" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "axe" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "apa" regexp "pi|apa";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "apa" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pi" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pix" regexp "^(pi|apa)$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>(abc)*</TT></DT>

<DD>
Zero or more times the sequence <TT>abc</TT>.</DD>

<PRE>mysql> select "pi" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "pip" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "pipi" regexp "^(pi)+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1</PRE>

<DT>
<TT>{1}</TT></DT>

<DT>
<TT>{2,3}</TT></DT>

<DD>
The is a more general way of writing regexps that match many occurences.</DD>

<DL COMPACT>
<DT>
<TT>a*</TT></DT>

<DD>
Can be written as <TT>a{0,}</TT>.</DD>

<DT>
<TT>+</TT></DT>

<DD>
Can be written as <TT>a{1,}</TT>.</DD>

<DT>
<TT>?</TT></DT>

<DD>
Can be written as <TT>a{0,1}</TT>.</DD>
</DL>
To be more precice an atom followed by a bound containing one integer <TT>i</TT>
and no comma matches a sequence of exactly <TT>i</TT> matches of the atom.
An atom followed by a bound containing one integer <TT>i</TT> and a comma
matches a sequence of <TT>i</TT> or more matches of the atom. An atom followed
by a bound containing two integers <TT>i</TT> and <TT>j</TT> matches a
sequence of <TT>i</TT> through <TT>j</TT> (inclusive) matches of the atom.
Both arguments must <TT>0 >= value &lt;= RE_DUP_MAX (default 255)</TT>,
and if there are two of them, the second must be bigger or equal to the
first.
<DT>
<TT>[a-dX]</TT></DT>

<DT>
<TT>[^a-dX]</TT></DT>

<DD>
Any character which is (not if ^ is used) either <TT>a</TT>, <TT>b</TT>,
<TT>c</TT>, <TT>d</TT> or <TT>X</TT>. To include <TT>]</TT> it has to be
written first. To include <TT>-</TT> it has to be written first or last.
So <TT>[0-9]</TT> matches any decimal digit. All character that does not
have a defined mening inside a <TT>[]</TT> pair has no special meaning
and matches only itself.</DD>

<PRE>mysql> select "aXbc" regexp "[a-dXYZ]";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "aXbc" regexp "^[a-dXYZ]$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "aXbc" regexp "^[a-dXYZ]+$";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "aXbc" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp;&nbsp;&nbsp; -> 0
mysql> select "gheis" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp;&nbsp; -> 1
mysql> select "gheisa" regexp "^[^a-dXYZ]+$";&nbsp;&nbsp; -> 0</PRE>

<DT>
<TT>[[.characters.]]</TT></DT>

<DD>
The sequence of characters of that collating element. The sequence is a
single element of the bracket expression's list. A bracket expression containing
a multi-character collating element can thus match more than one character,
e.g. if the collating sequence includes a <TT>ch</TT> collating element,
then the RE <TT>[[.ch.]]*c</TT> matches the first five characters of <TT>chchcc</TT>.</DD>

<DT>
<TT>[=character-class=]</TT></DT>

<DD>
An equivalence class, standing for the sequences of characters of all collating
elements equivalent to that one, including itself. For example, if <TT>o</TT>
and <TT>(+)</TT> are the members of an equivalence class, then <TT>[[=o=]]</TT>,
<TT>[[=(+)=]]</TT>, and <TT>[o(+)]</TT> are all synonymous. An equivalence
class may not be an endpoint of a range.</DD>

<DT>
<TT>[:character_class:]</TT></DT>

<DD>
Within a bracket expression, the name of a character class enclosed in
<TT>[:</TT> and <TT>:]</TT> stands for the list of all characters belonging
to that class. Standard character class names are:</DD>

<TABLE BORDER WIDTH="100%" NOSAVE >
<TR>
<TD>alnum&nbsp;</TD>

<TD>digit&nbsp;</TD>

<TD>punct&nbsp;</TD>
</TR>

<TR>
<TD>alpha&nbsp;</TD>

<TD>graph&nbsp;</TD>

<TD>space&nbsp;</TD>
</TR>

<TR>
<TD>blank&nbsp;</TD>

<TD>lower&nbsp;</TD>

<TD>upper&nbsp;</TD>
</TR>

<TR>
<TD>cntrl&nbsp;</TD>

<TD>print&nbsp;</TD>

<TD>xdigit&nbsp;</TD>
</TR>
</TABLE>
These stand for the character classes defined in ctype(3). A locale may
provide others. A character class may not be used as an endpoint of a range.
<PRE>mysql> select "justalnums" regexp "[[:alnum:]]+";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "!!" regexp "[[:alnum:]]+";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>

<LI>
[[:&lt;:]]</LI>

<LI>
[[:>:]] These match the null string at the beginning and end of a word
respectively. A word is defined as a sequence of word characters which
is neither preceded nor followed by word characters. A word character is
an alnum character (as defined by ctype(3)) or an underscore.</LI>

<PRE>mysql> select "a word a" regexp "[[:&lt;:]]word[[:>:]]";&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; -> 1
mysql> select "a xword a" regexp "[[:&lt;:]]word[[:>:]]";&nbsp;&nbsp;&nbsp;&nbsp; -> 0</PRE>
</DL>

<PRE>mysql> select "weeknights" regexp "^(wee|week)(knights|nights)$"; -> 1</PRE>
&nbsp;
</BODY>
</HTML>
Bonsai and Tinderbox have been freed. 1998-06-17 01:43:24 +04:00			`<HTML>`
			`<HEAD>`
			`<META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1">`
			`<META NAME="Author" CONTENT="lloyd tabb">`
			`<META NAME="GENERATOR" CONTENT="Mozilla/4.0 [en] (WinNT; I) [Netscape]">`
			`<TITLE>Regular expressions in the cvs query tool</TITLE>`
			`</HEAD>`
			`<BODY>`

			`<H1>`
			`Description of MySQL regular expression syntax.</H1>`
			`Regular expressions are a powerful way of specifying complex searches.`

			`<P><B>MySQL</B> uses regular Henry Spencers inplementation of regular expressions.`
			`And that is aimed to conform to POSIX 1003.2. <B>MySQL</B> uses the extended`
			`version.`

			`<P>To get more exact information see Henry Spencers regex.7 manual.`

			`<P>This is a simplistic reference that skips the details. From here on`
			`a regualr expressions is called a regexp.`

			`<P>A regular expression describes a set of strings. The simplest case is`
			`one that has no special characters in it. For example the regexp <TT>hello</TT>`
			`matches <TT>hello</TT> and nothing else.`

			`<P>Nontrivial regular expressions use certain special constructs so that`
			`they can match more than one string. For example, the regexp <TT>hello\|word</TT>`
			`matches either the string <TT>hello</TT> or the string <TT>word</TT>.`

			`<P>And a more comples example regexp <TT>B[an]*s</TT> matches any of the`
			`strings <TT>Bananas</TT>, <TT>Baaaaas</TT>, <TT>Bs</TT> and all other string`
			`starting with a <TT>B</TT> and continuing with any number of <TT>a</TT>`
			`<TT>n</TT> and ending with a <TT>s</TT>.`

			`<P>The following special characters/constructs are known.`
			`<DL COMPACT>`
			`<DT>`
			`<TT>^</TT></DT>`

			`<DD>`
			`Start of whole string.</DD>`

			`<PRE>mysql> select "fo\nfo" regexp "^fo$";           -> 0`
			`mysql> select "fofo" regexp "^fo";              -> 1</PRE>`

			`<DT>`
			`<TT>$</TT></DT>`

			`<DD>`
			`End of whole string.</DD>`

			`<PRE>mysql> select "fo\no" regexp "^fo\no$";         -> 1`
			`mysql> select "fo\no" regexp "^fo$";            -> 0</PRE>`

			`<DT>`
			`<TT>.</TT></DT>`

			`<DD>`
			`Any character (including newline).</DD>`

			`<PRE>mysql> select "fofo" regexp "^f.*";             -> 1`
			`mysql> select "fo\nfo" regexp "^f.*";           -> 1</PRE>`

			`<DT>`
			`<TT>a*</TT></DT>`

			`<DD>`
			`Any sequence of zero or more a's.</DD>`

			`<PRE>mysql> select "Ban" regexp "^Ba*n";             -> 1`
			`mysql> select "Baaan" regexp "^Ba*n";           -> 1`
			`mysql> select "Bn" regexp "^Ba*n";              -> 1</PRE>`

			`<DT>`
			`<TT>a+</TT></DT>`

			`<DD>`
			`Any sequence of one or more a's.</DD>`

			`<PRE>mysql> select "Ban" regexp "^Ba+n";             -> 1`
			`mysql> select "Bn" regexp "^Ba+n";              -> 0</PRE>`

			`<DT>`
			`<TT>a?</TT></DT>`

			`<DD>`
			`Either zero or one a.</DD>`

			`<PRE>mysql> select "Bn" regexp "^Ba?n";              -> 1`
			`mysql> select "Ban" regexp "^Ba?n";             -> 1`
			`mysql> select "Baan" regexp "^Ba?n";            -> 0</PRE>`

			`<DT>`
			`<TT>de\|abc</TT></DT>`

			`<DD>`
			`Either the sequence <TT>de</TT> or <TT>abc</TT>.</DD>`

			`<PRE>mysql> select "pi" regexp "pi\|apa";             -> 1`
			`mysql> select "axe" regexp "pi\|apa";            -> 0`
			`mysql> select "apa" regexp "pi\|apa";            -> 1`
			`mysql> select "apa" regexp "^(pi\|apa)$";        -> 1`
			`mysql> select "pi" regexp "^(pi\|apa)$";         -> 1`
			`mysql> select "pix" regexp "^(pi\|apa)$";        -> 0</PRE>`

			`<DT>`
			`<TT>(abc)*</TT></DT>`

			`<DD>`
			`Zero or more times the sequence <TT>abc</TT>.</DD>`

			`<PRE>mysql> select "pi" regexp "^(pi)+$";            -> 1`
			`mysql> select "pip" regexp "^(pi)+$";           -> 0`
			`mysql> select "pipi" regexp "^(pi)+$";          -> 1</PRE>`

			`<DT>`
			`<TT>{1}</TT></DT>`

			`<DT>`
			`<TT>{2,3}</TT></DT>`

			`<DD>`
			`The is a more general way of writing regexps that match many occurences.</DD>`

			`<DL COMPACT>`
			`<DT>`
			`<TT>a*</TT></DT>`

			`<DD>`
			`Can be written as <TT>a{0,}</TT>.</DD>`

			`<DT>`
			`<TT>+</TT></DT>`

			`<DD>`
			`Can be written as <TT>a{1,}</TT>.</DD>`

			`<DT>`
			`<TT>?</TT></DT>`

			`<DD>`
			`Can be written as <TT>a{0,1}</TT>.</DD>`
			`</DL>`
			`To be more precice an atom followed by a bound containing one integer <TT>i</TT>`
			`and no comma matches a sequence of exactly <TT>i</TT> matches of the atom.`
			`An atom followed by a bound containing one integer <TT>i</TT> and a comma`
			`matches a sequence of <TT>i</TT> or more matches of the atom. An atom followed`
			`by a bound containing two integers <TT>i</TT> and <TT>j</TT> matches a`
			`sequence of <TT>i</TT> through <TT>j</TT> (inclusive) matches of the atom.`
			`Both arguments must <TT>0 >= value <= RE_DUP_MAX (default 255)</TT>,`
			`and if there are two of them, the second must be bigger or equal to the`
			`first.`
			`<DT>`
			`<TT>[a-dX]</TT></DT>`

			`<DT>`
			`<TT>[^a-dX]</TT></DT>`

			`<DD>`
			`Any character which is (not if ^ is used) either <TT>a</TT>, <TT>b</TT>,`
			`<TT>c</TT>, <TT>d</TT> or <TT>X</TT>. To include <TT>]</TT> it has to be`
			`written first. To include <TT>-</TT> it has to be written first or last.`
			`So <TT>[0-9]</TT> matches any decimal digit. All character that does not`
			`have a defined mening inside a <TT>[]</TT> pair has no special meaning`
			`and matches only itself.</DD>`

			`<PRE>mysql> select "aXbc" regexp "[a-dXYZ]";         -> 1`
			`mysql> select "aXbc" regexp "^[a-dXYZ]$";       -> 0`
			`mysql> select "aXbc" regexp "^[a-dXYZ]+$";      -> 1`
			`mysql> select "aXbc" regexp "^[^a-dXYZ]+$";     -> 0`
			`mysql> select "gheis" regexp "^[^a-dXYZ]+$";    -> 1`
			`mysql> select "gheisa" regexp "^[^a-dXYZ]+$";   -> 0</PRE>`

			`<DT>`
			`<TT>[[.characters.]]</TT></DT>`

			`<DD>`
			`The sequence of characters of that collating element. The sequence is a`
			`single element of the bracket expression's list. A bracket expression containing`
			`a multi-character collating element can thus match more than one character,`
			`e.g. if the collating sequence includes a <TT>ch</TT> collating element,`
			`then the RE <TT>[[.ch.]]*c</TT> matches the first five characters of <TT>chchcc</TT>.</DD>`

			`<DT>`
			`<TT>[=character-class=]</TT></DT>`

			`<DD>`
			`An equivalence class, standing for the sequences of characters of all collating`
			`elements equivalent to that one, including itself. For example, if <TT>o</TT>`
			`and <TT>(+)</TT> are the members of an equivalence class, then <TT>[[=o=]]</TT>,`
			`<TT>[[=(+)=]]</TT>, and <TT>[o(+)]</TT> are all synonymous. An equivalence`
			`class may not be an endpoint of a range.</DD>`

			`<DT>`
			`<TT>[:character_class:]</TT></DT>`

			`<DD>`
			`Within a bracket expression, the name of a character class enclosed in`
			`<TT>[:</TT> and <TT>:]</TT> stands for the list of all characters belonging`
			`to that class. Standard character class names are:</DD>`

			`<TABLE BORDER WIDTH="100%" NOSAVE >`
			`<TR>`
			`<TD>alnum </TD>`

			`<TD>digit </TD>`

			`<TD>punct </TD>`
			`</TR>`

			`<TR>`
			`<TD>alpha </TD>`

			`<TD>graph </TD>`

			`<TD>space </TD>`
			`</TR>`

			`<TR>`
			`<TD>blank </TD>`

			`<TD>lower </TD>`

			`<TD>upper </TD>`
			`</TR>`

			`<TR>`
			`<TD>cntrl </TD>`

			`<TD>print </TD>`

			`<TD>xdigit </TD>`
			`</TR>`
			`</TABLE>`
			`These stand for the character classes defined in ctype(3). A locale may`
			`provide others. A character class may not be used as an endpoint of a range.`
			`<PRE>mysql> select "justalnums" regexp "[[:alnum:]]+";       -> 1`
			`mysql> select "!!" regexp "[[:alnum:]]+";               -> 0</PRE>`

			`<LI>`
			`[[:<:]]</LI>`

			`<LI>`
			`[[:>:]] These match the null string at the beginning and end of a word`
			`respectively. A word is defined as a sequence of word characters which`
			`is neither preceded nor followed by word characters. A word character is`
			`an alnum character (as defined by ctype(3)) or an underscore.</LI>`

			`<PRE>mysql> select "a word a" regexp "[[:<:]]word[[:>:]]";      -> 1`
			`mysql> select "a xword a" regexp "[[:<:]]word[[:>:]]";     -> 0</PRE>`
			`</DL>`

			`<PRE>mysql> select "weeknights" regexp "^(wee\|week)(knights\|nights)$"; -> 1</PRE>`
			` `
			`</BODY>`
			`</HTML>`