mirror of
https://github.com/c64scene-ar/llvm-6502.git
synced 2024-11-05 13:09:10 +00:00
757 lines
18 KiB
Groff
757 lines
18 KiB
Groff
|
.\" $OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $
|
||
|
.\"
|
||
|
.\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
|
||
|
.\"
|
||
|
.\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
|
||
|
.\" Copyright (c) 1992, 1993, 1994
|
||
|
.\" The Regents of the University of California. All rights reserved.
|
||
|
.\"
|
||
|
.\" This code is derived from software contributed to Berkeley by
|
||
|
.\" Henry Spencer.
|
||
|
.\"
|
||
|
.\" Redistribution and use in source and binary forms, with or without
|
||
|
.\" modification, are permitted provided that the following conditions
|
||
|
.\" are met:
|
||
|
.\" 1. Redistributions of source code must retain the above copyright
|
||
|
.\" notice, this list of conditions and the following disclaimer.
|
||
|
.\" 2. Redistributions in binary form must reproduce the above copyright
|
||
|
.\" notice, this list of conditions and the following disclaimer in the
|
||
|
.\" documentation and/or other materials provided with the distribution.
|
||
|
.\" 3. Neither the name of the University nor the names of its contributors
|
||
|
.\" may be used to endorse or promote products derived from this software
|
||
|
.\" without specific prior written permission.
|
||
|
.\"
|
||
|
.\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
|
||
|
.\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||
|
.\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||
|
.\" ARE DISCLAIMED. IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
|
||
|
.\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
|
||
|
.\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
|
||
|
.\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
|
||
|
.\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
|
||
|
.\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
|
||
|
.\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
|
||
|
.\" SUCH DAMAGE.
|
||
|
.\"
|
||
|
.\" @(#)re_format.7 8.3 (Berkeley) 3/20/94
|
||
|
.\"
|
||
|
.Dd $Mdocdate: May 31 2007 $
|
||
|
.Dt RE_FORMAT 7
|
||
|
.Os
|
||
|
.Sh NAME
|
||
|
.Nm re_format
|
||
|
.Nd POSIX regular expressions
|
||
|
.Sh DESCRIPTION
|
||
|
Regular expressions (REs),
|
||
|
as defined in
|
||
|
.St -p1003.1-2004 ,
|
||
|
come in two forms:
|
||
|
basic regular expressions
|
||
|
(BREs)
|
||
|
and extended regular expressions
|
||
|
(EREs).
|
||
|
Both forms of regular expressions are supported
|
||
|
by the interfaces described in
|
||
|
.Xr regex 3 .
|
||
|
Applications dealing with regular expressions
|
||
|
may use one or the other form
|
||
|
(or indeed both).
|
||
|
For example,
|
||
|
.Xr ed 1
|
||
|
uses BREs,
|
||
|
whilst
|
||
|
.Xr egrep 1
|
||
|
talks EREs.
|
||
|
Consult the manual page for the specific application to find out which
|
||
|
it uses.
|
||
|
.Pp
|
||
|
POSIX leaves some aspects of RE syntax and semantics open;
|
||
|
.Sq **
|
||
|
marks decisions on these aspects that
|
||
|
may not be fully portable to other POSIX implementations.
|
||
|
.Pp
|
||
|
This manual page first describes regular expressions in general,
|
||
|
specifically extended regular expressions,
|
||
|
and then discusses differences between them and basic regular expressions.
|
||
|
.Sh EXTENDED REGULAR EXPRESSIONS
|
||
|
An ERE is one** or more non-empty**
|
||
|
.Em branches ,
|
||
|
separated by
|
||
|
.Sq \*(Ba .
|
||
|
It matches anything that matches one of the branches.
|
||
|
.Pp
|
||
|
A branch is one** or more
|
||
|
.Em pieces ,
|
||
|
concatenated.
|
||
|
It matches a match for the first, followed by a match for the second, etc.
|
||
|
.Pp
|
||
|
A piece is an
|
||
|
.Em atom
|
||
|
possibly followed by a single**
|
||
|
.Sq * ,
|
||
|
.Sq + ,
|
||
|
.Sq ?\& ,
|
||
|
or
|
||
|
.Em bound .
|
||
|
An atom followed by
|
||
|
.Sq *
|
||
|
matches a sequence of 0 or more matches of the atom.
|
||
|
An atom followed by
|
||
|
.Sq +
|
||
|
matches a sequence of 1 or more matches of the atom.
|
||
|
An atom followed by
|
||
|
.Sq ?\&
|
||
|
matches a sequence of 0 or 1 matches of the atom.
|
||
|
.Pp
|
||
|
A bound is
|
||
|
.Sq {
|
||
|
followed by an unsigned decimal integer,
|
||
|
possibly followed by
|
||
|
.Sq ,\&
|
||
|
possibly followed by another unsigned decimal integer,
|
||
|
always followed by
|
||
|
.Sq } .
|
||
|
The integers must lie between 0 and
|
||
|
.Dv RE_DUP_MAX
|
||
|
(255**) inclusive,
|
||
|
and if there are two of them, the first may not exceed the second.
|
||
|
An atom followed by a bound containing one integer
|
||
|
.Ar i
|
||
|
and no comma matches
|
||
|
a sequence of exactly
|
||
|
.Ar i
|
||
|
matches of the atom.
|
||
|
An atom followed by a bound
|
||
|
containing one integer
|
||
|
.Ar i
|
||
|
and a comma matches
|
||
|
a sequence of
|
||
|
.Ar i
|
||
|
or more matches of the atom.
|
||
|
An atom followed by a bound
|
||
|
containing two integers
|
||
|
.Ar i
|
||
|
and
|
||
|
.Ar j
|
||
|
matches a sequence of
|
||
|
.Ar i
|
||
|
through
|
||
|
.Ar j
|
||
|
(inclusive) matches of the atom.
|
||
|
.Pp
|
||
|
An atom is a regular expression enclosed in
|
||
|
.Sq ()
|
||
|
(matching a part of the regular expression),
|
||
|
an empty set of
|
||
|
.Sq ()
|
||
|
(matching the null string)**,
|
||
|
a
|
||
|
.Em bracket expression
|
||
|
(see below),
|
||
|
.Sq .\&
|
||
|
(matching any single character),
|
||
|
.Sq ^
|
||
|
(matching the null string at the beginning of a line),
|
||
|
.Sq $
|
||
|
(matching the null string at the end of a line),
|
||
|
a
|
||
|
.Sq \e
|
||
|
followed by one of the characters
|
||
|
.Sq ^.[$()|*+?{\e
|
||
|
(matching that character taken as an ordinary character),
|
||
|
a
|
||
|
.Sq \e
|
||
|
followed by any other character**
|
||
|
(matching that character taken as an ordinary character,
|
||
|
as if the
|
||
|
.Sq \e
|
||
|
had not been present**),
|
||
|
or a single character with no other significance (matching that character).
|
||
|
A
|
||
|
.Sq {
|
||
|
followed by a character other than a digit is an ordinary character,
|
||
|
not the beginning of a bound**.
|
||
|
It is illegal to end an RE with
|
||
|
.Sq \e .
|
||
|
.Pp
|
||
|
A bracket expression is a list of characters enclosed in
|
||
|
.Sq [] .
|
||
|
It normally matches any single character from the list (but see below).
|
||
|
If the list begins with
|
||
|
.Sq ^ ,
|
||
|
it matches any single character
|
||
|
.Em not
|
||
|
from the rest of the list
|
||
|
(but see below).
|
||
|
If two characters in the list are separated by
|
||
|
.Sq - ,
|
||
|
this is shorthand for the full
|
||
|
.Em range
|
||
|
of characters between those two (inclusive) in the
|
||
|
collating sequence, e.g.\&
|
||
|
.Sq [0-9]
|
||
|
in ASCII matches any decimal digit.
|
||
|
It is illegal** for two ranges to share an endpoint, e.g.\&
|
||
|
.Sq a-c-e .
|
||
|
Ranges are very collating-sequence-dependent,
|
||
|
and portable programs should avoid relying on them.
|
||
|
.Pp
|
||
|
To include a literal
|
||
|
.Sq ]\&
|
||
|
in the list, make it the first character
|
||
|
(following a possible
|
||
|
.Sq ^ ) .
|
||
|
To include a literal
|
||
|
.Sq - ,
|
||
|
make it the first or last character,
|
||
|
or the second endpoint of a range.
|
||
|
To use a literal
|
||
|
.Sq -
|
||
|
as the first endpoint of a range,
|
||
|
enclose it in
|
||
|
.Sq [.
|
||
|
and
|
||
|
.Sq .]
|
||
|
to make it a collating element (see below).
|
||
|
With the exception of these and some combinations using
|
||
|
.Sq [
|
||
|
(see next paragraphs),
|
||
|
all other special characters, including
|
||
|
.Sq \e ,
|
||
|
lose their special significance within a bracket expression.
|
||
|
.Pp
|
||
|
Within a bracket expression, a collating element
|
||
|
(a character,
|
||
|
a multi-character sequence that collates as if it were a single character,
|
||
|
or a collating-sequence name for either)
|
||
|
enclosed in
|
||
|
.Sq [.
|
||
|
and
|
||
|
.Sq .]
|
||
|
stands for the sequence of characters of that collating element.
|
||
|
The sequence is a single element of the bracket expression's list.
|
||
|
A bracket expression containing a multi-character collating element
|
||
|
can thus match more than one character,
|
||
|
e.g. if the collating sequence includes a
|
||
|
.Sq ch
|
||
|
collating element,
|
||
|
then the RE
|
||
|
.Sq [[.ch.]]*c
|
||
|
matches the first five characters of
|
||
|
.Sq chchcc .
|
||
|
.Pp
|
||
|
Within a bracket expression, a collating element enclosed in
|
||
|
.Sq [=
|
||
|
and
|
||
|
.Sq =]
|
||
|
is an equivalence class, standing for the sequences of characters
|
||
|
of all collating elements equivalent to that one, including itself.
|
||
|
(If there are no other equivalent collating elements,
|
||
|
the treatment is as if the enclosing delimiters were
|
||
|
.Sq [.
|
||
|
and
|
||
|
.Sq .] . )
|
||
|
For example, if
|
||
|
.Sq x
|
||
|
and
|
||
|
.Sq y
|
||
|
are the members of an equivalence class,
|
||
|
then
|
||
|
.Sq [[=x=]] ,
|
||
|
.Sq [[=y=]] ,
|
||
|
and
|
||
|
.Sq [xy]
|
||
|
are all synonymous.
|
||
|
An equivalence class may not** be an endpoint of a range.
|
||
|
.Pp
|
||
|
Within a bracket expression, the name of a
|
||
|
.Em character class
|
||
|
enclosed
|
||
|
in
|
||
|
.Sq [:
|
||
|
and
|
||
|
.Sq :]
|
||
|
stands for the list of all characters belonging to that class.
|
||
|
Standard character class names are:
|
||
|
.Bd -literal -offset indent
|
||
|
alnum digit punct
|
||
|
alpha graph space
|
||
|
blank lower upper
|
||
|
cntrl print xdigit
|
||
|
.Ed
|
||
|
.Pp
|
||
|
These stand for the character classes defined in
|
||
|
.Xr ctype 3 .
|
||
|
A locale may provide others.
|
||
|
A character class may not be used as an endpoint of a range.
|
||
|
.Pp
|
||
|
There are two special cases** of bracket expressions:
|
||
|
the bracket expressions
|
||
|
.Sq [[:<:]]
|
||
|
and
|
||
|
.Sq [[:>:]]
|
||
|
match the null string at the beginning and end of a word, respectively.
|
||
|
A word is defined as a sequence of
|
||
|
characters starting and ending with a word character
|
||
|
which is neither preceded nor followed by
|
||
|
word characters.
|
||
|
A word character is an
|
||
|
.Em alnum
|
||
|
character (as defined by
|
||
|
.Xr ctype 3 )
|
||
|
or an underscore.
|
||
|
This is an extension,
|
||
|
compatible with but not specified by POSIX,
|
||
|
and should be used with
|
||
|
caution in software intended to be portable to other systems.
|
||
|
.Pp
|
||
|
In the event that an RE could match more than one substring of a given
|
||
|
string,
|
||
|
the RE matches the one starting earliest in the string.
|
||
|
If the RE could match more than one substring starting at that point,
|
||
|
it matches the longest.
|
||
|
Subexpressions also match the longest possible substrings, subject to
|
||
|
the constraint that the whole match be as long as possible,
|
||
|
with subexpressions starting earlier in the RE taking priority over
|
||
|
ones starting later.
|
||
|
Note that higher-level subexpressions thus take priority over
|
||
|
their lower-level component subexpressions.
|
||
|
.Pp
|
||
|
Match lengths are measured in characters, not collating elements.
|
||
|
A null string is considered longer than no match at all.
|
||
|
For example,
|
||
|
.Sq bb*
|
||
|
matches the three middle characters of
|
||
|
.Sq abbbc ;
|
||
|
.Sq (wee|week)(knights|nights)
|
||
|
matches all ten characters of
|
||
|
.Sq weeknights ;
|
||
|
when
|
||
|
.Sq (.*).*
|
||
|
is matched against
|
||
|
.Sq abc ,
|
||
|
the parenthesized subexpression matches all three characters;
|
||
|
and when
|
||
|
.Sq (a*)*
|
||
|
is matched against
|
||
|
.Sq bc ,
|
||
|
both the whole RE and the parenthesized subexpression match the null string.
|
||
|
.Pp
|
||
|
If case-independent matching is specified,
|
||
|
the effect is much as if all case distinctions had vanished from the
|
||
|
alphabet.
|
||
|
When an alphabetic that exists in multiple cases appears as an
|
||
|
ordinary character outside a bracket expression, it is effectively
|
||
|
transformed into a bracket expression containing both cases,
|
||
|
e.g.\&
|
||
|
.Sq x
|
||
|
becomes
|
||
|
.Sq [xX] .
|
||
|
When it appears inside a bracket expression,
|
||
|
all case counterparts of it are added to the bracket expression,
|
||
|
so that, for example,
|
||
|
.Sq [x]
|
||
|
becomes
|
||
|
.Sq [xX]
|
||
|
and
|
||
|
.Sq [^x]
|
||
|
becomes
|
||
|
.Sq [^xX] .
|
||
|
.Pp
|
||
|
No particular limit is imposed on the length of REs**.
|
||
|
Programs intended to be portable should not employ REs longer
|
||
|
than 256 bytes,
|
||
|
as an implementation can refuse to accept such REs and remain
|
||
|
POSIX-compliant.
|
||
|
.Pp
|
||
|
The following is a list of extended regular expressions:
|
||
|
.Bl -tag -width Ds
|
||
|
.It Ar c
|
||
|
Any character
|
||
|
.Ar c
|
||
|
not listed below matches itself.
|
||
|
.It \e Ns Ar c
|
||
|
Any backslash-escaped character
|
||
|
.Ar c
|
||
|
matches itself.
|
||
|
.It \&.
|
||
|
Matches any single character that is not a newline
|
||
|
.Pq Sq \en .
|
||
|
.It Bq Ar char-class
|
||
|
Matches any single character in
|
||
|
.Ar char-class .
|
||
|
To include a
|
||
|
.Ql \&]
|
||
|
in
|
||
|
.Ar char-class ,
|
||
|
it must be the first character.
|
||
|
A range of characters may be specified by separating the end characters
|
||
|
of the range with a
|
||
|
.Ql - ;
|
||
|
e.g.\&
|
||
|
.Ar a-z
|
||
|
specifies the lower case characters.
|
||
|
The following literal expressions can also be used in
|
||
|
.Ar char-class
|
||
|
to specify sets of characters:
|
||
|
.Bd -unfilled -offset indent
|
||
|
[:alnum:] [:cntrl:] [:lower:] [:space:]
|
||
|
[:alpha:] [:digit:] [:print:] [:upper:]
|
||
|
[:blank:] [:graph:] [:punct:] [:xdigit:]
|
||
|
.Ed
|
||
|
.Pp
|
||
|
If
|
||
|
.Ql -
|
||
|
appears as the first or last character of
|
||
|
.Ar char-class ,
|
||
|
then it matches itself.
|
||
|
All other characters in
|
||
|
.Ar char-class
|
||
|
match themselves.
|
||
|
.Pp
|
||
|
Patterns in
|
||
|
.Ar char-class
|
||
|
of the form
|
||
|
.Eo [.
|
||
|
.Ar col-elm
|
||
|
.Ec .]\&
|
||
|
or
|
||
|
.Eo [=
|
||
|
.Ar col-elm
|
||
|
.Ec =]\& ,
|
||
|
where
|
||
|
.Ar col-elm
|
||
|
is a collating element, are interpreted according to
|
||
|
.Xr setlocale 3
|
||
|
.Pq not currently supported .
|
||
|
.It Bq ^ Ns Ar char-class
|
||
|
Matches any single character, other than newline, not in
|
||
|
.Ar char-class .
|
||
|
.Ar char-class
|
||
|
is defined as above.
|
||
|
.It ^
|
||
|
If
|
||
|
.Sq ^
|
||
|
is the first character of a regular expression, then it
|
||
|
anchors the regular expression to the beginning of a line.
|
||
|
Otherwise, it matches itself.
|
||
|
.It $
|
||
|
If
|
||
|
.Sq $
|
||
|
is the last character of a regular expression,
|
||
|
it anchors the regular expression to the end of a line.
|
||
|
Otherwise, it matches itself.
|
||
|
.It [[:<:]]
|
||
|
Anchors the single character regular expression or subexpression
|
||
|
immediately following it to the beginning of a word.
|
||
|
.It [[:>:]]
|
||
|
Anchors the single character regular expression or subexpression
|
||
|
immediately following it to the end of a word.
|
||
|
.It Pq Ar re
|
||
|
Defines a subexpression
|
||
|
.Ar re .
|
||
|
Any set of characters enclosed in parentheses
|
||
|
matches whatever the set of characters without parentheses matches
|
||
|
(that is a long-winded way of saying the constructs
|
||
|
.Sq (re)
|
||
|
and
|
||
|
.Sq re
|
||
|
match identically).
|
||
|
.It *
|
||
|
Matches the single character regular expression or subexpression
|
||
|
immediately preceding it zero or more times.
|
||
|
If
|
||
|
.Sq *
|
||
|
is the first character of a regular expression or subexpression,
|
||
|
then it matches itself.
|
||
|
The
|
||
|
.Sq *
|
||
|
operator sometimes yields unexpected results.
|
||
|
For example, the regular expression
|
||
|
.Ar b*
|
||
|
matches the beginning of the string
|
||
|
.Qq abbb
|
||
|
(as opposed to the substring
|
||
|
.Qq bbb ) ,
|
||
|
since a null match is the only leftmost match.
|
||
|
.It +
|
||
|
Matches the singular character regular expression
|
||
|
or subexpression immediately preceding it
|
||
|
one or more times.
|
||
|
.It ?
|
||
|
Matches the singular character regular expression
|
||
|
or subexpression immediately preceding it
|
||
|
0 or 1 times.
|
||
|
.Sm off
|
||
|
.It Xo
|
||
|
.Pf { Ar n , m No }\ \&
|
||
|
.Pf { Ar n , No }\ \&
|
||
|
.Pf { Ar n No }
|
||
|
.Xc
|
||
|
.Sm on
|
||
|
Matches the single character regular expression or subexpression
|
||
|
immediately preceding it at least
|
||
|
.Ar n
|
||
|
and at most
|
||
|
.Ar m
|
||
|
times.
|
||
|
If
|
||
|
.Ar m
|
||
|
is omitted, then it matches at least
|
||
|
.Ar n
|
||
|
times.
|
||
|
If the comma is also omitted, then it matches exactly
|
||
|
.Ar n
|
||
|
times.
|
||
|
.It \*(Ba
|
||
|
Used to separate patterns.
|
||
|
For example,
|
||
|
the pattern
|
||
|
.Sq cat\*(Badog
|
||
|
matches either
|
||
|
.Sq cat
|
||
|
or
|
||
|
.Sq dog .
|
||
|
.El
|
||
|
.Sh BASIC REGULAR EXPRESSIONS
|
||
|
Basic regular expressions differ in several respects:
|
||
|
.Bl -bullet -offset 3n
|
||
|
.It
|
||
|
.Sq \*(Ba ,
|
||
|
.Sq + ,
|
||
|
and
|
||
|
.Sq ?\&
|
||
|
are ordinary characters and there is no equivalent
|
||
|
for their functionality.
|
||
|
.It
|
||
|
The delimiters for bounds are
|
||
|
.Sq \e{
|
||
|
and
|
||
|
.Sq \e} ,
|
||
|
with
|
||
|
.Sq {
|
||
|
and
|
||
|
.Sq }
|
||
|
by themselves ordinary characters.
|
||
|
.It
|
||
|
The parentheses for nested subexpressions are
|
||
|
.Sq \e(
|
||
|
and
|
||
|
.Sq \e) ,
|
||
|
with
|
||
|
.Sq (
|
||
|
and
|
||
|
.Sq )\&
|
||
|
by themselves ordinary characters.
|
||
|
.It
|
||
|
.Sq ^
|
||
|
is an ordinary character except at the beginning of the
|
||
|
RE or** the beginning of a parenthesized subexpression.
|
||
|
.It
|
||
|
.Sq $
|
||
|
is an ordinary character except at the end of the
|
||
|
RE or** the end of a parenthesized subexpression.
|
||
|
.It
|
||
|
.Sq *
|
||
|
is an ordinary character if it appears at the beginning of the
|
||
|
RE or the beginning of a parenthesized subexpression
|
||
|
(after a possible leading
|
||
|
.Sq ^ ) .
|
||
|
.It
|
||
|
Finally, there is one new type of atom, a
|
||
|
.Em back-reference :
|
||
|
.Sq \e
|
||
|
followed by a non-zero decimal digit
|
||
|
.Ar d
|
||
|
matches the same sequence of characters matched by the
|
||
|
.Ar d Ns th
|
||
|
parenthesized subexpression
|
||
|
(numbering subexpressions by the positions of their opening parentheses,
|
||
|
left to right),
|
||
|
so that, for example,
|
||
|
.Sq \e([bc]\e)\e1
|
||
|
matches
|
||
|
.Sq bb\&
|
||
|
or
|
||
|
.Sq cc
|
||
|
but not
|
||
|
.Sq bc .
|
||
|
.El
|
||
|
.Pp
|
||
|
The following is a list of basic regular expressions:
|
||
|
.Bl -tag -width Ds
|
||
|
.It Ar c
|
||
|
Any character
|
||
|
.Ar c
|
||
|
not listed below matches itself.
|
||
|
.It \e Ns Ar c
|
||
|
Any backslash-escaped character
|
||
|
.Ar c ,
|
||
|
except for
|
||
|
.Sq { ,
|
||
|
.Sq } ,
|
||
|
.Sq \&( ,
|
||
|
and
|
||
|
.Sq \&) ,
|
||
|
matches itself.
|
||
|
.It \&.
|
||
|
Matches any single character that is not a newline
|
||
|
.Pq Sq \en .
|
||
|
.It Bq Ar char-class
|
||
|
Matches any single character in
|
||
|
.Ar char-class .
|
||
|
To include a
|
||
|
.Ql \&]
|
||
|
in
|
||
|
.Ar char-class ,
|
||
|
it must be the first character.
|
||
|
A range of characters may be specified by separating the end characters
|
||
|
of the range with a
|
||
|
.Ql - ;
|
||
|
e.g.\&
|
||
|
.Ar a-z
|
||
|
specifies the lower case characters.
|
||
|
The following literal expressions can also be used in
|
||
|
.Ar char-class
|
||
|
to specify sets of characters:
|
||
|
.Bd -unfilled -offset indent
|
||
|
[:alnum:] [:cntrl:] [:lower:] [:space:]
|
||
|
[:alpha:] [:digit:] [:print:] [:upper:]
|
||
|
[:blank:] [:graph:] [:punct:] [:xdigit:]
|
||
|
.Ed
|
||
|
.Pp
|
||
|
If
|
||
|
.Ql -
|
||
|
appears as the first or last character of
|
||
|
.Ar char-class ,
|
||
|
then it matches itself.
|
||
|
All other characters in
|
||
|
.Ar char-class
|
||
|
match themselves.
|
||
|
.Pp
|
||
|
Patterns in
|
||
|
.Ar char-class
|
||
|
of the form
|
||
|
.Eo [.
|
||
|
.Ar col-elm
|
||
|
.Ec .]\&
|
||
|
or
|
||
|
.Eo [=
|
||
|
.Ar col-elm
|
||
|
.Ec =]\& ,
|
||
|
where
|
||
|
.Ar col-elm
|
||
|
is a collating element, are interpreted according to
|
||
|
.Xr setlocale 3
|
||
|
.Pq not currently supported .
|
||
|
.It Bq ^ Ns Ar char-class
|
||
|
Matches any single character, other than newline, not in
|
||
|
.Ar char-class .
|
||
|
.Ar char-class
|
||
|
is defined as above.
|
||
|
.It ^
|
||
|
If
|
||
|
.Sq ^
|
||
|
is the first character of a regular expression, then it
|
||
|
anchors the regular expression to the beginning of a line.
|
||
|
Otherwise, it matches itself.
|
||
|
.It $
|
||
|
If
|
||
|
.Sq $
|
||
|
is the last character of a regular expression,
|
||
|
it anchors the regular expression to the end of a line.
|
||
|
Otherwise, it matches itself.
|
||
|
.It [[:<:]]
|
||
|
Anchors the single character regular expression or subexpression
|
||
|
immediately following it to the beginning of a word.
|
||
|
.It [[:>:]]
|
||
|
Anchors the single character regular expression or subexpression
|
||
|
immediately following it to the end of a word.
|
||
|
.It \e( Ns Ar re Ns \e)
|
||
|
Defines a subexpression
|
||
|
.Ar re .
|
||
|
Subexpressions may be nested.
|
||
|
A subsequent backreference of the form
|
||
|
.Pf \e Ns Ar n ,
|
||
|
where
|
||
|
.Ar n
|
||
|
is a number in the range [1,9], expands to the text matched by the
|
||
|
.Ar n Ns th
|
||
|
subexpression.
|
||
|
For example, the regular expression
|
||
|
.Ar \e(.*\e)\e1
|
||
|
matches any string consisting of identical adjacent substrings.
|
||
|
Subexpressions are ordered relative to their left delimiter.
|
||
|
.It *
|
||
|
Matches the single character regular expression or subexpression
|
||
|
immediately preceding it zero or more times.
|
||
|
If
|
||
|
.Sq *
|
||
|
is the first character of a regular expression or subexpression,
|
||
|
then it matches itself.
|
||
|
The
|
||
|
.Sq *
|
||
|
operator sometimes yields unexpected results.
|
||
|
For example, the regular expression
|
||
|
.Ar b*
|
||
|
matches the beginning of the string
|
||
|
.Qq abbb
|
||
|
(as opposed to the substring
|
||
|
.Qq bbb ) ,
|
||
|
since a null match is the only leftmost match.
|
||
|
.Sm off
|
||
|
.It Xo
|
||
|
.Pf \e{ Ar n , m No \e}\ \&
|
||
|
.Pf \e{ Ar n , No \e}\ \&
|
||
|
.Pf \e{ Ar n No \e}
|
||
|
.Xc
|
||
|
.Sm on
|
||
|
Matches the single character regular expression or subexpression
|
||
|
immediately preceding it at least
|
||
|
.Ar n
|
||
|
and at most
|
||
|
.Ar m
|
||
|
times.
|
||
|
If
|
||
|
.Ar m
|
||
|
is omitted, then it matches at least
|
||
|
.Ar n
|
||
|
times.
|
||
|
If the comma is also omitted, then it matches exactly
|
||
|
.Ar n
|
||
|
times.
|
||
|
.El
|
||
|
.Sh SEE ALSO
|
||
|
.Xr ctype 3 ,
|
||
|
.Xr regex 3
|
||
|
.Sh STANDARDS
|
||
|
.St -p1003.1-2004 :
|
||
|
Base Definitions, Chapter 9 (Regular Expressions).
|
||
|
.Sh BUGS
|
||
|
Having two kinds of REs is a botch.
|
||
|
.Pp
|
||
|
The current POSIX spec says that
|
||
|
.Sq )\&
|
||
|
is an ordinary character in the absence of an unmatched
|
||
|
.Sq ( ;
|
||
|
this was an unintentional result of a wording error,
|
||
|
and change is likely.
|
||
|
Avoid relying on it.
|
||
|
.Pp
|
||
|
Back-references are a dreadful botch,
|
||
|
posing major problems for efficient implementations.
|
||
|
They are also somewhat vaguely defined
|
||
|
(does
|
||
|
.Sq a\e(\e(b\e)*\e2\e)*d
|
||
|
match
|
||
|
.Sq abbbd ? ) .
|
||
|
Avoid using them.
|
||
|
.Pp
|
||
|
POSIX's specification of case-independent matching is vague.
|
||
|
The
|
||
|
.Dq one case implies all cases
|
||
|
definition given above
|
||
|
is the current consensus among implementors as to the right interpretation.
|
||
|
.Pp
|
||
|
The syntax for word boundaries is incredibly ugly.
|