mirror of
				https://github.com/c64scene-ar/llvm-6502.git
				synced 2025-10-30 16:17:05 +00:00 
			
		
		
		
	implementation. git-svn-id: https://llvm.org/svn/llvm-project/llvm/trunk@80493 91177308-0d34-0410-b5e6-96231b3b80d8
		
			
				
	
	
		
			757 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
			
		
		
	
	
			757 lines
		
	
	
		
			18 KiB
		
	
	
	
		
			Groff
		
	
	
	
	
	
| .\"	$OpenBSD: re_format.7,v 1.14 2007/05/31 19:19:30 jmc Exp $
 | |
| .\"
 | |
| .\" Copyright (c) 1997, Phillip F Knaack. All rights reserved.
 | |
| .\"
 | |
| .\" Copyright (c) 1992, 1993, 1994 Henry Spencer.
 | |
| .\" Copyright (c) 1992, 1993, 1994
 | |
| .\"	The Regents of the University of California.  All rights reserved.
 | |
| .\"
 | |
| .\" This code is derived from software contributed to Berkeley by
 | |
| .\" Henry Spencer.
 | |
| .\"
 | |
| .\" Redistribution and use in source and binary forms, with or without
 | |
| .\" modification, are permitted provided that the following conditions
 | |
| .\" are met:
 | |
| .\" 1. Redistributions of source code must retain the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer.
 | |
| .\" 2. Redistributions in binary form must reproduce the above copyright
 | |
| .\"    notice, this list of conditions and the following disclaimer in the
 | |
| .\"    documentation and/or other materials provided with the distribution.
 | |
| .\" 3. Neither the name of the University nor the names of its contributors
 | |
| .\"    may be used to endorse or promote products derived from this software
 | |
| .\"    without specific prior written permission.
 | |
| .\"
 | |
| .\" THIS SOFTWARE IS PROVIDED BY THE REGENTS AND CONTRIBUTORS ``AS IS'' AND
 | |
| .\" ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 | |
| .\" IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 | |
| .\" ARE DISCLAIMED.  IN NO EVENT SHALL THE REGENTS OR CONTRIBUTORS BE LIABLE
 | |
| .\" FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 | |
| .\" DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 | |
| .\" OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 | |
| .\" HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 | |
| .\" LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 | |
| .\" OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 | |
| .\" SUCH DAMAGE.
 | |
| .\"
 | |
| .\"	@(#)re_format.7	8.3 (Berkeley) 3/20/94
 | |
| .\"
 | |
| .Dd $Mdocdate: May 31 2007 $
 | |
| .Dt RE_FORMAT 7
 | |
| .Os
 | |
| .Sh NAME
 | |
| .Nm re_format
 | |
| .Nd POSIX regular expressions
 | |
| .Sh DESCRIPTION
 | |
| Regular expressions (REs),
 | |
| as defined in
 | |
| .St -p1003.1-2004 ,
 | |
| come in two forms:
 | |
| basic regular expressions
 | |
| (BREs)
 | |
| and extended regular expressions
 | |
| (EREs).
 | |
| Both forms of regular expressions are supported
 | |
| by the interfaces described in
 | |
| .Xr regex 3 .
 | |
| Applications dealing with regular expressions
 | |
| may use one or the other form
 | |
| (or indeed both).
 | |
| For example,
 | |
| .Xr ed 1
 | |
| uses BREs,
 | |
| whilst
 | |
| .Xr egrep 1
 | |
| talks EREs.
 | |
| Consult the manual page for the specific application to find out which
 | |
| it uses.
 | |
| .Pp
 | |
| POSIX leaves some aspects of RE syntax and semantics open;
 | |
| .Sq **
 | |
| marks decisions on these aspects that
 | |
| may not be fully portable to other POSIX implementations.
 | |
| .Pp
 | |
| This manual page first describes regular expressions in general,
 | |
| specifically extended regular expressions,
 | |
| and then discusses differences between them and basic regular expressions.
 | |
| .Sh EXTENDED REGULAR EXPRESSIONS
 | |
| An ERE is one** or more non-empty**
 | |
| .Em branches ,
 | |
| separated by
 | |
| .Sq \*(Ba .
 | |
| It matches anything that matches one of the branches.
 | |
| .Pp
 | |
| A branch is one** or more
 | |
| .Em pieces ,
 | |
| concatenated.
 | |
| It matches a match for the first, followed by a match for the second, etc.
 | |
| .Pp
 | |
| A piece is an
 | |
| .Em atom
 | |
| possibly followed by a single**
 | |
| .Sq * ,
 | |
| .Sq + ,
 | |
| .Sq ?\& ,
 | |
| or
 | |
| .Em bound .
 | |
| An atom followed by
 | |
| .Sq *
 | |
| matches a sequence of 0 or more matches of the atom.
 | |
| An atom followed by
 | |
| .Sq +
 | |
| matches a sequence of 1 or more matches of the atom.
 | |
| An atom followed by
 | |
| .Sq ?\&
 | |
| matches a sequence of 0 or 1 matches of the atom.
 | |
| .Pp
 | |
| A bound is
 | |
| .Sq {
 | |
| followed by an unsigned decimal integer,
 | |
| possibly followed by
 | |
| .Sq ,\&
 | |
| possibly followed by another unsigned decimal integer,
 | |
| always followed by
 | |
| .Sq } .
 | |
| The integers must lie between 0 and
 | |
| .Dv RE_DUP_MAX
 | |
| (255**) inclusive,
 | |
| and if there are two of them, the first may not exceed the second.
 | |
| An atom followed by a bound containing one integer
 | |
| .Ar i
 | |
| and no comma matches
 | |
| a sequence of exactly
 | |
| .Ar i
 | |
| matches of the atom.
 | |
| An atom followed by a bound
 | |
| containing one integer
 | |
| .Ar i
 | |
| and a comma matches
 | |
| a sequence of
 | |
| .Ar i
 | |
| or more matches of the atom.
 | |
| An atom followed by a bound
 | |
| containing two integers
 | |
| .Ar i
 | |
| and
 | |
| .Ar j
 | |
| matches a sequence of
 | |
| .Ar i
 | |
| through
 | |
| .Ar j
 | |
| (inclusive) matches of the atom.
 | |
| .Pp
 | |
| An atom is a regular expression enclosed in
 | |
| .Sq ()
 | |
| (matching a part of the regular expression),
 | |
| an empty set of
 | |
| .Sq ()
 | |
| (matching the null string)**,
 | |
| a
 | |
| .Em bracket expression
 | |
| (see below),
 | |
| .Sq .\&
 | |
| (matching any single character),
 | |
| .Sq ^
 | |
| (matching the null string at the beginning of a line),
 | |
| .Sq $
 | |
| (matching the null string at the end of a line),
 | |
| a
 | |
| .Sq \e
 | |
| followed by one of the characters
 | |
| .Sq ^.[$()|*+?{\e
 | |
| (matching that character taken as an ordinary character),
 | |
| a
 | |
| .Sq \e
 | |
| followed by any other character**
 | |
| (matching that character taken as an ordinary character,
 | |
| as if the
 | |
| .Sq \e
 | |
| had not been present**),
 | |
| or a single character with no other significance (matching that character).
 | |
| A
 | |
| .Sq {
 | |
| followed by a character other than a digit is an ordinary character,
 | |
| not the beginning of a bound**.
 | |
| It is illegal to end an RE with
 | |
| .Sq \e .
 | |
| .Pp
 | |
| A bracket expression is a list of characters enclosed in
 | |
| .Sq [] .
 | |
| It normally matches any single character from the list (but see below).
 | |
| If the list begins with
 | |
| .Sq ^ ,
 | |
| it matches any single character
 | |
| .Em not
 | |
| from the rest of the list
 | |
| (but see below).
 | |
| If two characters in the list are separated by
 | |
| .Sq - ,
 | |
| this is shorthand for the full
 | |
| .Em range
 | |
| of characters between those two (inclusive) in the
 | |
| collating sequence, e.g.\&
 | |
| .Sq [0-9]
 | |
| in ASCII matches any decimal digit.
 | |
| It is illegal** for two ranges to share an endpoint, e.g.\&
 | |
| .Sq a-c-e .
 | |
| Ranges are very collating-sequence-dependent,
 | |
| and portable programs should avoid relying on them.
 | |
| .Pp
 | |
| To include a literal
 | |
| .Sq ]\&
 | |
| in the list, make it the first character
 | |
| (following a possible
 | |
| .Sq ^ ) .
 | |
| To include a literal
 | |
| .Sq - ,
 | |
| make it the first or last character,
 | |
| or the second endpoint of a range.
 | |
| To use a literal
 | |
| .Sq -
 | |
| as the first endpoint of a range,
 | |
| enclose it in
 | |
| .Sq [.
 | |
| and
 | |
| .Sq .]
 | |
| to make it a collating element (see below).
 | |
| With the exception of these and some combinations using
 | |
| .Sq [
 | |
| (see next paragraphs),
 | |
| all other special characters, including
 | |
| .Sq \e ,
 | |
| lose their special significance within a bracket expression.
 | |
| .Pp
 | |
| Within a bracket expression, a collating element
 | |
| (a character,
 | |
| a multi-character sequence that collates as if it were a single character,
 | |
| or a collating-sequence name for either)
 | |
| enclosed in
 | |
| .Sq [.
 | |
| and
 | |
| .Sq .]
 | |
| stands for the sequence of characters of that collating element.
 | |
| The sequence is a single element of the bracket expression's list.
 | |
| A bracket expression containing a multi-character collating element
 | |
| can thus match more than one character,
 | |
| e.g. if the collating sequence includes a
 | |
| .Sq ch
 | |
| collating element,
 | |
| then the RE
 | |
| .Sq [[.ch.]]*c
 | |
| matches the first five characters of
 | |
| .Sq chchcc .
 | |
| .Pp
 | |
| Within a bracket expression, a collating element enclosed in
 | |
| .Sq [=
 | |
| and
 | |
| .Sq =]
 | |
| is an equivalence class, standing for the sequences of characters
 | |
| of all collating elements equivalent to that one, including itself.
 | |
| (If there are no other equivalent collating elements,
 | |
| the treatment is as if the enclosing delimiters were
 | |
| .Sq [.
 | |
| and
 | |
| .Sq .] . )
 | |
| For example, if
 | |
| .Sq x
 | |
| and
 | |
| .Sq y
 | |
| are the members of an equivalence class,
 | |
| then
 | |
| .Sq [[=x=]] ,
 | |
| .Sq [[=y=]] ,
 | |
| and
 | |
| .Sq [xy]
 | |
| are all synonymous.
 | |
| An equivalence class may not** be an endpoint of a range.
 | |
| .Pp
 | |
| Within a bracket expression, the name of a
 | |
| .Em character class
 | |
| enclosed
 | |
| in
 | |
| .Sq [:
 | |
| and
 | |
| .Sq :]
 | |
| stands for the list of all characters belonging to that class.
 | |
| Standard character class names are:
 | |
| .Bd -literal -offset indent
 | |
| alnum	digit	punct
 | |
| alpha	graph	space
 | |
| blank	lower	upper
 | |
| cntrl	print	xdigit
 | |
| .Ed
 | |
| .Pp
 | |
| These stand for the character classes defined in
 | |
| .Xr ctype 3 .
 | |
| A locale may provide others.
 | |
| A character class may not be used as an endpoint of a range.
 | |
| .Pp
 | |
| There are two special cases** of bracket expressions:
 | |
| the bracket expressions
 | |
| .Sq [[:<:]]
 | |
| and
 | |
| .Sq [[:>:]]
 | |
| match the null string at the beginning and end of a word, respectively.
 | |
| A word is defined as a sequence of
 | |
| characters starting and ending with a word character
 | |
| which is neither preceded nor followed by
 | |
| word characters.
 | |
| A word character is an
 | |
| .Em alnum
 | |
| character (as defined by
 | |
| .Xr ctype 3 )
 | |
| or an underscore.
 | |
| This is an extension,
 | |
| compatible with but not specified by POSIX,
 | |
| and should be used with
 | |
| caution in software intended to be portable to other systems.
 | |
| .Pp
 | |
| In the event that an RE could match more than one substring of a given
 | |
| string,
 | |
| the RE matches the one starting earliest in the string.
 | |
| If the RE could match more than one substring starting at that point,
 | |
| it matches the longest.
 | |
| Subexpressions also match the longest possible substrings, subject to
 | |
| the constraint that the whole match be as long as possible,
 | |
| with subexpressions starting earlier in the RE taking priority over
 | |
| ones starting later.
 | |
| Note that higher-level subexpressions thus take priority over
 | |
| their lower-level component subexpressions.
 | |
| .Pp
 | |
| Match lengths are measured in characters, not collating elements.
 | |
| A null string is considered longer than no match at all.
 | |
| For example,
 | |
| .Sq bb*
 | |
| matches the three middle characters of
 | |
| .Sq abbbc ;
 | |
| .Sq (wee|week)(knights|nights)
 | |
| matches all ten characters of
 | |
| .Sq weeknights ;
 | |
| when
 | |
| .Sq (.*).*
 | |
| is matched against
 | |
| .Sq abc ,
 | |
| the parenthesized subexpression matches all three characters;
 | |
| and when
 | |
| .Sq (a*)*
 | |
| is matched against
 | |
| .Sq bc ,
 | |
| both the whole RE and the parenthesized subexpression match the null string.
 | |
| .Pp
 | |
| If case-independent matching is specified,
 | |
| the effect is much as if all case distinctions had vanished from the
 | |
| alphabet.
 | |
| When an alphabetic that exists in multiple cases appears as an
 | |
| ordinary character outside a bracket expression, it is effectively
 | |
| transformed into a bracket expression containing both cases,
 | |
| e.g.\&
 | |
| .Sq x
 | |
| becomes
 | |
| .Sq [xX] .
 | |
| When it appears inside a bracket expression,
 | |
| all case counterparts of it are added to the bracket expression,
 | |
| so that, for example,
 | |
| .Sq [x]
 | |
| becomes
 | |
| .Sq [xX]
 | |
| and
 | |
| .Sq [^x]
 | |
| becomes
 | |
| .Sq [^xX] .
 | |
| .Pp
 | |
| No particular limit is imposed on the length of REs**.
 | |
| Programs intended to be portable should not employ REs longer
 | |
| than 256 bytes,
 | |
| as an implementation can refuse to accept such REs and remain
 | |
| POSIX-compliant.
 | |
| .Pp
 | |
| The following is a list of extended regular expressions:
 | |
| .Bl -tag -width Ds
 | |
| .It Ar c
 | |
| Any character
 | |
| .Ar c
 | |
| not listed below matches itself.
 | |
| .It \e Ns Ar c
 | |
| Any backslash-escaped character
 | |
| .Ar c
 | |
| matches itself.
 | |
| .It \&.
 | |
| Matches any single character that is not a newline
 | |
| .Pq Sq \en .
 | |
| .It Bq Ar char-class
 | |
| Matches any single character in
 | |
| .Ar char-class .
 | |
| To include a
 | |
| .Ql \&]
 | |
| in
 | |
| .Ar char-class ,
 | |
| it must be the first character.
 | |
| A range of characters may be specified by separating the end characters
 | |
| of the range with a
 | |
| .Ql - ;
 | |
| e.g.\&
 | |
| .Ar a-z
 | |
| specifies the lower case characters.
 | |
| The following literal expressions can also be used in
 | |
| .Ar char-class
 | |
| to specify sets of characters:
 | |
| .Bd -unfilled -offset indent
 | |
| [:alnum:] [:cntrl:] [:lower:] [:space:]
 | |
| [:alpha:] [:digit:] [:print:] [:upper:]
 | |
| [:blank:] [:graph:] [:punct:] [:xdigit:]
 | |
| .Ed
 | |
| .Pp
 | |
| If
 | |
| .Ql -
 | |
| appears as the first or last character of
 | |
| .Ar char-class ,
 | |
| then it matches itself.
 | |
| All other characters in
 | |
| .Ar char-class
 | |
| match themselves.
 | |
| .Pp
 | |
| Patterns in
 | |
| .Ar char-class
 | |
| of the form
 | |
| .Eo [.
 | |
| .Ar col-elm
 | |
| .Ec .]\&
 | |
| or
 | |
| .Eo [=
 | |
| .Ar col-elm
 | |
| .Ec =]\& ,
 | |
| where
 | |
| .Ar col-elm
 | |
| is a collating element, are interpreted according to
 | |
| .Xr setlocale 3
 | |
| .Pq not currently supported .
 | |
| .It Bq ^ Ns Ar char-class
 | |
| Matches any single character, other than newline, not in
 | |
| .Ar char-class .
 | |
| .Ar char-class
 | |
| is defined as above.
 | |
| .It ^
 | |
| If
 | |
| .Sq ^
 | |
| is the first character of a regular expression, then it
 | |
| anchors the regular expression to the beginning of a line.
 | |
| Otherwise, it matches itself.
 | |
| .It $
 | |
| If
 | |
| .Sq $
 | |
| is the last character of a regular expression,
 | |
| it anchors the regular expression to the end of a line.
 | |
| Otherwise, it matches itself.
 | |
| .It [[:<:]]
 | |
| Anchors the single character regular expression or subexpression
 | |
| immediately following it to the beginning of a word.
 | |
| .It [[:>:]]
 | |
| Anchors the single character regular expression or subexpression
 | |
| immediately following it to the end of a word.
 | |
| .It Pq Ar re
 | |
| Defines a subexpression
 | |
| .Ar re .
 | |
| Any set of characters enclosed in parentheses
 | |
| matches whatever the set of characters without parentheses matches
 | |
| (that is a long-winded way of saying the constructs
 | |
| .Sq (re)
 | |
| and
 | |
| .Sq re
 | |
| match identically).
 | |
| .It *
 | |
| Matches the single character regular expression or subexpression
 | |
| immediately preceding it zero or more times.
 | |
| If
 | |
| .Sq *
 | |
| is the first character of a regular expression or subexpression,
 | |
| then it matches itself.
 | |
| The
 | |
| .Sq *
 | |
| operator sometimes yields unexpected results.
 | |
| For example, the regular expression
 | |
| .Ar b*
 | |
| matches the beginning of the string
 | |
| .Qq abbb
 | |
| (as opposed to the substring
 | |
| .Qq bbb ) ,
 | |
| since a null match is the only leftmost match.
 | |
| .It +
 | |
| Matches the singular character regular expression
 | |
| or subexpression immediately preceding it
 | |
| one or more times.
 | |
| .It ?
 | |
| Matches the singular character regular expression
 | |
| or subexpression immediately preceding it
 | |
| 0 or 1 times.
 | |
| .Sm off
 | |
| .It Xo
 | |
| .Pf { Ar n , m No }\ \&
 | |
| .Pf { Ar n , No }\ \&
 | |
| .Pf { Ar n No }
 | |
| .Xc
 | |
| .Sm on
 | |
| Matches the single character regular expression or subexpression
 | |
| immediately preceding it at least
 | |
| .Ar n
 | |
| and at most
 | |
| .Ar m
 | |
| times.
 | |
| If
 | |
| .Ar m
 | |
| is omitted, then it matches at least
 | |
| .Ar n
 | |
| times.
 | |
| If the comma is also omitted, then it matches exactly
 | |
| .Ar n
 | |
| times.
 | |
| .It \*(Ba
 | |
| Used to separate patterns.
 | |
| For example,
 | |
| the pattern
 | |
| .Sq cat\*(Badog
 | |
| matches either
 | |
| .Sq cat
 | |
| or
 | |
| .Sq dog .
 | |
| .El
 | |
| .Sh BASIC REGULAR EXPRESSIONS
 | |
| Basic regular expressions differ in several respects:
 | |
| .Bl -bullet -offset 3n
 | |
| .It
 | |
| .Sq \*(Ba ,
 | |
| .Sq + ,
 | |
| and
 | |
| .Sq ?\&
 | |
| are ordinary characters and there is no equivalent
 | |
| for their functionality.
 | |
| .It
 | |
| The delimiters for bounds are
 | |
| .Sq \e{
 | |
| and
 | |
| .Sq \e} ,
 | |
| with
 | |
| .Sq {
 | |
| and
 | |
| .Sq }
 | |
| by themselves ordinary characters.
 | |
| .It
 | |
| The parentheses for nested subexpressions are
 | |
| .Sq \e(
 | |
| and
 | |
| .Sq \e) ,
 | |
| with
 | |
| .Sq (
 | |
| and
 | |
| .Sq )\&
 | |
| by themselves ordinary characters.
 | |
| .It
 | |
| .Sq ^
 | |
| is an ordinary character except at the beginning of the
 | |
| RE or** the beginning of a parenthesized subexpression.
 | |
| .It
 | |
| .Sq $
 | |
| is an ordinary character except at the end of the
 | |
| RE or** the end of a parenthesized subexpression.
 | |
| .It
 | |
| .Sq *
 | |
| is an ordinary character if it appears at the beginning of the
 | |
| RE or the beginning of a parenthesized subexpression
 | |
| (after a possible leading
 | |
| .Sq ^ ) .
 | |
| .It
 | |
| Finally, there is one new type of atom, a
 | |
| .Em back-reference :
 | |
| .Sq \e
 | |
| followed by a non-zero decimal digit
 | |
| .Ar d
 | |
| matches the same sequence of characters matched by the
 | |
| .Ar d Ns th
 | |
| parenthesized subexpression
 | |
| (numbering subexpressions by the positions of their opening parentheses,
 | |
| left to right),
 | |
| so that, for example,
 | |
| .Sq \e([bc]\e)\e1
 | |
| matches
 | |
| .Sq bb\&
 | |
| or
 | |
| .Sq cc
 | |
| but not
 | |
| .Sq bc .
 | |
| .El
 | |
| .Pp
 | |
| The following is a list of basic regular expressions:
 | |
| .Bl -tag -width Ds
 | |
| .It Ar c
 | |
| Any character
 | |
| .Ar c
 | |
| not listed below matches itself.
 | |
| .It \e Ns Ar c
 | |
| Any backslash-escaped character
 | |
| .Ar c ,
 | |
| except for
 | |
| .Sq { ,
 | |
| .Sq } ,
 | |
| .Sq \&( ,
 | |
| and
 | |
| .Sq \&) ,
 | |
| matches itself.
 | |
| .It \&.
 | |
| Matches any single character that is not a newline
 | |
| .Pq Sq \en .
 | |
| .It Bq Ar char-class
 | |
| Matches any single character in
 | |
| .Ar char-class .
 | |
| To include a
 | |
| .Ql \&]
 | |
| in
 | |
| .Ar char-class ,
 | |
| it must be the first character.
 | |
| A range of characters may be specified by separating the end characters
 | |
| of the range with a
 | |
| .Ql - ;
 | |
| e.g.\&
 | |
| .Ar a-z
 | |
| specifies the lower case characters.
 | |
| The following literal expressions can also be used in
 | |
| .Ar char-class
 | |
| to specify sets of characters:
 | |
| .Bd -unfilled -offset indent
 | |
| [:alnum:] [:cntrl:] [:lower:] [:space:]
 | |
| [:alpha:] [:digit:] [:print:] [:upper:]
 | |
| [:blank:] [:graph:] [:punct:] [:xdigit:]
 | |
| .Ed
 | |
| .Pp
 | |
| If
 | |
| .Ql -
 | |
| appears as the first or last character of
 | |
| .Ar char-class ,
 | |
| then it matches itself.
 | |
| All other characters in
 | |
| .Ar char-class
 | |
| match themselves.
 | |
| .Pp
 | |
| Patterns in
 | |
| .Ar char-class
 | |
| of the form
 | |
| .Eo [.
 | |
| .Ar col-elm
 | |
| .Ec .]\&
 | |
| or
 | |
| .Eo [=
 | |
| .Ar col-elm
 | |
| .Ec =]\& ,
 | |
| where
 | |
| .Ar col-elm
 | |
| is a collating element, are interpreted according to
 | |
| .Xr setlocale 3
 | |
| .Pq not currently supported .
 | |
| .It Bq ^ Ns Ar char-class
 | |
| Matches any single character, other than newline, not in
 | |
| .Ar char-class .
 | |
| .Ar char-class
 | |
| is defined as above.
 | |
| .It ^
 | |
| If
 | |
| .Sq ^
 | |
| is the first character of a regular expression, then it
 | |
| anchors the regular expression to the beginning of a line.
 | |
| Otherwise, it matches itself.
 | |
| .It $
 | |
| If
 | |
| .Sq $
 | |
| is the last character of a regular expression,
 | |
| it anchors the regular expression to the end of a line.
 | |
| Otherwise, it matches itself.
 | |
| .It [[:<:]]
 | |
| Anchors the single character regular expression or subexpression
 | |
| immediately following it to the beginning of a word.
 | |
| .It [[:>:]]
 | |
| Anchors the single character regular expression or subexpression
 | |
| immediately following it to the end of a word.
 | |
| .It \e( Ns Ar re Ns \e)
 | |
| Defines a subexpression
 | |
| .Ar re .
 | |
| Subexpressions may be nested.
 | |
| A subsequent backreference of the form
 | |
| .Pf \e Ns Ar n ,
 | |
| where
 | |
| .Ar n
 | |
| is a number in the range [1,9], expands to the text matched by the
 | |
| .Ar n Ns th
 | |
| subexpression.
 | |
| For example, the regular expression
 | |
| .Ar \e(.*\e)\e1
 | |
| matches any string consisting of identical adjacent substrings.
 | |
| Subexpressions are ordered relative to their left delimiter.
 | |
| .It *
 | |
| Matches the single character regular expression or subexpression
 | |
| immediately preceding it zero or more times.
 | |
| If
 | |
| .Sq *
 | |
| is the first character of a regular expression or subexpression,
 | |
| then it matches itself.
 | |
| The
 | |
| .Sq *
 | |
| operator sometimes yields unexpected results.
 | |
| For example, the regular expression
 | |
| .Ar b*
 | |
| matches the beginning of the string
 | |
| .Qq abbb
 | |
| (as opposed to the substring
 | |
| .Qq bbb ) ,
 | |
| since a null match is the only leftmost match.
 | |
| .Sm off
 | |
| .It Xo
 | |
| .Pf \e{ Ar n , m No \e}\ \&
 | |
| .Pf \e{ Ar n , No \e}\ \&
 | |
| .Pf \e{ Ar n No \e}
 | |
| .Xc
 | |
| .Sm on
 | |
| Matches the single character regular expression or subexpression
 | |
| immediately preceding it at least
 | |
| .Ar n
 | |
| and at most
 | |
| .Ar m
 | |
| times.
 | |
| If
 | |
| .Ar m
 | |
| is omitted, then it matches at least
 | |
| .Ar n
 | |
| times.
 | |
| If the comma is also omitted, then it matches exactly
 | |
| .Ar n
 | |
| times.
 | |
| .El
 | |
| .Sh SEE ALSO
 | |
| .Xr ctype 3 ,
 | |
| .Xr regex 3
 | |
| .Sh STANDARDS
 | |
| .St -p1003.1-2004 :
 | |
| Base Definitions, Chapter 9 (Regular Expressions).
 | |
| .Sh BUGS
 | |
| Having two kinds of REs is a botch.
 | |
| .Pp
 | |
| The current POSIX spec says that
 | |
| .Sq )\&
 | |
| is an ordinary character in the absence of an unmatched
 | |
| .Sq ( ;
 | |
| this was an unintentional result of a wording error,
 | |
| and change is likely.
 | |
| Avoid relying on it.
 | |
| .Pp
 | |
| Back-references are a dreadful botch,
 | |
| posing major problems for efficient implementations.
 | |
| They are also somewhat vaguely defined
 | |
| (does
 | |
| .Sq a\e(\e(b\e)*\e2\e)*d
 | |
| match
 | |
| .Sq abbbd ? ) .
 | |
| Avoid using them.
 | |
| .Pp
 | |
| POSIX's specification of case-independent matching is vague.
 | |
| The
 | |
| .Dq one case implies all cases
 | |
| definition given above
 | |
| is the current consensus among implementors as to the right interpretation.
 | |
| .Pp
 | |
| The syntax for word boundaries is incredibly ugly.
 |