| 1 | '\" | 
|---|
| 2 | '\" Copyright (c) 1998 Sun Microsystems, Inc. | 
|---|
| 3 | '\" Copyright (c) 1999 Scriptics Corporation | 
|---|
| 4 | '\" | 
|---|
| 5 | '\" See the file "license.terms" for information on usage and redistribution | 
|---|
| 6 | '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. | 
|---|
| 7 | '\"  | 
|---|
| 8 | '\" RCS: @(#) $Id: re_syntax.n,v 1.18 2007/12/13 15:22:33 dgp Exp $ | 
|---|
| 9 | '\" | 
|---|
| 10 | .so man.macros | 
|---|
| 11 | .TH re_syntax n "8.1" Tcl "Tcl Built-In Commands" | 
|---|
| 12 | .BS | 
|---|
| 13 | .SH NAME | 
|---|
| 14 | re_syntax \- Syntax of Tcl regular expressions | 
|---|
| 15 | .BE | 
|---|
| 16 | .SH DESCRIPTION | 
|---|
| 17 | .PP | 
|---|
| 18 | A \fIregular expression\fR describes strings of characters. | 
|---|
| 19 | It's a pattern that matches certain strings and does not match others. | 
|---|
| 20 | .SH "DIFFERENT FLAVORS OF REs" | 
|---|
| 21 | Regular expressions | 
|---|
| 22 | .PQ RE s , | 
|---|
| 23 | as defined by POSIX, come in two flavors: \fIextended\fR REs | 
|---|
| 24 | .PQ ERE s | 
|---|
| 25 | and \fIbasic\fR REs | 
|---|
| 26 | .PQ BRE s . | 
|---|
| 27 | EREs are roughly those of the traditional \fIegrep\fR, while BREs are | 
|---|
| 28 | roughly those of the traditional \fIed\fR. This implementation adds | 
|---|
| 29 | a third flavor, \fIadvanced\fR REs | 
|---|
| 30 | .PQ ARE s , | 
|---|
| 31 | basically EREs with some significant extensions. | 
|---|
| 32 | .PP | 
|---|
| 33 | This manual page primarily describes AREs. BREs mostly exist for | 
|---|
| 34 | backward compatibility in some old programs; they will be discussed at | 
|---|
| 35 | the end. POSIX EREs are almost an exact subset of AREs. Features of | 
|---|
| 36 | AREs that are not present in EREs will be indicated. | 
|---|
| 37 | .SH "REGULAR EXPRESSION SYNTAX" | 
|---|
| 38 | .PP | 
|---|
| 39 | Tcl regular expressions are implemented using the package written by | 
|---|
| 40 | Henry Spencer, based on the 1003.2 spec and some (not quite all) of | 
|---|
| 41 | the Perl5 extensions (thanks, Henry!). Much of the description of | 
|---|
| 42 | regular expressions below is copied verbatim from his manual entry. | 
|---|
| 43 | .PP | 
|---|
| 44 | An ARE is one or more \fIbranches\fR, | 
|---|
| 45 | separated by | 
|---|
| 46 | .QW \fB|\fR , | 
|---|
| 47 | matching anything that matches any of the branches. | 
|---|
| 48 | .PP | 
|---|
| 49 | A branch is zero or more \fIconstraints\fR or \fIquantified atoms\fR, | 
|---|
| 50 | concatenated. | 
|---|
| 51 | It matches a match for the first, followed by a match for the second, etc; | 
|---|
| 52 | an empty branch matches the empty string. | 
|---|
| 53 | .SS QUANTIFIERS | 
|---|
| 54 | A quantified atom is an \fIatom\fR possibly followed | 
|---|
| 55 | by a single \fIquantifier\fR. | 
|---|
| 56 | Without a quantifier, it matches a single match for the atom. | 
|---|
| 57 | The quantifiers, | 
|---|
| 58 | and what a so-quantified atom matches, are: | 
|---|
| 59 | .RS 2 | 
|---|
| 60 | .TP 6 | 
|---|
| 61 | \fB*\fR | 
|---|
| 62 | . | 
|---|
| 63 | a sequence of 0 or more matches of the atom | 
|---|
| 64 | .TP | 
|---|
| 65 | \fB+\fR | 
|---|
| 66 | . | 
|---|
| 67 | a sequence of 1 or more matches of the atom | 
|---|
| 68 | .TP | 
|---|
| 69 | \fB?\fR | 
|---|
| 70 | . | 
|---|
| 71 | a sequence of 0 or 1 matches of the atom | 
|---|
| 72 | .TP | 
|---|
| 73 | \fB{\fIm\fB}\fR | 
|---|
| 74 | . | 
|---|
| 75 | a sequence of exactly \fIm\fR matches of the atom | 
|---|
| 76 | .TP | 
|---|
| 77 | \fB{\fIm\fB,}\fR | 
|---|
| 78 | . | 
|---|
| 79 | a sequence of \fIm\fR or more matches of the atom | 
|---|
| 80 | .TP | 
|---|
| 81 | \fB{\fIm\fB,\fIn\fB}\fR | 
|---|
| 82 | . | 
|---|
| 83 | a sequence of \fIm\fR through \fIn\fR (inclusive) matches of the atom; | 
|---|
| 84 | \fIm\fR may not exceed \fIn\fR | 
|---|
| 85 | .TP | 
|---|
| 86 | \fB*?  +?  ??  {\fIm\fB}?  {\fIm\fB,}?  {\fIm\fB,\fIn\fB}?\fR | 
|---|
| 87 | . | 
|---|
| 88 | \fInon-greedy\fR quantifiers, which match the same possibilities, | 
|---|
| 89 | but prefer the smallest number rather than the largest number | 
|---|
| 90 | of matches (see \fBMATCHING\fR) | 
|---|
| 91 | .RE | 
|---|
| 92 | .PP | 
|---|
| 93 | The forms using \fB{\fR and \fB}\fR are known as \fIbound\fRs. The | 
|---|
| 94 | numbers \fIm\fR and \fIn\fR are unsigned decimal integers with | 
|---|
| 95 | permissible values from 0 to 255 inclusive. | 
|---|
| 96 | .SS ATOMS | 
|---|
| 97 | An atom is one of: | 
|---|
| 98 | .RS 2 | 
|---|
| 99 | .IP \fB(\fIre\fB)\fR 6 | 
|---|
| 100 | matches a match for \fIre\fR (\fIre\fR is any regular expression) with | 
|---|
| 101 | the match noted for possible reporting | 
|---|
| 102 | .IP \fB(?:\fIre\fB)\fR | 
|---|
| 103 | as previous, but does no reporting (a | 
|---|
| 104 | .QW non-capturing | 
|---|
| 105 | set of parentheses) | 
|---|
| 106 | .IP \fB()\fR | 
|---|
| 107 | matches an empty string, noted for possible reporting | 
|---|
| 108 | .IP \fB(?:)\fR | 
|---|
| 109 | matches an empty string, without reporting | 
|---|
| 110 | .IP \fB[\fIchars\fB]\fR | 
|---|
| 111 | a \fIbracket expression\fR, matching any one of the \fIchars\fR (see | 
|---|
| 112 | \fBBRACKET EXPRESSIONS\fR for more detail) | 
|---|
| 113 | .IP \fB.\fR | 
|---|
| 114 | matches any single character | 
|---|
| 115 | .IP \fB\e\fIk\fR | 
|---|
| 116 | matches the non-alphanumeric character \fIk\fR | 
|---|
| 117 | taken as an ordinary character, e.g. \fB\e\e\fR matches a backslash | 
|---|
| 118 | character | 
|---|
| 119 | .IP \fB\e\fIc\fR | 
|---|
| 120 | where \fIc\fR is alphanumeric (possibly followed by other characters), | 
|---|
| 121 | an \fIescape\fR (AREs only), see \fBESCAPES\fR below | 
|---|
| 122 | .IP \fB{\fR | 
|---|
| 123 | when followed by a character other than a digit, matches the | 
|---|
| 124 | left-brace character | 
|---|
| 125 | .QW \fB{\fR ; | 
|---|
| 126 | when followed by a digit, it is the beginning of a \fIbound\fR (see above) | 
|---|
| 127 | .IP \fIx\fR | 
|---|
| 128 | where \fIx\fR is a single character with no other significance, | 
|---|
| 129 | matches that character. | 
|---|
| 130 | .RE | 
|---|
| 131 | .SS CONSTRAINTS | 
|---|
| 132 | A \fIconstraint\fR matches an empty string when specific conditions | 
|---|
| 133 | are met. A constraint may not be followed by a quantifier. The | 
|---|
| 134 | simple constraints are as follows; some more constraints are described | 
|---|
| 135 | later, under \fBESCAPES\fR. | 
|---|
| 136 | .RS 2 | 
|---|
| 137 | .TP 8 | 
|---|
| 138 | \fB^\fR | 
|---|
| 139 | . | 
|---|
| 140 | matches at the beginning of a line | 
|---|
| 141 | .TP | 
|---|
| 142 | \fB$\fR | 
|---|
| 143 | . | 
|---|
| 144 | matches at the end of a line | 
|---|
| 145 | .TP | 
|---|
| 146 | \fB(?=\fIre\fB)\fR | 
|---|
| 147 | . | 
|---|
| 148 | \fIpositive lookahead\fR (AREs only), matches at any point where a | 
|---|
| 149 | substring matching \fIre\fR begins | 
|---|
| 150 | .TP | 
|---|
| 151 | \fB(?!\fIre\fB)\fR | 
|---|
| 152 | . | 
|---|
| 153 | \fInegative lookahead\fR (AREs only), matches at any point where no | 
|---|
| 154 | substring matching \fIre\fR begins | 
|---|
| 155 | .RE | 
|---|
| 156 | .PP | 
|---|
| 157 | The lookahead constraints may not contain back references (see later), | 
|---|
| 158 | and all parentheses within them are considered non-capturing. | 
|---|
| 159 | .PP | 
|---|
| 160 | An RE may not end with | 
|---|
| 161 | .QW \fB\e\fR . | 
|---|
| 162 | .SH "BRACKET EXPRESSIONS" | 
|---|
| 163 | A \fIbracket expression\fR is a list of characters enclosed in | 
|---|
| 164 | .QW \fB[\|]\fR . | 
|---|
| 165 | It normally matches any single character from the list | 
|---|
| 166 | (but see below). If the list begins with | 
|---|
| 167 | .QW \fB^\fR , | 
|---|
| 168 | it matches any single character (but see below) \fInot\fR from the | 
|---|
| 169 | rest of the list. | 
|---|
| 170 | .PP | 
|---|
| 171 | If two characters in the list are separated by | 
|---|
| 172 | .QW \fB\-\fR , | 
|---|
| 173 | this is shorthand for the full \fIrange\fR of characters between those two | 
|---|
| 174 | (inclusive) in the collating sequence, e.g. | 
|---|
| 175 | .QW \fB[0\-9]\fR | 
|---|
| 176 | in Unicode matches any conventional decimal digit. Two ranges may not share an | 
|---|
| 177 | endpoint, so e.g. | 
|---|
| 178 | .QW \fBa\-c\-e\fR | 
|---|
| 179 | is illegal. Ranges in Tcl always use the | 
|---|
| 180 | Unicode collating sequence, but other programs may use other collating | 
|---|
| 181 | sequences and this can be a source of incompatability between programs. | 
|---|
| 182 | .PP | 
|---|
| 183 | To include a literal \fB]\fR or \fB\-\fR in the list, the simplest | 
|---|
| 184 | method is to enclose it in \fB[.\fR and \fB.]\fR to make it a | 
|---|
| 185 | collating element (see below). Alternatively, make it the first | 
|---|
| 186 | character (following a possible | 
|---|
| 187 | .QW \fB^\fR ), | 
|---|
| 188 | or (AREs only) precede it with | 
|---|
| 189 | .QW \fB\e\fR . | 
|---|
| 190 | Alternatively, for | 
|---|
| 191 | .QW \fB\-\fR , | 
|---|
| 192 | make it the last character, or the second endpoint of a range. To use | 
|---|
| 193 | a literal \fB\-\fR as the first endpoint of a range, make it a | 
|---|
| 194 | collating element or (AREs only) precede it with | 
|---|
| 195 | .QW \fB\e\fR . | 
|---|
| 196 | With the exception of | 
|---|
| 197 | these, some combinations using \fB[\fR (see next paragraphs), and | 
|---|
| 198 | escapes, all other special characters lose their special significance | 
|---|
| 199 | within a bracket expression. | 
|---|
| 200 | .SS "CHARACTER CLASSES" | 
|---|
| 201 | Within a bracket expression, the name of a \fIcharacter class\fR | 
|---|
| 202 | enclosed in \fB[:\fR and \fB:]\fR stands for the list of all | 
|---|
| 203 | characters (not all collating elements!) belonging to that class. | 
|---|
| 204 | Standard character classes are: | 
|---|
| 205 | .IP \fBalpha\fR 8 | 
|---|
| 206 | A letter. | 
|---|
| 207 | .IP \fBupper\fR 8 | 
|---|
| 208 | An upper-case letter. | 
|---|
| 209 | .IP \fBlower\fR 8 | 
|---|
| 210 | A lower-case letter. | 
|---|
| 211 | .IP \fBdigit\fR 8 | 
|---|
| 212 | A decimal digit. | 
|---|
| 213 | .IP \fBxdigit\fR 8 | 
|---|
| 214 | A hexadecimal digit. | 
|---|
| 215 | .IP \fBalnum\fR 8 | 
|---|
| 216 | An alphanumeric (letter or digit). | 
|---|
| 217 | .IP \fBprint\fR 8 | 
|---|
| 218 | A "printable" (same as graph, except also including space). | 
|---|
| 219 | .IP \fBblank\fR 8 | 
|---|
| 220 | A space or tab character. | 
|---|
| 221 | .IP \fBspace\fR 8 | 
|---|
| 222 | A character producing white space in displayed text. | 
|---|
| 223 | .IP \fBpunct\fR 8 | 
|---|
| 224 | A punctuation character. | 
|---|
| 225 | .IP \fBgraph\fR 8 | 
|---|
| 226 | A character with a visible representation (includes both alnum and punct). | 
|---|
| 227 | .IP \fBcntrl\fR 8 | 
|---|
| 228 | A control character. | 
|---|
| 229 | .PP | 
|---|
| 230 | A locale may provide others. A character class may not be used as an endpoint | 
|---|
| 231 | of a range. | 
|---|
| 232 | .RS | 
|---|
| 233 | .PP | 
|---|
| 234 | (\fINote:\fR the current Tcl implementation has only one locale, the Unicode | 
|---|
| 235 | locale, which supports exactly the above classes.) | 
|---|
| 236 | .RE | 
|---|
| 237 | .SS "BRACKETED CONSTRAINTS" | 
|---|
| 238 | There are two special cases of bracket expressions: the bracket | 
|---|
| 239 | expressions | 
|---|
| 240 | .QW \fB[[:<:]]\fR | 
|---|
| 241 | and | 
|---|
| 242 | .QW \fB[[:>:]]\fR | 
|---|
| 243 | are constraints, matching empty strings at the beginning and end of a word | 
|---|
| 244 | respectively. | 
|---|
| 245 | .\" note, discussion of escapes below references this definition of word | 
|---|
| 246 | A word is defined as a sequence of word characters that is neither preceded | 
|---|
| 247 | nor followed by word characters. A word character is an \fIalnum\fR character | 
|---|
| 248 | or an underscore | 
|---|
| 249 | .PQ \fB_\fR "" . | 
|---|
| 250 | These special bracket expressions are deprecated; users of AREs should use | 
|---|
| 251 | constraint escapes instead (see below). | 
|---|
| 252 | .SS "COLLATING ELEMENTS" | 
|---|
| 253 | Within a bracket expression, a collating element (a character, a | 
|---|
| 254 | multi-character sequence that collates as if it were a single | 
|---|
| 255 | character, or a collating-sequence name for either) enclosed in | 
|---|
| 256 | \fB[.\fR and \fB.]\fR stands for the sequence of characters of that | 
|---|
| 257 | collating element. The sequence is a single element of the bracket | 
|---|
| 258 | expression's list. A bracket expression in a locale that has | 
|---|
| 259 | multi-character collating elements can thus match more than one | 
|---|
| 260 | character. So (insidiously), a bracket expression that starts with | 
|---|
| 261 | \fB^\fR can match multi-character collating elements even if none of | 
|---|
| 262 | them appear in the bracket expression! | 
|---|
| 263 | .RS | 
|---|
| 264 | .PP | 
|---|
| 265 | (\fINote:\fR Tcl has no multi-character collating elements. This information | 
|---|
| 266 | is only for illustration.) | 
|---|
| 267 | .RE | 
|---|
| 268 | .PP | 
|---|
| 269 | For example, assume the collating sequence includes a \fBch\fR multi-character | 
|---|
| 270 | collating element. Then the RE | 
|---|
| 271 | .QW \fB[[.ch.]]*c\fR | 
|---|
| 272 | (zero or more | 
|---|
| 273 | .QW \fBch\fRs | 
|---|
| 274 | followed by | 
|---|
| 275 | .QW \fBc\fR ) | 
|---|
| 276 | matches the first five characters of | 
|---|
| 277 | .QW \fBchchcc\fR . | 
|---|
| 278 | Also, the RE | 
|---|
| 279 | .QW \fB[^c]b\fR | 
|---|
| 280 | matches all of | 
|---|
| 281 | .QW \fBchb\fR | 
|---|
| 282 | (because | 
|---|
| 283 | .QW \fB[^c]\fR | 
|---|
| 284 | matches the multi-character | 
|---|
| 285 | .QW \fBch\fR ). | 
|---|
| 286 | .SS "EQUIVALENCE CLASSES" | 
|---|
| 287 | Within a bracket expression, a collating element enclosed in \fB[=\fR | 
|---|
| 288 | and \fB=]\fR is an equivalence class, standing for the sequences of | 
|---|
| 289 | characters of all collating elements equivalent to that one, including | 
|---|
| 290 | itself. (If there are no other equivalent collating elements, the | 
|---|
| 291 | treatment is as if the enclosing delimiters were | 
|---|
| 292 | .QW \fB[.\fR \& | 
|---|
| 293 | and | 
|---|
| 294 | .QW \fB.]\fR .) | 
|---|
| 295 | For example, if \fBo\fR and \fB\N'244'\fR are the members of an | 
|---|
| 296 | equivalence class, then | 
|---|
| 297 | .QW \fB[[=o=]]\fR , | 
|---|
| 298 | .QW \fB[[=\N'244'=]]\fR , | 
|---|
| 299 | and | 
|---|
| 300 | .QW \fB[o\N'244']\fR \& | 
|---|
| 301 | are all synonymous. An equivalence class may not be an endpoint of a range. | 
|---|
| 302 | .RS | 
|---|
| 303 | .PP | 
|---|
| 304 | (\fINote:\fR Tcl implements only the Unicode locale. It does not define any | 
|---|
| 305 | equivalence classes. The examples above are just illustrations.) | 
|---|
| 306 | .RE | 
|---|
| 307 | .SH ESCAPES | 
|---|
| 308 | Escapes (AREs only), which begin with a \fB\e\fR followed by an | 
|---|
| 309 | alphanumeric character, come in several varieties: character entry, | 
|---|
| 310 | class shorthands, constraint escapes, and back references. A \fB\e\fR | 
|---|
| 311 | followed by an alphanumeric character but not constituting a valid | 
|---|
| 312 | escape is illegal in AREs. In EREs, there are no escapes: outside a | 
|---|
| 313 | bracket expression, a \fB\e\fR followed by an alphanumeric character | 
|---|
| 314 | merely stands for that character as an ordinary character, and inside | 
|---|
| 315 | a bracket expression, \fB\e\fR is an ordinary character. (The latter | 
|---|
| 316 | is the one actual incompatibility between EREs and AREs.) | 
|---|
| 317 | .SS "CHARACTER-ENTRY ESCAPES" | 
|---|
| 318 | Character-entry escapes (AREs only) exist to make it easier to specify | 
|---|
| 319 | non-printing and otherwise inconvenient characters in REs: | 
|---|
| 320 | .RS 2 | 
|---|
| 321 | .TP 5 | 
|---|
| 322 | \fB\ea\fR | 
|---|
| 323 | . | 
|---|
| 324 | alert (bell) character, as in C | 
|---|
| 325 | .TP | 
|---|
| 326 | \fB\eb\fR | 
|---|
| 327 | . | 
|---|
| 328 | backspace, as in C | 
|---|
| 329 | .TP | 
|---|
| 330 | \fB\eB\fR | 
|---|
| 331 | . | 
|---|
| 332 | synonym for \fB\e\fR to help reduce backslash doubling in some | 
|---|
| 333 | applications where there are multiple levels of backslash processing | 
|---|
| 334 | .TP | 
|---|
| 335 | \fB\ec\fIX\fR | 
|---|
| 336 | . | 
|---|
| 337 | (where \fIX\fR is any character) the character whose low-order 5 bits | 
|---|
| 338 | are the same as those of \fIX\fR, and whose other bits are all zero | 
|---|
| 339 | .TP | 
|---|
| 340 | \fB\ee\fR | 
|---|
| 341 | . | 
|---|
| 342 | the character whose collating-sequence name is | 
|---|
| 343 | .QW \fBESC\fR , | 
|---|
| 344 | or failing that, the character with octal value 033 | 
|---|
| 345 | .TP | 
|---|
| 346 | \fB\ef\fR | 
|---|
| 347 | . | 
|---|
| 348 | formfeed, as in C | 
|---|
| 349 | .TP | 
|---|
| 350 | \fB\en\fR | 
|---|
| 351 | . | 
|---|
| 352 | newline, as in C | 
|---|
| 353 | .TP | 
|---|
| 354 | \fB\er\fR | 
|---|
| 355 | . | 
|---|
| 356 | carriage return, as in C | 
|---|
| 357 | .TP | 
|---|
| 358 | \fB\et\fR | 
|---|
| 359 | . | 
|---|
| 360 | horizontal tab, as in C | 
|---|
| 361 | .TP | 
|---|
| 362 | \fB\eu\fIwxyz\fR | 
|---|
| 363 | . | 
|---|
| 364 | (where \fIwxyz\fR is exactly four hexadecimal digits) the Unicode | 
|---|
| 365 | character \fBU+\fIwxyz\fR in the local byte ordering | 
|---|
| 366 | .TP | 
|---|
| 367 | \fB\eU\fIstuvwxyz\fR | 
|---|
| 368 | . | 
|---|
| 369 | (where \fIstuvwxyz\fR is exactly eight hexadecimal digits) reserved | 
|---|
| 370 | for a somewhat-hypothetical Unicode extension to 32 bits | 
|---|
| 371 | .TP | 
|---|
| 372 | \fB\ev\fR | 
|---|
| 373 | . | 
|---|
| 374 | vertical tab, as in C are all available. | 
|---|
| 375 | .TP | 
|---|
| 376 | \fB\ex\fIhhh\fR | 
|---|
| 377 | . | 
|---|
| 378 | (where \fIhhh\fR is any sequence of hexadecimal digits) the character | 
|---|
| 379 | whose hexadecimal value is \fB0x\fIhhh\fR (a single character no | 
|---|
| 380 | matter how many hexadecimal digits are used). | 
|---|
| 381 | .TP | 
|---|
| 382 | \fB\e0\fR | 
|---|
| 383 | . | 
|---|
| 384 | the character whose value is \fB0\fR | 
|---|
| 385 | .TP | 
|---|
| 386 | \fB\e\fIxy\fR | 
|---|
| 387 | . | 
|---|
| 388 | (where \fIxy\fR is exactly two octal digits, and is not a \fIback | 
|---|
| 389 | reference\fR (see below)) the character whose octal value is | 
|---|
| 390 | \fB0\fIxy\fR | 
|---|
| 391 | .TP | 
|---|
| 392 | \fB\e\fIxyz\fR | 
|---|
| 393 | . | 
|---|
| 394 | (where \fIxyz\fR is exactly three octal digits, and is not a back | 
|---|
| 395 | reference (see below)) the character whose octal value is | 
|---|
| 396 | \fB0\fIxyz\fR | 
|---|
| 397 | .RE | 
|---|
| 398 | .PP | 
|---|
| 399 | Hexadecimal digits are | 
|---|
| 400 | .QR \fB0\fR \fB9\fR , | 
|---|
| 401 | .QR \fBa\fR \fBf\fR , | 
|---|
| 402 | and | 
|---|
| 403 | .QR \fBA\fR \fBF\fR . | 
|---|
| 404 | Octal digits are | 
|---|
| 405 | .QR \fB0\fR \fB7\fR . | 
|---|
| 406 | .PP | 
|---|
| 407 | The character-entry escapes are always taken as ordinary characters. | 
|---|
| 408 | For example, \fB\e135\fR is \fB]\fR in Unicode, but \fB\e135\fR does | 
|---|
| 409 | not terminate a bracket expression. Beware, however, that some | 
|---|
| 410 | applications (e.g., C compilers and the Tcl interpreter if the regular | 
|---|
| 411 | expression is not quoted with braces) interpret such sequences | 
|---|
| 412 | themselves before the regular-expression package gets to see them, | 
|---|
| 413 | which may require doubling (quadrupling, etc.) the | 
|---|
| 414 | .QW \fB\e\fR . | 
|---|
| 415 | .SS "CLASS-SHORTHAND ESCAPES" | 
|---|
| 416 | Class-shorthand escapes (AREs only) provide shorthands for certain | 
|---|
| 417 | commonly-used character classes: | 
|---|
| 418 | .RS 2 | 
|---|
| 419 | .TP 10 | 
|---|
| 420 | \fB\ed\fR | 
|---|
| 421 | . | 
|---|
| 422 | \fB[[:digit:]]\fR | 
|---|
| 423 | .TP | 
|---|
| 424 | \fB\es\fR | 
|---|
| 425 | . | 
|---|
| 426 | \fB[[:space:]]\fR | 
|---|
| 427 | .TP | 
|---|
| 428 | \fB\ew\fR | 
|---|
| 429 | . | 
|---|
| 430 | \fB[[:alnum:]_]\fR (note underscore) | 
|---|
| 431 | .TP | 
|---|
| 432 | \fB\eD\fR | 
|---|
| 433 | . | 
|---|
| 434 | \fB[^[:digit:]]\fR | 
|---|
| 435 | .TP | 
|---|
| 436 | \fB\eS\fR | 
|---|
| 437 | . | 
|---|
| 438 | \fB[^[:space:]]\fR | 
|---|
| 439 | .TP | 
|---|
| 440 | \fB\eW\fR | 
|---|
| 441 | . | 
|---|
| 442 | \fB[^[:alnum:]_]\fR (note underscore) | 
|---|
| 443 | .RE | 
|---|
| 444 | .PP | 
|---|
| 445 | Within bracket expressions, | 
|---|
| 446 | .QW \fB\ed\fR , | 
|---|
| 447 | .QW \fB\es\fR , | 
|---|
| 448 | and | 
|---|
| 449 | .QW \fB\ew\fR \& | 
|---|
| 450 | lose their outer brackets, and | 
|---|
| 451 | .QW \fB\eD\fR , | 
|---|
| 452 | .QW \fB\eS\fR , | 
|---|
| 453 | and | 
|---|
| 454 | .QW \fB\eW\fR \& | 
|---|
| 455 | are illegal. (So, for example, | 
|---|
| 456 | .QW \fB[a-c\ed]\fR | 
|---|
| 457 | is equivalent to | 
|---|
| 458 | .QW \fB[a-c[:digit:]]\fR . | 
|---|
| 459 | Also, | 
|---|
| 460 | .QW \fB[a-c\eD]\fR , | 
|---|
| 461 | which is equivalent to | 
|---|
| 462 | .QW \fB[a-c^[:digit:]]\fR , | 
|---|
| 463 | is illegal.) | 
|---|
| 464 | .SS "CONSTRAINT ESCAPES" | 
|---|
| 465 | A constraint escape (AREs only) is a constraint, matching the empty | 
|---|
| 466 | string if specific conditions are met, written as an escape: | 
|---|
| 467 | .RS 2 | 
|---|
| 468 | .TP 6 | 
|---|
| 469 | \fB\eA\fR | 
|---|
| 470 | . | 
|---|
| 471 | matches only at the beginning of the string (see \fBMATCHING\fR, | 
|---|
| 472 | below, for how this differs from | 
|---|
| 473 | .QW \fB^\fR ) | 
|---|
| 474 | .TP | 
|---|
| 475 | \fB\em\fR | 
|---|
| 476 | . | 
|---|
| 477 | matches only at the beginning of a word | 
|---|
| 478 | .TP | 
|---|
| 479 | \fB\eM\fR | 
|---|
| 480 | . | 
|---|
| 481 | matches only at the end of a word | 
|---|
| 482 | .TP | 
|---|
| 483 | \fB\ey\fR | 
|---|
| 484 | . | 
|---|
| 485 | matches only at the beginning or end of a word | 
|---|
| 486 | .TP | 
|---|
| 487 | \fB\eY\fR | 
|---|
| 488 | . | 
|---|
| 489 | matches only at a point that is not the beginning or end of a word | 
|---|
| 490 | .TP | 
|---|
| 491 | \fB\eZ\fR | 
|---|
| 492 | . | 
|---|
| 493 | matches only at the end of the string (see \fBMATCHING\fR, below, for | 
|---|
| 494 | how this differs from | 
|---|
| 495 | .QW \fB$\fR ) | 
|---|
| 496 | .TP | 
|---|
| 497 | \fB\e\fIm\fR | 
|---|
| 498 | . | 
|---|
| 499 | (where \fIm\fR is a nonzero digit) a \fIback reference\fR, see below | 
|---|
| 500 | .TP | 
|---|
| 501 | \fB\e\fImnn\fR | 
|---|
| 502 | . | 
|---|
| 503 | (where \fIm\fR is a nonzero digit, and \fInn\fR is some more digits, | 
|---|
| 504 | and the decimal value \fImnn\fR is not greater than the number of | 
|---|
| 505 | closing capturing parentheses seen so far) a \fIback reference\fR, see | 
|---|
| 506 | below | 
|---|
| 507 | .RE | 
|---|
| 508 | .PP | 
|---|
| 509 | A word is defined as in the specification of | 
|---|
| 510 | .QW \fB[[:<:]]\fR | 
|---|
| 511 | and | 
|---|
| 512 | .QW \fB[[:>:]]\fR | 
|---|
| 513 | above. Constraint escapes are illegal within bracket expressions. | 
|---|
| 514 | .SS "BACK REFERENCES" | 
|---|
| 515 | A back reference (AREs only) matches the same string matched by the | 
|---|
| 516 | parenthesized subexpression specified by the number, so that (e.g.) | 
|---|
| 517 | .QW \fB([bc])\e1\fR | 
|---|
| 518 | matches | 
|---|
| 519 | .QW \fBbb\fR | 
|---|
| 520 | or | 
|---|
| 521 | .QW \fBcc\fR | 
|---|
| 522 | but not | 
|---|
| 523 | .QW \fBbc\fR . | 
|---|
| 524 | The subexpression must entirely precede the back reference in the RE. | 
|---|
| 525 | Subexpressions are numbered in the order of their leading parentheses. | 
|---|
| 526 | Non-capturing parentheses do not define subexpressions. | 
|---|
| 527 | .PP | 
|---|
| 528 | There is an inherent historical ambiguity between octal | 
|---|
| 529 | character-entry escapes and back references, which is resolved by | 
|---|
| 530 | heuristics, as hinted at above. A leading zero always indicates an | 
|---|
| 531 | octal escape. A single non-zero digit, not followed by another digit, | 
|---|
| 532 | is always taken as a back reference. A multi-digit sequence not | 
|---|
| 533 | starting with a zero is taken as a back reference if it comes after a | 
|---|
| 534 | suitable subexpression (i.e. the number is in the legal range for a | 
|---|
| 535 | back reference), and otherwise is taken as octal. | 
|---|
| 536 | .SH "METASYNTAX" | 
|---|
| 537 | In addition to the main syntax described above, there are some special | 
|---|
| 538 | forms and miscellaneous syntactic facilities available. | 
|---|
| 539 | .PP | 
|---|
| 540 | Normally the flavor of RE being used is specified by | 
|---|
| 541 | application-dependent means. However, this can be overridden by a | 
|---|
| 542 | \fIdirector\fR. If an RE of any flavor begins with | 
|---|
| 543 | .QW \fB***:\fR , | 
|---|
| 544 | the rest of the RE is an ARE. If an RE of any flavor begins with | 
|---|
| 545 | .QW \fB***=\fR , | 
|---|
| 546 | the rest of the RE is taken to be a literal string, with | 
|---|
| 547 | all characters considered ordinary characters. | 
|---|
| 548 | .PP | 
|---|
| 549 | An ARE may begin with \fIembedded options\fR: a sequence | 
|---|
| 550 | \fB(?\fIxyz\fB)\fR (where \fIxyz\fR is one or more alphabetic | 
|---|
| 551 | characters) specifies options affecting the rest of the RE. These | 
|---|
| 552 | supplement, and can override, any options specified by the | 
|---|
| 553 | application. The available option letters are: | 
|---|
| 554 | .RS 2 | 
|---|
| 555 | .TP 3 | 
|---|
| 556 | \fBb\fR | 
|---|
| 557 | . | 
|---|
| 558 | rest of RE is a BRE | 
|---|
| 559 | .TP 3 | 
|---|
| 560 | \fBc\fR | 
|---|
| 561 | . | 
|---|
| 562 | case-sensitive matching (usual default) | 
|---|
| 563 | .TP 3 | 
|---|
| 564 | \fBe\fR | 
|---|
| 565 | . | 
|---|
| 566 | rest of RE is an ERE | 
|---|
| 567 | .TP 3 | 
|---|
| 568 | \fBi\fR | 
|---|
| 569 | . | 
|---|
| 570 | case-insensitive matching (see \fBMATCHING\fR, below) | 
|---|
| 571 | .TP 3 | 
|---|
| 572 | \fBm\fR | 
|---|
| 573 | . | 
|---|
| 574 | historical synonym for \fBn\fR | 
|---|
| 575 | .TP 3 | 
|---|
| 576 | \fBn\fR | 
|---|
| 577 | . | 
|---|
| 578 | newline-sensitive matching (see \fBMATCHING\fR, below) | 
|---|
| 579 | .TP 3 | 
|---|
| 580 | \fBp\fR | 
|---|
| 581 | . | 
|---|
| 582 | partial newline-sensitive matching (see \fBMATCHING\fR, below) | 
|---|
| 583 | .TP 3 | 
|---|
| 584 | \fBq\fR | 
|---|
| 585 | . | 
|---|
| 586 | rest of RE is a literal | 
|---|
| 587 | .PQ quoted | 
|---|
| 588 | string, all ordinary characters | 
|---|
| 589 | .TP 3 | 
|---|
| 590 | \fBs\fR | 
|---|
| 591 | . | 
|---|
| 592 | non-newline-sensitive matching (usual default) | 
|---|
| 593 | .TP 3 | 
|---|
| 594 | \fBt\fR | 
|---|
| 595 | . | 
|---|
| 596 | tight syntax (usual default; see below) | 
|---|
| 597 | .TP 3 | 
|---|
| 598 | \fBw\fR | 
|---|
| 599 | . | 
|---|
| 600 | inverse partial newline-sensitive | 
|---|
| 601 | .PQ weird | 
|---|
| 602 | matching (see \fBMATCHING\fR, below) | 
|---|
| 603 | .TP 3 | 
|---|
| 604 | \fBx\fR | 
|---|
| 605 | . | 
|---|
| 606 | expanded syntax (see below) | 
|---|
| 607 | .RE | 
|---|
| 608 | .PP | 
|---|
| 609 | Embedded options take effect at the \fB)\fR terminating the sequence. | 
|---|
| 610 | They are available only at the start of an ARE, and may not be used | 
|---|
| 611 | later within it. | 
|---|
| 612 | .PP | 
|---|
| 613 | In addition to the usual (\fItight\fR) RE syntax, in which all | 
|---|
| 614 | characters are significant, there is an \fIexpanded\fR syntax, | 
|---|
| 615 | available in all flavors of RE with the \fB\-expanded\fR switch, or in | 
|---|
| 616 | AREs with the embedded x option. In the expanded syntax, white-space | 
|---|
| 617 | characters are ignored and all characters between a \fB#\fR and the | 
|---|
| 618 | following newline (or the end of the RE) are ignored, permitting | 
|---|
| 619 | paragraphing and commenting a complex RE. There are three exceptions | 
|---|
| 620 | to that basic rule: | 
|---|
| 621 | .IP \(bu 3 | 
|---|
| 622 | a white-space character or | 
|---|
| 623 | .QW \fB#\fR | 
|---|
| 624 | preceded by | 
|---|
| 625 | .QW \fB\e\fR | 
|---|
| 626 | is retained | 
|---|
| 627 | .IP \(bu 3 | 
|---|
| 628 | white space or | 
|---|
| 629 | .QW \fB#\fR | 
|---|
| 630 | within a bracket expression is retained | 
|---|
| 631 | .IP \(bu 3 | 
|---|
| 632 | white space and comments are illegal within multi-character symbols | 
|---|
| 633 | like the ARE | 
|---|
| 634 | .QW \fB(?:\fR | 
|---|
| 635 | or the BRE | 
|---|
| 636 | .QW \fB\e(\fR | 
|---|
| 637 | .PP | 
|---|
| 638 | Expanded-syntax white-space characters are blank, tab, newline, and | 
|---|
| 639 | any character that belongs to the \fIspace\fR character class. | 
|---|
| 640 | .PP | 
|---|
| 641 | Finally, in an ARE, outside bracket expressions, the sequence | 
|---|
| 642 | .QW \fB(?#\fIttt\fB)\fR | 
|---|
| 643 | (where \fIttt\fR is any text not containing a | 
|---|
| 644 | .QW \fB)\fR ) | 
|---|
| 645 | is a comment, completely ignored. Again, this is not | 
|---|
| 646 | allowed between the characters of multi-character symbols like | 
|---|
| 647 | .QW \fB(?:\fR . | 
|---|
| 648 | Such comments are more a historical artifact than a useful facility, | 
|---|
| 649 | and their use is deprecated; use the expanded syntax instead. | 
|---|
| 650 | .PP | 
|---|
| 651 | \fINone\fR of these metasyntax extensions is available if the | 
|---|
| 652 | application (or an initial | 
|---|
| 653 | .QW \fB***=\fR | 
|---|
| 654 | director) has specified that the | 
|---|
| 655 | user's input be treated as a literal string rather than as an RE. | 
|---|
| 656 | .SH MATCHING | 
|---|
| 657 | In the event that an RE could match more than one substring of a given | 
|---|
| 658 | string, the RE matches the one starting earliest in the string. If | 
|---|
| 659 | the RE could match more than one substring starting at that point, its | 
|---|
| 660 | choice is determined by its \fIpreference\fR: either the longest | 
|---|
| 661 | substring, or the shortest. | 
|---|
| 662 | .PP | 
|---|
| 663 | Most atoms, and all constraints, have no preference. A parenthesized | 
|---|
| 664 | RE has the same preference (possibly none) as the RE. A quantified | 
|---|
| 665 | atom with quantifier \fB{\fIm\fB}\fR or \fB{\fIm\fB}?\fR has the same | 
|---|
| 666 | preference (possibly none) as the atom itself. A quantified atom with | 
|---|
| 667 | other normal quantifiers (including \fB{\fIm\fB,\fIn\fB}\fR with | 
|---|
| 668 | \fIm\fR equal to \fIn\fR) prefers longest match. A quantified atom | 
|---|
| 669 | with other non-greedy quantifiers (including \fB{\fIm\fB,\fIn\fB}?\fR | 
|---|
| 670 | with \fIm\fR equal to \fIn\fR) prefers shortest match. A branch has | 
|---|
| 671 | the same preference as the first quantified atom in it which has a | 
|---|
| 672 | preference. An RE consisting of two or more branches connected by the | 
|---|
| 673 | \fB|\fR operator prefers longest match. | 
|---|
| 674 | .PP | 
|---|
| 675 | Subject to the constraints imposed by the rules for matching the whole | 
|---|
| 676 | RE, subexpressions also match the longest or shortest possible | 
|---|
| 677 | substrings, based on their preferences, with subexpressions starting | 
|---|
| 678 | earlier in the RE taking priority over ones starting later. Note that | 
|---|
| 679 | outer subexpressions thus take priority over their component | 
|---|
| 680 | subexpressions. | 
|---|
| 681 | .PP | 
|---|
| 682 | Note that the quantifiers \fB{1,1}\fR and \fB{1,1}?\fR can be used to | 
|---|
| 683 | force longest and shortest preference, respectively, on a | 
|---|
| 684 | subexpression or a whole RE. | 
|---|
| 685 | .PP | 
|---|
| 686 | Match lengths are measured in characters, not collating elements. An | 
|---|
| 687 | empty string is considered longer than no match at all. For example, | 
|---|
| 688 | .QW \fBbb*\fR | 
|---|
| 689 | matches the three middle characters of | 
|---|
| 690 | .QW \fBabbbc\fR , | 
|---|
| 691 | .QW \fB(week|wee)(night|knights)\fR | 
|---|
| 692 | matches all ten characters of | 
|---|
| 693 | .QW \fBweeknights\fR , | 
|---|
| 694 | when | 
|---|
| 695 | .QW \fB(.*).*\fR | 
|---|
| 696 | is matched against | 
|---|
| 697 | .QW \fBabc\fR | 
|---|
| 698 | the parenthesized subexpression matches all three characters, and when | 
|---|
| 699 | .QW \fB(a*)*\fR | 
|---|
| 700 | is matched against | 
|---|
| 701 | .QW \fBbc\fR | 
|---|
| 702 | both the whole RE and the parenthesized subexpression match an empty string. | 
|---|
| 703 | .PP | 
|---|
| 704 | If case-independent matching is specified, the effect is much as if | 
|---|
| 705 | all case distinctions had vanished from the alphabet. When an | 
|---|
| 706 | alphabetic that exists in multiple cases appears as an ordinary | 
|---|
| 707 | character outside a bracket expression, it is effectively transformed | 
|---|
| 708 | into a bracket expression containing both cases, so that \fBx\fR | 
|---|
| 709 | becomes | 
|---|
| 710 | .QW \fB[xX]\fR . | 
|---|
| 711 | When it appears inside a bracket expression, | 
|---|
| 712 | all case counterparts of it are added to the bracket expression, so | 
|---|
| 713 | that | 
|---|
| 714 | .QW \fB[x]\fR | 
|---|
| 715 | becomes | 
|---|
| 716 | .QW \fB[xX]\fR | 
|---|
| 717 | and | 
|---|
| 718 | .QW \fB[^x]\fR | 
|---|
| 719 | becomes | 
|---|
| 720 | .QW \fB[^xX]\fR . | 
|---|
| 721 | .PP | 
|---|
| 722 | If newline-sensitive matching is specified, \fB.\fR and bracket | 
|---|
| 723 | expressions using \fB^\fR will never match the newline character (so | 
|---|
| 724 | that matches will never cross newlines unless the RE explicitly | 
|---|
| 725 | arranges it) and \fB^\fR and \fB$\fR will match the empty string after | 
|---|
| 726 | and before a newline respectively, in addition to matching at | 
|---|
| 727 | beginning and end of string respectively. ARE \fB\eA\fR and \fB\eZ\fR | 
|---|
| 728 | continue to match beginning or end of string \fIonly\fR. | 
|---|
| 729 | .PP | 
|---|
| 730 | If partial newline-sensitive matching is specified, this affects | 
|---|
| 731 | \fB.\fR and bracket expressions as with newline-sensitive matching, | 
|---|
| 732 | but not \fB^\fR and \fB$\fR. | 
|---|
| 733 | .PP | 
|---|
| 734 | If inverse partial newline-sensitive matching is specified, this | 
|---|
| 735 | affects \fB^\fR and \fB$\fR as with newline-sensitive matching, but | 
|---|
| 736 | not \fB.\fR and bracket expressions. This is not very useful but is | 
|---|
| 737 | provided for symmetry. | 
|---|
| 738 | .SH "LIMITS AND COMPATIBILITY" | 
|---|
| 739 | No particular limit is imposed on the length of REs. Programs | 
|---|
| 740 | intended to be highly portable should not employ REs longer than 256 | 
|---|
| 741 | bytes, as a POSIX-compliant implementation can refuse to accept such | 
|---|
| 742 | REs. | 
|---|
| 743 | .PP | 
|---|
| 744 | The only feature of AREs that is actually incompatible with POSIX EREs | 
|---|
| 745 | is that \fB\e\fR does not lose its special significance inside bracket | 
|---|
| 746 | expressions. All other ARE features use syntax which is illegal or | 
|---|
| 747 | has undefined or unspecified effects in POSIX EREs; the \fB***\fR | 
|---|
| 748 | syntax of directors likewise is outside the POSIX syntax for both BREs | 
|---|
| 749 | and EREs. | 
|---|
| 750 | .PP | 
|---|
| 751 | Many of the ARE extensions are borrowed from Perl, but some have been | 
|---|
| 752 | changed to clean them up, and a few Perl extensions are not present. | 
|---|
| 753 | Incompatibilities of note include | 
|---|
| 754 | .QW \fB\eb\fR , | 
|---|
| 755 | .QW \fB\eB\fR , | 
|---|
| 756 | the lack of special treatment for a trailing newline, the addition of | 
|---|
| 757 | complemented bracket expressions to the things affected by | 
|---|
| 758 | newline-sensitive matching, the restrictions on parentheses and back | 
|---|
| 759 | references in lookahead constraints, and the longest/shortest-match | 
|---|
| 760 | (rather than first-match) matching semantics. | 
|---|
| 761 | .PP | 
|---|
| 762 | The matching rules for REs containing both normal and non-greedy | 
|---|
| 763 | quantifiers have changed since early beta-test versions of this | 
|---|
| 764 | package. (The new rules are much simpler and cleaner, but do not work | 
|---|
| 765 | as hard at guessing the user's real intentions.) | 
|---|
| 766 | .PP | 
|---|
| 767 | Henry Spencer's original 1986 \fIregexp\fR package, still in | 
|---|
| 768 | widespread use (e.g., in pre-8.1 releases of Tcl), implemented an | 
|---|
| 769 | early version of today's EREs. There are four incompatibilities | 
|---|
| 770 | between \fIregexp\fR's near-EREs | 
|---|
| 771 | .PQ RREs " for short" | 
|---|
| 772 | and AREs. In roughly increasing order of significance: | 
|---|
| 773 | .IP \(bu 3 | 
|---|
| 774 | In AREs, \fB\e\fR followed by an alphanumeric character is either an | 
|---|
| 775 | escape or an error, while in RREs, it was just another way of writing | 
|---|
| 776 | the alphanumeric. This should not be a problem because there was no | 
|---|
| 777 | reason to write such a sequence in RREs. | 
|---|
| 778 | .IP \(bu 3 | 
|---|
| 779 | \fB{\fR followed by a digit in an ARE is the beginning of a bound, | 
|---|
| 780 | while in RREs, \fB{\fR was always an ordinary character. Such | 
|---|
| 781 | sequences should be rare, and will often result in an error because | 
|---|
| 782 | following characters will not look like a valid bound. | 
|---|
| 783 | .IP \(bu 3 | 
|---|
| 784 | In AREs, \fB\e\fR remains a special character within | 
|---|
| 785 | .QW \fB[\|]\fR , | 
|---|
| 786 | so a literal \fB\e\fR within \fB[\|]\fR must be written | 
|---|
| 787 | .QW \fB\e\e\fR . | 
|---|
| 788 | \fB\e\e\fR also gives a literal \fB\e\fR within \fB[\|]\fR in RREs, | 
|---|
| 789 | but only truly paranoid programmers routinely doubled the backslash. | 
|---|
| 790 | .IP \(bu 3 | 
|---|
| 791 | AREs report the longest/shortest match for the RE, rather than the | 
|---|
| 792 | first found in a specified search order. This may affect some RREs | 
|---|
| 793 | which were written in the expectation that the first match would be | 
|---|
| 794 | reported. (The careful crafting of RREs to optimize the search order | 
|---|
| 795 | for fast matching is obsolete (AREs examine all possible matches in | 
|---|
| 796 | parallel, and their performance is largely insensitive to their | 
|---|
| 797 | complexity) but cases where the search order was exploited to | 
|---|
| 798 | deliberately find a match which was \fInot\fR the longest/shortest | 
|---|
| 799 | will need rewriting.) | 
|---|
| 800 | .SH "BASIC REGULAR EXPRESSIONS" | 
|---|
| 801 | BREs differ from EREs in several respects. | 
|---|
| 802 | .QW \fB|\fR , | 
|---|
| 803 | .QW \fB+\fR , | 
|---|
| 804 | and \fB?\fR are ordinary characters and there is no equivalent for their | 
|---|
| 805 | functionality. The delimiters for bounds are \fB\e{\fR and | 
|---|
| 806 | .QW \fB\e}\fR , | 
|---|
| 807 | with \fB{\fR and \fB}\fR by themselves ordinary characters. The | 
|---|
| 808 | parentheses for nested subexpressions are \fB\e(\fR and | 
|---|
| 809 | .QW \fB\e)\fR , | 
|---|
| 810 | with \fB(\fR and \fB)\fR by themselves ordinary | 
|---|
| 811 | characters. \fB^\fR is an ordinary character except at the beginning | 
|---|
| 812 | of the RE or the beginning of a parenthesized subexpression, \fB$\fR | 
|---|
| 813 | is an ordinary character except at the end of the RE or the end of a | 
|---|
| 814 | parenthesized subexpression, and \fB*\fR is an ordinary character if | 
|---|
| 815 | it appears at the beginning of the RE or the beginning of a | 
|---|
| 816 | parenthesized subexpression (after a possible leading | 
|---|
| 817 | .QW \fB^\fR ). | 
|---|
| 818 | Finally, single-digit back references are available, and \fB\e<\fR and | 
|---|
| 819 | \fB\e>\fR are synonyms for | 
|---|
| 820 | .QW \fB[[:<:]]\fR | 
|---|
| 821 | and | 
|---|
| 822 | .QW \fB[[:>:]]\fR | 
|---|
| 823 | respectively; no other escapes are available. | 
|---|
| 824 | .SH "SEE ALSO" | 
|---|
| 825 | RegExp(3), regexp(n), regsub(n), lsearch(n), switch(n), text(n) | 
|---|
| 826 | .SH KEYWORDS | 
|---|
| 827 | match, regular expression, string | 
|---|
| 828 | .\" Local Variables: | 
|---|
| 829 | .\" mode: nroff | 
|---|
| 830 | .\" End: | 
|---|