| 1 | '\" | 
|---|
| 2 | '\" Copyright (c) 1998 Sun Microsystems, Inc. | 
|---|
| 3 | '\" | 
|---|
| 4 | '\" See the file "license.terms" for information on usage and redistribution | 
|---|
| 5 | '\" of this file, and for a DISCLAIMER OF ALL WARRANTIES. | 
|---|
| 6 | '\"  | 
|---|
| 7 | '\" RCS: @(#) $Id: regexp.n,v 1.28 2007/12/13 15:22:33 dgp Exp $ | 
|---|
| 8 | '\"  | 
|---|
| 9 | .so man.macros | 
|---|
| 10 | .TH regexp n 8.3 Tcl "Tcl Built-In Commands" | 
|---|
| 11 | .BS | 
|---|
| 12 | '\" Note:  do not modify the .SH NAME line immediately below! | 
|---|
| 13 | .SH NAME | 
|---|
| 14 | regexp \- Match a regular expression against a string | 
|---|
| 15 |  | 
|---|
| 16 | .SH SYNOPSIS | 
|---|
| 17 | \fBregexp \fR?\fIswitches\fR? \fIexp string \fR?\fImatchVar\fR? ?\fIsubMatchVar subMatchVar ...\fR? | 
|---|
| 18 | .BE | 
|---|
| 19 |  | 
|---|
| 20 | .SH DESCRIPTION | 
|---|
| 21 | .PP | 
|---|
| 22 | Determines whether the regular expression \fIexp\fR matches part or | 
|---|
| 23 | all of \fIstring\fR and returns 1 if it does, 0 if it does not, unless | 
|---|
| 24 | \fB\-inline\fR is specified (see below). | 
|---|
| 25 | (Regular expression matching is described in the \fBre_syntax\fR | 
|---|
| 26 | reference page.) | 
|---|
| 27 | .LP | 
|---|
| 28 | If additional arguments are specified after \fIstring\fR then they | 
|---|
| 29 | are treated as the names of variables in which to return | 
|---|
| 30 | information about which part(s) of \fIstring\fR matched \fIexp\fR. | 
|---|
| 31 | \fIMatchVar\fR will be set to the range of \fIstring\fR that | 
|---|
| 32 | matched all of \fIexp\fR.  The first \fIsubMatchVar\fR will contain | 
|---|
| 33 | the characters in \fIstring\fR that matched the leftmost parenthesized | 
|---|
| 34 | subexpression within \fIexp\fR, the next \fIsubMatchVar\fR will | 
|---|
| 35 | contain the characters that matched the next parenthesized | 
|---|
| 36 | subexpression to the right in \fIexp\fR, and so on. | 
|---|
| 37 | .PP | 
|---|
| 38 | If the initial arguments to \fBregexp\fR start with \fB\-\fR then | 
|---|
| 39 | they are treated as switches.  The following switches are | 
|---|
| 40 | currently supported: | 
|---|
| 41 | .TP 15 | 
|---|
| 42 | \fB\-about\fR | 
|---|
| 43 | Instead of attempting to match the regular expression, returns a list | 
|---|
| 44 | containing information about the regular expression.  The first | 
|---|
| 45 | element of the list is a subexpression count.  The second element is a | 
|---|
| 46 | list of property names that describe various attributes of the regular | 
|---|
| 47 | expression. This switch is primarily intended for debugging purposes. | 
|---|
| 48 | .TP 15 | 
|---|
| 49 | \fB\-expanded\fR | 
|---|
| 50 | Enables use of the expanded regular expression syntax where | 
|---|
| 51 | whitespace and comments are ignored.  This is the same as specifying | 
|---|
| 52 | the \fB(?x)\fR embedded option (see the \fBre_syntax\fR manual page). | 
|---|
| 53 | .TP 15 | 
|---|
| 54 | \fB\-indices\fR | 
|---|
| 55 | Changes what is stored in the \fIsubMatchVar\fRs.  | 
|---|
| 56 | Instead of storing the matching characters from \fIstring\fR, | 
|---|
| 57 | each variable | 
|---|
| 58 | will contain a list of two decimal strings giving the indices | 
|---|
| 59 | in \fIstring\fR of the first and last characters in the matching | 
|---|
| 60 | range of characters. | 
|---|
| 61 | .TP 15 | 
|---|
| 62 | \fB\-line\fR | 
|---|
| 63 | Enables newline-sensitive matching.  By default, newline is a | 
|---|
| 64 | completely ordinary character with no special meaning.  With this | 
|---|
| 65 | flag, | 
|---|
| 66 | .QW [^ | 
|---|
| 67 | bracket expressions and | 
|---|
| 68 | .QW . | 
|---|
| 69 | never match newline, | 
|---|
| 70 | .QW ^ | 
|---|
| 71 | matches an empty string after any newline in addition to its normal | 
|---|
| 72 | function, and | 
|---|
| 73 | .QW $ | 
|---|
| 74 | matches an empty string before any newline in | 
|---|
| 75 | addition to its normal function.  This flag is equivalent to | 
|---|
| 76 | specifying both \fB\-linestop\fR and \fB\-lineanchor\fR, or the | 
|---|
| 77 | \fB(?n)\fR embedded option (see the \fBre_syntax\fR manual page). | 
|---|
| 78 | .TP 15 | 
|---|
| 79 | \fB\-linestop\fR | 
|---|
| 80 | Changes the behavior of | 
|---|
| 81 | .QW [^ | 
|---|
| 82 | bracket expressions and | 
|---|
| 83 | .QW . | 
|---|
| 84 | so that they | 
|---|
| 85 | stop at newlines.  This is the same as specifying the \fB(?p)\fR | 
|---|
| 86 | embedded option (see the \fBre_syntax\fR manual page). | 
|---|
| 87 | .TP 15 | 
|---|
| 88 | \fB\-lineanchor\fR | 
|---|
| 89 | Changes the behavior of | 
|---|
| 90 | .QW ^ | 
|---|
| 91 | and | 
|---|
| 92 | .QW $ | 
|---|
| 93 | (the | 
|---|
| 94 | .QW anchors ) | 
|---|
| 95 | so they match the | 
|---|
| 96 | beginning and end of a line respectively.  This is the same as | 
|---|
| 97 | specifying the \fB(?w)\fR embedded option (see the \fBre_syntax\fR | 
|---|
| 98 | manual page). | 
|---|
| 99 | .TP 15 | 
|---|
| 100 | \fB\-nocase\fR | 
|---|
| 101 | Causes upper-case characters in \fIstring\fR to be treated as | 
|---|
| 102 | lower case during the matching process. | 
|---|
| 103 | .TP 15 | 
|---|
| 104 | \fB\-all\fR | 
|---|
| 105 | Causes the regular expression to be matched as many times as possible | 
|---|
| 106 | in the string, returning the total number of matches found.  If this | 
|---|
| 107 | is specified with match variables, they will contain information for | 
|---|
| 108 | the last match only. | 
|---|
| 109 | .TP 15 | 
|---|
| 110 | \fB\-inline\fR | 
|---|
| 111 | Causes the command to return, as a list, the data that would otherwise | 
|---|
| 112 | be placed in match variables.  When using \fB\-inline\fR, | 
|---|
| 113 | match variables may not be specified.  If used with \fB\-all\fR, the | 
|---|
| 114 | list will be concatenated at each iteration, such that a flat list is | 
|---|
| 115 | always returned.  For each match iteration, the command will append the | 
|---|
| 116 | overall match data, plus one element for each subexpression in the | 
|---|
| 117 | regular expression.  Examples are: | 
|---|
| 118 | .CS | 
|---|
| 119 | \fBregexp\fR -inline -- {\ew(\ew)} " inlined " | 
|---|
| 120 |       \fI\(-> in n\fR | 
|---|
| 121 | \fBregexp\fR -all -inline -- {\ew(\ew)} " inlined " | 
|---|
| 122 |       \fI\(-> in n li i ne e\fR | 
|---|
| 123 | .CE | 
|---|
| 124 | .TP 15 | 
|---|
| 125 | \fB\-start\fR \fIindex\fR | 
|---|
| 126 | Specifies a character index offset into the string to start | 
|---|
| 127 | matching the regular expression at.   | 
|---|
| 128 | .VS 8.5 | 
|---|
| 129 | The \fIindex\fR value is interpreted in the same manner | 
|---|
| 130 | as the \fIindex\fR argument to \fBstring index\fR. | 
|---|
| 131 | .VE 8.5 | 
|---|
| 132 | When using this switch, | 
|---|
| 133 | .QW ^ | 
|---|
| 134 | will not match the beginning of the line, and \eA will still | 
|---|
| 135 | match the start of the string at \fIindex\fR.  If \fB\-indices\fR | 
|---|
| 136 | is specified, the indices will be indexed starting from the | 
|---|
| 137 | absolute beginning of the input string. | 
|---|
| 138 | \fIindex\fR will be constrained to the bounds of the input string. | 
|---|
| 139 | .TP 15 | 
|---|
| 140 | \fB\-\|\-\fR | 
|---|
| 141 | Marks the end of switches.  The argument following this one will | 
|---|
| 142 | be treated as \fIexp\fR even if it starts with a \fB\-\fR. | 
|---|
| 143 | .PP | 
|---|
| 144 | If there are more \fIsubMatchVar\fRs than parenthesized | 
|---|
| 145 | subexpressions within \fIexp\fR, or if a particular subexpression | 
|---|
| 146 | in \fIexp\fR does not match the string (e.g. because it was in a | 
|---|
| 147 | portion of the expression that was not matched), then the corresponding | 
|---|
| 148 | \fIsubMatchVar\fR will be set to | 
|---|
| 149 | .QW "\fB\-1 \-1\fR" | 
|---|
| 150 | if \fB\-indices\fR has been specified or to an empty string otherwise. | 
|---|
| 151 | .SH EXAMPLES | 
|---|
| 152 | Find the first occurrence of a word starting with \fBfoo\fR in a | 
|---|
| 153 | string that is not actually an instance of \fBfoobar\fR, and get the | 
|---|
| 154 | letters following it up to the end of the word into a variable: | 
|---|
| 155 | .CS | 
|---|
| 156 | \fBregexp\fR {\e<foo(?!bar\e>)(\ew*)} $string \-> restOfWord | 
|---|
| 157 | .CE | 
|---|
| 158 | Note that the whole matched substring has been placed in the variable | 
|---|
| 159 | \fB\->\fR which is a name chosen to look nice given that we are not | 
|---|
| 160 | actually interested in its contents. | 
|---|
| 161 | .PP | 
|---|
| 162 | Find the index of the word \fBbadger\fR (in any case) within a string | 
|---|
| 163 | and store that in the variable \fBlocation\fR: | 
|---|
| 164 | .CS | 
|---|
| 165 | \fBregexp\fR \-indices {(?i)\e<badger\e>} $string location | 
|---|
| 166 | .CE | 
|---|
| 167 | .PP | 
|---|
| 168 | Count the number of octal digits in a string: | 
|---|
| 169 | .CS | 
|---|
| 170 | \fBregexp\fR \-all {[0\-7]} $string | 
|---|
| 171 | .CE | 
|---|
| 172 | .PP | 
|---|
| 173 | List all words (consisting of all sequences of non-whitespace | 
|---|
| 174 | characters) in a string: | 
|---|
| 175 | .CS | 
|---|
| 176 | \fBregexp\fR \-all \-inline {\eS+} $string | 
|---|
| 177 | .CE | 
|---|
| 178 |  | 
|---|
| 179 | .SH "SEE ALSO" | 
|---|
| 180 | re_syntax(n), regsub(n), | 
|---|
| 181 | .VS 8.5 | 
|---|
| 182 | string(n) | 
|---|
| 183 | .VE | 
|---|
| 184 |  | 
|---|
| 185 |  | 
|---|
| 186 | .SH KEYWORDS | 
|---|
| 187 | match, regular expression, string | 
|---|