| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
|---|
| 2 | <html> |
|---|
| 3 | <head> |
|---|
| 4 | <title>Boost.Regex: POSIX API Compatibility Functions</title> |
|---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
|---|
| 6 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
|---|
| 7 | </head> |
|---|
| 8 | <body> |
|---|
| 9 | <P> |
|---|
| 10 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
|---|
| 11 | <TR> |
|---|
| 12 | <td valign="top" width="300"> |
|---|
| 13 | <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
|---|
| 14 | </td> |
|---|
| 15 | <TD width="353"> |
|---|
| 16 | <H1 align="center">Boost.Regex</H1> |
|---|
| 17 | <H2 align="center">POSIX API Compatibility Functions</H2> |
|---|
| 18 | </TD> |
|---|
| 19 | <td width="50"> |
|---|
| 20 | <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> |
|---|
| 21 | </td> |
|---|
| 22 | </TR> |
|---|
| 23 | </TABLE> |
|---|
| 24 | </P> |
|---|
| 25 | <HR> |
|---|
| 26 | <p></p> |
|---|
| 27 | <PRE>#include <boost/cregex.hpp> |
|---|
| 28 | <I>or</I>: |
|---|
| 29 | #include <boost/regex.h></PRE> |
|---|
| 30 | <P>The following functions are available for users who need a POSIX compatible C |
|---|
| 31 | library, they are available in both Unicode and narrow character versions, the |
|---|
| 32 | standard POSIX API names are macros that expand to one version or the other |
|---|
| 33 | depending upon whether UNICODE is defined or not. |
|---|
| 34 | </P> |
|---|
| 35 | <P><B>Important</B>: Note that all the symbols defined here are enclosed inside |
|---|
| 36 | namespace <I>boost</I> when used in C++ programs, unless you use #include |
|---|
| 37 | <boost/regex.h> instead - in which case the symbols are still defined in |
|---|
| 38 | namespace boost, but are made available in the global namespace as well.</P> |
|---|
| 39 | <P>The functions are defined as: |
|---|
| 40 | </P> |
|---|
| 41 | <PRE>extern "C" { |
|---|
| 42 | <B>int</B> regcompA(regex_tA*, <B>const</B> <B>char</B>*, <B>int</B>); |
|---|
| 43 | <B>unsigned</B> <B>int</B> regerrorA(<B>int</B>, <B>const</B> regex_tA*, <B>char</B>*, <B>unsigned</B> <B>int</B>); |
|---|
| 44 | <B>int</B> regexecA(<B>const</B> regex_tA*, <B>const</B> <B>char</B>*, <B>unsigned</B> <B>int</B>, regmatch_t*, <B>int</B>); |
|---|
| 45 | <B>void</B> regfreeA(regex_tA*); |
|---|
| 46 | |
|---|
| 47 | <B>int</B> regcompW(regex_tW*, <B>const</B> <B>wchar_t</B>*, <B>int</B>); |
|---|
| 48 | <B>unsigned</B> <B>int</B> regerrorW(<B>int</B>, <B>const</B> regex_tW*, <B>wchar_t</B>*, <B>unsigned</B> <B>int</B>); |
|---|
| 49 | <B>int</B> regexecW(<B>const</B> regex_tW*, <B>const</B> <B>wchar_t</B>*, <B>unsigned</B> <B>int</B>, regmatch_t*, <B>int</B>); |
|---|
| 50 | <B>void</B> regfreeW(regex_tW*); |
|---|
| 51 | |
|---|
| 52 | #ifdef UNICODE |
|---|
| 53 | #define regcomp regcompW |
|---|
| 54 | #define regerror regerrorW |
|---|
| 55 | #define regexec regexecW |
|---|
| 56 | #define regfree regfreeW |
|---|
| 57 | #define regex_t regex_tW |
|---|
| 58 | #else |
|---|
| 59 | #define regcomp regcompA |
|---|
| 60 | #define regerror regerrorA |
|---|
| 61 | #define regexec regexecA |
|---|
| 62 | #define regfree regfreeA |
|---|
| 63 | #define regex_t regex_tA |
|---|
| 64 | #endif |
|---|
| 65 | }</PRE> |
|---|
| 66 | <P>All the functions operate on structure <B>regex_t</B>, which exposes two public |
|---|
| 67 | members: |
|---|
| 68 | </P> |
|---|
| 69 | <P><B>unsigned int re_nsub</B> this is filled in by <B>regcomp</B> and indicates |
|---|
| 70 | the number of sub-expressions contained in the regular expression. |
|---|
| 71 | </P> |
|---|
| 72 | <P><B>const TCHAR* re_endp</B> points to the end of the expression to compile when |
|---|
| 73 | the flag REG_PEND is set. |
|---|
| 74 | </P> |
|---|
| 75 | <P><I>Footnote: regex_t is actually a #define - it is either regex_tA or regex_tW |
|---|
| 76 | depending upon whether UNICODE is defined or not, TCHAR is either char or |
|---|
| 77 | wchar_t again depending upon the macro UNICODE.</I> |
|---|
| 78 | </P> |
|---|
| 79 | <H3>regcomp</H3> |
|---|
| 80 | <P><B>regcomp</B> takes a pointer to a <B>regex_t</B>, a pointer to the expression |
|---|
| 81 | to compile and a flags parameter which can be a combination of: |
|---|
| 82 | <BR> |
|---|
| 83 | |
|---|
| 84 | </P> |
|---|
| 85 | <P> |
|---|
| 86 | <TABLE id="Table2" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
|---|
| 87 | <TR> |
|---|
| 88 | <TD width="5%"> </TD> |
|---|
| 89 | <TD vAlign="top" width="45%">REG_EXTENDED</TD> |
|---|
| 90 | <TD vAlign="top" width="45%">Compiles modern regular expressions. Equivalent to |
|---|
| 91 | regbase::char_classes | regbase::intervals | regbase::bk_refs.</TD> |
|---|
| 92 | <TD width="5%"> </TD> |
|---|
| 93 | </TR> |
|---|
| 94 | <TR> |
|---|
| 95 | <TD width="5%"> </TD> |
|---|
| 96 | <TD vAlign="top" width="45%">REG_BASIC</TD> |
|---|
| 97 | <TD vAlign="top" width="45%">Compiles basic (obsolete) regular expression syntax. |
|---|
| 98 | Equivalent to regbase::char_classes | regbase::intervals | regbase::limited_ops |
|---|
| 99 | | regbase::bk_braces | regbase::bk_parens | regbase::bk_refs.</TD> |
|---|
| 100 | <TD width="5%"> </TD> |
|---|
| 101 | </TR> |
|---|
| 102 | <TR> |
|---|
| 103 | <TD width="5%"> </TD> |
|---|
| 104 | <TD vAlign="top" width="45%">REG_NOSPEC</TD> |
|---|
| 105 | <TD vAlign="top" width="45%">All characters are ordinary, the expression is a |
|---|
| 106 | literal string.</TD> |
|---|
| 107 | <TD width="5%"> </TD> |
|---|
| 108 | </TR> |
|---|
| 109 | <TR> |
|---|
| 110 | <TD width="5%"> </TD> |
|---|
| 111 | <TD vAlign="top" width="45%">REG_ICASE</TD> |
|---|
| 112 | <TD vAlign="top" width="45%">Compiles for matching that ignores character case.</TD> |
|---|
| 113 | <TD width="5%"> </TD> |
|---|
| 114 | </TR> |
|---|
| 115 | <TR> |
|---|
| 116 | <TD width="5%"> </TD> |
|---|
| 117 | <TD vAlign="top" width="45%">REG_NOSUB</TD> |
|---|
| 118 | <TD vAlign="top" width="45%">Has no effect in this library.</TD> |
|---|
| 119 | <TD width="5%"> </TD> |
|---|
| 120 | </TR> |
|---|
| 121 | <TR> |
|---|
| 122 | <TD width="5%"> </TD> |
|---|
| 123 | <TD vAlign="top" width="45%">REG_NEWLINE</TD> |
|---|
| 124 | <TD vAlign="top" width="45%">When this flag is set a dot does not match the |
|---|
| 125 | newline character.</TD> |
|---|
| 126 | <TD width="5%"> </TD> |
|---|
| 127 | </TR> |
|---|
| 128 | <TR> |
|---|
| 129 | <TD width="5%"> </TD> |
|---|
| 130 | <TD vAlign="top" width="45%">REG_PEND</TD> |
|---|
| 131 | <TD vAlign="top" width="45%">When this flag is set the re_endp parameter of the |
|---|
| 132 | regex_t structure must point to the end of the regular expression to compile.</TD> |
|---|
| 133 | <TD width="5%"> </TD> |
|---|
| 134 | </TR> |
|---|
| 135 | <TR> |
|---|
| 136 | <TD width="5%"> </TD> |
|---|
| 137 | <TD vAlign="top" width="45%">REG_NOCOLLATE</TD> |
|---|
| 138 | <TD vAlign="top" width="45%">When this flag is set then locale dependent collation |
|---|
| 139 | for character ranges is turned off.</TD> |
|---|
| 140 | <TD width="5%"> </TD> |
|---|
| 141 | </TR> |
|---|
| 142 | <TR> |
|---|
| 143 | <TD width="5%"> </TD> |
|---|
| 144 | <TD vAlign="top" width="45%">REG_ESCAPE_IN_LISTS<BR> |
|---|
| 145 | , , , |
|---|
| 146 | </TD> |
|---|
| 147 | <TD vAlign="top" width="45%">When this flag is set, then escape sequences are |
|---|
| 148 | permitted in bracket expressions (character sets).</TD> |
|---|
| 149 | <TD width="5%"> </TD> |
|---|
| 150 | </TR> |
|---|
| 151 | <TR> |
|---|
| 152 | <TD width="5%"> </TD> |
|---|
| 153 | <TD vAlign="top" width="45%">REG_NEWLINE_ALT </TD> |
|---|
| 154 | <TD vAlign="top" width="45%">When this flag is set then the newline character is |
|---|
| 155 | equivalent to the alternation operator |.</TD> |
|---|
| 156 | <TD width="5%"> </TD> |
|---|
| 157 | </TR> |
|---|
| 158 | <TR> |
|---|
| 159 | <TD width="5%"> </TD> |
|---|
| 160 | <TD vAlign="top" width="45%">REG_PERL </TD> |
|---|
| 161 | <TD vAlign="top" width="45%">Compiles Perl like regular expressions.</TD> |
|---|
| 162 | <TD width="5%"> </TD> |
|---|
| 163 | </TR> |
|---|
| 164 | <TR> |
|---|
| 165 | <TD width="5%"> </TD> |
|---|
| 166 | <TD vAlign="top" width="45%">REG_AWK</TD> |
|---|
| 167 | <TD vAlign="top" width="45%">A shortcut for awk-like behavior: REG_EXTENDED | |
|---|
| 168 | REG_ESCAPE_IN_LISTS</TD> |
|---|
| 169 | <TD width="5%"> </TD> |
|---|
| 170 | </TR> |
|---|
| 171 | <TR> |
|---|
| 172 | <TD width="5%"> </TD> |
|---|
| 173 | <TD vAlign="top" width="45%">REG_GREP</TD> |
|---|
| 174 | <TD vAlign="top" width="45%">A shortcut for grep like behavior: REG_BASIC | |
|---|
| 175 | REG_NEWLINE_ALT</TD> |
|---|
| 176 | <TD width="5%"> </TD> |
|---|
| 177 | </TR> |
|---|
| 178 | <TR> |
|---|
| 179 | <TD width="5%"> </TD> |
|---|
| 180 | <TD vAlign="top" width="45%">REG_EGREP</TD> |
|---|
| 181 | <TD vAlign="top" width="45%"> A shortcut for egrep like behavior: |
|---|
| 182 | REG_EXTENDED | REG_NEWLINE_ALT</TD> |
|---|
| 183 | <TD width="5%"> </TD> |
|---|
| 184 | </TR> |
|---|
| 185 | </TABLE> |
|---|
| 186 | </P> |
|---|
| 187 | <H3>regerror</H3> |
|---|
| 188 | <P>regerror takes the following parameters, it maps an error code to a human |
|---|
| 189 | readable string: |
|---|
| 190 | <BR> |
|---|
| 191 | </P> |
|---|
| 192 | <P> |
|---|
| 193 | <TABLE id="Table3" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
|---|
| 194 | <TR> |
|---|
| 195 | <TD width="5%"> </TD> |
|---|
| 196 | <TD vAlign="top" width="50%">int code</TD> |
|---|
| 197 | <TD vAlign="top" width="50%">The error code.</TD> |
|---|
| 198 | <TD width="5%"> </TD> |
|---|
| 199 | </TR> |
|---|
| 200 | <TR> |
|---|
| 201 | <TD> </TD> |
|---|
| 202 | <TD vAlign="top" width="50%">const regex_t* e</TD> |
|---|
| 203 | <TD vAlign="top" width="50%">The regular expression (can be null).</TD> |
|---|
| 204 | <TD> </TD> |
|---|
| 205 | </TR> |
|---|
| 206 | <TR> |
|---|
| 207 | <TD> </TD> |
|---|
| 208 | <TD vAlign="top" width="50%">char* buf</TD> |
|---|
| 209 | <TD vAlign="top" width="50%">The buffer to fill in with the error message.</TD> |
|---|
| 210 | <TD> </TD> |
|---|
| 211 | </TR> |
|---|
| 212 | <TR> |
|---|
| 213 | <TD> </TD> |
|---|
| 214 | <TD vAlign="top" width="50%">unsigned int buf_size</TD> |
|---|
| 215 | <TD vAlign="top" width="50%">The length of buf.</TD> |
|---|
| 216 | <TD> </TD> |
|---|
| 217 | </TR> |
|---|
| 218 | </TABLE> |
|---|
| 219 | </P> |
|---|
| 220 | <P>If the error code is OR'ed with REG_ITOA then the message that results is the |
|---|
| 221 | printable name of the code rather than a message, for example "REG_BADPAT". If |
|---|
| 222 | the code is REG_ATIO then <B>e</B> must not be null and <B>e->re_pend</B> must |
|---|
| 223 | point to the printable name of an error code, the return value is then the |
|---|
| 224 | value of the error code. For any other value of <B>code</B>, the return value |
|---|
| 225 | is the number of characters in the error message, if the return value is |
|---|
| 226 | greater than or equal to <B>buf_size</B> then <B>regerror</B> will have to be |
|---|
| 227 | called again with a larger buffer.</P> |
|---|
| 228 | <H3>regexec</H3> |
|---|
| 229 | <P><B>regexec</B> finds the first occurrence of expression <B>e</B> within string <B>buf</B>. |
|---|
| 230 | If <B>len</B> is non-zero then *<B>m</B> is filled in with what matched the |
|---|
| 231 | regular expression, <B>m[0]</B> contains what matched the whole string, <B>m[1] </B> |
|---|
| 232 | the first sub-expression etc, see <B>regmatch_t</B> in the header file |
|---|
| 233 | declaration for more details. The <B>eflags</B> parameter can be a combination |
|---|
| 234 | of: |
|---|
| 235 | <BR> |
|---|
| 236 | |
|---|
| 237 | </P> |
|---|
| 238 | <P> |
|---|
| 239 | <TABLE id="Table4" cellSpacing="0" cellPadding="7" width="100%" border="0"> |
|---|
| 240 | <TR> |
|---|
| 241 | <TD width="5%"> </TD> |
|---|
| 242 | <TD vAlign="top" width="50%">REG_NOTBOL</TD> |
|---|
| 243 | <TD vAlign="top" width="50%">Parameter <B>buf </B>does not represent the start of |
|---|
| 244 | a line.</TD> |
|---|
| 245 | <TD width="5%"> </TD> |
|---|
| 246 | </TR> |
|---|
| 247 | <TR> |
|---|
| 248 | <TD> </TD> |
|---|
| 249 | <TD vAlign="top" width="50%">REG_NOTEOL</TD> |
|---|
| 250 | <TD vAlign="top" width="50%">Parameter <B>buf</B> does not terminate at the end of |
|---|
| 251 | a line.</TD> |
|---|
| 252 | <TD> </TD> |
|---|
| 253 | </TR> |
|---|
| 254 | <TR> |
|---|
| 255 | <TD> </TD> |
|---|
| 256 | <TD vAlign="top" width="50%">REG_STARTEND</TD> |
|---|
| 257 | <TD vAlign="top" width="50%">The string searched starts at buf + pmatch[0].rm_so |
|---|
| 258 | and ends at buf + pmatch[0].rm_eo.</TD> |
|---|
| 259 | <TD> </TD> |
|---|
| 260 | </TR> |
|---|
| 261 | </TABLE> |
|---|
| 262 | </P> |
|---|
| 263 | <H3>regfree</H3> |
|---|
| 264 | <P>Finally <B>regfree</B> frees all the memory that was allocated by regcomp. |
|---|
| 265 | </P> |
|---|
| 266 | <P><I>Footnote: this is an abridged reference to the POSIX API functions, it is |
|---|
| 267 | provided for compatibility with other libraries, rather than an API to be used |
|---|
| 268 | in new code (unless you need access from a language other than C++). This |
|---|
| 269 | version of these functions should also happily coexist with other versions, as |
|---|
| 270 | the names used are macros that expand to the actual function names.</I> |
|---|
| 271 | <P> |
|---|
| 272 | <HR> |
|---|
| 273 | <P></P> |
|---|
| 274 | <p>Revised |
|---|
| 275 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
|---|
| 276 | 24 Oct 2003 |
|---|
| 277 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
|---|
| 278 | <p><i>© Copyright John Maddock 1998- |
|---|
| 279 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%Y" startspan --> |
|---|
| 280 | 2003<!--webbot bot="Timestamp" endspan i-checksum="39359" --></i></p> |
|---|
| 281 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
|---|
| 282 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
|---|
| 283 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
|---|
| 284 | </body> |
|---|
| 285 | </html> |
|---|
| 286 | |
|---|