| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> |
|---|
| 2 | <html> |
|---|
| 3 | <head> |
|---|
| 4 | <title>Boost.Regex: Working With Unicode and ICU String Types</title> |
|---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
|---|
| 6 | <LINK href="../../../boost.css" type="text/css" rel="stylesheet"></head> |
|---|
| 7 | <body> |
|---|
| 8 | <P> |
|---|
| 9 | <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> |
|---|
| 10 | <TR> |
|---|
| 11 | <td vAlign="top" width="300"> |
|---|
| 12 | <h3><A href="../../../index.htm"><IMG height="86" alt="C++ Boost" src="../../../boost.png" width="277" border="0"></A></h3> |
|---|
| 13 | </td> |
|---|
| 14 | <TD width="353"> |
|---|
| 15 | <H1 align="center">Boost.Regex</H1> |
|---|
| 16 | <H2 align="center">Working With Unicode and ICU String Types.</H2> |
|---|
| 17 | </TD> |
|---|
| 18 | <td width="50"> |
|---|
| 19 | <h3><A href="index.html"><IMG height="45" alt="Boost.Regex Index" src="uarrow.gif" width="43" border="0"></A></h3> |
|---|
| 20 | </td> |
|---|
| 21 | </TR> |
|---|
| 22 | </TABLE> |
|---|
| 23 | </P> |
|---|
| 24 | <HR> |
|---|
| 25 | <p></p> |
|---|
| 26 | <H3>Contents</H3> |
|---|
| 27 | <dl class="index"> |
|---|
| 28 | <dt><a href="#introduction">Introduction</a></dt> |
|---|
| 29 | <dt><a href="#types">Unicode regular expression types</a></dt> |
|---|
| 30 | <dt><a href="#algo">Regular Expression Algorithms</a> |
|---|
| 31 | <dd> |
|---|
| 32 | <dl class="index"> |
|---|
| 33 | <dt><a href="#u32regex_match">u32regex_match</a></dt> |
|---|
| 34 | <dt><a href="#u32regex_search">u32regex_search</a></dt> |
|---|
| 35 | <dt><a href="#u32regex_replace">u32regex_replace</a></dt> |
|---|
| 36 | </dl> |
|---|
| 37 | </dd> |
|---|
| 38 | </dt> |
|---|
| 39 | <dt><a href="#iterators">Iterators</a> |
|---|
| 40 | <dd> |
|---|
| 41 | <dl class="index"> |
|---|
| 42 | <dt><a href="#u32regex_iterator">u32regex_iterator</a></dt> |
|---|
| 43 | <dt><a href="#u32regex_token_iterator">u32regex_token_iterator</a></dt> |
|---|
| 44 | </dl> |
|---|
| 45 | </dd> |
|---|
| 46 | </dt> |
|---|
| 47 | </dl> |
|---|
| 48 | <H3><A name="introduction"></A>Introduction</H3> |
|---|
| 49 | <P>The header:</P> |
|---|
| 50 | <PRE><boost/regex/icu.hpp></PRE> |
|---|
| 51 | <P>contains the data types and algorithms necessary for working with regular |
|---|
| 52 | expressions in a Unicode aware environment. |
|---|
| 53 | </P> |
|---|
| 54 | <P>In order to use this header you will need <A href="http://www.ibm.com/software/globalization/icu/"> |
|---|
| 55 | the ICU library</A>, and you will need to have built the Boost.Regex library |
|---|
| 56 | with <A href="install.html#unicode">ICU support enabled</A>.</P> |
|---|
| 57 | <P>The header will enable you to:</P> |
|---|
| 58 | <UL> |
|---|
| 59 | <LI> |
|---|
| 60 | Create regular expressions that treat Unicode strings as sequences of UTF-32 |
|---|
| 61 | code points. |
|---|
| 62 | <LI> |
|---|
| 63 | Create regular expressions that support various Unicode data properties, |
|---|
| 64 | including character classification. |
|---|
| 65 | <LI> |
|---|
| 66 | Transparently search Unicode strings that are encoded as either UTF-8, UTF-16 |
|---|
| 67 | or UTF-32.</LI></UL> |
|---|
| 68 | <H3><A name="types"></A>Unicode regular expression types</H3> |
|---|
| 69 | <P>Header <boost/regex/icu.hpp> provides a regular expression traits |
|---|
| 70 | class that handles UTF-32 characters:</P> |
|---|
| 71 | <PRE>class icu_regex_traits;</PRE> |
|---|
| 72 | <P>and a regular expression type based upon that:</P> |
|---|
| 73 | <PRE>typedef basic_regex<UChar32,icu_regex_traits> u32regex;</PRE> |
|---|
| 74 | <P>The type <EM>u32regex</EM> is regular expression type to use for all Unicode |
|---|
| 75 | regular expressions; internally it uses UTF-32 code points, but can be created |
|---|
| 76 | from, and used to search, either UTF-8, or UTF-16 encoded strings as well as |
|---|
| 77 | UTF-32 ones.</P> |
|---|
| 78 | <P>The <A href="basic_regex.html#c2">constructors</A>, and <A href="basic_regex.html#a1"> |
|---|
| 79 | assign</A> member functions of u32regex, require UTF-32 encoded strings, but |
|---|
| 80 | there are a series of overloaded algorithms called make_u32regex which allow |
|---|
| 81 | regular expressions to be created from UTF-8, UTF-16, or UTF-32 encoded |
|---|
| 82 | strings:</P> |
|---|
| 83 | <PRE>template <class InputIterator> |
|---|
| 84 | u32regex make_u32regex(InputIterator i, InputIterator j, boost::regex_constants::syntax_option_type opt); |
|---|
| 85 | </PRE> |
|---|
| 86 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the iterator |
|---|
| 87 | sequence [i,j). The character encoding of the sequence is determined based upon <code> |
|---|
| 88 | sizeof(*i)</code>: 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.</P> |
|---|
| 89 | <PRE>u32regex make_u32regex(const char* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl); |
|---|
| 90 | </PRE> |
|---|
| 91 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the |
|---|
| 92 | Null-terminated UTF-8 characater sequence <EM>p</EM>.</P> |
|---|
| 93 | <PRE>u32regex make_u32regex(const unsigned char* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);</PRE> |
|---|
| 94 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the |
|---|
| 95 | Null-terminated UTF-8 characater sequence <EM>p</EM>.u32regex |
|---|
| 96 | make_u32regex(const wchar_t* p, boost::regex_constants::syntax_option_type opt |
|---|
| 97 | = boost::regex_constants::perl);</P> |
|---|
| 98 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the |
|---|
| 99 | Null-terminated characater sequence <EM>p</EM>. The character encoding of |
|---|
| 100 | the sequence is determined based upon <CODE>sizeof(wchar_t)</CODE>: 1 implies |
|---|
| 101 | UTF-8, 2 implies UTF-16, and 4 implies UTF-32.</P> |
|---|
| 102 | <PRE>u32regex make_u32regex(const UChar* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);</PRE> |
|---|
| 103 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the |
|---|
| 104 | Null-terminated UTF-16 characater sequence <EM>p</EM>.</P> |
|---|
| 105 | <PRE>template<class C, class T, class A> |
|---|
| 106 | u32regex make_u32regex(const std::basic_string<C, T, A>& s, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);</PRE> |
|---|
| 107 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the string <EM>s</EM>. |
|---|
| 108 | The character encoding of the string is determined based upon <CODE>sizeof(C)</CODE>: |
|---|
| 109 | 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.</P> |
|---|
| 110 | <PRE>u32regex make_u32regex(const UnicodeString& s, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);</PRE> |
|---|
| 111 | <P><STRONG>Effects:</STRONG> Creates a regular expression object from the UTF-16 |
|---|
| 112 | encoding string <EM>s</EM>.</P> |
|---|
| 113 | <H3><A name="algo"></A>Regular Expression Algorithms</H3> |
|---|
| 114 | <P>The regular expression algorithms <A href="regex_match.html">regex_match</A>, <A href="regex_search.html"> |
|---|
| 115 | regex_search</A> and <A href="regex_replace.html">regex_replace</A> all |
|---|
| 116 | expect that the character sequence upon which they operate, is encoded in the |
|---|
| 117 | same character encoding as the regular expression object with which they are |
|---|
| 118 | used. For Unicode regular expressions that behavior is undesirable: while |
|---|
| 119 | we may want to process the data in UTF-32 "chunks", the actual data is much |
|---|
| 120 | more likely to encoded as either UTF-8 or UTF-16. Therefore the header |
|---|
| 121 | <boost/regex/icu.hpp> provides a series of thin wrappers around these |
|---|
| 122 | algorithms, called u32regex_match, u32regex_search, and u32regex_replace. |
|---|
| 123 | These wrappers use iterator-adapters internally to make external UTF-8 or |
|---|
| 124 | UTF-16 data look as though it's really a UTF-32 sequence, that can then be |
|---|
| 125 | passed on to the "real" algorithm.</P> |
|---|
| 126 | <H4><A name="u32regex_match"></A>u32regex_match</H4> |
|---|
| 127 | <P>For each <A href="regex_match.html">regex_match</A> algorithm defined by |
|---|
| 128 | <boost/regex.hpp>, then <boost/regex/icu.hpp> defines an overloaded |
|---|
| 129 | algorithm that takes the same arguments, but which is called <EM>u32regex_match</EM>, |
|---|
| 130 | and which will accept UTF-8, UTF-16 or UTF-32 encoded data, as well as an |
|---|
| 131 | ICU UnicodeString as input.</P> |
|---|
| 132 | <P><STRONG>Example: </STRONG>match a password, encoded in a UTF-16 UnicodeString:</P> |
|---|
| 133 | <PRE>// |
|---|
| 134 | // Find out if *password* meets our password requirements, |
|---|
| 135 | // as defined by the regular expression *requirements*. |
|---|
| 136 | // |
|---|
| 137 | bool is_valid_password(const UnicodeString& password, const UnicodeString& requirements) |
|---|
| 138 | { |
|---|
| 139 | return boost::u32regex_match(password, boost::make_u32regex(requirements)); |
|---|
| 140 | } |
|---|
| 141 | </PRE> |
|---|
| 142 | <P> |
|---|
| 143 | <P><STRONG>Example: </STRONG>match a UTF-8 encoded filename:</P> |
|---|
| 144 | <PRE>// |
|---|
| 145 | // Extract filename part of a path from a UTF-8 encoded std::string and return the result |
|---|
| 146 | // as another std::string: |
|---|
| 147 | // |
|---|
| 148 | std::string get_filename(const std::string& path) |
|---|
| 149 | { |
|---|
| 150 | boost::u32regex r = boost::make_u32regex("(?:\\A|.*\\\\)([^\\\\]+)"); |
|---|
| 151 | boost::smatch what; |
|---|
| 152 | if(boost::u32regex_match(path, what, r)) |
|---|
| 153 | { |
|---|
| 154 | // extract $1 as a CString: |
|---|
| 155 | return what.str(1); |
|---|
| 156 | } |
|---|
| 157 | else |
|---|
| 158 | { |
|---|
| 159 | throw std::runtime_error("Invalid pathname"); |
|---|
| 160 | } |
|---|
| 161 | } |
|---|
| 162 | </PRE> |
|---|
| 163 | <H4><A name="u32regex_search"></A>u32regex_search</H4> |
|---|
| 164 | <P>For each <A href="regex_search.html">regex_search</A> algorithm defined by |
|---|
| 165 | <boost/regex.hpp>, then <boost/regex/icu.hpp> defines an overloaded |
|---|
| 166 | algorithm that takes the same arguments, but which is called <EM>u32regex_search</EM>, |
|---|
| 167 | and which will accept UTF-8, UTF-16 or UTF-32 encoded data, as well as an |
|---|
| 168 | ICU UnicodeString as input.</P> |
|---|
| 169 | <P><STRONG>Example: </STRONG>search for a character sequence in a specific |
|---|
| 170 | language block: |
|---|
| 171 | </P> |
|---|
| 172 | <PRE>UnicodeString extract_greek(const UnicodeString& text) |
|---|
| 173 | { |
|---|
| 174 | // searches through some UTF-16 encoded text for a block encoded in Greek, |
|---|
| 175 | // this expression is imperfect, but the best we can do for now - searching |
|---|
| 176 | // for specific scripts is actually pretty hard to do right. |
|---|
| 177 | // |
|---|
| 178 | // Here we search for a character sequence that begins with a Greek letter, |
|---|
| 179 | // and continues with characters that are either not-letters ( [^[:L*:]] ) |
|---|
| 180 | // or are characters in the Greek character block ( [\\x{370}-\\x{3FF}] ). |
|---|
| 181 | // |
|---|
| 182 | boost::u32regex r = boost::make_u32regex(L"[\\x{370}-\\x{3FF}](?:[^[:L*:]]|[\\x{370}-\\x{3FF}])*"); |
|---|
| 183 | boost::u16match what; |
|---|
| 184 | if(boost::u32regex_search(text, what, r)) |
|---|
| 185 | { |
|---|
| 186 | // extract $0 as a CString: |
|---|
| 187 | return UnicodeString(what[0].first, what.length(0)); |
|---|
| 188 | } |
|---|
| 189 | else |
|---|
| 190 | { |
|---|
| 191 | throw std::runtime_error("No Greek found!"); |
|---|
| 192 | } |
|---|
| 193 | }</PRE> |
|---|
| 194 | <H4><A name="u32regex_replace"></A>u32regex_replace</H4> |
|---|
| 195 | <P>For each <A href="regex_replace.html">regex_replace</A> algorithm defined by |
|---|
| 196 | <boost/regex.hpp>, then <boost/regex/icu.hpp> defines an overloaded |
|---|
| 197 | algorithm that takes the same arguments, but which is called <EM>u32regex_replace</EM>, |
|---|
| 198 | and which will accept UTF-8, UTF-16 or UTF-32 encoded data, as well as an |
|---|
| 199 | ICU UnicodeString as input. The input sequence and the format string |
|---|
| 200 | specifier passed to the algorithm, can be encoded independently (for example |
|---|
| 201 | one can be UTF-8, the other in UTF-16), but the result string / output iterator |
|---|
| 202 | argument must use the same character encoding as the text being searched.</P> |
|---|
| 203 | <P><STRONG>Example: </STRONG>Credit card number reformatting:</P> |
|---|
| 204 | <PRE>// |
|---|
| 205 | // Take a credit card number as a string of digits, |
|---|
| 206 | // and reformat it as a human readable string with "-" |
|---|
| 207 | // separating each group of four digit;, |
|---|
| 208 | // note that we're mixing a UTF-32 regex, with a UTF-16 |
|---|
| 209 | // string and a UTF-8 format specifier, and it still all |
|---|
| 210 | // just works: |
|---|
| 211 | // |
|---|
| 212 | const boost::u32regex e = boost::make_u32regex("\\A(\\d{3,4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\z"); |
|---|
| 213 | const char* human_format = "$1-$2-$3-$4"; |
|---|
| 214 | |
|---|
| 215 | UnicodeString human_readable_card_number(const UnicodeString& s) |
|---|
| 216 | { |
|---|
| 217 | return boost::u32regex_replace(s, e, human_format); |
|---|
| 218 | }</PRE> |
|---|
| 219 | <P> |
|---|
| 220 | <H2><A name="iterators"></A>Iterators</H2> |
|---|
| 221 | <H3><A name="u32regex_iterator"></A>u32regex_iterator</H3> |
|---|
| 222 | <P>Type u32regex_iterator is in all respects the same as <A href="regex_iterator.html"> |
|---|
| 223 | regex_iterator</A> except that since the regular expression type is always |
|---|
| 224 | u32regex it only takes one template parameter (the iterator type). It also |
|---|
| 225 | calls u32regex_search internally, allowing it to interface correctly with |
|---|
| 226 | UTF-8, UTF-16, and UTF-32 data:</P> |
|---|
| 227 | <PRE> |
|---|
| 228 | template <class BidirectionalIterator> |
|---|
| 229 | class u32regex_iterator |
|---|
| 230 | { |
|---|
| 231 | // for members see <A href="regex_iterator.html">regex_iterator</A> |
|---|
| 232 | }; |
|---|
| 233 | |
|---|
| 234 | typedef u32regex_iterator<const char*> utf8regex_iterator; |
|---|
| 235 | typedef u32regex_iterator<const UChar*> utf16regex_iterator; |
|---|
| 236 | typedef u32regex_iterator<const UChar32*> utf32regex_iterator; |
|---|
| 237 | </PRE> |
|---|
| 238 | <P>In order to simplify the construction of a u32regex_iterator from a string, |
|---|
| 239 | there are a series of non-member helper functions called |
|---|
| 240 | make_u32regex_iterator:</P> |
|---|
| 241 | <PRE> |
|---|
| 242 | u32regex_iterator<const char*> |
|---|
| 243 | make_u32regex_iterator(const char* s, |
|---|
| 244 | const u32regex& e, |
|---|
| 245 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 246 | |
|---|
| 247 | u32regex_iterator<const wchar_t*> |
|---|
| 248 | make_u32regex_iterator(const wchar_t* s, |
|---|
| 249 | const u32regex& e, |
|---|
| 250 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 251 | |
|---|
| 252 | u32regex_iterator<const UChar*> |
|---|
| 253 | make_u32regex_iterator(const UChar* s, |
|---|
| 254 | const u32regex& e, |
|---|
| 255 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 256 | |
|---|
| 257 | template <class charT, class Traits, class Alloc> |
|---|
| 258 | u32regex_iterator<typename std::basic_string<charT, Traits, Alloc>::const_iterator> |
|---|
| 259 | make_u32regex_iterator(const std::basic_string<charT, Traits, Alloc>& s, |
|---|
| 260 | const u32regex& e, |
|---|
| 261 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 262 | |
|---|
| 263 | u32regex_iterator<const UChar*> |
|---|
| 264 | make_u32regex_iterator(const UnicodeString& s, |
|---|
| 265 | const u32regex& e, |
|---|
| 266 | regex_constants::match_flag_type m = regex_constants::match_default);</PRE> |
|---|
| 267 | <P> |
|---|
| 268 | <P>Each of these overloads returns an iterator that enumerates all occurrences of |
|---|
| 269 | expression <EM>e</EM>, in text <EM>s</EM>, using match_flags <EM>m.</EM></P> |
|---|
| 270 | <P><STRONG>Example</STRONG>: search for international currency symbols, along with |
|---|
| 271 | their associated numeric value:</P> |
|---|
| 272 | <PRE> |
|---|
| 273 | void enumerate_currencies(const std::string& text) |
|---|
| 274 | { |
|---|
| 275 | // enumerate and print all the currency symbols, along |
|---|
| 276 | // with any associated numeric values: |
|---|
| 277 | const char* re = |
|---|
| 278 | "([[:Sc:]][[:Cf:][:Cc:][:Z*:]]*)?" |
|---|
| 279 | "([[:Nd:]]+(?:[[:Po:]][[:Nd:]]+)?)?" |
|---|
| 280 | "(?(1)" |
|---|
| 281 | "|(?(2)" |
|---|
| 282 | "[[:Cf:][:Cc:][:Z*:]]*" |
|---|
| 283 | ")" |
|---|
| 284 | "[[:Sc:]]" |
|---|
| 285 | ")"; |
|---|
| 286 | boost::u32regex r = boost::make_u32regex(re); |
|---|
| 287 | boost::u32regex_iterator<std::string::const_iterator> i(boost::make_u32regex_iterator(text, r)), j; |
|---|
| 288 | while(i != j) |
|---|
| 289 | { |
|---|
| 290 | std::cout << (*i)[0] << std::endl; |
|---|
| 291 | ++i; |
|---|
| 292 | } |
|---|
| 293 | }</PRE> |
|---|
| 294 | <P> |
|---|
| 295 | <P>Calling |
|---|
| 296 | </P> |
|---|
| 297 | <PRE>enumerate_currencies(" $100.23 or £198.12 ");</PRE> |
|---|
| 298 | <P>Yields the output:</P> |
|---|
| 299 | <PRE>$100.23<BR>£198.12</PRE> |
|---|
| 300 | <P>Provided of course that the input is encoded as UTF-8.</P> |
|---|
| 301 | <H3><A name="u32regex_token_iterator"></A>u32regex_token_iterator</H3> |
|---|
| 302 | <P>Type u32regex_token_iterator is in all respects the same as <A href="regex_token_iterator.html"> |
|---|
| 303 | regex_token_iterator</A> except that since the regular expression type is |
|---|
| 304 | always u32regex it only takes one template parameter (the iterator type). |
|---|
| 305 | It also calls u32regex_search internally, allowing it to interface correctly |
|---|
| 306 | with UTF-8, UTF-16, and UTF-32 data:</P> |
|---|
| 307 | <PRE>template <class BidirectionalIterator> |
|---|
| 308 | class u32regex_token_iterator |
|---|
| 309 | { |
|---|
| 310 | // for members see <A href="regex_token_iterator.html">regex_token_iterator</A> |
|---|
| 311 | }; |
|---|
| 312 | |
|---|
| 313 | typedef u32regex_token_iterator<const char*> utf8regex_token_iterator; |
|---|
| 314 | typedef u32regex_token_iterator<const UChar*> utf16regex_token_iterator; |
|---|
| 315 | typedef u32regex_token_iterator<const UChar32*> utf32regex_token_iterator; |
|---|
| 316 | </PRE> |
|---|
| 317 | <P>In order to simplify the construction of a u32regex_token_iterator from a |
|---|
| 318 | string, there are a series of non-member helper functions called |
|---|
| 319 | make_u32regex_token_iterator:</P> |
|---|
| 320 | <PRE> |
|---|
| 321 | u32regex_token_iterator<const char*> |
|---|
| 322 | make_u32regex_token_iterator(const char* s, |
|---|
| 323 | const u32regex& e, |
|---|
| 324 | int sub, |
|---|
| 325 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 326 | |
|---|
| 327 | u32regex_token_iterator<const wchar_t*> |
|---|
| 328 | make_u32regex_token_iterator(const wchar_t* s, |
|---|
| 329 | const u32regex& e, |
|---|
| 330 | int sub, |
|---|
| 331 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 332 | |
|---|
| 333 | u32regex_token_iterator<const UChar*> |
|---|
| 334 | make_u32regex_token_iterator(const UChar* s, |
|---|
| 335 | const u32regex& e, |
|---|
| 336 | int sub, |
|---|
| 337 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 338 | |
|---|
| 339 | template <class charT, class Traits, class Alloc> |
|---|
| 340 | u32regex_token_iterator<typename std::basic_string<charT, Traits, Alloc>::const_iterator> |
|---|
| 341 | make_u32regex_token_iterator(const std::basic_string<charT, Traits, Alloc>& s, |
|---|
| 342 | const u32regex& e, |
|---|
| 343 | int sub, |
|---|
| 344 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 345 | |
|---|
| 346 | u32regex_token_iterator<const UChar*> |
|---|
| 347 | make_u32regex_token_iterator(const UnicodeString& s, |
|---|
| 348 | const u32regex& e, |
|---|
| 349 | int sub, |
|---|
| 350 | regex_constants::match_flag_type m = regex_constants::match_default);</PRE> |
|---|
| 351 | <P> |
|---|
| 352 | <P>Each of these overloads returns an iterator that enumerates all occurrences of |
|---|
| 353 | marked sub-expression <EM>sub</EM> in regular expression <EM>e</EM>, found |
|---|
| 354 | in text <EM>s</EM>, using match_flags <EM>m.</EM></P> |
|---|
| 355 | <PRE> |
|---|
| 356 | template <std::size_t N> |
|---|
| 357 | u32regex_token_iterator<const char*> |
|---|
| 358 | make_u32regex_token_iterator(const char* p, |
|---|
| 359 | const u32regex& e, |
|---|
| 360 | const int (&submatch)[N], |
|---|
| 361 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 362 | |
|---|
| 363 | template <std::size_t N> |
|---|
| 364 | u32regex_token_iterator<const wchar_t*> |
|---|
| 365 | make_u32regex_token_iterator(const wchar_t* p, |
|---|
| 366 | const u32regex& e, |
|---|
| 367 | const int (&submatch)[N], |
|---|
| 368 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 369 | |
|---|
| 370 | template <std::size_t N> |
|---|
| 371 | u32regex_token_iterator<const UChar*> |
|---|
| 372 | make_u32regex_token_iterator(const UChar* p, |
|---|
| 373 | const u32regex& e, |
|---|
| 374 | const int (&submatch)[N], |
|---|
| 375 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 376 | |
|---|
| 377 | template <class charT, class Traits, class Alloc, std::size_t N> |
|---|
| 378 | u32regex_token_iterator<typename std::basic_string<charT, Traits, Alloc>::const_iterator> |
|---|
| 379 | make_u32regex_token_iterator(const std::basic_string<charT, Traits, Alloc>& p, |
|---|
| 380 | const u32regex& e, |
|---|
| 381 | const int (&submatch)[N], |
|---|
| 382 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 383 | |
|---|
| 384 | template <std::size_t N> |
|---|
| 385 | u32regex_token_iterator<const UChar*> |
|---|
| 386 | make_u32regex_token_iterator(const UnicodeString& s, |
|---|
| 387 | const u32regex& e, |
|---|
| 388 | const int (&submatch)[N], |
|---|
| 389 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 390 | </PRE> |
|---|
| 391 | <P>Each of these overloads returns an iterator that enumerates one sub-expression |
|---|
| 392 | for each <EM>submatch</EM> in regular expression <EM>e</EM>, found in |
|---|
| 393 | text <EM>s</EM>, using match_flags <EM>m.</EM></P> |
|---|
| 394 | <PRE> |
|---|
| 395 | u32regex_token_iterator<const char*> |
|---|
| 396 | make_u32regex_token_iterator(const char* p, |
|---|
| 397 | const u32regex& e, |
|---|
| 398 | const std::vector<int>& submatch, |
|---|
| 399 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 400 | |
|---|
| 401 | u32regex_token_iterator<const wchar_t*> |
|---|
| 402 | make_u32regex_token_iterator(const wchar_t* p, |
|---|
| 403 | const u32regex& e, |
|---|
| 404 | const std::vector<int>& submatch, |
|---|
| 405 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 406 | |
|---|
| 407 | u32regex_token_iterator<const UChar*> |
|---|
| 408 | make_u32regex_token_iterator(const UChar* p, |
|---|
| 409 | const u32regex& e, |
|---|
| 410 | const std::vector<int>& submatch, |
|---|
| 411 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 412 | |
|---|
| 413 | template <class charT, class Traits, class Alloc> |
|---|
| 414 | u32regex_token_iterator<typename std::basic_string<charT, Traits, Alloc>::const_iterator> |
|---|
| 415 | make_u32regex_token_iterator(const std::basic_string<charT, Traits, Alloc>& p, |
|---|
| 416 | const u32regex& e, |
|---|
| 417 | const std::vector<int>& submatch, |
|---|
| 418 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 419 | |
|---|
| 420 | u32regex_token_iterator<const UChar*> |
|---|
| 421 | make_u32regex_token_iterator(const UnicodeString& s, |
|---|
| 422 | const u32regex& e, |
|---|
| 423 | const std::vector<int>& submatch, |
|---|
| 424 | regex_constants::match_flag_type m = regex_constants::match_default); |
|---|
| 425 | </PRE> |
|---|
| 426 | <P>Each of these overloads returns an iterator that enumerates one sub-expression |
|---|
| 427 | for each <EM>submatch</EM> in regular expression <EM>e</EM>, found in |
|---|
| 428 | text <EM>s</EM>, using match_flags <EM>m.</EM></P> |
|---|
| 429 | <P><STRONG>Example</STRONG>: search for international currency symbols, along with |
|---|
| 430 | their associated numeric value:</P> |
|---|
| 431 | <PRE> |
|---|
| 432 | void enumerate_currencies2(const std::string& text) |
|---|
| 433 | { |
|---|
| 434 | // enumerate and print all the currency symbols, along |
|---|
| 435 | // with any associated numeric values: |
|---|
| 436 | const char* re = |
|---|
| 437 | "([[:Sc:]][[:Cf:][:Cc:][:Z*:]]*)?" |
|---|
| 438 | "([[:Nd:]]+(?:[[:Po:]][[:Nd:]]+)?)?" |
|---|
| 439 | "(?(1)" |
|---|
| 440 | "|(?(2)" |
|---|
| 441 | "[[:Cf:][:Cc:][:Z*:]]*" |
|---|
| 442 | ")" |
|---|
| 443 | "[[:Sc:]]" |
|---|
| 444 | ")"; |
|---|
| 445 | boost::u32regex r = boost::make_u32regex(re); |
|---|
| 446 | boost::u32regex_token_iterator<std::string::const_iterator> |
|---|
| 447 | i(boost::make_u32regex_token_iterator(text, r, 1)), j; |
|---|
| 448 | while(i != j) |
|---|
| 449 | { |
|---|
| 450 | std::cout << *i << std::endl; |
|---|
| 451 | ++i; |
|---|
| 452 | } |
|---|
| 453 | } |
|---|
| 454 | </PRE> |
|---|
| 455 | <P> |
|---|
| 456 | <HR> |
|---|
| 457 | <p>Revised |
|---|
| 458 | <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan --> |
|---|
| 459 | 05 Jan 2005 |
|---|
| 460 | <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> |
|---|
| 461 | <p><i>© Copyright John Maddock 2005</i></p> |
|---|
| 462 | <P><I>Use, modification and distribution are subject to the Boost Software License, |
|---|
| 463 | Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> |
|---|
| 464 | or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> |
|---|
| 465 | </body> |
|---|
| 466 | </html> |
|---|
| 467 | |
|---|
| 468 | |
|---|