| 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> | 
|---|
| 2 | <html> | 
|---|
| 3 |    <head> | 
|---|
| 4 |       <title>Boost.Regex: Index</title> | 
|---|
| 5 |       <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> | 
|---|
| 6 |       <link rel="stylesheet" type="text/css" href="../../../boost.css"> | 
|---|
| 7 |    </head> | 
|---|
| 8 |    <body> | 
|---|
| 9 |       <P> | 
|---|
| 10 |          <TABLE id="Table1" cellSpacing="1" cellPadding="1" width="100%" border="0"> | 
|---|
| 11 |             <TR> | 
|---|
| 12 |                <td valign="top" width="300"> | 
|---|
| 13 |                   <h3><a href="../../../index.htm"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> | 
|---|
| 14 |                </td> | 
|---|
| 15 |                <TD width="353"> | 
|---|
| 16 |                   <H1 align="center">Boost.Regex</H1> | 
|---|
| 17 |                   <H2 align="center">Unicode Regular Expressions.</H2> | 
|---|
| 18 |                </TD> | 
|---|
| 19 |                <td width="50"> | 
|---|
| 20 |                   <h3><a href="index.html"><img height="45" width="43" alt="Boost.Regex Index" src="uarrow.gif" border="0"></a></h3> | 
|---|
| 21 |                </td> | 
|---|
| 22 |             </TR> | 
|---|
| 23 |          </TABLE> | 
|---|
| 24 |       </P> | 
|---|
| 25 |       <HR> | 
|---|
| 26 |       <p></p> | 
|---|
| 27 |       <P>There are two ways to use Boost.Regex with Unicode strings:</P> | 
|---|
| 28 |       <H3>Rely on wchar_t</H3> | 
|---|
| 29 |       <P>If your platform's wchar_t type can hold Unicode strings, <EM>and</EM> your  | 
|---|
| 30 |          platform's C/C++ runtime correctly handles wide character constants (when  | 
|---|
| 31 |          passed to std::iswspace std::iswlower etc), then you can use boost::wregex to  | 
|---|
| 32 |          process Unicode.  However, there are several disadvantages to this  | 
|---|
| 33 |          approach:</P> | 
|---|
| 34 |       <UL> | 
|---|
| 35 |          <LI> | 
|---|
| 36 |             It's not portable: there's no guarantee on the width of wchar_t, or even  | 
|---|
| 37 |             whether the runtime treats wide characters as Unicode at all, most Windows  | 
|---|
| 38 |             compilers do so, but many Unix systems do not.</LI> | 
|---|
| 39 |          <LI> | 
|---|
| 40 |             There's no support for Unicode-specific character classes: [[:Nd:]], [[:Po:]]  | 
|---|
| 41 |             etc.</LI> | 
|---|
| 42 |          <LI> | 
|---|
| 43 |             You can only search strings that are encoded as sequences of wide characters,  | 
|---|
| 44 |             it is not possible to search UTF-8, or even UTF-16 on many platforms.</LI></UL> | 
|---|
| 45 |       <H3>Use a Unicode Aware Regular Expression Type.</H3> | 
|---|
| 46 |       <P>If you have the <A href="http://www.ibm.com/software/globalization/icu/">ICU  | 
|---|
| 47 |             library</A>, then Boost.Regex can be <A href="install.html#unicode">configured  | 
|---|
| 48 |             to make use of it</A>, and provide a distinct regular expression type  | 
|---|
| 49 |          (boost::u32regex), that supports both Unicode specific character properties,  | 
|---|
| 50 |          and the searching of text that is encoded in either UTF-8, UTF-16, or  | 
|---|
| 51 |          UTF-32.  See: <A href="icu_strings.html">ICU string class support</A>.</P> | 
|---|
| 52 |       <P> | 
|---|
| 53 |          <HR> | 
|---|
| 54 |       </P> | 
|---|
| 55 |       <P></P> | 
|---|
| 56 |       <p>Revised   | 
|---|
| 57 |          <!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->  | 
|---|
| 58 |          04 Jan 2005   | 
|---|
| 59 |          <!--webbot bot="Timestamp" endspan i-checksum="39359" --></p> | 
|---|
| 60 |       <p><i>© Copyright John Maddock 2005</i></p> | 
|---|
| 61 |       <P><I>Use, modification and distribution are subject to the Boost Software License,  | 
|---|
| 62 |             Version 1.0. (See accompanying file <A href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</A> | 
|---|
| 63 |             or copy at <A href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</A>)</I></P> | 
|---|
| 64 |    </body> | 
|---|
| 65 | </html> | 
|---|
| 66 |  | 
|---|