Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/algorithm/string/doc/usage.xml @ 20

Last change on this file since 20 was 12, checked in by landauf, 18 years ago

added boost

File size: 17.0 KB
Line 
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
3"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
4<section id="string_algo.usage" last-revision="$Date: 2005/12/01 13:42:02 $">
5    <title>Usage</title>
6
7    <using-namespace name="boost"/>
8    <using-namespace name="boost::algorithm"/>
9
10
11    <section>
12        <title>First Example</title>
13       
14        <para>
15            Using the algorithms is straightforward. Let us have a look at the first example:
16        </para>
17        <programlisting>
18    #include &lt;boost/algorithm/string.hpp&gt;
19    using namespace std;
20    using namespace boost;
21   
22    // ...
23
24    string str1(" hello world! ");
25    to_upper(str1);  // str1 == " HELLO WORLD! "
26    trim(str1);      // str1 == "HELLO WORLD!"
27
28    string str2=
29       to_lower_copy(
30          ireplace_first_copy(
31             str1,"hello","goodbye")); // str2 == "goodbye world!"
32        </programlisting>
33        <para>
34            This example converts str1 to upper case and trims spaces from the start and the end
35            of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
36            This example demonstrates several important concepts used in the library:
37        </para>
38        <itemizedlist>
39            <listitem>
40                <para><emphasis role="bold">Container parameters:</emphasis>
41                    Unlike in the STL algorithms, parameters are not specified only in the form
42                    of iterators. The STL convention allows for great flexibility,
43                    but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together,
44                    because a container is passed in two parameters. Therefore it is not possible to use
45                    a return value from another algorithm. It is considerably easier to write
46                    <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>.
47                </para>
48                <para>
49                    The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> 
50                    provides a uniform way of handling different string types.
51                    If there is a need to pass a pair of iterators,
52                    <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>
53                    can be used to package iterators into a structure with a compatible interface.
54                </para>
55            </listitem>
56            <listitem>
57                <para><emphasis role="bold">Copy vs. Mutable:</emphasis>
58                    Many algorithms in the library are performing a transformation of the input.
59                    The transformation can be done in-place, mutating the input sequence, or a copy
60                    of the transformed input can be created, leaving the input intact. None of
61                    these possibilities is superior to the other one and both have different
62                    advantages and disadvantages. For this reason, both are provided with the library.
63                </para>
64            </listitem>
65            <listitem>
66                <para><emphasis role="bold">Algorithm stacking:</emphasis>
67                    Copy versions return a transformed input as a result, thus allow a simple chaining of
68                    transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>).
69                    Mutable versions have <code>void</code> return, to avoid misuse.
70                </para>
71            </listitem>
72            <listitem>
73                <para><emphasis role="bold">Naming:</emphasis>
74                    Naming follows the conventions from the Standard C++ Library. If there is a
75                    copy and a mutable version of the same algorithm, the mutable version has no suffix
76                    and the copy version has the suffix <emphasis>_copy</emphasis>.
77                    Some algorithms have the prefix <emphasis>i</emphasis> 
78                    (e.g. <functionname>ifind_first()</functionname>).
79                    This prefix identifies that the algorithm works in a case-insensitive manner.
80                </para>
81            </listitem>
82        </itemizedlist>
83        <para>
84            To use the library, include the <headername>boost/algorithm/string.hpp</headername> header.
85            If the regex related functions are needed, include the
86            <headername>boost/algorithm/string_regex.hpp</headername> header.
87        </para>
88    </section>
89    <section>
90        <title>Case conversion</title>
91       
92        <para>
93            STL has a nice way of converting character case. Unfortunately, it works only
94            for a single character and we want to convert a string,
95        </para>
96        <programlisting>
97    string str1("HeLlO WoRld!");
98    to_upper(str1); // str1=="HELLO WORLD!"
99        </programlisting>
100        <para>
101            <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of
102            characters in a string using a specified locale.
103        </para>
104        <para>
105            For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>.
106        </para>
107    </section>
108    <section>
109        <title>Predicates and Classification</title>
110        <para>
111            A part of the library deals with string related predicates. Consider this example:
112        </para>
113        <programlisting>
114    bool is_executable( string&amp; filename )
115    {
116        return
117            iends_with(filename, ".exe") ||
118            iends_with(filename, ".com");
119    }
120
121    // ...
122    string str1("command.com");
123    cout
124        &lt;&lt; str1
125        &lt;&lt; is_executable("command.com")? "is": "is not"
126        &lt;&lt; "an executable"
127        &lt;&lt; endl; // prints "command.com is an executable"
128   
129    //..
130    char text1[]="hello world!";
131    cout
132        &lt;&lt; text1
133        &lt;&lt; all( text1, is_lower() )? "is": "is not"
134        &lt;&lt; " written in the lower case"
135        &lt;&lt; endl; // prints "hello world! is written in the lower case"
136        </programlisting>
137        <para>
138            The predicates determine whether if a substring is contained in the input string
139            under various conditions. The conditions are: a string starts with the substring,
140            ends with the substring,
141            simply contains the substring or if both strings are equal. See the reference for
142            <headername>boost/algorithm/string/predicate.hpp</headername> for more details.
143        </para>
144        <para> 
145            In addition the algorithm <functionname>all()</functionname> checks
146            all elements of a container to satisfy a condition specified by a predicate.
147            This predicate can be any unary predicate, but the library provides a bunch of
148            useful string-related predicates and combinators ready for use.
149            These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header.
150            Classification predicates can be combined using logical combinators to form
151            a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code>
152        </para>
153    </section>
154    <section>
155        <title>Trimming</title>
156       
157        <para>
158            When parsing the input from a user, strings usually have unwanted leading or trailing
159            characters. To get rid of them, we need trim functions:
160        </para>
161        <programlisting>
162    string str1="     hello world!     ";
163    string str2=trim_left_copy(str1);   // str2 == "hello world!     "
164    string str3=trim_right_copy(str2);  // str3 == "     hello world!"
165    trim(str1);                         // str1 == "hello world!"
166
167    string phone="00423333444";
168    // remove leading 0 from the phone number
169    trim_left_if(phone,is_any_of("0")); // phone == "423333444"
170        </programlisting>
171        <para>
172            It is possible to trim the spaces on the right, on the left or on both sides of a string.
173            And for those cases when there is a need to remove something else than blank space, there
174            are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will
175            select the <emphasis>space</emphasis> to be removed. It is possible to use classification
176            predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
177            See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>.
178        </para>
179    </section>
180    <section>
181        <title>Find algorithms</title>
182       
183        <para>
184            The library contains a set of find algorithms. Here is an example:
185        </para>
186        <programlisting>
187    char text[]="hello dolly!";
188    iterator_range&lt;char*&gt; result=find_last(text,"ll");
189
190    transform( result.begin(), result.end(), result.begin(), bind2nd(plus&lt;char&gt;(), 1) );
191    // text = "hello dommy!"           
192
193    to_upper(result); // text == "hello doMMy!"
194
195    // iterator_range is convertible to bool
196    if(find_first(text, "dolly"))
197    {
198        cout &lt;&lt; "Dolly is there" &lt;&lt; endl;
199    }
200        </programlisting>
201        <para>
202            We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
203            The result is given in the <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>.
204            This range delimits the
205            part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
206           
207            As we can see, input of the <functionname>find_last()</functionname> algorithm can be also
208            char[] because this type is supported by
209            <ulink url="../../libs/range/index.html">Boost.Range</ulink>.
210
211            The following lines transform the result. Notice that
212            <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> has familiar
213            <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
214            Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking.
215        </para>
216        <para>
217            Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>.
218        </para>
219    </section>
220    <section>
221        <title>Replace Algorithms</title>
222        <para>
223            Find algorithms can be used for searching for a specific part of string. Replace goes one step
224            further. After a matching part is found, it is substituted with something else. The substitution is computed
225            from the original, using some transformation.
226        </para>
227        <programlisting>
228    string str1="Hello  Dolly,   Hello World!"
229    replace_first(str1, "Dolly", "Jane");      // str1 == "Hello  Jane,   Hello World!"
230    replace_last(str1, "Hello", "Goodbye");    // str1 == "Hello  Jane,   Goodbye World!"
231    erase_all(str1, " ");                      // str1 == "HelloJane,GoodbyeWorld!"
232    erase_head(str1, 6);                       // str1 == "Jane,GoodbyeWorld!"
233        </programlisting>
234        <para>
235            For the complete list of replace and erase functions see the
236            <link linkend="string_algo.reference">reference</link>.
237            There is a lot of predefined function for common usage, however, the library allows you to
238            define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> 
239            function which takes two parameters.
240            The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is
241            a <link linkend="string_algo.formatter_concept">Formatter</link> object.
242            The Finder object is a functor which performs the searching for the replacement part. The Formatter object
243            takes the result of the Finder (usually a reference to the found substring) and creates a
244            substitute for it. Replace algorithm puts these two together and makes the desired substitution.
245        </para>
246        <para>
247            Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and
248            <headername>boost/algorithm/string/find_format.hpp</headername> for reference.
249        </para>
250    </section>
251    <section>
252        <title>Find Iterator</title>
253
254        <para>
255            An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string,
256            the find iterator allows us to iterate over the substrings matching the specified criteria.
257            This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally
258            search the string.
259            Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> 
260            object, that delimits the current match.
261        </para>
262        <para>
263            There are two iterators provided <classname>find_iterator</classname> and
264            <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified
265            Finder. The latter iterates over the gaps between these substrings.
266        </para>
267        <programlisting>
268    string str1("abc-*-ABC-*-aBc");
269    // Find all 'abc' substrings (ignoring the case)
270    // Create a find_iterator
271    typedef find_iterator&lt;string::iterator&gt; string_find_iterator;
272    for(string_find_iterator It=
273            make_find_iterator(str1, first_finder("abc", is_iequal()));
274        It!=string_find_iterator();
275        ++It)
276    {
277        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
278    }
279
280    // Output will be:
281    // abc
282    // ABC
283    // aBC
284   
285    typedef split_iterator&lt;string::iterator&gt; string_split_iterator;
286    for(string_find_iterator It=
287        make_split_iterator(str1, first_finder("-*-", is_iequal()));
288        It!=string_find_iterator();
289        ++It)
290    {
291        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
292    }
293
294    // Output will be:
295    // abc
296    // ABC
297    // aBC
298        </programlisting>
299        <para>
300            Note that the find iterators have only one template parameter. It is the base iterator type.
301            The Finder is specified at runtime. This allows us to typedef a find iterator for
302            common string types and reuse it. Additionally make_*_iterator functions help
303            to construct a find iterator for a particular range.
304        </para>
305        <para>
306            See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>.
307        </para>
308    </section>
309    <section>
310        <title>Split</title>
311
312        <para>
313            Split algorithms are an extension to the find iterator for one common usage scenario.
314            These algorithms use a find iterator and store all matches into the provided
315            container. This container must be able to hold copies (e.g. <code>std::string</code>) or
316            references (e.g. <code>iterator_range</code>) of the extracted substrings.
317        </para>
318        <para>
319            Two algorithms are provided. <functionname>find_all()</functionname> finds all copies
320            of a string in the input. <functionname>split()</functionname> splits the input into parts.
321        </para>
322
323        <programlisting>
324    string str1("hello abc-*-ABC-*-aBc goodbye");
325
326    typedef vector&lt; iterator_range&lt;string::iterator&gt; &gt; find_vector_type;
327   
328    find_vector_type FindVec; // #1: Search for separators
329    ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
330
331    typedef vector&lt; string &gt; split_vector_type;
332   
333    split_vector_type SplitVec; // #2: Search for tokens
334    split( SplitVec, str1, is_any_of("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
335        </programlisting>
336        <para>
337            <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.                       
338        </para>
339        <para>
340            First example show how to construct a container to hold references to all extracted
341            substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references
342            to all substrings that are in case-insensitive manner equal to "abc".
343        </para>
344        <para>
345            Second example uses <functionname>split()</functionname> to split string str1 into parts
346            separated by characters '-' or '*'. These parts are then put into the SplitVec.
347            It is possible to specify if adjacent separators are concatenated or not.
348        </para>
349        <para>
350            More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>.
351        </para>
352   </section>
353</section>
Note: See TracBrowser for help on using the repository browser.