Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/algorithm/string/doc/usage.xml @ 29

Last change on this file since 29 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 17.2 KB
Line 
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
3"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
4
5<!-- Copyright (c) 2002-2006 Pavol Droba.
6     Subject to the Boost Software License, Version 1.0.
7     (See accompanying file LICENSE-1.0 or  http://www.boost.org/LICENSE-1.0)
8-->
9
10<section id="string_algo.usage" last-revision="$Date: 2007/01/30 07:58:35 $">
11    <title>Usage</title>
12
13    <using-namespace name="boost"/>
14    <using-namespace name="boost::algorithm"/>
15
16
17    <section>
18        <title>First Example</title>
19       
20        <para>
21            Using the algorithms is straightforward. Let us have a look at the first example:
22        </para>
23        <programlisting>
24    #include &lt;boost/algorithm/string.hpp&gt;
25    using namespace std;
26    using namespace boost;
27   
28    // ...
29
30    string str1(" hello world! ");
31    to_upper(str1);  // str1 == " HELLO WORLD! "
32    trim(str1);      // str1 == "HELLO WORLD!"
33
34    string str2=
35       to_lower_copy(
36          ireplace_first_copy(
37             str1,"hello","goodbye")); // str2 == "goodbye world!"
38        </programlisting>
39        <para>
40            This example converts str1 to upper case and trims spaces from the start and the end
41            of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye".
42            This example demonstrates several important concepts used in the library:
43        </para>
44        <itemizedlist>
45            <listitem>
46                <para><emphasis role="bold">Container parameters:</emphasis>
47                    Unlike in the STL algorithms, parameters are not specified only in the form
48                    of iterators. The STL convention allows for great flexibility,
49                    but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together,
50                    because a container is passed in two parameters. Therefore it is not possible to use
51                    a return value from another algorithm. It is considerably easier to write
52                    <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>.
53                </para>
54                <para>
55                    The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> 
56                    provides a uniform way of handling different string types.
57                    If there is a need to pass a pair of iterators,
58                    <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>
59                    can be used to package iterators into a structure with a compatible interface.
60                </para>
61            </listitem>
62            <listitem>
63                <para><emphasis role="bold">Copy vs. Mutable:</emphasis>
64                    Many algorithms in the library are performing a transformation of the input.
65                    The transformation can be done in-place, mutating the input sequence, or a copy
66                    of the transformed input can be created, leaving the input intact. None of
67                    these possibilities is superior to the other one and both have different
68                    advantages and disadvantages. For this reason, both are provided with the library.
69                </para>
70            </listitem>
71            <listitem>
72                <para><emphasis role="bold">Algorithm stacking:</emphasis>
73                    Copy versions return a transformed input as a result, thus allow a simple chaining of
74                    transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>).
75                    Mutable versions have <code>void</code> return, to avoid misuse.
76                </para>
77            </listitem>
78            <listitem>
79                <para><emphasis role="bold">Naming:</emphasis>
80                    Naming follows the conventions from the Standard C++ Library. If there is a
81                    copy and a mutable version of the same algorithm, the mutable version has no suffix
82                    and the copy version has the suffix <emphasis>_copy</emphasis>.
83                    Some algorithms have the prefix <emphasis>i</emphasis> 
84                    (e.g. <functionname>ifind_first()</functionname>).
85                    This prefix identifies that the algorithm works in a case-insensitive manner.
86                </para>
87            </listitem>
88        </itemizedlist>
89        <para>
90            To use the library, include the <headername>boost/algorithm/string.hpp</headername> header.
91            If the regex related functions are needed, include the
92            <headername>boost/algorithm/string_regex.hpp</headername> header.
93        </para>
94    </section>
95    <section>
96        <title>Case conversion</title>
97       
98        <para>
99            STL has a nice way of converting character case. Unfortunately, it works only
100            for a single character and we want to convert a string,
101        </para>
102        <programlisting>
103    string str1("HeLlO WoRld!");
104    to_upper(str1); // str1=="HELLO WORLD!"
105        </programlisting>
106        <para>
107            <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of
108            characters in a string using a specified locale.
109        </para>
110        <para>
111            For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>.
112        </para>
113    </section>
114    <section>
115        <title>Predicates and Classification</title>
116        <para>
117            A part of the library deals with string related predicates. Consider this example:
118        </para>
119        <programlisting>
120    bool is_executable( string&amp; filename )
121    {
122        return
123            iends_with(filename, ".exe") ||
124            iends_with(filename, ".com");
125    }
126
127    // ...
128    string str1("command.com");
129    cout
130        &lt;&lt; str1
131        &lt;&lt; is_executable("command.com")? "is": "is not"
132        &lt;&lt; "an executable"
133        &lt;&lt; endl; // prints "command.com is an executable"
134   
135    //..
136    char text1[]="hello world!";
137    cout
138        &lt;&lt; text1
139        &lt;&lt; all( text1, is_lower() )? "is": "is not"
140        &lt;&lt; " written in the lower case"
141        &lt;&lt; endl; // prints "hello world! is written in the lower case"
142        </programlisting>
143        <para>
144            The predicates determine whether if a substring is contained in the input string
145            under various conditions. The conditions are: a string starts with the substring,
146            ends with the substring,
147            simply contains the substring or if both strings are equal. See the reference for
148            <headername>boost/algorithm/string/predicate.hpp</headername> for more details.
149        </para>
150        <para> 
151            In addition the algorithm <functionname>all()</functionname> checks
152            all elements of a container to satisfy a condition specified by a predicate.
153            This predicate can be any unary predicate, but the library provides a bunch of
154            useful string-related predicates and combinators ready for use.
155            These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header.
156            Classification predicates can be combined using logical combinators to form
157            a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code>
158        </para>
159    </section>
160    <section>
161        <title>Trimming</title>
162       
163        <para>
164            When parsing the input from a user, strings usually have unwanted leading or trailing
165            characters. To get rid of them, we need trim functions:
166        </para>
167        <programlisting>
168    string str1="     hello world!     ";
169    string str2=trim_left_copy(str1);   // str2 == "hello world!     "
170    string str3=trim_right_copy(str2);  // str3 == "     hello world!"
171    trim(str1);                         // str1 == "hello world!"
172
173    string phone="00423333444";
174    // remove leading 0 from the phone number
175    trim_left_if(phone,is_any_of("0")); // phone == "423333444"
176        </programlisting>
177        <para>
178            It is possible to trim the spaces on the right, on the left or on both sides of a string.
179            And for those cases when there is a need to remove something else than blank space, there
180            are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will
181            select the <emphasis>space</emphasis> to be removed. It is possible to use classification
182            predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph.
183            See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>.
184        </para>
185    </section>
186    <section>
187        <title>Find algorithms</title>
188       
189        <para>
190            The library contains a set of find algorithms. Here is an example:
191        </para>
192        <programlisting>
193    char text[]="hello dolly!";
194    iterator_range&lt;char*&gt; result=find_last(text,"ll");
195
196    transform( result.begin(), result.end(), result.begin(), bind2nd(plus&lt;char&gt;(), 1) );
197    // text = "hello dommy!"           
198
199    to_upper(result); // text == "hello doMMy!"
200
201    // iterator_range is convertible to bool
202    if(find_first(text, "dolly"))
203    {
204        cout &lt;&lt; "Dolly is there" &lt;&lt; endl;
205    }
206        </programlisting>
207        <para>
208            We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll".
209            The result is given in the <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>.
210            This range delimits the
211            part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll".
212           
213            As we can see, input of the <functionname>find_last()</functionname> algorithm can be also
214            char[] because this type is supported by
215            <ulink url="../../libs/range/index.html">Boost.Range</ulink>.
216
217            The following lines transform the result. Notice that
218            <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> has familiar
219            <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container.
220            Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking.
221        </para>
222        <para>
223            Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>.
224        </para>
225    </section>
226    <section>
227        <title>Replace Algorithms</title>
228        <para>
229            Find algorithms can be used for searching for a specific part of string. Replace goes one step
230            further. After a matching part is found, it is substituted with something else. The substitution is computed
231            from the original, using some transformation.
232        </para>
233        <programlisting>
234    string str1="Hello  Dolly,   Hello World!"
235    replace_first(str1, "Dolly", "Jane");      // str1 == "Hello  Jane,   Hello World!"
236    replace_last(str1, "Hello", "Goodbye");    // str1 == "Hello  Jane,   Goodbye World!"
237    erase_all(str1, " ");                      // str1 == "HelloJane,GoodbyeWorld!"
238    erase_head(str1, 6);                       // str1 == "Jane,GoodbyeWorld!"
239        </programlisting>
240        <para>
241            For the complete list of replace and erase functions see the
242            <link linkend="string_algo.reference">reference</link>.
243            There is a lot of predefined function for common usage, however, the library allows you to
244            define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> 
245            function which takes two parameters.
246            The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is
247            a <link linkend="string_algo.formatter_concept">Formatter</link> object.
248            The Finder object is a functor which performs the searching for the replacement part. The Formatter object
249            takes the result of the Finder (usually a reference to the found substring) and creates a
250            substitute for it. Replace algorithm puts these two together and makes the desired substitution.
251        </para>
252        <para>
253            Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and
254            <headername>boost/algorithm/string/find_format.hpp</headername> for reference.
255        </para>
256    </section>
257    <section>
258        <title>Find Iterator</title>
259
260        <para>
261            An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string,
262            the find iterator allows us to iterate over the substrings matching the specified criteria.
263            This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally
264            search the string.
265            Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> 
266            object, that delimits the current match.
267        </para>
268        <para>
269            There are two iterators provided <classname>find_iterator</classname> and
270            <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified
271            Finder. The latter iterates over the gaps between these substrings.
272        </para>
273        <programlisting>
274    string str1("abc-*-ABC-*-aBc");
275    // Find all 'abc' substrings (ignoring the case)
276    // Create a find_iterator
277    typedef find_iterator&lt;string::iterator&gt; string_find_iterator;
278    for(string_find_iterator It=
279            make_find_iterator(str1, first_finder("abc", is_iequal()));
280        It!=string_find_iterator();
281        ++It)
282    {
283        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
284    }
285
286    // Output will be:
287    // abc
288    // ABC
289    // aBC
290   
291    typedef split_iterator&lt;string::iterator&gt; string_split_iterator;
292    for(string_split_iterator It=
293        make_split_iterator(str1, first_finder("-*-", is_iequal()));
294        It!=string_split_iterator();
295        ++It)
296    {
297        cout &lt;&lt; copy_range&lt;std::string&gt;(*It) &lt;&lt; endl;
298    }
299
300    // Output will be:
301    // abc
302    // ABC
303    // aBC
304        </programlisting>
305        <para>
306            Note that the find iterators have only one template parameter. It is the base iterator type.
307            The Finder is specified at runtime. This allows us to typedef a find iterator for
308            common string types and reuse it. Additionally make_*_iterator functions help
309            to construct a find iterator for a particular range.
310        </para>
311        <para>
312            See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>.
313        </para>
314    </section>
315    <section>
316        <title>Split</title>
317
318        <para>
319            Split algorithms are an extension to the find iterator for one common usage scenario.
320            These algorithms use a find iterator and store all matches into the provided
321            container. This container must be able to hold copies (e.g. <code>std::string</code>) or
322            references (e.g. <code>iterator_range</code>) of the extracted substrings.
323        </para>
324        <para>
325            Two algorithms are provided. <functionname>find_all()</functionname> finds all copies
326            of a string in the input. <functionname>split()</functionname> splits the input into parts.
327        </para>
328
329        <programlisting>
330    string str1("hello abc-*-ABC-*-aBc goodbye");
331
332    typedef vector&lt; iterator_range&lt;string::iterator&gt; &gt; find_vector_type;
333   
334    find_vector_type FindVec; // #1: Search for separators
335    ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] }
336
337    typedef vector&lt; string &gt; split_vector_type;
338   
339    split_vector_type SplitVec; // #2: Search for tokens
340    split( SplitVec, str1, is_any_of("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" }
341        </programlisting>
342        <para>
343            <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring.                       
344        </para>
345        <para>
346            First example show how to construct a container to hold references to all extracted
347            substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references
348            to all substrings that are in case-insensitive manner equal to "abc".
349        </para>
350        <para>
351            Second example uses <functionname>split()</functionname> to split string str1 into parts
352            separated by characters '-' or '*'. These parts are then put into the SplitVec.
353            It is possible to specify if adjacent separators are concatenated or not.
354        </para>
355        <para>
356            More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>.
357        </para>
358   </section>
359</section>
Note: See TracBrowser for help on using the repository browser.