1 | <?xml version="1.0" encoding="utf-8"?> |
---|
2 | <!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" |
---|
3 | "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"> |
---|
4 | |
---|
5 | <!-- Copyright (c) 2002-2006 Pavol Droba. |
---|
6 | Subject to the Boost Software License, Version 1.0. |
---|
7 | (See accompanying file LICENSE-1.0 or http://www.boost.org/LICENSE-1.0) |
---|
8 | --> |
---|
9 | |
---|
10 | <section id="string_algo.usage" last-revision="$Date: 2007/01/30 07:58:35 $"> |
---|
11 | <title>Usage</title> |
---|
12 | |
---|
13 | <using-namespace name="boost"/> |
---|
14 | <using-namespace name="boost::algorithm"/> |
---|
15 | |
---|
16 | |
---|
17 | <section> |
---|
18 | <title>First Example</title> |
---|
19 | |
---|
20 | <para> |
---|
21 | Using the algorithms is straightforward. Let us have a look at the first example: |
---|
22 | </para> |
---|
23 | <programlisting> |
---|
24 | #include <boost/algorithm/string.hpp> |
---|
25 | using namespace std; |
---|
26 | using namespace boost; |
---|
27 | |
---|
28 | // ... |
---|
29 | |
---|
30 | string str1(" hello world! "); |
---|
31 | to_upper(str1); // str1 == " HELLO WORLD! " |
---|
32 | trim(str1); // str1 == "HELLO WORLD!" |
---|
33 | |
---|
34 | string str2= |
---|
35 | to_lower_copy( |
---|
36 | ireplace_first_copy( |
---|
37 | str1,"hello","goodbye")); // str2 == "goodbye world!" |
---|
38 | </programlisting> |
---|
39 | <para> |
---|
40 | This example converts str1 to upper case and trims spaces from the start and the end |
---|
41 | of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye". |
---|
42 | This example demonstrates several important concepts used in the library: |
---|
43 | </para> |
---|
44 | <itemizedlist> |
---|
45 | <listitem> |
---|
46 | <para><emphasis role="bold">Container parameters:</emphasis> |
---|
47 | Unlike in the STL algorithms, parameters are not specified only in the form |
---|
48 | of iterators. The STL convention allows for great flexibility, |
---|
49 | but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together, |
---|
50 | because a container is passed in two parameters. Therefore it is not possible to use |
---|
51 | a return value from another algorithm. It is considerably easier to write |
---|
52 | <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>. |
---|
53 | </para> |
---|
54 | <para> |
---|
55 | The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> |
---|
56 | provides a uniform way of handling different string types. |
---|
57 | If there is a need to pass a pair of iterators, |
---|
58 | <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> |
---|
59 | can be used to package iterators into a structure with a compatible interface. |
---|
60 | </para> |
---|
61 | </listitem> |
---|
62 | <listitem> |
---|
63 | <para><emphasis role="bold">Copy vs. Mutable:</emphasis> |
---|
64 | Many algorithms in the library are performing a transformation of the input. |
---|
65 | The transformation can be done in-place, mutating the input sequence, or a copy |
---|
66 | of the transformed input can be created, leaving the input intact. None of |
---|
67 | these possibilities is superior to the other one and both have different |
---|
68 | advantages and disadvantages. For this reason, both are provided with the library. |
---|
69 | </para> |
---|
70 | </listitem> |
---|
71 | <listitem> |
---|
72 | <para><emphasis role="bold">Algorithm stacking:</emphasis> |
---|
73 | Copy versions return a transformed input as a result, thus allow a simple chaining of |
---|
74 | transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>). |
---|
75 | Mutable versions have <code>void</code> return, to avoid misuse. |
---|
76 | </para> |
---|
77 | </listitem> |
---|
78 | <listitem> |
---|
79 | <para><emphasis role="bold">Naming:</emphasis> |
---|
80 | Naming follows the conventions from the Standard C++ Library. If there is a |
---|
81 | copy and a mutable version of the same algorithm, the mutable version has no suffix |
---|
82 | and the copy version has the suffix <emphasis>_copy</emphasis>. |
---|
83 | Some algorithms have the prefix <emphasis>i</emphasis> |
---|
84 | (e.g. <functionname>ifind_first()</functionname>). |
---|
85 | This prefix identifies that the algorithm works in a case-insensitive manner. |
---|
86 | </para> |
---|
87 | </listitem> |
---|
88 | </itemizedlist> |
---|
89 | <para> |
---|
90 | To use the library, include the <headername>boost/algorithm/string.hpp</headername> header. |
---|
91 | If the regex related functions are needed, include the |
---|
92 | <headername>boost/algorithm/string_regex.hpp</headername> header. |
---|
93 | </para> |
---|
94 | </section> |
---|
95 | <section> |
---|
96 | <title>Case conversion</title> |
---|
97 | |
---|
98 | <para> |
---|
99 | STL has a nice way of converting character case. Unfortunately, it works only |
---|
100 | for a single character and we want to convert a string, |
---|
101 | </para> |
---|
102 | <programlisting> |
---|
103 | string str1("HeLlO WoRld!"); |
---|
104 | to_upper(str1); // str1=="HELLO WORLD!" |
---|
105 | </programlisting> |
---|
106 | <para> |
---|
107 | <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of |
---|
108 | characters in a string using a specified locale. |
---|
109 | </para> |
---|
110 | <para> |
---|
111 | For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>. |
---|
112 | </para> |
---|
113 | </section> |
---|
114 | <section> |
---|
115 | <title>Predicates and Classification</title> |
---|
116 | <para> |
---|
117 | A part of the library deals with string related predicates. Consider this example: |
---|
118 | </para> |
---|
119 | <programlisting> |
---|
120 | bool is_executable( string& filename ) |
---|
121 | { |
---|
122 | return |
---|
123 | iends_with(filename, ".exe") || |
---|
124 | iends_with(filename, ".com"); |
---|
125 | } |
---|
126 | |
---|
127 | // ... |
---|
128 | string str1("command.com"); |
---|
129 | cout |
---|
130 | << str1 |
---|
131 | << is_executable("command.com")? "is": "is not" |
---|
132 | << "an executable" |
---|
133 | << endl; // prints "command.com is an executable" |
---|
134 | |
---|
135 | //.. |
---|
136 | char text1[]="hello world!"; |
---|
137 | cout |
---|
138 | << text1 |
---|
139 | << all( text1, is_lower() )? "is": "is not" |
---|
140 | << " written in the lower case" |
---|
141 | << endl; // prints "hello world! is written in the lower case" |
---|
142 | </programlisting> |
---|
143 | <para> |
---|
144 | The predicates determine whether if a substring is contained in the input string |
---|
145 | under various conditions. The conditions are: a string starts with the substring, |
---|
146 | ends with the substring, |
---|
147 | simply contains the substring or if both strings are equal. See the reference for |
---|
148 | <headername>boost/algorithm/string/predicate.hpp</headername> for more details. |
---|
149 | </para> |
---|
150 | <para> |
---|
151 | In addition the algorithm <functionname>all()</functionname> checks |
---|
152 | all elements of a container to satisfy a condition specified by a predicate. |
---|
153 | This predicate can be any unary predicate, but the library provides a bunch of |
---|
154 | useful string-related predicates and combinators ready for use. |
---|
155 | These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header. |
---|
156 | Classification predicates can be combined using logical combinators to form |
---|
157 | a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code> |
---|
158 | </para> |
---|
159 | </section> |
---|
160 | <section> |
---|
161 | <title>Trimming</title> |
---|
162 | |
---|
163 | <para> |
---|
164 | When parsing the input from a user, strings usually have unwanted leading or trailing |
---|
165 | characters. To get rid of them, we need trim functions: |
---|
166 | </para> |
---|
167 | <programlisting> |
---|
168 | string str1=" hello world! "; |
---|
169 | string str2=trim_left_copy(str1); // str2 == "hello world! " |
---|
170 | string str3=trim_right_copy(str2); // str3 == " hello world!" |
---|
171 | trim(str1); // str1 == "hello world!" |
---|
172 | |
---|
173 | string phone="00423333444"; |
---|
174 | // remove leading 0 from the phone number |
---|
175 | trim_left_if(phone,is_any_of("0")); // phone == "423333444" |
---|
176 | </programlisting> |
---|
177 | <para> |
---|
178 | It is possible to trim the spaces on the right, on the left or on both sides of a string. |
---|
179 | And for those cases when there is a need to remove something else than blank space, there |
---|
180 | are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will |
---|
181 | select the <emphasis>space</emphasis> to be removed. It is possible to use classification |
---|
182 | predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph. |
---|
183 | See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>. |
---|
184 | </para> |
---|
185 | </section> |
---|
186 | <section> |
---|
187 | <title>Find algorithms</title> |
---|
188 | |
---|
189 | <para> |
---|
190 | The library contains a set of find algorithms. Here is an example: |
---|
191 | </para> |
---|
192 | <programlisting> |
---|
193 | char text[]="hello dolly!"; |
---|
194 | iterator_range<char*> result=find_last(text,"ll"); |
---|
195 | |
---|
196 | transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) ); |
---|
197 | // text = "hello dommy!" |
---|
198 | |
---|
199 | to_upper(result); // text == "hello doMMy!" |
---|
200 | |
---|
201 | // iterator_range is convertible to bool |
---|
202 | if(find_first(text, "dolly")) |
---|
203 | { |
---|
204 | cout << "Dolly is there" << endl; |
---|
205 | } |
---|
206 | </programlisting> |
---|
207 | <para> |
---|
208 | We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll". |
---|
209 | The result is given in the <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>. |
---|
210 | This range delimits the |
---|
211 | part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll". |
---|
212 | |
---|
213 | As we can see, input of the <functionname>find_last()</functionname> algorithm can be also |
---|
214 | char[] because this type is supported by |
---|
215 | <ulink url="../../libs/range/index.html">Boost.Range</ulink>. |
---|
216 | |
---|
217 | The following lines transform the result. Notice that |
---|
218 | <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> has familiar |
---|
219 | <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container. |
---|
220 | Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking. |
---|
221 | </para> |
---|
222 | <para> |
---|
223 | Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>. |
---|
224 | </para> |
---|
225 | </section> |
---|
226 | <section> |
---|
227 | <title>Replace Algorithms</title> |
---|
228 | <para> |
---|
229 | Find algorithms can be used for searching for a specific part of string. Replace goes one step |
---|
230 | further. After a matching part is found, it is substituted with something else. The substitution is computed |
---|
231 | from the original, using some transformation. |
---|
232 | </para> |
---|
233 | <programlisting> |
---|
234 | string str1="Hello Dolly, Hello World!" |
---|
235 | replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!" |
---|
236 | replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!" |
---|
237 | erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!" |
---|
238 | erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!" |
---|
239 | </programlisting> |
---|
240 | <para> |
---|
241 | For the complete list of replace and erase functions see the |
---|
242 | <link linkend="string_algo.reference">reference</link>. |
---|
243 | There is a lot of predefined function for common usage, however, the library allows you to |
---|
244 | define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> |
---|
245 | function which takes two parameters. |
---|
246 | The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is |
---|
247 | a <link linkend="string_algo.formatter_concept">Formatter</link> object. |
---|
248 | The Finder object is a functor which performs the searching for the replacement part. The Formatter object |
---|
249 | takes the result of the Finder (usually a reference to the found substring) and creates a |
---|
250 | substitute for it. Replace algorithm puts these two together and makes the desired substitution. |
---|
251 | </para> |
---|
252 | <para> |
---|
253 | Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and |
---|
254 | <headername>boost/algorithm/string/find_format.hpp</headername> for reference. |
---|
255 | </para> |
---|
256 | </section> |
---|
257 | <section> |
---|
258 | <title>Find Iterator</title> |
---|
259 | |
---|
260 | <para> |
---|
261 | An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string, |
---|
262 | the find iterator allows us to iterate over the substrings matching the specified criteria. |
---|
263 | This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally |
---|
264 | search the string. |
---|
265 | Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> |
---|
266 | object, that delimits the current match. |
---|
267 | </para> |
---|
268 | <para> |
---|
269 | There are two iterators provided <classname>find_iterator</classname> and |
---|
270 | <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified |
---|
271 | Finder. The latter iterates over the gaps between these substrings. |
---|
272 | </para> |
---|
273 | <programlisting> |
---|
274 | string str1("abc-*-ABC-*-aBc"); |
---|
275 | // Find all 'abc' substrings (ignoring the case) |
---|
276 | // Create a find_iterator |
---|
277 | typedef find_iterator<string::iterator> string_find_iterator; |
---|
278 | for(string_find_iterator It= |
---|
279 | make_find_iterator(str1, first_finder("abc", is_iequal())); |
---|
280 | It!=string_find_iterator(); |
---|
281 | ++It) |
---|
282 | { |
---|
283 | cout << copy_range<std::string>(*It) << endl; |
---|
284 | } |
---|
285 | |
---|
286 | // Output will be: |
---|
287 | // abc |
---|
288 | // ABC |
---|
289 | // aBC |
---|
290 | |
---|
291 | typedef split_iterator<string::iterator> string_split_iterator; |
---|
292 | for(string_split_iterator It= |
---|
293 | make_split_iterator(str1, first_finder("-*-", is_iequal())); |
---|
294 | It!=string_split_iterator(); |
---|
295 | ++It) |
---|
296 | { |
---|
297 | cout << copy_range<std::string>(*It) << endl; |
---|
298 | } |
---|
299 | |
---|
300 | // Output will be: |
---|
301 | // abc |
---|
302 | // ABC |
---|
303 | // aBC |
---|
304 | </programlisting> |
---|
305 | <para> |
---|
306 | Note that the find iterators have only one template parameter. It is the base iterator type. |
---|
307 | The Finder is specified at runtime. This allows us to typedef a find iterator for |
---|
308 | common string types and reuse it. Additionally make_*_iterator functions help |
---|
309 | to construct a find iterator for a particular range. |
---|
310 | </para> |
---|
311 | <para> |
---|
312 | See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>. |
---|
313 | </para> |
---|
314 | </section> |
---|
315 | <section> |
---|
316 | <title>Split</title> |
---|
317 | |
---|
318 | <para> |
---|
319 | Split algorithms are an extension to the find iterator for one common usage scenario. |
---|
320 | These algorithms use a find iterator and store all matches into the provided |
---|
321 | container. This container must be able to hold copies (e.g. <code>std::string</code>) or |
---|
322 | references (e.g. <code>iterator_range</code>) of the extracted substrings. |
---|
323 | </para> |
---|
324 | <para> |
---|
325 | Two algorithms are provided. <functionname>find_all()</functionname> finds all copies |
---|
326 | of a string in the input. <functionname>split()</functionname> splits the input into parts. |
---|
327 | </para> |
---|
328 | |
---|
329 | <programlisting> |
---|
330 | string str1("hello abc-*-ABC-*-aBc goodbye"); |
---|
331 | |
---|
332 | typedef vector< iterator_range<string::iterator> > find_vector_type; |
---|
333 | |
---|
334 | find_vector_type FindVec; // #1: Search for separators |
---|
335 | ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] } |
---|
336 | |
---|
337 | typedef vector< string > split_vector_type; |
---|
338 | |
---|
339 | split_vector_type SplitVec; // #2: Search for tokens |
---|
340 | split( SplitVec, str1, is_any_of("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" } |
---|
341 | </programlisting> |
---|
342 | <para> |
---|
343 | <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring. |
---|
344 | </para> |
---|
345 | <para> |
---|
346 | First example show how to construct a container to hold references to all extracted |
---|
347 | substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references |
---|
348 | to all substrings that are in case-insensitive manner equal to "abc". |
---|
349 | </para> |
---|
350 | <para> |
---|
351 | Second example uses <functionname>split()</functionname> to split string str1 into parts |
---|
352 | separated by characters '-' or '*'. These parts are then put into the SplitVec. |
---|
353 | It is possible to specify if adjacent separators are concatenated or not. |
---|
354 | </para> |
---|
355 | <para> |
---|
356 | More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>. |
---|
357 | </para> |
---|
358 | </section> |
---|
359 | </section> |
---|