| 1 | <?xml version="1.0" encoding="utf-8"?> |
|---|
| 2 | <!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN" |
|---|
| 3 | "http://www.boost.org/tools/boostbook/dtd/boostbook.dtd"> |
|---|
| 4 | <section id="string_algo.usage" last-revision="$Date: 2005/12/01 13:42:02 $"> |
|---|
| 5 | <title>Usage</title> |
|---|
| 6 | |
|---|
| 7 | <using-namespace name="boost"/> |
|---|
| 8 | <using-namespace name="boost::algorithm"/> |
|---|
| 9 | |
|---|
| 10 | |
|---|
| 11 | <section> |
|---|
| 12 | <title>First Example</title> |
|---|
| 13 | |
|---|
| 14 | <para> |
|---|
| 15 | Using the algorithms is straightforward. Let us have a look at the first example: |
|---|
| 16 | </para> |
|---|
| 17 | <programlisting> |
|---|
| 18 | #include <boost/algorithm/string.hpp> |
|---|
| 19 | using namespace std; |
|---|
| 20 | using namespace boost; |
|---|
| 21 | |
|---|
| 22 | // ... |
|---|
| 23 | |
|---|
| 24 | string str1(" hello world! "); |
|---|
| 25 | to_upper(str1); // str1 == " HELLO WORLD! " |
|---|
| 26 | trim(str1); // str1 == "HELLO WORLD!" |
|---|
| 27 | |
|---|
| 28 | string str2= |
|---|
| 29 | to_lower_copy( |
|---|
| 30 | ireplace_first_copy( |
|---|
| 31 | str1,"hello","goodbye")); // str2 == "goodbye world!" |
|---|
| 32 | </programlisting> |
|---|
| 33 | <para> |
|---|
| 34 | This example converts str1 to upper case and trims spaces from the start and the end |
|---|
| 35 | of the string. str2 is then created as a copy of str1 with "hello" replaced with "goodbye". |
|---|
| 36 | This example demonstrates several important concepts used in the library: |
|---|
| 37 | </para> |
|---|
| 38 | <itemizedlist> |
|---|
| 39 | <listitem> |
|---|
| 40 | <para><emphasis role="bold">Container parameters:</emphasis> |
|---|
| 41 | Unlike in the STL algorithms, parameters are not specified only in the form |
|---|
| 42 | of iterators. The STL convention allows for great flexibility, |
|---|
| 43 | but it has several limitations. It is not possible to <emphasis>stack</emphasis> algorithms together, |
|---|
| 44 | because a container is passed in two parameters. Therefore it is not possible to use |
|---|
| 45 | a return value from another algorithm. It is considerably easier to write |
|---|
| 46 | <code>to_lower(str1)</code>, than <code>to_lower(str1.begin(), str1.end())</code>. |
|---|
| 47 | </para> |
|---|
| 48 | <para> |
|---|
| 49 | The magic of <ulink url="../../libs/range/index.html">Boost.Range</ulink> |
|---|
| 50 | provides a uniform way of handling different string types. |
|---|
| 51 | If there is a need to pass a pair of iterators, |
|---|
| 52 | <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> |
|---|
| 53 | can be used to package iterators into a structure with a compatible interface. |
|---|
| 54 | </para> |
|---|
| 55 | </listitem> |
|---|
| 56 | <listitem> |
|---|
| 57 | <para><emphasis role="bold">Copy vs. Mutable:</emphasis> |
|---|
| 58 | Many algorithms in the library are performing a transformation of the input. |
|---|
| 59 | The transformation can be done in-place, mutating the input sequence, or a copy |
|---|
| 60 | of the transformed input can be created, leaving the input intact. None of |
|---|
| 61 | these possibilities is superior to the other one and both have different |
|---|
| 62 | advantages and disadvantages. For this reason, both are provided with the library. |
|---|
| 63 | </para> |
|---|
| 64 | </listitem> |
|---|
| 65 | <listitem> |
|---|
| 66 | <para><emphasis role="bold">Algorithm stacking:</emphasis> |
|---|
| 67 | Copy versions return a transformed input as a result, thus allow a simple chaining of |
|---|
| 68 | transformations within one expression (i.e. one can write <code>trim_copy(to_upper_copy(s))</code>). |
|---|
| 69 | Mutable versions have <code>void</code> return, to avoid misuse. |
|---|
| 70 | </para> |
|---|
| 71 | </listitem> |
|---|
| 72 | <listitem> |
|---|
| 73 | <para><emphasis role="bold">Naming:</emphasis> |
|---|
| 74 | Naming follows the conventions from the Standard C++ Library. If there is a |
|---|
| 75 | copy and a mutable version of the same algorithm, the mutable version has no suffix |
|---|
| 76 | and the copy version has the suffix <emphasis>_copy</emphasis>. |
|---|
| 77 | Some algorithms have the prefix <emphasis>i</emphasis> |
|---|
| 78 | (e.g. <functionname>ifind_first()</functionname>). |
|---|
| 79 | This prefix identifies that the algorithm works in a case-insensitive manner. |
|---|
| 80 | </para> |
|---|
| 81 | </listitem> |
|---|
| 82 | </itemizedlist> |
|---|
| 83 | <para> |
|---|
| 84 | To use the library, include the <headername>boost/algorithm/string.hpp</headername> header. |
|---|
| 85 | If the regex related functions are needed, include the |
|---|
| 86 | <headername>boost/algorithm/string_regex.hpp</headername> header. |
|---|
| 87 | </para> |
|---|
| 88 | </section> |
|---|
| 89 | <section> |
|---|
| 90 | <title>Case conversion</title> |
|---|
| 91 | |
|---|
| 92 | <para> |
|---|
| 93 | STL has a nice way of converting character case. Unfortunately, it works only |
|---|
| 94 | for a single character and we want to convert a string, |
|---|
| 95 | </para> |
|---|
| 96 | <programlisting> |
|---|
| 97 | string str1("HeLlO WoRld!"); |
|---|
| 98 | to_upper(str1); // str1=="HELLO WORLD!" |
|---|
| 99 | </programlisting> |
|---|
| 100 | <para> |
|---|
| 101 | <functionname>to_upper()</functionname> and <functionname>to_lower()</functionname> convert the case of |
|---|
| 102 | characters in a string using a specified locale. |
|---|
| 103 | </para> |
|---|
| 104 | <para> |
|---|
| 105 | For more information see the reference for <headername>boost/algorithm/string/case_conv.hpp</headername>. |
|---|
| 106 | </para> |
|---|
| 107 | </section> |
|---|
| 108 | <section> |
|---|
| 109 | <title>Predicates and Classification</title> |
|---|
| 110 | <para> |
|---|
| 111 | A part of the library deals with string related predicates. Consider this example: |
|---|
| 112 | </para> |
|---|
| 113 | <programlisting> |
|---|
| 114 | bool is_executable( string& filename ) |
|---|
| 115 | { |
|---|
| 116 | return |
|---|
| 117 | iends_with(filename, ".exe") || |
|---|
| 118 | iends_with(filename, ".com"); |
|---|
| 119 | } |
|---|
| 120 | |
|---|
| 121 | // ... |
|---|
| 122 | string str1("command.com"); |
|---|
| 123 | cout |
|---|
| 124 | << str1 |
|---|
| 125 | << is_executable("command.com")? "is": "is not" |
|---|
| 126 | << "an executable" |
|---|
| 127 | << endl; // prints "command.com is an executable" |
|---|
| 128 | |
|---|
| 129 | //.. |
|---|
| 130 | char text1[]="hello world!"; |
|---|
| 131 | cout |
|---|
| 132 | << text1 |
|---|
| 133 | << all( text1, is_lower() )? "is": "is not" |
|---|
| 134 | << " written in the lower case" |
|---|
| 135 | << endl; // prints "hello world! is written in the lower case" |
|---|
| 136 | </programlisting> |
|---|
| 137 | <para> |
|---|
| 138 | The predicates determine whether if a substring is contained in the input string |
|---|
| 139 | under various conditions. The conditions are: a string starts with the substring, |
|---|
| 140 | ends with the substring, |
|---|
| 141 | simply contains the substring or if both strings are equal. See the reference for |
|---|
| 142 | <headername>boost/algorithm/string/predicate.hpp</headername> for more details. |
|---|
| 143 | </para> |
|---|
| 144 | <para> |
|---|
| 145 | In addition the algorithm <functionname>all()</functionname> checks |
|---|
| 146 | all elements of a container to satisfy a condition specified by a predicate. |
|---|
| 147 | This predicate can be any unary predicate, but the library provides a bunch of |
|---|
| 148 | useful string-related predicates and combinators ready for use. |
|---|
| 149 | These are located in the <headername>boost/algorithm/string/classification.hpp</headername> header. |
|---|
| 150 | Classification predicates can be combined using logical combinators to form |
|---|
| 151 | a more complex expressions. For example: <code>is_from_range('a','z') || is_digit()</code> |
|---|
| 152 | </para> |
|---|
| 153 | </section> |
|---|
| 154 | <section> |
|---|
| 155 | <title>Trimming</title> |
|---|
| 156 | |
|---|
| 157 | <para> |
|---|
| 158 | When parsing the input from a user, strings usually have unwanted leading or trailing |
|---|
| 159 | characters. To get rid of them, we need trim functions: |
|---|
| 160 | </para> |
|---|
| 161 | <programlisting> |
|---|
| 162 | string str1=" hello world! "; |
|---|
| 163 | string str2=trim_left_copy(str1); // str2 == "hello world! " |
|---|
| 164 | string str3=trim_right_copy(str2); // str3 == " hello world!" |
|---|
| 165 | trim(str1); // str1 == "hello world!" |
|---|
| 166 | |
|---|
| 167 | string phone="00423333444"; |
|---|
| 168 | // remove leading 0 from the phone number |
|---|
| 169 | trim_left_if(phone,is_any_of("0")); // phone == "423333444" |
|---|
| 170 | </programlisting> |
|---|
| 171 | <para> |
|---|
| 172 | It is possible to trim the spaces on the right, on the left or on both sides of a string. |
|---|
| 173 | And for those cases when there is a need to remove something else than blank space, there |
|---|
| 174 | are <emphasis>_if</emphasis> variants. Using these, a user can specify a functor which will |
|---|
| 175 | select the <emphasis>space</emphasis> to be removed. It is possible to use classification |
|---|
| 176 | predicates like <functionname>is_digit()</functionname> mentioned in the previous paragraph. |
|---|
| 177 | See the reference for the <headername>boost/algorithm/string/trim.hpp</headername>. |
|---|
| 178 | </para> |
|---|
| 179 | </section> |
|---|
| 180 | <section> |
|---|
| 181 | <title>Find algorithms</title> |
|---|
| 182 | |
|---|
| 183 | <para> |
|---|
| 184 | The library contains a set of find algorithms. Here is an example: |
|---|
| 185 | </para> |
|---|
| 186 | <programlisting> |
|---|
| 187 | char text[]="hello dolly!"; |
|---|
| 188 | iterator_range<char*> result=find_last(text,"ll"); |
|---|
| 189 | |
|---|
| 190 | transform( result.begin(), result.end(), result.begin(), bind2nd(plus<char>(), 1) ); |
|---|
| 191 | // text = "hello dommy!" |
|---|
| 192 | |
|---|
| 193 | to_upper(result); // text == "hello doMMy!" |
|---|
| 194 | |
|---|
| 195 | // iterator_range is convertible to bool |
|---|
| 196 | if(find_first(text, "dolly")) |
|---|
| 197 | { |
|---|
| 198 | cout << "Dolly is there" << endl; |
|---|
| 199 | } |
|---|
| 200 | </programlisting> |
|---|
| 201 | <para> |
|---|
| 202 | We have used <functionname>find_last()</functionname> to search the <code>text</code> for "ll". |
|---|
| 203 | The result is given in the <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink>. |
|---|
| 204 | This range delimits the |
|---|
| 205 | part of the input which satisfies the find criteria. In our example it is the last occurrence of "ll". |
|---|
| 206 | |
|---|
| 207 | As we can see, input of the <functionname>find_last()</functionname> algorithm can be also |
|---|
| 208 | char[] because this type is supported by |
|---|
| 209 | <ulink url="../../libs/range/index.html">Boost.Range</ulink>. |
|---|
| 210 | |
|---|
| 211 | The following lines transform the result. Notice that |
|---|
| 212 | <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> has familiar |
|---|
| 213 | <code>begin()</code> and <code>end()</code> methods, so it can be used like any other STL container. |
|---|
| 214 | Also it is convertible to bool therefore it is easy to use find algorithms for a simple containment checking. |
|---|
| 215 | </para> |
|---|
| 216 | <para> |
|---|
| 217 | Find algorithms are located in <headername>boost/algorithm/string/find.hpp</headername>. |
|---|
| 218 | </para> |
|---|
| 219 | </section> |
|---|
| 220 | <section> |
|---|
| 221 | <title>Replace Algorithms</title> |
|---|
| 222 | <para> |
|---|
| 223 | Find algorithms can be used for searching for a specific part of string. Replace goes one step |
|---|
| 224 | further. After a matching part is found, it is substituted with something else. The substitution is computed |
|---|
| 225 | from the original, using some transformation. |
|---|
| 226 | </para> |
|---|
| 227 | <programlisting> |
|---|
| 228 | string str1="Hello Dolly, Hello World!" |
|---|
| 229 | replace_first(str1, "Dolly", "Jane"); // str1 == "Hello Jane, Hello World!" |
|---|
| 230 | replace_last(str1, "Hello", "Goodbye"); // str1 == "Hello Jane, Goodbye World!" |
|---|
| 231 | erase_all(str1, " "); // str1 == "HelloJane,GoodbyeWorld!" |
|---|
| 232 | erase_head(str1, 6); // str1 == "Jane,GoodbyeWorld!" |
|---|
| 233 | </programlisting> |
|---|
| 234 | <para> |
|---|
| 235 | For the complete list of replace and erase functions see the |
|---|
| 236 | <link linkend="string_algo.reference">reference</link>. |
|---|
| 237 | There is a lot of predefined function for common usage, however, the library allows you to |
|---|
| 238 | define a custom <code>replace()</code> that suits a specific need. There is a generic <functionname>find_format()</functionname> |
|---|
| 239 | function which takes two parameters. |
|---|
| 240 | The first one is a <link linkend="string_algo.finder_concept">Finder</link> object, the second one is |
|---|
| 241 | a <link linkend="string_algo.formatter_concept">Formatter</link> object. |
|---|
| 242 | The Finder object is a functor which performs the searching for the replacement part. The Formatter object |
|---|
| 243 | takes the result of the Finder (usually a reference to the found substring) and creates a |
|---|
| 244 | substitute for it. Replace algorithm puts these two together and makes the desired substitution. |
|---|
| 245 | </para> |
|---|
| 246 | <para> |
|---|
| 247 | Check <headername>boost/algorithm/string/replace.hpp</headername>, <headername>boost/algorithm/string/erase.hpp</headername> and |
|---|
| 248 | <headername>boost/algorithm/string/find_format.hpp</headername> for reference. |
|---|
| 249 | </para> |
|---|
| 250 | </section> |
|---|
| 251 | <section> |
|---|
| 252 | <title>Find Iterator</title> |
|---|
| 253 | |
|---|
| 254 | <para> |
|---|
| 255 | An extension to find algorithms it the Find Iterator. Instead of searching for just a one part of a string, |
|---|
| 256 | the find iterator allows us to iterate over the substrings matching the specified criteria. |
|---|
| 257 | This facility is using the <link linkend="string_algo.finder_concept">Finder</link> to incrementally |
|---|
| 258 | search the string. |
|---|
| 259 | Dereferencing a find iterator yields an <ulink url="../../libs/range/doc/utility_class.html"><code>boost::iterator_range</code></ulink> |
|---|
| 260 | object, that delimits the current match. |
|---|
| 261 | </para> |
|---|
| 262 | <para> |
|---|
| 263 | There are two iterators provided <classname>find_iterator</classname> and |
|---|
| 264 | <classname>split_iterator</classname>. The former iterates over substrings that are found using the specified |
|---|
| 265 | Finder. The latter iterates over the gaps between these substrings. |
|---|
| 266 | </para> |
|---|
| 267 | <programlisting> |
|---|
| 268 | string str1("abc-*-ABC-*-aBc"); |
|---|
| 269 | // Find all 'abc' substrings (ignoring the case) |
|---|
| 270 | // Create a find_iterator |
|---|
| 271 | typedef find_iterator<string::iterator> string_find_iterator; |
|---|
| 272 | for(string_find_iterator It= |
|---|
| 273 | make_find_iterator(str1, first_finder("abc", is_iequal())); |
|---|
| 274 | It!=string_find_iterator(); |
|---|
| 275 | ++It) |
|---|
| 276 | { |
|---|
| 277 | cout << copy_range<std::string>(*It) << endl; |
|---|
| 278 | } |
|---|
| 279 | |
|---|
| 280 | // Output will be: |
|---|
| 281 | // abc |
|---|
| 282 | // ABC |
|---|
| 283 | // aBC |
|---|
| 284 | |
|---|
| 285 | typedef split_iterator<string::iterator> string_split_iterator; |
|---|
| 286 | for(string_find_iterator It= |
|---|
| 287 | make_split_iterator(str1, first_finder("-*-", is_iequal())); |
|---|
| 288 | It!=string_find_iterator(); |
|---|
| 289 | ++It) |
|---|
| 290 | { |
|---|
| 291 | cout << copy_range<std::string>(*It) << endl; |
|---|
| 292 | } |
|---|
| 293 | |
|---|
| 294 | // Output will be: |
|---|
| 295 | // abc |
|---|
| 296 | // ABC |
|---|
| 297 | // aBC |
|---|
| 298 | </programlisting> |
|---|
| 299 | <para> |
|---|
| 300 | Note that the find iterators have only one template parameter. It is the base iterator type. |
|---|
| 301 | The Finder is specified at runtime. This allows us to typedef a find iterator for |
|---|
| 302 | common string types and reuse it. Additionally make_*_iterator functions help |
|---|
| 303 | to construct a find iterator for a particular range. |
|---|
| 304 | </para> |
|---|
| 305 | <para> |
|---|
| 306 | See the reference in <headername>boost/algorithm/string/find_iterator.hpp</headername>. |
|---|
| 307 | </para> |
|---|
| 308 | </section> |
|---|
| 309 | <section> |
|---|
| 310 | <title>Split</title> |
|---|
| 311 | |
|---|
| 312 | <para> |
|---|
| 313 | Split algorithms are an extension to the find iterator for one common usage scenario. |
|---|
| 314 | These algorithms use a find iterator and store all matches into the provided |
|---|
| 315 | container. This container must be able to hold copies (e.g. <code>std::string</code>) or |
|---|
| 316 | references (e.g. <code>iterator_range</code>) of the extracted substrings. |
|---|
| 317 | </para> |
|---|
| 318 | <para> |
|---|
| 319 | Two algorithms are provided. <functionname>find_all()</functionname> finds all copies |
|---|
| 320 | of a string in the input. <functionname>split()</functionname> splits the input into parts. |
|---|
| 321 | </para> |
|---|
| 322 | |
|---|
| 323 | <programlisting> |
|---|
| 324 | string str1("hello abc-*-ABC-*-aBc goodbye"); |
|---|
| 325 | |
|---|
| 326 | typedef vector< iterator_range<string::iterator> > find_vector_type; |
|---|
| 327 | |
|---|
| 328 | find_vector_type FindVec; // #1: Search for separators |
|---|
| 329 | ifind_all( FindVec, str1, "abc" ); // FindVec == { [abc],[ABC],[aBc] } |
|---|
| 330 | |
|---|
| 331 | typedef vector< string > split_vector_type; |
|---|
| 332 | |
|---|
| 333 | split_vector_type SplitVec; // #2: Search for tokens |
|---|
| 334 | split( SplitVec, str1, is_any_of("-*") ); // SplitVec == { "hello abc","ABC","aBc goodbye" } |
|---|
| 335 | </programlisting> |
|---|
| 336 | <para> |
|---|
| 337 | <code>[hello]</code> designates an <code>iterator_range</code> delimiting this substring. |
|---|
| 338 | </para> |
|---|
| 339 | <para> |
|---|
| 340 | First example show how to construct a container to hold references to all extracted |
|---|
| 341 | substrings. Algorithm <functionname>ifind_all()</functionname> puts into FindVec references |
|---|
| 342 | to all substrings that are in case-insensitive manner equal to "abc". |
|---|
| 343 | </para> |
|---|
| 344 | <para> |
|---|
| 345 | Second example uses <functionname>split()</functionname> to split string str1 into parts |
|---|
| 346 | separated by characters '-' or '*'. These parts are then put into the SplitVec. |
|---|
| 347 | It is possible to specify if adjacent separators are concatenated or not. |
|---|
| 348 | </para> |
|---|
| 349 | <para> |
|---|
| 350 | More information can be found in the reference: <headername>boost/algorithm/string/split.hpp</headername>. |
|---|
| 351 | </para> |
|---|
| 352 | </section> |
|---|
| 353 | </section> |
|---|