| 1 | <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> |
|---|
| 2 | |
|---|
| 3 | <html> |
|---|
| 4 | <head> |
|---|
| 5 | <meta http-equiv="Content-Language" content="en-us"> |
|---|
| 6 | <meta http-equiv="Content-Type" content="text/html; charset=us-ascii"> |
|---|
| 7 | <meta name="GENERATOR" content="Microsoft FrontPage 6.0"> |
|---|
| 8 | <meta name="ProgId" content="FrontPage.Editor.Document"> |
|---|
| 9 | |
|---|
| 10 | <title>Boost Char Separator</title> |
|---|
| 11 | </head> |
|---|
| 12 | |
|---|
| 13 | <body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink= |
|---|
| 14 | "#FF0000"> |
|---|
| 15 | <p><img src="../../boost.png" alt="C++ Boost" width="277" height= |
|---|
| 16 | "86"><br></p> |
|---|
| 17 | |
|---|
| 18 | <h1>char_separator<Char, Traits></h1> |
|---|
| 19 | |
|---|
| 20 | <p>The <tt>char_separator</tt> class breaks a sequence of characters into |
|---|
| 21 | tokens based on character delimiters much in the same way that |
|---|
| 22 | <tt>strtok()</tt> does (but without all the evils of non-reentrancy and |
|---|
| 23 | destruction of the input sequence).</p> |
|---|
| 24 | |
|---|
| 25 | <p>The <tt>char_separator</tt> class is used in conjunction with the |
|---|
| 26 | <a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href= |
|---|
| 27 | "tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p> |
|---|
| 28 | |
|---|
| 29 | <h2>Definitions</h2> |
|---|
| 30 | |
|---|
| 31 | <p>The <tt>strtok()</tt> function does not include matches with the |
|---|
| 32 | character delimiters in the output sequence of tokens. However, sometimes |
|---|
| 33 | it is useful to have the delimiters show up in the output sequence, |
|---|
| 34 | therefore <tt>char_separator</tt> provides this as an option. We refer to |
|---|
| 35 | delimiters that show up as output tokens as <b><i>kept delimiters</i></b> |
|---|
| 36 | and delimiters that do now show up as output tokens as <b><i>dropped |
|---|
| 37 | delimiters</i></b>.</p> |
|---|
| 38 | |
|---|
| 39 | <p>When two delimiters appear next to each other in the input sequence, |
|---|
| 40 | there is the question of whether to output an <b><i>empty token</i></b> or |
|---|
| 41 | to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The |
|---|
| 42 | <tt>char_separator</tt> class provides both options.</p> |
|---|
| 43 | |
|---|
| 44 | <h2>Examples</h2> |
|---|
| 45 | |
|---|
| 46 | <p>This first examples shows how to use <tt>char_separator</tt> as a |
|---|
| 47 | replacement for the <tt>strtok()</tt> function. We've specified three |
|---|
| 48 | character delimiters, and they will not show up as output tokens. We have |
|---|
| 49 | not specified any kept delimiters, and by default any empty tokens will be |
|---|
| 50 | ignored.</p> |
|---|
| 51 | |
|---|
| 52 | <blockquote> |
|---|
| 53 | <pre> |
|---|
| 54 | // char_sep_example_1.cpp |
|---|
| 55 | #include <iostream> |
|---|
| 56 | #include <boost/tokenizer.hpp> |
|---|
| 57 | #include <string> |
|---|
| 58 | |
|---|
| 59 | int main() |
|---|
| 60 | { |
|---|
| 61 | std::string str = ";;Hello|world||-foo--bar;yow;baz|"; |
|---|
| 62 | typedef boost::tokenizer<boost::char_separator<char> > |
|---|
| 63 | tokenizer; |
|---|
| 64 | boost::char_separator<char> sep("-;|"); |
|---|
| 65 | tokenizer tokens(str, sep); |
|---|
| 66 | for (tokenizer::iterator tok_iter = tokens.begin(); |
|---|
| 67 | tok_iter != tokens.end(); ++tok_iter) |
|---|
| 68 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 69 | std::cout << "\n"; |
|---|
| 70 | return EXIT_SUCCESS; |
|---|
| 71 | } |
|---|
| 72 | </pre> |
|---|
| 73 | </blockquote>The output is: |
|---|
| 74 | |
|---|
| 75 | <blockquote> |
|---|
| 76 | <pre> |
|---|
| 77 | <Hello> <world> <foo> <bar> <yow> <baz> |
|---|
| 78 | </pre> |
|---|
| 79 | </blockquote> |
|---|
| 80 | |
|---|
| 81 | <p>The next example shows tokenizing with two dropped delimiters '-' and |
|---|
| 82 | ';' and a single kept delimiter '|'. We also specify that empty tokens |
|---|
| 83 | should show up in the output when two delimiters are next to each |
|---|
| 84 | other.</p> |
|---|
| 85 | |
|---|
| 86 | <blockquote> |
|---|
| 87 | <pre> |
|---|
| 88 | // char_sep_example_2.cpp |
|---|
| 89 | #include <iostream> |
|---|
| 90 | #include <boost/tokenizer.hpp> |
|---|
| 91 | #include <string> |
|---|
| 92 | |
|---|
| 93 | int main() |
|---|
| 94 | { |
|---|
| 95 | std::string str = ";;Hello|world||-foo--bar;yow;baz|"; |
|---|
| 96 | typedef boost::tokenizer<boost::char_separator<char> > |
|---|
| 97 | tokenizer; |
|---|
| 98 | boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens); |
|---|
| 99 | tokenizer tokens(str, sep); |
|---|
| 100 | for (tokenizer::iterator tok_iter = tokens.begin(); |
|---|
| 101 | tok_iter != tokens.end(); ++tok_iter) |
|---|
| 102 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 103 | std::cout << "\n"; |
|---|
| 104 | return EXIT_SUCCESS; |
|---|
| 105 | } |
|---|
| 106 | </pre> |
|---|
| 107 | </blockquote>The output is: |
|---|
| 108 | |
|---|
| 109 | <blockquote> |
|---|
| 110 | <pre> |
|---|
| 111 | <> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <> |
|---|
| 112 | </pre> |
|---|
| 113 | </blockquote> |
|---|
| 114 | |
|---|
| 115 | <p>The final example shows tokenizing on punctuation and whitespace |
|---|
| 116 | characters using the default constructor of the |
|---|
| 117 | <tt>char_separator</tt>.</p> |
|---|
| 118 | |
|---|
| 119 | <blockquote> |
|---|
| 120 | <pre> |
|---|
| 121 | // char_sep_example_3.cpp |
|---|
| 122 | #include <iostream> |
|---|
| 123 | #include <boost/tokenizer.hpp> |
|---|
| 124 | #include <string> |
|---|
| 125 | |
|---|
| 126 | int main() |
|---|
| 127 | { |
|---|
| 128 | std::string str = "This is, a test"; |
|---|
| 129 | typedef boost::tokenizer<boost::char_separator<char> > Tok; |
|---|
| 130 | boost::char_separator<char> sep; // default constructed |
|---|
| 131 | Tok tok(str, sep); |
|---|
| 132 | for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter) |
|---|
| 133 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 134 | std::cout << "\n"; |
|---|
| 135 | return EXIT_SUCCESS; |
|---|
| 136 | } |
|---|
| 137 | </pre> |
|---|
| 138 | </blockquote>The output is: |
|---|
| 139 | |
|---|
| 140 | <blockquote> |
|---|
| 141 | <pre> |
|---|
| 142 | <This> <is> <,> <a> <test> |
|---|
| 143 | </pre> |
|---|
| 144 | </blockquote> |
|---|
| 145 | |
|---|
| 146 | <h2>Template parameters</h2> |
|---|
| 147 | |
|---|
| 148 | <table border summary=""> |
|---|
| 149 | <tr> |
|---|
| 150 | <th>Parameter</th> |
|---|
| 151 | |
|---|
| 152 | <th>Description</th> |
|---|
| 153 | |
|---|
| 154 | <th>Default</th> |
|---|
| 155 | </tr> |
|---|
| 156 | |
|---|
| 157 | <tr> |
|---|
| 158 | <td><tt>Char</tt></td> |
|---|
| 159 | |
|---|
| 160 | <td>The type of elements within a token, typically <tt>char</tt>.</td> |
|---|
| 161 | |
|---|
| 162 | <td> </td> |
|---|
| 163 | </tr> |
|---|
| 164 | |
|---|
| 165 | <tr> |
|---|
| 166 | <td><tt>Traits</tt></td> |
|---|
| 167 | |
|---|
| 168 | <td>The <tt>char_traits</tt> for the character type.</td> |
|---|
| 169 | |
|---|
| 170 | <td><tt>char_traits<char></tt></td> |
|---|
| 171 | </tr> |
|---|
| 172 | </table> |
|---|
| 173 | |
|---|
| 174 | <h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a> |
|---|
| 175 | |
|---|
| 176 | <h2>Members</h2> |
|---|
| 177 | <hr> |
|---|
| 178 | <pre> |
|---|
| 179 | explicit char_separator(const Char* dropped_delims, |
|---|
| 180 | const Char* kept_delims = "", |
|---|
| 181 | empty_token_policy empty_tokens = drop_empty_tokens) |
|---|
| 182 | </pre> |
|---|
| 183 | |
|---|
| 184 | <p>This creates a <tt>char_separator</tt> object, which can then be used to |
|---|
| 185 | create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or |
|---|
| 186 | <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The |
|---|
| 187 | <tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters |
|---|
| 188 | where each character is used as delimiter during tokenizing. Whenever a |
|---|
| 189 | delimiter is seen in the input sequence, the current token is finished, and |
|---|
| 190 | a new token begins. The delimiters in <tt>dropped_delims</tt> do not show |
|---|
| 191 | up as tokens in the output whereas the delimiters in <tt>kept_delims</tt> |
|---|
| 192 | do show up as tokens. If <tt>empty_tokens</tt> is |
|---|
| 193 | <tt>drop_empty_tokens</tt>, then empty tokens will not show up in the |
|---|
| 194 | output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty |
|---|
| 195 | tokens will show up in the output.</p> |
|---|
| 196 | <hr> |
|---|
| 197 | <pre> |
|---|
| 198 | explicit char_separator() |
|---|
| 199 | </pre> |
|---|
| 200 | |
|---|
| 201 | <p>The function <tt>std::isspace()</tt> is used to identify dropped |
|---|
| 202 | delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters. |
|---|
| 203 | In addition, empty tokens are dropped.</p> |
|---|
| 204 | <hr> |
|---|
| 205 | <pre> |
|---|
| 206 | template <typename InputIterator, typename Token> |
|---|
| 207 | bool operator()(InputIterator& next, InputIterator end, Token& tok) |
|---|
| 208 | </pre> |
|---|
| 209 | |
|---|
| 210 | <p>This function is called by the <a href= |
|---|
| 211 | "token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The |
|---|
| 212 | user typically does not call this function directly.</p> |
|---|
| 213 | <hr> |
|---|
| 214 | |
|---|
| 215 | <p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src= |
|---|
| 216 | "http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01 Transitional" |
|---|
| 217 | height="31" width="88"></a></p> |
|---|
| 218 | |
|---|
| 219 | <p>Revised |
|---|
| 220 | <!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25 |
|---|
| 221 | December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p> |
|---|
| 222 | |
|---|
| 223 | <p><i>Copyright © 2001-2002 Jeremy Siek and John R. Bandela</i></p> |
|---|
| 224 | |
|---|
| 225 | <p><i>Distributed under the Boost Software License, Version 1.0. (See |
|---|
| 226 | accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or |
|---|
| 227 | copy at <a href= |
|---|
| 228 | "http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> |
|---|
| 229 | </body> |
|---|
| 230 | </html> |
|---|