| 1 | <html> |
|---|
| 2 | |
|---|
| 3 | <head> |
|---|
| 4 | <meta http-equiv="Content-Type" |
|---|
| 5 | content="text/html; charset=iso-8859-1"> |
|---|
| 6 | <meta name="GENERATOR" content="Microsoft FrontPage Express 2.0"> |
|---|
| 7 | <title>Boost Char Separator</title> |
|---|
| 8 | <!-- |
|---|
| 9 | -- Copyright © Jeremy Siek and John Bandela 2001-2002 |
|---|
| 10 | -- |
|---|
| 11 | -- Permission to use, copy, modify, distribute and sell this software |
|---|
| 12 | -- and its documentation for any purpose is hereby granted without fee, |
|---|
| 13 | -- provided that the above copyright notice appears in all copies and |
|---|
| 14 | -- that both that copyright notice and this permission notice appear |
|---|
| 15 | -- in supporting documentation. Jeremy Siek makes no |
|---|
| 16 | -- representations about the suitability of this software for any |
|---|
| 17 | -- purpose. It is provided "as is" without express or implied warranty. |
|---|
| 18 | --> |
|---|
| 19 | </head> |
|---|
| 20 | |
|---|
| 21 | <body bgcolor="#FFFFFF" text="#000000" link="#0000EE" |
|---|
| 22 | vlink="#551A8B" alink="#FF0000"> |
|---|
| 23 | |
|---|
| 24 | <p><img src="../../boost.png" alt="C++ Boost" width="277" |
|---|
| 25 | height="86"> <br> |
|---|
| 26 | </p> |
|---|
| 27 | |
|---|
| 28 | <h1> |
|---|
| 29 | char_separator<Char, Traits> |
|---|
| 30 | </h1> |
|---|
| 31 | |
|---|
| 32 | <p> |
|---|
| 33 | The <tt>char_separator</tt> class breaks a sequence of characters into |
|---|
| 34 | tokens based on character delimiters much in the same way that |
|---|
| 35 | <tt>strtok()</tt> does (but without all the evils of non-reentrancy |
|---|
| 36 | and destruction of the input sequence). |
|---|
| 37 | </p> |
|---|
| 38 | |
|---|
| 39 | <p> |
|---|
| 40 | The <tt>char_separator</tt> class is used in conjunction with the <a |
|---|
| 41 | href="token_iterator.htm"><tt>token_iterator</tt></a> or <a |
|---|
| 42 | href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. |
|---|
| 43 | </p> |
|---|
| 44 | |
|---|
| 45 | <h2>Definitions</h2> |
|---|
| 46 | |
|---|
| 47 | <p> |
|---|
| 48 | The <tt>strtok()</tt> function does not include matches with the |
|---|
| 49 | character delimiters in the output sequence of tokens. However, |
|---|
| 50 | sometimes it is useful to have the delimiters show up in the output |
|---|
| 51 | sequence, therefore <tt>char_separator</tt> provides this as an |
|---|
| 52 | option. We refer to delimiters that show up as output tokens as |
|---|
| 53 | <b><i>kept delimiters</i></b> and delimiters that do now show up as |
|---|
| 54 | output tokens as <b><i>dropped delimiters</i></b>. |
|---|
| 55 | </p> |
|---|
| 56 | |
|---|
| 57 | <p> |
|---|
| 58 | When two delimiters appear next to each other in the input sequence, |
|---|
| 59 | there is the question of whether to output an <b><i>empty |
|---|
| 60 | token</i></b> or to skip ahead. The behaviour of <tt>strtok()</tt> is |
|---|
| 61 | to skip ahead. The <tt>char_separator</tt> class provides both |
|---|
| 62 | options. |
|---|
| 63 | </p> |
|---|
| 64 | |
|---|
| 65 | |
|---|
| 66 | <h2>Examples</h2> |
|---|
| 67 | |
|---|
| 68 | <p> |
|---|
| 69 | This first examples shows how to use <tt>char_separator</tt> as a |
|---|
| 70 | replacement for the <tt>strtok()</tt> function. We've specified three |
|---|
| 71 | character delimiters, and they will not show up as output tokens. We |
|---|
| 72 | have not specified any kept delimiters, and by default any empty |
|---|
| 73 | tokens will be ignored. |
|---|
| 74 | </p> |
|---|
| 75 | |
|---|
| 76 | <blockquote> |
|---|
| 77 | <pre>// char_sep_example_1.cpp |
|---|
| 78 | #include <iostream> |
|---|
| 79 | #include <boost/tokenizer.hpp> |
|---|
| 80 | #include <string> |
|---|
| 81 | |
|---|
| 82 | int main() |
|---|
| 83 | { |
|---|
| 84 | std::string str = ";;Hello|world||-foo--bar;yow;baz|"; |
|---|
| 85 | typedef boost::tokenizer<boost::char_separator<char> > |
|---|
| 86 | tokenizer; |
|---|
| 87 | boost::char_separator<char> sep("-;|"); |
|---|
| 88 | tokenizer tokens(str, sep); |
|---|
| 89 | for (tokenizer::iterator tok_iter = tokens.begin(); |
|---|
| 90 | tok_iter != tokens.end(); ++tok_iter) |
|---|
| 91 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 92 | std::cout << "\n"; |
|---|
| 93 | return EXIT_SUCCESS; |
|---|
| 94 | } |
|---|
| 95 | </pre> |
|---|
| 96 | </blockquote> |
|---|
| 97 | The output is: |
|---|
| 98 | <blockquote> |
|---|
| 99 | <pre> |
|---|
| 100 | <Hello> <world> <foo> <bar> <yow> <baz> |
|---|
| 101 | </pre> |
|---|
| 102 | </blockquote> |
|---|
| 103 | |
|---|
| 104 | |
|---|
| 105 | <p> |
|---|
| 106 | The next example shows tokenizing with two dropped delimiters '-' and |
|---|
| 107 | ';' and a single kept delimiter '|'. We also specify that empty tokens |
|---|
| 108 | should show up in the output when two delimiters are next to each |
|---|
| 109 | other. |
|---|
| 110 | </p> |
|---|
| 111 | |
|---|
| 112 | <blockquote> |
|---|
| 113 | <pre>// char_sep_example_2.cpp |
|---|
| 114 | #include <iostream> |
|---|
| 115 | #include <boost/tokenizer.hpp> |
|---|
| 116 | #include <string> |
|---|
| 117 | |
|---|
| 118 | int main() |
|---|
| 119 | { |
|---|
| 120 | std::string str = ";;Hello|world||-foo--bar;yow;baz|"; |
|---|
| 121 | typedef boost::tokenizer<boost::char_separator<char> > |
|---|
| 122 | tokenizer; |
|---|
| 123 | boost::char_separator<char> sep("-;", "|", boost::keep_empty_tokens); |
|---|
| 124 | tokenizer tokens(str, sep); |
|---|
| 125 | for (tokenizer::iterator tok_iter = tokens.begin(); |
|---|
| 126 | tok_iter != tokens.end(); ++tok_iter) |
|---|
| 127 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 128 | std::cout << "\n"; |
|---|
| 129 | return EXIT_SUCCESS; |
|---|
| 130 | } |
|---|
| 131 | </pre> |
|---|
| 132 | </blockquote> |
|---|
| 133 | The output is: |
|---|
| 134 | <blockquote> |
|---|
| 135 | <pre> |
|---|
| 136 | <> <> <Hello> <|> <world> <|> <> <|> <> <foo> <> <bar> <yow> <baz> <|> <> |
|---|
| 137 | </pre> |
|---|
| 138 | </blockquote> |
|---|
| 139 | |
|---|
| 140 | <p> |
|---|
| 141 | The final example shows tokenizing on punctuation and whitespace |
|---|
| 142 | characters using the default constructor of the |
|---|
| 143 | <tt>char_separator</tt>. |
|---|
| 144 | </p> |
|---|
| 145 | |
|---|
| 146 | <blockquote> |
|---|
| 147 | <pre>// char_sep_example_3.cpp |
|---|
| 148 | #include <iostream> |
|---|
| 149 | #include <boost/tokenizer.hpp> |
|---|
| 150 | #include <string> |
|---|
| 151 | |
|---|
| 152 | int main() |
|---|
| 153 | { |
|---|
| 154 | std::string str = "This is, a test"; |
|---|
| 155 | typedef boost::tokenizer<boost::char_separator<char> > Tok; |
|---|
| 156 | boost::char_separator<char> sep; // default constructed |
|---|
| 157 | Tok tok(str, sep); |
|---|
| 158 | for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter) |
|---|
| 159 | std::cout << "<" << *tok_iter << "> "; |
|---|
| 160 | std::cout << "\n"; |
|---|
| 161 | return EXIT_SUCCESS; |
|---|
| 162 | } |
|---|
| 163 | </pre> |
|---|
| 164 | </blockquote> |
|---|
| 165 | The output is: |
|---|
| 166 | <blockquote> |
|---|
| 167 | <pre> |
|---|
| 168 | <This> <is> <,> <a> <test> |
|---|
| 169 | </pre> |
|---|
| 170 | </blockquote> |
|---|
| 171 | |
|---|
| 172 | <h2>Template parameters</h2> |
|---|
| 173 | |
|---|
| 174 | <P> |
|---|
| 175 | <table border> |
|---|
| 176 | <TR> |
|---|
| 177 | <th>Parameter</th><th>Description</th><th>Default</th> |
|---|
| 178 | </tr> |
|---|
| 179 | |
|---|
| 180 | <TR><TD><TT>Char</TT></TD> |
|---|
| 181 | <TD>The type of elements within a token, typically <tt>char</tt>.</TD> |
|---|
| 182 | <TD> </TD> |
|---|
| 183 | </TR> |
|---|
| 184 | |
|---|
| 185 | <TR><TD><TT>Traits</TT></TD> |
|---|
| 186 | <TD>The <tt>char_traits</tt> for the character type.</TD> |
|---|
| 187 | <TD><tt>char_traits<char></tt></TD> |
|---|
| 188 | </TR> |
|---|
| 189 | |
|---|
| 190 | </table> |
|---|
| 191 | |
|---|
| 192 | <h2>Model of</h2> |
|---|
| 193 | |
|---|
| 194 | <a href="tokenizerfunction.htm">Tokenizer Function</a> |
|---|
| 195 | |
|---|
| 196 | |
|---|
| 197 | <h2>Members</h2> |
|---|
| 198 | |
|---|
| 199 | <hr> |
|---|
| 200 | <pre> |
|---|
| 201 | explicit char_separator(const Char* dropped_delims, |
|---|
| 202 | const Char* kept_delims = "", |
|---|
| 203 | empty_token_policy empty_tokens = drop_empty_tokens) |
|---|
| 204 | </pre> |
|---|
| 205 | |
|---|
| 206 | <p> |
|---|
| 207 | This creates a <tt>char_separator</tt> object, which can then be used |
|---|
| 208 | to create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> |
|---|
| 209 | or <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform |
|---|
| 210 | tokenizing. The <tt>dropped_delims</tt> and <tt>kept_delims</tt> are |
|---|
| 211 | strings of characters where each character is used as delimiter during |
|---|
| 212 | tokenizing. Whenever a delimiter is seen in the input sequence, the |
|---|
| 213 | current token is finished, and a new token begins. |
|---|
| 214 | |
|---|
| 215 | The delimiters in <tt>dropped_delims</tt> do not show up as tokens in |
|---|
| 216 | the output whereas the delimiters in <tt>kept_delims</tt> do show up |
|---|
| 217 | as tokens. If <tt>empty_tokens</tt> is <tt>drop_empty_tokens</tt>, |
|---|
| 218 | then empty tokens will not show up in the output. If |
|---|
| 219 | <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty tokens |
|---|
| 220 | will show up in the output. |
|---|
| 221 | </p> |
|---|
| 222 | |
|---|
| 223 | <hr> |
|---|
| 224 | |
|---|
| 225 | <pre> |
|---|
| 226 | explicit char_separator() |
|---|
| 227 | </pre> |
|---|
| 228 | <p> |
|---|
| 229 | The function <tt>std::isspace()</tt> is used to identify dropped |
|---|
| 230 | delimiters and <tt>std::ispunct()</tt> is used to identify kept |
|---|
| 231 | delimiters. In addition, empty tokens are dropped. |
|---|
| 232 | </p> |
|---|
| 233 | |
|---|
| 234 | <hr> |
|---|
| 235 | |
|---|
| 236 | <pre> |
|---|
| 237 | template <typename InputIterator, typename Token> |
|---|
| 238 | bool operator()(InputIterator& next, InputIterator end, Token& tok) |
|---|
| 239 | </pre> |
|---|
| 240 | |
|---|
| 241 | <p> |
|---|
| 242 | This function is called by the <a |
|---|
| 243 | href="token_iterator.htm"><tt>token_iterator</tt></a> to perform |
|---|
| 244 | tokenizing. The user typically does not call this function directly. |
|---|
| 245 | </p> |
|---|
| 246 | |
|---|
| 247 | |
|---|
| 248 | <hr> |
|---|
| 249 | |
|---|
| 250 | <p>© Copyright Jeremy Siek and John R. Bandela 2001-2002. Permission |
|---|
| 251 | to copy, use, modify, sell and distribute this document is granted |
|---|
| 252 | provided this copyright notice appears in all copies. This document is |
|---|
| 253 | provided "as is" without express or implied warranty, and |
|---|
| 254 | with no claim as to its suitability for any purpose.</p> |
|---|
| 255 | </body> |
|---|
| 256 | </html> |
|---|