Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_34_1/libs/tokenizer/char_separator.htm @ 29

Last change on this file since 29 was 29, checked in by landauf, 17 years ago
updated boost from 1_33_1 to 1_34_1
File size: 7.7 KB

Line
1	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
2
3	<html>
4	<head>
5	<meta http-equiv="Content-Language" content="en-us">
6	<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
7	<meta name="GENERATOR" content="Microsoft FrontPage 6.0">
8	<meta name="ProgId" content="FrontPage.Editor.Document">
9
10	<title>Boost Char Separator</title>
11	</head>
12
13	<body bgcolor="#FFFFFF" text="#000000" link="#0000EE" vlink="#551A8B" alink=
14	"#FF0000">
15	<p><img src="../../boost.png" alt="C++ Boost" width="277" height=
16	"86"><br></p>
17
18	<h1>char_separator<Char, Traits></h1>
19
20	<p>The <tt>char_separator</tt> class breaks a sequence of characters into
21	tokens based on character delimiters much in the same way that
22	<tt>strtok()</tt> does (but without all the evils of non-reentrancy and
23	destruction of the input sequence).</p>
24
25	<p>The <tt>char_separator</tt> class is used in conjunction with the
26	<a href="token_iterator.htm"><tt>token_iterator</tt></a> or <a href=
27	"tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.</p>
28
29	<h2>Definitions</h2>
30
31	<p>The <tt>strtok()</tt> function does not include matches with the
32	character delimiters in the output sequence of tokens. However, sometimes
33	it is useful to have the delimiters show up in the output sequence,
34	therefore <tt>char_separator</tt> provides this as an option. We refer to
35	delimiters that show up as output tokens as <b><i>kept delimiters</i></b>
36	and delimiters that do now show up as output tokens as <b><i>dropped
37	delimiters</i></b>.</p>
38
39	<p>When two delimiters appear next to each other in the input sequence,
40	there is the question of whether to output an <b><i>empty token</i></b> or
41	to skip ahead. The behaviour of <tt>strtok()</tt> is to skip ahead. The
42	<tt>char_separator</tt> class provides both options.</p>
43
44	<h2>Examples</h2>
45
46	<p>This first examples shows how to use <tt>char_separator</tt> as a
47	replacement for the <tt>strtok()</tt> function. We've specified three
48	character delimiters, and they will not show up as output tokens. We have
49	not specified any kept delimiters, and by default any empty tokens will be
50	ignored.</p>
51
52	<blockquote>
53	<pre>
54	// char_sep_example_1.cpp
55	#include <iostream>
56	#include <boost/tokenizer.hpp>
57	#include <string>
58
59	int main()
60	{
61	std::string str = ";;Hello\|world\|\|-foo--bar;yow;baz\|";
62	typedef boost::tokenizer<boost::char_separator<char> >
63	tokenizer;
64	boost::char_separator<char> sep("-;\|");
65	tokenizer tokens(str, sep);
66	for (tokenizer::iterator tok_iter = tokens.begin();
67	tok_iter != tokens.end(); ++tok_iter)
68	std::cout << "<" << *tok_iter << "> ";
69	std::cout << "\n";
70	return EXIT_SUCCESS;
71	}
72	</pre>
73	</blockquote>The output is:
74
75	<blockquote>
76	<pre>
77	<Hello> <world> <foo> <bar> <yow> <baz>
78	</pre>
79	</blockquote>
80
81	<p>The next example shows tokenizing with two dropped delimiters '-' and
82	';' and a single kept delimiter '\|'. We also specify that empty tokens
83	should show up in the output when two delimiters are next to each
84	other.</p>
85
86	<blockquote>
87	<pre>
88	// char_sep_example_2.cpp
89	#include <iostream>
90	#include <boost/tokenizer.hpp>
91	#include <string>
92
93	int main()
94	{
95	std::string str = ";;Hello\|world\|\|-foo--bar;yow;baz\|";
96	typedef boost::tokenizer<boost::char_separator<char> >
97	tokenizer;
98	boost::char_separator<char> sep("-;", "\|", boost::keep_empty_tokens);
99	tokenizer tokens(str, sep);
100	for (tokenizer::iterator tok_iter = tokens.begin();
101	tok_iter != tokens.end(); ++tok_iter)
102	std::cout << "<" << *tok_iter << "> ";
103	std::cout << "\n";
104	return EXIT_SUCCESS;
105	}
106	</pre>
107	</blockquote>The output is:
108
109	<blockquote>
110	<pre>
111	<> <> <Hello> <\|> <world> <\|> <> <\|> <> <foo> <> <bar> <yow> <baz> <\|> <>
112	</pre>
113	</blockquote>
114
115	<p>The final example shows tokenizing on punctuation and whitespace
116	characters using the default constructor of the
117	<tt>char_separator</tt>.</p>
118
119	<blockquote>
120	<pre>
121	// char_sep_example_3.cpp
122	#include <iostream>
123	#include <boost/tokenizer.hpp>
124	#include <string>
125
126	int main()
127	{
128	std::string str = "This is, a test";
129	typedef boost::tokenizer<boost::char_separator<char> > Tok;
130	boost::char_separator<char> sep; // default constructed
131	Tok tok(str, sep);
132	for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
133	std::cout << "<" << *tok_iter << "> ";
134	std::cout << "\n";
135	return EXIT_SUCCESS;
136	}
137	</pre>
138	</blockquote>The output is:
139
140	<blockquote>
141	<pre>
142	<This> <is> <,> <a> <test>
143	</pre>
144	</blockquote>
145
146	<h2>Template parameters</h2>
147
148	<table border summary="">
149	<tr>
150	<th>Parameter</th>
151
152	<th>Description</th>
153
154	<th>Default</th>
155	</tr>
156
157	<tr>
158	<td><tt>Char</tt></td>
159
160	<td>The type of elements within a token, typically <tt>char</tt>.</td>
161
162	<td> </td>
163	</tr>
164
165	<tr>
166	<td><tt>Traits</tt></td>
167
168	<td>The <tt>char_traits</tt> for the character type.</td>
169
170	<td><tt>char_traits<char></tt></td>
171	</tr>
172	</table>
173
174	<h2>Model of</h2><a href="tokenizerfunction.htm">Tokenizer Function</a>
175
176	<h2>Members</h2>
177	<hr>
178	<pre>
179	explicit char_separator(const Char* dropped_delims,
180	const Char* kept_delims = "",
181	empty_token_policy empty_tokens = drop_empty_tokens)
182	</pre>
183
184	<p>This creates a <tt>char_separator</tt> object, which can then be used to
185	create a <a href="token_iterator.htm"><tt>token_iterator</tt></a> or
186	<a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing. The
187	<tt>dropped_delims</tt> and <tt>kept_delims</tt> are strings of characters
188	where each character is used as delimiter during tokenizing. Whenever a
189	delimiter is seen in the input sequence, the current token is finished, and
190	a new token begins. The delimiters in <tt>dropped_delims</tt> do not show
191	up as tokens in the output whereas the delimiters in <tt>kept_delims</tt>
192	do show up as tokens. If <tt>empty_tokens</tt> is
193	<tt>drop_empty_tokens</tt>, then empty tokens will not show up in the
194	output. If <tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty
195	tokens will show up in the output.</p>
196	<hr>
197	<pre>
198	explicit char_separator()
199	</pre>
200
201	<p>The function <tt>std::isspace()</tt> is used to identify dropped
202	delimiters and <tt>std::ispunct()</tt> is used to identify kept delimiters.
203	In addition, empty tokens are dropped.</p>
204	<hr>
205	<pre>
206	template <typename InputIterator, typename Token>
207	bool operator()(InputIterator& next, InputIterator end, Token& tok)
208	</pre>
209
210	<p>This function is called by the <a href=
211	"token_iterator.htm"><tt>token_iterator</tt></a> to perform tokenizing. The
212	user typically does not call this function directly.</p>
213	<hr>
214
215	<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
216	"http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01 Transitional"
217	height="31" width="88"></a></p>
218
219	<p>Revised
220	<!--webbot bot="Timestamp" s-type="EDITED" s-format="%d %B, %Y" startspan -->25
221	December, 2006<!--webbot bot="Timestamp" endspan i-checksum="38518" --></p>
222
223	<p><i>Copyright © 2001-2002 Jeremy Siek and John R. Bandela</i></p>
224
225	<p><i>Distributed under the Boost Software License, Version 1.0. (See
226	accompanying file <a href="../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> or
227	copy at <a href=
228	"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
229	</body>
230	</html>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: