Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_33_1/libs/tokenizer/char_separator.htm @ 25

Last change on this file since 25 was 12, checked in by landauf, 18 years ago
added boost
File size: 7.6 KB

Line
1	<html>
2
3	<head>
4	<meta http-equiv="Content-Type"
5	content="text/html; charset=iso-8859-1">
6	<meta name="GENERATOR" content="Microsoft FrontPage Express 2.0">
7	<title>Boost Char Separator</title>
8	<!--
9	-- Copyright © Jeremy Siek and John Bandela 2001-2002
10	--
11	-- Permission to use, copy, modify, distribute and sell this software
12	-- and its documentation for any purpose is hereby granted without fee,
13	-- provided that the above copyright notice appears in all copies and
14	-- that both that copyright notice and this permission notice appear
15	-- in supporting documentation. Jeremy Siek makes no
16	-- representations about the suitability of this software for any
17	-- purpose. It is provided "as is" without express or implied warranty.
18	-->
19	</head>
20
21	<body bgcolor="#FFFFFF" text="#000000" link="#0000EE"
22	vlink="#551A8B" alink="#FF0000">
23
24	<p><img src="../../boost.png" alt="C++ Boost" width="277"
25	height="86"> <br>
26	</p>
27
28	<h1>
29	char_separator<Char, Traits>
30	</h1>
31
32	<p>
33	The <tt>char_separator</tt> class breaks a sequence of characters into
34	tokens based on character delimiters much in the same way that
35	<tt>strtok()</tt> does (but without all the evils of non-reentrancy
36	and destruction of the input sequence).
37	</p>
38
39	<p>
40	The <tt>char_separator</tt> class is used in conjunction with the <a
41	href="token_iterator.htm"><tt>token_iterator</tt></a> or <a
42	href="tokenizer.htm"><tt>tokenizer</tt></a> to perform tokenizing.
43	</p>
44
45	<h2>Definitions</h2>
46
47	<p>
48	The <tt>strtok()</tt> function does not include matches with the
49	character delimiters in the output sequence of tokens. However,
50	sometimes it is useful to have the delimiters show up in the output
51	sequence, therefore <tt>char_separator</tt> provides this as an
52	option. We refer to delimiters that show up as output tokens as
53	<b><i>kept delimiters</i></b> and delimiters that do now show up as
54	output tokens as <b><i>dropped delimiters</i></b>.
55	</p>
56
57	<p>
58	When two delimiters appear next to each other in the input sequence,
59	there is the question of whether to output an <b><i>empty
60	token</i></b> or to skip ahead. The behaviour of <tt>strtok()</tt> is
61	to skip ahead. The <tt>char_separator</tt> class provides both
62	options.
63	</p>
64
65
66	<h2>Examples</h2>
67
68	<p>
69	This first examples shows how to use <tt>char_separator</tt> as a
70	replacement for the <tt>strtok()</tt> function. We've specified three
71	character delimiters, and they will not show up as output tokens. We
72	have not specified any kept delimiters, and by default any empty
73	tokens will be ignored.
74	</p>
75
76	<blockquote>
77	<pre>// char_sep_example_1.cpp
78	#include <iostream>
79	#include <boost/tokenizer.hpp>
80	#include <string>
81
82	int main()
83	{
84	std::string str = ";;Hello\|world\|\|-foo--bar;yow;baz\|";
85	typedef boost::tokenizer<boost::char_separator<char> >
86	tokenizer;
87	boost::char_separator<char> sep("-;\|");
88	tokenizer tokens(str, sep);
89	for (tokenizer::iterator tok_iter = tokens.begin();
90	tok_iter != tokens.end(); ++tok_iter)
91	std::cout << "<" << *tok_iter << "> ";
92	std::cout << "\n";
93	return EXIT_SUCCESS;
94	}
95	</pre>
96	</blockquote>
97	The output is:
98	<blockquote>
99	<pre>
100	<Hello> <world> <foo> <bar> <yow> <baz>
101	</pre>
102	</blockquote>
103
104
105	<p>
106	The next example shows tokenizing with two dropped delimiters '-' and
107	';' and a single kept delimiter '\|'. We also specify that empty tokens
108	should show up in the output when two delimiters are next to each
109	other.
110	</p>
111
112	<blockquote>
113	<pre>// char_sep_example_2.cpp
114	#include <iostream>
115	#include <boost/tokenizer.hpp>
116	#include <string>
117
118	int main()
119	{
120	std::string str = ";;Hello\|world\|\|-foo--bar;yow;baz\|";
121	typedef boost::tokenizer<boost::char_separator<char> >
122	tokenizer;
123	boost::char_separator<char> sep("-;", "\|", boost::keep_empty_tokens);
124	tokenizer tokens(str, sep);
125	for (tokenizer::iterator tok_iter = tokens.begin();
126	tok_iter != tokens.end(); ++tok_iter)
127	std::cout << "<" << *tok_iter << "> ";
128	std::cout << "\n";
129	return EXIT_SUCCESS;
130	}
131	</pre>
132	</blockquote>
133	The output is:
134	<blockquote>
135	<pre>
136	<> <> <Hello> <\|> <world> <\|> <> <\|> <> <foo> <> <bar> <yow> <baz> <\|> <>
137	</pre>
138	</blockquote>
139
140	<p>
141	The final example shows tokenizing on punctuation and whitespace
142	characters using the default constructor of the
143	<tt>char_separator</tt>.
144	</p>
145
146	<blockquote>
147	<pre>// char_sep_example_3.cpp
148	#include <iostream>
149	#include <boost/tokenizer.hpp>
150	#include <string>
151
152	int main()
153	{
154	std::string str = "This is, a test";
155	typedef boost::tokenizer<boost::char_separator<char> > Tok;
156	boost::char_separator<char> sep; // default constructed
157	Tok tok(str, sep);
158	for(Tok::iterator tok_iter = tok.begin(); tok_iter != tok.end(); ++tok_iter)
159	std::cout << "<" << *tok_iter << "> ";
160	std::cout << "\n";
161	return EXIT_SUCCESS;
162	}
163	</pre>
164	</blockquote>
165	The output is:
166	<blockquote>
167	<pre>
168	<This> <is> <,> <a> <test>
169	</pre>
170	</blockquote>
171
172	<h2>Template parameters</h2>
173
174	<P>
175	<table border>
176	<TR>
177	<th>Parameter</th><th>Description</th><th>Default</th>
178	</tr>
179
180	<TR><TD><TT>Char</TT></TD>
181	<TD>The type of elements within a token, typically <tt>char</tt>.</TD>
182	<TD> </TD>
183	</TR>
184
185	<TR><TD><TT>Traits</TT></TD>
186	<TD>The <tt>char_traits</tt> for the character type.</TD>
187	<TD><tt>char_traits<char></tt></TD>
188	</TR>
189
190	</table>
191
192	<h2>Model of</h2>
193
194	<a href="tokenizerfunction.htm">Tokenizer Function</a>
195
196
197	<h2>Members</h2>
198
199	<hr>
200	<pre>
201	explicit char_separator(const Char* dropped_delims,
202	const Char* kept_delims = "",
203	empty_token_policy empty_tokens = drop_empty_tokens)
204	</pre>
205
206	<p>
207	This creates a <tt>char_separator</tt> object, which can then be used
208	to create a <a href="token_iterator.htm"><tt>token_iterator</tt></a>
209	or <a href="tokenizer.htm"><tt>tokenizer</tt></a> to perform
210	tokenizing. The <tt>dropped_delims</tt> and <tt>kept_delims</tt> are
211	strings of characters where each character is used as delimiter during
212	tokenizing. Whenever a delimiter is seen in the input sequence, the
213	current token is finished, and a new token begins.
214
215	The delimiters in <tt>dropped_delims</tt> do not show up as tokens in
216	the output whereas the delimiters in <tt>kept_delims</tt> do show up
217	as tokens. If <tt>empty_tokens</tt> is <tt>drop_empty_tokens</tt>,
218	then empty tokens will not show up in the output. If
219	<tt>empty_tokens</tt> is <tt>keep_empty_tokens</tt> then empty tokens
220	will show up in the output.
221	</p>
222
223	<hr>
224
225	<pre>
226	explicit char_separator()
227	</pre>
228	<p>
229	The function <tt>std::isspace()</tt> is used to identify dropped
230	delimiters and <tt>std::ispunct()</tt> is used to identify kept
231	delimiters. In addition, empty tokens are dropped.
232	</p>
233
234	<hr>
235
236	<pre>
237	template <typename InputIterator, typename Token>
238	bool operator()(InputIterator& next, InputIterator end, Token& tok)
239	</pre>
240
241	<p>
242	This function is called by the <a
243	href="token_iterator.htm"><tt>token_iterator</tt></a> to perform
244	tokenizing. The user typically does not call this function directly.
245	</p>
246
247
248	<hr>
249
250	<p>© Copyright Jeremy Siek and John R. Bandela 2001-2002. Permission
251	to copy, use, modify, sell and distribute this document is granted
252	provided this copyright notice appears in all copies. This document is
253	provided "as is" without express or implied warranty, and
254	with no claim as to its suitability for any purpose.</p>
255	</body>
256	</html>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: