Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/algorithm/string/doc/design.xml @ 47

Last change on this file since 47 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 11.7 KB
Line 
1<?xml version="1.0" encoding="utf-8"?>
2<!DOCTYPE library PUBLIC "-//Boost//DTD BoostBook XML V1.0//EN"
3"http://www.boost.org/tools/boostbook/dtd/boostbook.dtd">
4
5<!-- Copyright (c) 2002-2006 Pavol Droba.
6     Subject to the Boost Software License, Version 1.0.
7     (See accompanying file LICENSE-1.0 or  http://www.boost.org/LICENSE-1.0)
8-->
9
10<section id="string_algo.design" last-revision="$Date: 2006/08/16 07:10:48 $">
11    <title>Design Topics</title>
12
13    <using-namespace name="boost"/>
14    <using-namespace name="boost::algorithm"/>
15
16    <section id="string_algo.string">
17        <title>String Representation</title>
18
19        <para>
20            As the name suggest, this library works mainly with strings. However, in the context of this library,
21            a string is not restricted to any particular implementation (like <code>std::basic_string</code>),
22            rather it is a concept. This allows the algorithms in this library to be reused for any string type,
23            that satisfies the given requirements.
24        </para>
25        <para>
26            <emphasis role="bold">Definition:</emphasis> A string is a
27            <ulink url="../../libs/range/doc/range.html">range</ulink> of characters accessible in sequential
28            ordered fashion. Character is any value type with "cheap" copying and assignment.               
29        </para>
30        <para>
31            First requirement of string-type is that it must accessible using
32            <ulink url="../../libs/range/index.html">Boost.Range</ulink>. This facility allows to access
33            the elements inside the string in a uniform iterator-based fashion.
34            This is sufficient for our library
35        </para>
36        <para>           
37            Second requirement defines the way in which the characters are stored in the string. Algorithms in
38            this library work with an assumption that copying a character is cheaper then allocating extra
39            storage to cache results. This is a natural assumption for common character types. Algorithms will
40            work even if this requirement is not satisfied, however at the cost of performance degradation.
41        <para>
42        </para>
43            In addition some algorithms have additional requirements on the string-type. Particularly, it is required
44            that an algorithm can create a new string of the given type. In this case, it is required that
45            the type satisfies the sequence (Std &sect;23.1.1) requirements.
46        </para>
47        <para>
48            In the reference and also in the code, requirement on the string type is designated by the name of
49            template argument. <code>RangeT</code> means that the basic range requirements must hold.
50            <code>SequenceT</code> designates extended sequence requirements.
51        </para>
52    </section>
53   
54    <section id="string_algo.sequence_traits">
55        <title>Sequence Traits</title>
56
57        <para>
58            The major difference between <code>std::list</code> and <code>std::vector</code> is not in the interfaces
59            they provide, but rather in the inner details of the class and the way how it performs
60            various operations. The problem is that it is not possible to infer this difference from the
61            definitions of classes without some special mechanism.
62            However, some algorithms can run significantly faster with the knowledge of the properties
63            of a particular container.
64        </para>
65        <para>
66            Sequence traits allow one to specify additional properties of a sequence container (see Std.&sect;32.2).
67            These properties are then used by algorithms to select optimized handling for some operations.
68            The sequence traits are declared in the header
69            <headername>boost/algorithm/string/sequence_traits.hpp</headername>.
70        </para>
71
72        <para>
73            In the table C denotes a container and c is an object of C.
74        </para>
75        <table>
76            <title>Sequence Traits</title>
77            <tgroup cols="2" align="left">
78                <thead>
79                    <row>   
80                        <entry>Trait</entry>
81                        <entry>Description</entry>
82                    </row>
83                </thead>
84                <tbody>
85                    <row>
86                        <entry><classname>has_native_replace&lt;C&gt;</classname>::value</entry>
87                        <entry>Specifies that the sequence has std::string like replace method</entry>
88                    </row>
89                    <row>
90                        <entry><classname>has_stable_iterators&lt;C&gt;</classname>::value</entry>
91                        <entry>
92                            Specifies that the sequence has stable iterators. It means,
93                            that operations like <code>insert</code>/<code>erase</code>/<code>replace</code> 
94                            do not invalidate iterators.
95                        </entry>
96                    </row>
97                    <row>
98                        <entry><classname>has_const_time_insert&lt;C&gt;</classname>::value</entry>
99                        <entry>
100                            Specifies that the insert method of the sequence has
101                            constant time complexity.
102                        </entry>
103                    </row>
104                    <row>
105                        <entry><classname>has_const_time_erase&lt;C&gt;</classname>::value</entry>
106                        <entry>
107                            Specifies that the erase method of the sequence has constant time complexity
108                        </entry>
109                    </row>
110                    </tbody>
111            </tgroup>
112        </table>
113       
114        <para>
115            Current implementation contains specializations for std::list&lt;T&gt; and
116            std::basic_string&lt;T&gt; from the standard library and SGI's std::rope&lt;T&gt; and std::slist&lt;T&gt;.
117        </para>
118    </section>
119    <section id="string_algo.find">
120        <title>Find Algorithms</title>
121
122        <para>
123            Find algorithms have similar functionality to <code>std::search()</code> algorithm. They provide a different
124            interface which is more suitable for common string operations.
125            Instead of returning just the start of matching subsequence they return a range which is necessary
126            when the length of the matching subsequence is not known beforehand.
127            This feature also allows a partitioning of  the input sequence into three
128            parts: a prefix, a substring and a suffix.
129        </para>
130        <para>
131            Another difference is an addition of various searching methods besides find_first, including find_regex.
132        </para>
133        <para>
134            It the library, find algorithms are implemented in terms of
135            <link linkend="string_algo.finder_concept">Finders</link>. Finders are used also by other facilities
136            (replace,split).
137            For convenience, there are also function wrappers for these finders to simplify find operations.
138        </para>
139        <para>
140            Currently the library contains only naive implementation of find algorithms with complexity
141            O(n * m) where n is the size of the input sequence and m is the size of the search sequence.
142            There are algorithms with complexity O(n), but for smaller sequence a constant overhead is
143            rather big. For small m &lt;&lt; n (m by magnitude smaller than n) the current implementation
144            provides acceptable efficiency.
145            Even the C++ standard defines the required complexity for search algorithm as O(n * m).
146            It is possible that a future version of library will also contain algorithms with linear
147            complexity as an option
148        </para>
149    </section>
150    <section id="string_algo.replace">
151        <title>Replace Algorithms</title>
152
153        <para>
154            The implementation of replace algorithms follows the layered structure of the library. The
155            lower layer implements generic substitution of a range in the input sequence.
156            This layer takes a <link linkend="string_algo.finder_concept">Finder</link> object and a
157            <link linkend="string_algo.formatter_concept">Formatter</link> object as an input. These two
158            functors define what to replace and what to replace it with. The upper layer functions
159            are just wrapping calls to the lower layer. Finders are shared with the find and split facility.
160        </para>
161        <para>
162            As usual, the implementation of the lower layer is designed to work with a generic sequence while
163            taking advantage of specific features if possible
164            (by using <link linkend="string_algo.sequence_traits">Sequence traits</link>)
165        </para>         
166    </section>
167    <section id="string_algo.split">
168        <title>Find Iterators &amp; Split Algorithms</title>
169
170        <para>
171            Find iterators are a logical extension of the <link linkend="string_algo.find">find facility</link>.
172            Instead of searching for one match, the whole input can be iteratively searched for multiple matches.
173            The result of the search is then used to partition the input. It depends on the algorithms which parts
174            are returned as the result. They can be the matching parts (<classname>find_iterator</classname>) of the parts in
175            between (<classname>split_iterator</classname>).
176        </para>
177        <para>
178            In addition the split algorithms like <functionname>find_all()</functionname> and <functionname>split()</functionname>
179            can simplify the common operations. They use a find iterator to search the whole input and copy the
180            matches they found into the supplied container.
181        </para>
182    </section>
183    <section id="string_algo.exception">
184        <title>Exception Safety</title>
185
186        <para>
187            The library requires that all operations on types used as template
188            or function arguments provide the <emphasis>basic exception-safety guarantee</emphasis>.
189            In turn, all functions and algorithms in this library, except where stated
190            otherwise, will provide the <emphasis>basic exception-safety guarantee</emphasis>.
191            In other words:
192            The library maintains its invariants and does not leak resources in
193            the face of exceptions.  Some library operations give stronger
194            guarantees, which are documented on an individual basis.
195        </para>
196       
197        <para>
198            Some functions can provide the <emphasis>strong exception-safety guarantee</emphasis>.
199            That means that following statements are true:   
200            <itemizedlist>
201                <listitem>
202                    If an exception is thrown, there are no effects other than those
203                    of the function
204                </listitem>
205                <listitem>
206                    If an exception is thrown other than by the function, there are no effects
207                </listitem>
208            </itemizedlist>
209            This guarantee can be provided under the condition that the operations
210            on the types used for arguments for these functions either
211            provide the strong exception guarantee or do not alter the global state .
212         </para>
213        <para>
214            In the reference, under the term <emphasis>strong exception-safety guarantee</emphasis>, we mean the
215            guarantee as defined above.           
216        </para>
217        <para>
218            For more information about the exception safety topics, follow this
219            <ulink url="../../more/generic_exception_safety.html">link</ulink>
220        </para>       
221    </section>
222</section>
Note: See TracBrowser for help on using the repository browser.