Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_34_1/libs/serialization/doc/rationale.html @ 33

Last change on this file since 33 was 29, checked in by landauf, 17 years ago

updated boost from 1_33_1 to 1_34_1

File size: 14.9 KB
RevLine 
[29]1<!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
2<html>
3<!--
4(C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com .
5Use, modification and distribution is subject to the Boost Software
6License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at
7http://www.boost.org/LICENSE_1_0.txt)
8-->
9<head>
10<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
11<link rel="stylesheet" type="text/css" href="../../../boost.css">
12<link rel="stylesheet" type="text/css" href="style.css">
13<title>Seriealization - Rationale</title>
14</head>
15<body link="#0000ff" vlink="#800080">
16<table border="0" cellpadding="7" cellspacing="0" width="100%" summary=
17    "header">
18  <tr> 
19    <td valign="top" width="300"> 
20      <h3><a href="http://www.boost.org"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3>
21    </td>
22    <td valign="top"> 
23      <h1 align="center">Serialization</h1>
24      <h2 align="center">Rationale</h2>
25    </td>
26  </tr>
27</table>
28<hr>
29<dl class="index">
30  <dt><a href="#serialization">The term "serialization" is preferred to "persistence"</a></dt>
31  <dt><a href="#archives">Archives are not streams</a></dt>
32  <dt><a href="#strings">Strings are treated specially in text archives</a></dt>
33  <dt><a href="#typeid"><code style="white-space: normal">typeid</code> information is not included in archives</a></dt>
34  <dt><a href="#trap">Compile time trap when saving a non-const value</a></dt>
35  <!--
36  <dt><a href="#footnotes">Footnotes</a></dt>
37  -->
38</dl>
39<h2><a name="serialization"></a>The term "serialization" is preferred to "persistence"</h2>
40<p>
41I found that persistence is often used to refer
42to something quite different. Examples are storage of class
43instances (objects) in database schema <a href="bibliography.html#4">[4]</a>
44This library will be useful in other contexts besides implementing persistence. The
45most obvious case is that of marshalling data for transmission to another system.
46<h2><a name="archives"></a>Archives are not streams</h2>
47<p>
48Archive classes are <strong>NOT</strong> derived from
49streams even though they have similar syntax rules.
50<ul>
51    <li>Archive classes are not kinds of streams though they
52    are implemented in terms of streams. This
53    distinction is addressed in <a href="bibliography.html#5">[5]</a> item number item 41 .
54    <li>We don't want users to insert/extract&nbsp;data
55    directly into/from &nbsp;the stream .&nbsp; This could
56    create a corrupted archive. Were archives
57    derived from streams, it would possible to
58    accidentally do this. So archive classes
59    only define operations which are safe and necessary.
60    <li>The usage of streams to implement the archive classes that
61    are included in the library is merely convenient - not necessary.
62    Library users may well want to define their own archive format
63    which doesn't use streams at all.
64</ul>
65<h2><a name="primitives"></a>Archive Members are Templates
66Rather than Virtual Functions</h2>
67The previous version of this library defined virtual functions for all
68primitive types.  These were overridden by each archive class.  There were
69two issues related to this:
70</ul>
71    <li>Some disliked virtual functions because of the added execution time
72    overhead.
73    <li>This caused implementation difficulties since the set of primitive
74    data types varies between platforms.  Attempting to define the correct
75    set of virtual functions, (think <code style="white-space: normal">long long</code>,
76    <code style="white-space: normal">__int64</code>,
77    etc.) resulted in messy and fragile code.  Replacing this with templates
78    and letting the compiler generate the code for the primitive types actually
79    used, resolved this problem.  Of course, the ripple effects of this design
80    change were significant, but in the end led to smaller, faster, more
81    maintainable code.
82</ul>
83<h2><a name="strings"></a><code style="white-space: normal">std::strings</code> are treated specially in text files</h2>
84<p>
85Treating strings as STL vectors would result in minimal code size. This was
86not done because:
87<ul>
88     <li>In text archives it is convenient to be able to view strings.  Our text
89     implementation stores single characters as integers.  Storing strings
90     as a vector of characters would waste space and render the archives
91     inconvenient for debugging.
92     <li>Stream implementations have special functions for <code style="white-space: normal">std::string</code>
93     and <code style="white-space: normal">std::wstring</code>.
94     Presumably they optimize appropriately.
95     <li>Other specializations of <code style="white-space: normal">std::basic_string</code> are in fact handled
96     as vectors of the element type.
97</ul>
98</p>
99<h2><a name="typeid"></a><code style="white-space: normal">typeid</code> information is not included in archives</h2>
100<p>
101I originally thought that I had to save the name of the class specified by <code style="white-space: normal">std::type_of::name()</code>
102in the archive. This created difficulties as <code style="white-space: normal">std::type_of::name()</code> is not portable and
103not guaranteed to return the class name. This makes it almost useless for implementing
104archive portability.  This topic is explained in much more detail in
105<a href="bibliography.html#6">[7] page 206</a>. It turned out that it was not necessary.
106As long as objects are loaded in the exact sequence as they were saved, the type
107is available when loading.  The only exception to this is the case of polymorphic
108pointers never before loaded/saved.  This is addressed with the <code style="white-space: normal">register_type()</code>
109and/or <code style="white-space: normal">export</code> facilities described in the reference. 
110In effect, <code style="white-space: normal">export</code> generates a portable equivalent to
111<code style="white-space: normal">typeid</code> information.
112
113<h2><a name="trap"></a>Compile time trap when saving a non-const value</h2>
114</p>
115The following code will fail to compile.  The failure will occur on a line with a
116<code style="white-space: normal">BOOST_STATIC_ASSERT</code>
117Here, we refer to this as a compile time trap.
118<code style="white-space: normal"><pre>
119T t;
120ar &lt;&lt; t;
121</pre></code>
122
123unless the tracking_level serialization trait is set to "track_never". The following
124will compile without problem:
125
126<code style="white-space: normal"><pre>
127const T t
128ar &lt;&lt; t;
129</pre></code>
130
131Likewise, the following code will trap at compile time:
132
133<code style="white-space: normal"><pre>
134T * t;
135ar >> t;
136</pre></code>
137
138if the tracking_level serialization trait is set to "track_never".
139<p>
140
141This behavior has been contraversial and may be revised in the future. The criticism
142is that it will flag code that is in fact correct and force users to insert
143<code style="white-space: normal">const_cast</code>. My view is that:
144<ul>
145  <li>The trap is useful in detecting a certain class of programming errors.
146  <li>Such errors would otherwise be difficult to detect.
147  <li>The incovenience caused by including this trap is very small in relation
148  to its benefits.
149</ul>
150
151The following case illustrates my position.  It was originally used as an example in the
152mailing list by Peter Dimov.
153
154<code style="white-space: normal"><pre>
155class construct_from
156{
157    ...
158};
159
160void main(){
161    ...
162    Y y;
163    construct_from x(y);
164    ar &lt;&lt; x;
165}
166</pre></code>
167
168Suppose that there is no trap as described above.
169<ol>
170  <li>this example compiles and executes fine. No tracking is done because
171  construct_from has never been serialized through a pointer. Now some time
172  later, the next programmer(2) comes along and makes an enhancement. He
173  wants the archive to be sort of a log.
174
175<code style="white-space: normal"><pre>
176void main(){
177    ...
178    Y y;
179    construct_from x(y);
180    ar &lt;&lt; x;
181    ...
182    x.f(); // change x in some way
183   ...
184    ar &lt;&lt; x
185}
186</pre></code>
187  <p>
188  Again no problem. He gets two different of copies in the archive, each one is different.
189  That is he gets exactly what he expects and is naturally delighted.
190  <p>
191  <li>Now sometime later, a third programmer(3) sees construct_from and says -
192  oh cool, just what I need. He writes a function in a totally disjoint
193  module. (The project is so big, he doesn't even realize the existence of
194  the original usage) and writes something like:
195
196<code style="white-space: normal"><pre>
197class K {
198    shared_ptr &lt;construct_from&gt; z;
199    template &lt;class Archive&gt; 
200    void serialize(Archive & ar, const unsigned version){
201        ar &lt;&lt; z;
202    }
203};
204</pre></code>
205
206  <p>
207  He builds and runs the program and tests his new functionality. It works
208  great and he's delighted.
209  <p>
210  <li>Things continue smoothly as before.  A month goes by and it's
211  discovered that when loading the archives made in the last month (reading the
212  log). Things don't work. The second log entry is always the same as the
213  first. After a series of very long and increasingly acrimonius email exchanges,
214its discovered
215  that programmer (3) accidently broke programmer(2)'s code .This is because by
216  serializing via a pointer, the "log" object now being tracked.  This is because
217  the default tracking behavior is "track_selectively".  This means that class
218  instances are tracked only if they are serialized through pointers anywhere in
219  the program. Now multiple saves from the same address result in only the first one
220  being written to the archive. Subsequent saves only add the address - even though the
221  data might have been changed.  When it comes time to load the data, all instances of the log record show the same data.
222  In this way, the behavior of a functioning piece of code is changed due the side
223  effect of a change in an otherwise disjoint module.
224  Worse yet, the data has been lost and cannot not be now recovered from the archives.
225  People are really upset and disappointed with boost (at least the serialization system).
226  <p>
227  <li>
228  After a lot of investigation, it's discovered what the source of the problem
229  and class construct_from is marked "track_never" by including:
230<code style="white-space: normal"><pre>
231BOOST_SERIALIZATION_TRACKING(construct_from, track_never)
232</pre></code>
233  <li>Now everything works again. Or - so it seems.
234  <p>
235  <li><code style="white-space: normal">shared_ptr&lt;construct_from&gt;</code>
236is not going to have a single raw pointer shared amongst the instances. Each loaded
237<code style="white-space: normal">shared_ptr&lt;construct_from&gt;</code> is going to
238have its own distince raw pointer. This will break
239<code style="white-space: normal">shared_ptr</code> and cause a memory leak.  Again,
240The cause of this problem is very far removed from the point of discovery.  It would
241well be that the problem is not even discovered until after the archives are loaded.
242Now we not only have difficult to find and fix program bug, but we have a bunch of
243invalid archives and lost data.
244</ol>
245
246Now consider what happens when the trap is enabled:.
247
248<ol>
249  <p>
250  <li>Right away, the program traps at
251<code style="white-space: normal"><pre>
252ar &lt;&lt; x;
253</pre></code>
254  <p>
255  <li>The programmer curses (another %^&*&* hoop to jump through). If he's in a
256  hurry (and who isn't) and would prefer not to <code style="white-space: normal">const_cast</code>
257  - because it looks bad.  So he'll just make the following change an move on.
258<code style="white-space: normal"><pre>
259Y y;
260const construct_from x(y);
261ar &lt;&lt; x;
262</pre></code>
263  <p>
264  Things work fine and he moves on.
265  <p>
266  <li>Now programer (2) wants to make his change - and again another
267  annoying const issue;
268<code style="white-space: normal"><pre>
269Y y;
270const construct_from x(y);
271...
272x.f(); // change x in some way ; compile error f() is not const
273...
274ar &lt;&lt; x
275</pre></code>
276  <p>
277  He's mildly annoyed now he tries the following:
278  <ul>
279    <li>He considers making f() a const - but presumable that shifts the const
280    error to somewhere else. And his doesn't want to fiddle with "his" code to
281    work around a quirk in the serializaition system
282    <p>
283    <li>He removes the <code style="white-space: normal">const</code>
284    from <code style="white-space: normal">const construct_from</code> above - damn now he
285    gets the trap. If he looks at the comment code where the
286    <code style="white-space: normal">BOOST_STATIC_ASSERT</code>
287    occurs, he'll do one of two things
288    <ol>
289      <p>
290      <li>This is just crazy. Its making my life needlessly difficult and flagging
291      code that is just fine. So I'll fix this with a <code style="white-space: normal">const_cast</code>
292      and fire off a complaint to the list and mabe they will fix it.
293      In this case, the story branches off to the previous scenario.
294      <p>
295      <li>Oh, this trap is suggesting that the default serialization isn't really
296      what I want. Of course in this particular program it doesn't matter. But
297      then the code in the trap can't really evaluate code in other modules (which
298      might not even be written yet). OK, I'll at the following to my
299      construct_from.hpp to solve the problem.
300<code style="white-space: normal"><pre>
301BOOST_SERIALIZATION_TRACKING(construct_from, track_never)
302</pre></code>
303    </ol>
304  </ul>
305  <p>
306  <li>Now programmer (3) comes along and make his change. The behavior of the
307  original (and distant module) remains unchanged because the
308  <code style="white-space: normal">construct_from</code> trait has been set to
309  "track_never" so he should always get copies and the log should be what we expect.
310  <p>
311  <li>But now he gets another trap - trying to save an object of a
312  class marked "track_never" through a pointer. So he goes back to
313  construct_from.hpp and comments out the
314  <code style="white-space: normal">BOOST_SERIALIZATION_TRACKING</code> that
315  was inserted. Now the second trap is avoided, But damn - the first trap is
316  popping up again. Eventually, after some code restructuring, the differing
317  requirements of serializating <code style="white-space: normal">construct_from</code>
318  are reconciled.
319</ol>
320Note that in this second scenario
321<ul>
322  <li>all errors are trapped at compile time.
323  <li>no invalid archives are created.
324  <li>no data is lost.
325  <li>no runtime errors occur.
326</ul>
327
328It's true that these traps may sometimes flag code that is currently correct and
329that this may be annoying to some programmers.  However, this example illustrates
330my view that these traps are useful and that any such annoyance is small price to
331pay to avoid particularly vexing programming errors.
332
333<!--
334<h2><a name="footnotes"></a>Footnotes</h2>
335<dl>
336  <dt><a name="footnote1" class="footnote">(1)</a> {{text}}</dt>
337  <dt><a name="footnote2" class="footnote">(2)</a> {{text}}</dt>
338</dl>
339-->
340<hr>
341<p><i>&copy; Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004.
342Distributed under the Boost Software License, Version 1.0. (See
343accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
344</i></p>
345</body>
346</html>
Note: See TracBrowser for help on using the repository browser.