1 | <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
---|
2 | <html> |
---|
3 | <!-- |
---|
4 | (C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com . |
---|
5 | Use, modification and distribution is subject to the Boost Software |
---|
6 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
7 | http://www.boost.org/LICENSE_1_0.txt) |
---|
8 | --> |
---|
9 | <head> |
---|
10 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
---|
11 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
12 | <link rel="stylesheet" type="text/css" href="style.css"> |
---|
13 | <title>Seriealization - Rationale</title> |
---|
14 | </head> |
---|
15 | <body link="#0000ff" vlink="#800080"> |
---|
16 | <table border="0" cellpadding="7" cellspacing="0" width="100%" summary= |
---|
17 | "header"> |
---|
18 | <tr> |
---|
19 | <td valign="top" width="300"> |
---|
20 | <h3><a href="http://www.boost.org"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
21 | </td> |
---|
22 | <td valign="top"> |
---|
23 | <h1 align="center">Serialization</h1> |
---|
24 | <h2 align="center">Rationale</h2> |
---|
25 | </td> |
---|
26 | </tr> |
---|
27 | </table> |
---|
28 | <hr> |
---|
29 | <dl class="index"> |
---|
30 | <dt><a href="#serialization">The term "serialization" is preferred to "persistence"</a></dt> |
---|
31 | <dt><a href="#archives">Archives are not streams</a></dt> |
---|
32 | <dt><a href="#strings">Strings are treated specially in text archives</a></dt> |
---|
33 | <dt><a href="#typeid"><code style="white-space: normal">typeid</code> information is not included in archives</a></dt> |
---|
34 | <dt><a href="#trap">Compile time trap when saving a non-const value</a></dt> |
---|
35 | <!-- |
---|
36 | <dt><a href="#footnotes">Footnotes</a></dt> |
---|
37 | --> |
---|
38 | </dl> |
---|
39 | <h2><a name="serialization"></a>The term "serialization" is preferred to "persistence"</h2> |
---|
40 | <p> |
---|
41 | I found that persistence is often used to refer |
---|
42 | to something quite different. Examples are storage of class |
---|
43 | instances (objects) in database schema <a href="bibliography.html#4">[4]</a> |
---|
44 | This library will be useful in other contexts besides implementing persistence. The |
---|
45 | most obvious case is that of marshalling data for transmission to another system. |
---|
46 | <h2><a name="archives"></a>Archives are not streams</h2> |
---|
47 | <p> |
---|
48 | Archive classes are <strong>NOT</strong> derived from |
---|
49 | streams even though they have similar syntax rules. |
---|
50 | <ul> |
---|
51 | <li>Archive classes are not kinds of streams though they |
---|
52 | are implemented in terms of streams. This |
---|
53 | distinction is addressed in <a href="bibliography.html#5">[5]</a> item number item 41 . |
---|
54 | <li>We don't want users to insert/extract data |
---|
55 | directly into/from the stream . This could |
---|
56 | create a corrupted archive. Were archives |
---|
57 | derived from streams, it would possible to |
---|
58 | accidentally do this. So archive classes |
---|
59 | only define operations which are safe and necessary. |
---|
60 | <li>The usage of streams to implement the archive classes that |
---|
61 | are included in the library is merely convenient - not necessary. |
---|
62 | Library users may well want to define their own archive format |
---|
63 | which doesn't use streams at all. |
---|
64 | </ul> |
---|
65 | <h2><a name="primitives"></a>Archive Members are Templates |
---|
66 | Rather than Virtual Functions</h2> |
---|
67 | The previous version of this library defined virtual functions for all |
---|
68 | primitive types. These were overridden by each archive class. There were |
---|
69 | two issues related to this: |
---|
70 | </ul> |
---|
71 | <li>Some disliked virtual functions because of the added execution time |
---|
72 | overhead. |
---|
73 | <li>This caused implementation difficulties since the set of primitive |
---|
74 | data types varies between platforms. Attempting to define the correct |
---|
75 | set of virtual functions, (think <code style="white-space: normal">long long</code>, |
---|
76 | <code style="white-space: normal">__int64</code>, |
---|
77 | etc.) resulted in messy and fragile code. Replacing this with templates |
---|
78 | and letting the compiler generate the code for the primitive types actually |
---|
79 | used, resolved this problem. Of course, the ripple effects of this design |
---|
80 | change were significant, but in the end led to smaller, faster, more |
---|
81 | maintainable code. |
---|
82 | </ul> |
---|
83 | <h2><a name="strings"></a><code style="white-space: normal">std::strings</code> are treated specially in text files</h2> |
---|
84 | <p> |
---|
85 | Treating strings as STL vectors would result in minimal code size. This was |
---|
86 | not done because: |
---|
87 | <ul> |
---|
88 | <li>In text archives it is convenient to be able to view strings. Our text |
---|
89 | implementation stores single characters as integers. Storing strings |
---|
90 | as a vector of characters would waste space and render the archives |
---|
91 | inconvenient for debugging. |
---|
92 | <li>Stream implementations have special functions for <code style="white-space: normal">std::string</code> |
---|
93 | and <code style="white-space: normal">std::wstring</code>. |
---|
94 | Presumably they optimize appropriately. |
---|
95 | <li>Other specializations of <code style="white-space: normal">std::basic_string</code> are in fact handled |
---|
96 | as vectors of the element type. |
---|
97 | </ul> |
---|
98 | </p> |
---|
99 | <h2><a name="typeid"></a><code style="white-space: normal">typeid</code> information is not included in archives</h2> |
---|
100 | <p> |
---|
101 | I originally thought that I had to save the name of the class specified by <code style="white-space: normal">std::type_of::name()</code> |
---|
102 | in the archive. This created difficulties as <code style="white-space: normal">std::type_of::name()</code> is not portable and |
---|
103 | not guaranteed to return the class name. This makes it almost useless for implementing |
---|
104 | archive portability. This topic is explained in much more detail in |
---|
105 | <a href="bibliography.html#6">[7] page 206</a>. It turned out that it was not necessary. |
---|
106 | As long as objects are loaded in the exact sequence as they were saved, the type |
---|
107 | is available when loading. The only exception to this is the case of polymorphic |
---|
108 | pointers never before loaded/saved. This is addressed with the <code style="white-space: normal">register_type()</code> |
---|
109 | and/or <code style="white-space: normal">export</code> facilities described in the reference. |
---|
110 | In effect, <code style="white-space: normal">export</code> generates a portable equivalent to |
---|
111 | <code style="white-space: normal">typeid</code> information. |
---|
112 | |
---|
113 | <h2><a name="trap"></a>Compile time trap when saving a non-const value</h2> |
---|
114 | </p> |
---|
115 | The following code will fail to compile. The failure will occur on a line with a |
---|
116 | <code style="white-space: normal">BOOST_STATIC_ASSERT</code>. |
---|
117 | Here, we refer to this as a compile time trap. |
---|
118 | <code style="white-space: normal"><pre> |
---|
119 | T t; |
---|
120 | ar << t; |
---|
121 | </pre></code> |
---|
122 | |
---|
123 | unless the tracking_level serialization trait is set to "track_never". The following |
---|
124 | will compile without problem: |
---|
125 | |
---|
126 | <code style="white-space: normal"><pre> |
---|
127 | const T t |
---|
128 | ar << t; |
---|
129 | </pre></code> |
---|
130 | |
---|
131 | Likewise, the following code will trap at compile time: |
---|
132 | |
---|
133 | <code style="white-space: normal"><pre> |
---|
134 | T * t; |
---|
135 | ar >> t; |
---|
136 | </pre></code> |
---|
137 | |
---|
138 | if the tracking_level serialization trait is set to "track_never". |
---|
139 | <p> |
---|
140 | |
---|
141 | This behavior has been contraversial and may be revised in the future. The criticism |
---|
142 | is that it will flag code that is in fact correct and force users to insert |
---|
143 | <code style="white-space: normal">const_cast</code>. My view is that: |
---|
144 | <ul> |
---|
145 | <li>The trap is useful in detecting a certain class of programming errors. |
---|
146 | <li>Such errors would otherwise be difficult to detect. |
---|
147 | <li>The incovenience caused by including this trap is very small in relation |
---|
148 | to its benefits. |
---|
149 | </ul> |
---|
150 | |
---|
151 | The following case illustrates my position. It was originally used as an example in the |
---|
152 | mailing list by Peter Dimov. |
---|
153 | |
---|
154 | <code style="white-space: normal"><pre> |
---|
155 | class construct_from |
---|
156 | { |
---|
157 | ... |
---|
158 | }; |
---|
159 | |
---|
160 | void main(){ |
---|
161 | ... |
---|
162 | Y y; |
---|
163 | construct_from x(y); |
---|
164 | ar << x; |
---|
165 | } |
---|
166 | </pre></code> |
---|
167 | |
---|
168 | Suppose that there is no trap as described above. |
---|
169 | <ol> |
---|
170 | <li>this example compiles and executes fine. No tracking is done because |
---|
171 | construct_from has never been serialized through a pointer. Now some time |
---|
172 | later, the next programmer(2) comes along and makes an enhancement. He |
---|
173 | wants the archive to be sort of a log. |
---|
174 | |
---|
175 | <code style="white-space: normal"><pre> |
---|
176 | void main(){ |
---|
177 | ... |
---|
178 | Y y; |
---|
179 | construct_from x(y); |
---|
180 | ar << x; |
---|
181 | ... |
---|
182 | x.f(); // change x in some way |
---|
183 | ... |
---|
184 | ar << x |
---|
185 | } |
---|
186 | </pre></code> |
---|
187 | <p> |
---|
188 | Again no problem. He gets two different of copies in the archive, each one is different. |
---|
189 | That is he gets exactly what he expects and is naturally delighted. |
---|
190 | <p> |
---|
191 | <li>Now sometime later, a third programmer(3) sees construct_from and says - |
---|
192 | oh cool, just what I need. He writes a function in a totally disjoint |
---|
193 | module. (The project is so big, he doesn't even realize the existence of |
---|
194 | the original usage) and writes something like: |
---|
195 | |
---|
196 | <code style="white-space: normal"><pre> |
---|
197 | class K { |
---|
198 | shared_ptr <construct_from> z; |
---|
199 | template <class Archive> |
---|
200 | void serialize(Archive & ar, const unsigned version){ |
---|
201 | ar << z; |
---|
202 | } |
---|
203 | }; |
---|
204 | </pre></code> |
---|
205 | |
---|
206 | <p> |
---|
207 | He builds and runs the program and tests his new functionality. It works |
---|
208 | great and he's delighted. |
---|
209 | <p> |
---|
210 | <li>Things continue smoothly as before. A month goes by and it's |
---|
211 | discovered that when loading the archives made in the last month (reading the |
---|
212 | log). Things don't work. The second log entry is always the same as the |
---|
213 | first. After a series of very long and increasingly acrimonius email exchanges, |
---|
214 | its discovered |
---|
215 | that programmer (3) accidently broke programmer(2)'s code .This is because by |
---|
216 | serializing via a pointer, the "log" object now being tracked. This is because |
---|
217 | the default tracking behavior is "track_selectively". This means that class |
---|
218 | instances are tracked only if they are serialized through pointers anywhere in |
---|
219 | the program. Now multiple saves from the same address result in only the first one |
---|
220 | being written to the archive. Subsequent saves only add the address - even though the |
---|
221 | data might have been changed. When it comes time to load the data, all instances of the log record show the same data. |
---|
222 | In this way, the behavior of a functioning piece of code is changed due the side |
---|
223 | effect of a change in an otherwise disjoint module. |
---|
224 | Worse yet, the data has been lost and cannot not be now recovered from the archives. |
---|
225 | People are really upset and disappointed with boost (at least the serialization system). |
---|
226 | <p> |
---|
227 | <li> |
---|
228 | After a lot of investigation, it's discovered what the source of the problem |
---|
229 | and class construct_from is marked "track_never" by including: |
---|
230 | <code style="white-space: normal"><pre> |
---|
231 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never) |
---|
232 | </pre></code> |
---|
233 | <li>Now everything works again. Or - so it seems. |
---|
234 | <p> |
---|
235 | <li><code style="white-space: normal">shared_ptr<construct_from></code> |
---|
236 | is not going to have a single raw pointer shared amongst the instances. Each loaded |
---|
237 | <code style="white-space: normal">shared_ptr<construct_from></code> is going to |
---|
238 | have its own distince raw pointer. This will break |
---|
239 | <code style="white-space: normal">shared_ptr</code> and cause a memory leak. Again, |
---|
240 | The cause of this problem is very far removed from the point of discovery. It would |
---|
241 | well be that the problem is not even discovered until after the archives are loaded. |
---|
242 | Now we not only have difficult to find and fix program bug, but we have a bunch of |
---|
243 | invalid archives and lost data. |
---|
244 | </ol> |
---|
245 | |
---|
246 | Now consider what happens when the trap is enabled:. |
---|
247 | |
---|
248 | <ol> |
---|
249 | <p> |
---|
250 | <li>Right away, the program traps at |
---|
251 | <code style="white-space: normal"><pre> |
---|
252 | ar << x; |
---|
253 | </pre></code> |
---|
254 | <p> |
---|
255 | <li>The programmer curses (another %^&*&* hoop to jump through). If he's in a |
---|
256 | hurry (and who isn't) and would prefer not to <code style="white-space: normal">const_cast</code> |
---|
257 | - because it looks bad. So he'll just make the following change an move on. |
---|
258 | <code style="white-space: normal"><pre> |
---|
259 | Y y; |
---|
260 | const construct_from x(y); |
---|
261 | ar << x; |
---|
262 | </pre></code> |
---|
263 | <p> |
---|
264 | Things work fine and he moves on. |
---|
265 | <p> |
---|
266 | <li>Now programer (2) wants to make his change - and again another |
---|
267 | annoying const issue; |
---|
268 | <code style="white-space: normal"><pre> |
---|
269 | Y y; |
---|
270 | const construct_from x(y); |
---|
271 | ... |
---|
272 | x.f(); // change x in some way ; compile error f() is not const |
---|
273 | ... |
---|
274 | ar << x |
---|
275 | </pre></code> |
---|
276 | <p> |
---|
277 | He's mildly annoyed now he tries the following: |
---|
278 | <ul> |
---|
279 | <li>He considers making f() a const - but presumable that shifts the const |
---|
280 | error to somewhere else. And his doesn't want to fiddle with "his" code to |
---|
281 | work around a quirk in the serializaition system |
---|
282 | <p> |
---|
283 | <li>He removes the <code style="white-space: normal">const</code> |
---|
284 | from <code style="white-space: normal">const construct_from</code> above - damn now he |
---|
285 | gets the trap. If he looks at the comment code where the |
---|
286 | <code style="white-space: normal">BOOST_STATIC_ASSERT</code> |
---|
287 | occurs, he'll do one of two things |
---|
288 | <ol> |
---|
289 | <p> |
---|
290 | <li>This is just crazy. Its making my life needlessly difficult and flagging |
---|
291 | code that is just fine. So I'll fix this with a <code style="white-space: normal">const_cast</code> |
---|
292 | and fire off a complaint to the list and mabe they will fix it. |
---|
293 | In this case, the story branches off to the previous scenario. |
---|
294 | <p> |
---|
295 | <li>Oh, this trap is suggesting that the default serialization isn't really |
---|
296 | what I want. Of course in this particular program it doesn't matter. But |
---|
297 | then the code in the trap can't really evaluate code in other modules (which |
---|
298 | might not even be written yet). OK, I'll at the following to my |
---|
299 | construct_from.hpp to solve the problem. |
---|
300 | <code style="white-space: normal"><pre> |
---|
301 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never) |
---|
302 | </pre></code> |
---|
303 | </ol> |
---|
304 | </ul> |
---|
305 | <p> |
---|
306 | <li>Now programmer (3) comes along and make his change. The behavior of the |
---|
307 | original (and distant module) remains unchanged because the |
---|
308 | <code style="white-space: normal">construct_from</code> trait has been set to |
---|
309 | "track_never" so he should always get copies and the log should be what we expect. |
---|
310 | <p> |
---|
311 | <li>But now he gets another trap - trying to save an object of a |
---|
312 | class marked "track_never" through a pointer. So he goes back to |
---|
313 | construct_from.hpp and comments out the |
---|
314 | <code style="white-space: normal">BOOST_SERIALIZATION_TRACKING</code> that |
---|
315 | was inserted. Now the second trap is avoided, But damn - the first trap is |
---|
316 | popping up again. Eventually, after some code restructuring, the differing |
---|
317 | requirements of serializating <code style="white-space: normal">construct_from</code> |
---|
318 | are reconciled. |
---|
319 | </ol> |
---|
320 | Note that in this second scenario |
---|
321 | <ul> |
---|
322 | <li>all errors are trapped at compile time. |
---|
323 | <li>no invalid archives are created. |
---|
324 | <li>no data is lost. |
---|
325 | <li>no runtime errors occur. |
---|
326 | </ul> |
---|
327 | |
---|
328 | It's true that these traps may sometimes flag code that is currently correct and |
---|
329 | that this may be annoying to some programmers. However, this example illustrates |
---|
330 | my view that these traps are useful and that any such annoyance is small price to |
---|
331 | pay to avoid particularly vexing programming errors. |
---|
332 | |
---|
333 | <!-- |
---|
334 | <h2><a name="footnotes"></a>Footnotes</h2> |
---|
335 | <dl> |
---|
336 | <dt><a name="footnote1" class="footnote">(1)</a> {{text}}</dt> |
---|
337 | <dt><a name="footnote2" class="footnote">(2)</a> {{text}}</dt> |
---|
338 | </dl> |
---|
339 | --> |
---|
340 | <hr> |
---|
341 | <p><i>© Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004. |
---|
342 | Distributed under the Boost Software License, Version 1.0. (See |
---|
343 | accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
---|
344 | </i></p> |
---|
345 | </body> |
---|
346 | </html> |
---|