| [29] | 1 | <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> | 
|---|
 | 2 | <html> | 
|---|
 | 3 | <!-- | 
|---|
 | 4 | (C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com .  | 
|---|
 | 5 | Use, modification and distribution is subject to the Boost Software | 
|---|
 | 6 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at | 
|---|
 | 7 | http://www.boost.org/LICENSE_1_0.txt) | 
|---|
 | 8 | --> | 
|---|
 | 9 | <head> | 
|---|
 | 10 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> | 
|---|
 | 11 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> | 
|---|
 | 12 | <link rel="stylesheet" type="text/css" href="style.css"> | 
|---|
 | 13 | <title>Seriealization - Rationale</title> | 
|---|
 | 14 | </head> | 
|---|
 | 15 | <body link="#0000ff" vlink="#800080"> | 
|---|
 | 16 | <table border="0" cellpadding="7" cellspacing="0" width="100%" summary= | 
|---|
 | 17 |     "header"> | 
|---|
 | 18 |   <tr>  | 
|---|
 | 19 |     <td valign="top" width="300">  | 
|---|
 | 20 |       <h3><a href="http://www.boost.org"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> | 
|---|
 | 21 |     </td> | 
|---|
 | 22 |     <td valign="top">  | 
|---|
 | 23 |       <h1 align="center">Serialization</h1> | 
|---|
 | 24 |       <h2 align="center">Rationale</h2> | 
|---|
 | 25 |     </td> | 
|---|
 | 26 |   </tr> | 
|---|
 | 27 | </table> | 
|---|
 | 28 | <hr> | 
|---|
 | 29 | <dl class="index"> | 
|---|
 | 30 |   <dt><a href="#serialization">The term "serialization" is preferred to "persistence"</a></dt> | 
|---|
 | 31 |   <dt><a href="#archives">Archives are not streams</a></dt> | 
|---|
 | 32 |   <dt><a href="#strings">Strings are treated specially in text archives</a></dt> | 
|---|
 | 33 |   <dt><a href="#typeid"><code style="white-space: normal">typeid</code> information is not included in archives</a></dt> | 
|---|
 | 34 |   <dt><a href="#trap">Compile time trap when saving a non-const value</a></dt> | 
|---|
 | 35 |   <!-- | 
|---|
 | 36 |   <dt><a href="#footnotes">Footnotes</a></dt> | 
|---|
 | 37 |   --> | 
|---|
 | 38 | </dl> | 
|---|
 | 39 | <h2><a name="serialization"></a>The term "serialization" is preferred to "persistence"</h2> | 
|---|
 | 40 | <p> | 
|---|
 | 41 | I found that persistence is often used to refer | 
|---|
 | 42 | to something quite different. Examples are storage of class | 
|---|
 | 43 | instances (objects) in database schema <a href="bibliography.html#4">[4]</a> | 
|---|
 | 44 | This library will be useful in other contexts besides implementing persistence. The | 
|---|
 | 45 | most obvious case is that of marshalling data for transmission to another system. | 
|---|
 | 46 | <h2><a name="archives"></a>Archives are not streams</h2> | 
|---|
 | 47 | <p> | 
|---|
 | 48 | Archive classes are <strong>NOT</strong> derived from | 
|---|
 | 49 | streams even though they have similar syntax rules. | 
|---|
 | 50 | <ul> | 
|---|
 | 51 |     <li>Archive classes are not kinds of streams though they | 
|---|
 | 52 |     are implemented in terms of streams. This | 
|---|
 | 53 |     distinction is addressed in <a href="bibliography.html#5">[5]</a> item number item 41 . | 
|---|
 | 54 |     <li>We don't want users to insert/extract data | 
|---|
 | 55 |     directly into/from  the stream .  This could | 
|---|
 | 56 |     create a corrupted archive. Were archives | 
|---|
 | 57 |     derived from streams, it would possible to | 
|---|
 | 58 |     accidentally do this. So archive classes | 
|---|
 | 59 |     only define operations which are safe and necessary. | 
|---|
 | 60 |     <li>The usage of streams to implement the archive classes that | 
|---|
 | 61 |     are included in the library is merely convenient - not necessary. | 
|---|
 | 62 |     Library users may well want to define their own archive format | 
|---|
 | 63 |     which doesn't use streams at all. | 
|---|
 | 64 | </ul> | 
|---|
 | 65 | <h2><a name="primitives"></a>Archive Members are Templates  | 
|---|
 | 66 | Rather than Virtual Functions</h2> | 
|---|
 | 67 | The previous version of this library defined virtual functions for all | 
|---|
 | 68 | primitive types.  These were overridden by each archive class.  There were | 
|---|
 | 69 | two issues related to this: | 
|---|
 | 70 | </ul> | 
|---|
 | 71 |     <li>Some disliked virtual functions because of the added execution time | 
|---|
 | 72 |     overhead. | 
|---|
 | 73 |     <li>This caused implementation difficulties since the set of primitive | 
|---|
 | 74 |     data types varies between platforms.  Attempting to define the correct | 
|---|
 | 75 |     set of virtual functions, (think <code style="white-space: normal">long long</code>,  | 
|---|
 | 76 |     <code style="white-space: normal">__int64</code>,  | 
|---|
 | 77 |     etc.) resulted in messy and fragile code.  Replacing this with templates | 
|---|
 | 78 |     and letting the compiler generate the code for the primitive types actually | 
|---|
 | 79 |     used, resolved this problem.  Of course, the ripple effects of this design | 
|---|
 | 80 |     change were significant, but in the end led to smaller, faster, more | 
|---|
 | 81 |     maintainable code. | 
|---|
 | 82 | </ul> | 
|---|
 | 83 | <h2><a name="strings"></a><code style="white-space: normal">std::strings</code> are treated specially in text files</h2> | 
|---|
 | 84 | <p> | 
|---|
 | 85 | Treating strings as STL vectors would result in minimal code size. This was | 
|---|
 | 86 | not done because: | 
|---|
 | 87 | <ul> | 
|---|
 | 88 |      <li>In text archives it is convenient to be able to view strings.  Our text | 
|---|
 | 89 |      implementation stores single characters as integers.  Storing strings | 
|---|
 | 90 |      as a vector of characters would waste space and render the archives | 
|---|
 | 91 |      inconvenient for debugging. | 
|---|
 | 92 |      <li>Stream implementations have special functions for <code style="white-space: normal">std::string</code> | 
|---|
 | 93 |      and <code style="white-space: normal">std::wstring</code>. | 
|---|
 | 94 |      Presumably they optimize appropriately. | 
|---|
 | 95 |      <li>Other specializations of <code style="white-space: normal">std::basic_string</code> are in fact handled | 
|---|
 | 96 |      as vectors of the element type. | 
|---|
 | 97 | </ul> | 
|---|
 | 98 | </p> | 
|---|
 | 99 | <h2><a name="typeid"></a><code style="white-space: normal">typeid</code> information is not included in archives</h2> | 
|---|
 | 100 | <p> | 
|---|
 | 101 | I originally thought that I had to save the name of the class specified by <code style="white-space: normal">std::type_of::name()</code> | 
|---|
 | 102 | in the archive. This created difficulties as <code style="white-space: normal">std::type_of::name()</code> is not portable and | 
|---|
 | 103 | not guaranteed to return the class name. This makes it almost useless for implementing | 
|---|
 | 104 | archive portability.  This topic is explained in much more detail in | 
|---|
 | 105 | <a href="bibliography.html#6">[7] page 206</a>. It turned out that it was not necessary. | 
|---|
 | 106 | As long as objects are loaded in the exact sequence as they were saved, the type | 
|---|
 | 107 | is available when loading.  The only exception to this is the case of polymorphic | 
|---|
 | 108 | pointers never before loaded/saved.  This is addressed with the <code style="white-space: normal">register_type()</code> | 
|---|
 | 109 | and/or <code style="white-space: normal">export</code> facilities described in the reference.   | 
|---|
 | 110 | In effect, <code style="white-space: normal">export</code> generates a portable equivalent to | 
|---|
 | 111 | <code style="white-space: normal">typeid</code> information. | 
|---|
 | 112 |  | 
|---|
 | 113 | <h2><a name="trap"></a>Compile time trap when saving a non-const value</h2> | 
|---|
 | 114 | </p> | 
|---|
 | 115 | The following code will fail to compile.  The failure will occur on a line with a | 
|---|
 | 116 | <code style="white-space: normal">BOOST_STATIC_ASSERT</code>.   | 
|---|
 | 117 | Here, we refer to this as a compile time trap. | 
|---|
 | 118 | <code style="white-space: normal"><pre> | 
|---|
 | 119 | T t; | 
|---|
 | 120 | ar << t; | 
|---|
 | 121 | </pre></code> | 
|---|
 | 122 |  | 
|---|
 | 123 | unless the tracking_level serialization trait is set to "track_never". The following | 
|---|
 | 124 | will compile without problem: | 
|---|
 | 125 |  | 
|---|
 | 126 | <code style="white-space: normal"><pre> | 
|---|
 | 127 | const T t | 
|---|
 | 128 | ar << t; | 
|---|
 | 129 | </pre></code> | 
|---|
 | 130 |  | 
|---|
 | 131 | Likewise, the following code will trap at compile time: | 
|---|
 | 132 |  | 
|---|
 | 133 | <code style="white-space: normal"><pre> | 
|---|
 | 134 | T * t; | 
|---|
 | 135 | ar >> t; | 
|---|
 | 136 | </pre></code> | 
|---|
 | 137 |  | 
|---|
 | 138 | if the tracking_level serialization trait is set to "track_never". | 
|---|
 | 139 | <p> | 
|---|
 | 140 |  | 
|---|
 | 141 | This behavior has been contraversial and may be revised in the future. The criticism  | 
|---|
 | 142 | is that it will flag code that is in fact correct and force users to insert | 
|---|
 | 143 | <code style="white-space: normal">const_cast</code>. My view is that: | 
|---|
 | 144 | <ul> | 
|---|
 | 145 |   <li>The trap is useful in detecting a certain class of programming errors. | 
|---|
 | 146 |   <li>Such errors would otherwise be difficult to detect. | 
|---|
 | 147 |   <li>The incovenience caused by including this trap is very small in relation | 
|---|
 | 148 |   to its benefits. | 
|---|
 | 149 | </ul> | 
|---|
 | 150 |  | 
|---|
 | 151 | The following case illustrates my position.  It was originally used as an example in the | 
|---|
 | 152 | mailing list by Peter Dimov. | 
|---|
 | 153 |  | 
|---|
 | 154 | <code style="white-space: normal"><pre> | 
|---|
 | 155 | class construct_from  | 
|---|
 | 156 | {  | 
|---|
 | 157 |     ...  | 
|---|
 | 158 | };  | 
|---|
 | 159 |  | 
|---|
 | 160 | void main(){  | 
|---|
 | 161 |     ...  | 
|---|
 | 162 |     Y y;  | 
|---|
 | 163 |     construct_from x(y);  | 
|---|
 | 164 |     ar << x;  | 
|---|
 | 165 | }  | 
|---|
 | 166 | </pre></code> | 
|---|
 | 167 |  | 
|---|
 | 168 | Suppose that there is no trap as described above. | 
|---|
 | 169 | <ol> | 
|---|
 | 170 |   <li>this example compiles and executes fine. No tracking is done because  | 
|---|
 | 171 |   construct_from has never been serialized through a pointer. Now some time  | 
|---|
 | 172 |   later, the next programmer(2) comes along and makes an enhancement. He  | 
|---|
 | 173 |   wants the archive to be sort of a log.  | 
|---|
 | 174 |  | 
|---|
 | 175 | <code style="white-space: normal"><pre> | 
|---|
 | 176 | void main(){  | 
|---|
 | 177 |     ...  | 
|---|
 | 178 |     Y y;  | 
|---|
 | 179 |     construct_from x(y);  | 
|---|
 | 180 |     ar << x;  | 
|---|
 | 181 |     ...  | 
|---|
 | 182 |     x.f(); // change x in some way  | 
|---|
 | 183 |    ...  | 
|---|
 | 184 |     ar << x  | 
|---|
 | 185 | }  | 
|---|
 | 186 | </pre></code> | 
|---|
 | 187 |   <p> | 
|---|
 | 188 |   Again no problem. He gets two different of copies in the archive, each one is different.  | 
|---|
 | 189 |   That is he gets exactly what he expects and is naturally delighted.  | 
|---|
 | 190 |   <p> | 
|---|
 | 191 |   <li>Now sometime later, a third programmer(3) sees construct_from and says -  | 
|---|
 | 192 |   oh cool, just what I need. He writes a function in a totally disjoint  | 
|---|
 | 193 |   module. (The project is so big, he doesn't even realize the existence of  | 
|---|
 | 194 |   the original usage) and writes something like:  | 
|---|
 | 195 |  | 
|---|
 | 196 | <code style="white-space: normal"><pre> | 
|---|
 | 197 | class K {  | 
|---|
 | 198 |     shared_ptr <construct_from> z;  | 
|---|
 | 199 |     template <class Archive>  | 
|---|
 | 200 |     void serialize(Archive & ar, const unsigned version){  | 
|---|
 | 201 |         ar << z;  | 
|---|
 | 202 |     }  | 
|---|
 | 203 | };  | 
|---|
 | 204 | </pre></code> | 
|---|
 | 205 |  | 
|---|
 | 206 |   <p> | 
|---|
 | 207 |   He builds and runs the program and tests his new functionality. It works  | 
|---|
 | 208 |   great and he's delighted.  | 
|---|
 | 209 |   <p> | 
|---|
 | 210 |   <li>Things continue smoothly as before.  A month goes by and it's  | 
|---|
 | 211 |   discovered that when loading the archives made in the last month (reading the  | 
|---|
 | 212 |   log). Things don't work. The second log entry is always the same as the  | 
|---|
 | 213 |   first. After a series of very long and increasingly acrimonius email exchanges,  | 
|---|
 | 214 | its discovered  | 
|---|
 | 215 |   that programmer (3) accidently broke programmer(2)'s code .This is because by  | 
|---|
 | 216 |   serializing via a pointer, the "log" object now being tracked.  This is because | 
|---|
 | 217 |   the default tracking behavior is "track_selectively".  This means that class | 
|---|
 | 218 |   instances are tracked only if they are serialized through pointers anywhere in | 
|---|
 | 219 |   the program. Now multiple saves from the same address result in only the first one  | 
|---|
 | 220 |   being written to the archive. Subsequent saves only add the address - even though the  | 
|---|
 | 221 |   data might have been changed.  When it comes time to load the data, all instances of the log record show the same data. | 
|---|
 | 222 |   In this way, the behavior of a functioning piece of code is changed due the side | 
|---|
 | 223 |   effect of a change in an otherwise disjoint module. | 
|---|
 | 224 |   Worse yet, the data has been lost and cannot not be now recovered from the archives. | 
|---|
 | 225 |   People are really upset and disappointed with boost (at least the serialization system). | 
|---|
 | 226 |   <p> | 
|---|
 | 227 |   <li> | 
|---|
 | 228 |   After a lot of investigation, it's discovered what the source of the problem | 
|---|
 | 229 |   and class construct_from is marked "track_never" by including: | 
|---|
 | 230 | <code style="white-space: normal"><pre> | 
|---|
 | 231 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never)  | 
|---|
 | 232 | </pre></code> | 
|---|
 | 233 |   <li>Now everything works again. Or - so it seems. | 
|---|
 | 234 |   <p> | 
|---|
 | 235 |   <li><code style="white-space: normal">shared_ptr<construct_from></code> | 
|---|
 | 236 | is not going to have a single raw pointer shared amongst the instances. Each loaded  | 
|---|
 | 237 | <code style="white-space: normal">shared_ptr<construct_from></code> is going to  | 
|---|
 | 238 | have its own distince raw pointer. This will break  | 
|---|
 | 239 | <code style="white-space: normal">shared_ptr</code> and cause a memory leak.  Again, | 
|---|
 | 240 | The cause of this problem is very far removed from the point of discovery.  It would  | 
|---|
 | 241 | well be that the problem is not even discovered until after the archives are loaded. | 
|---|
 | 242 | Now we not only have difficult to find and fix program bug, but we have a bunch of | 
|---|
 | 243 | invalid archives and lost data. | 
|---|
 | 244 | </ol> | 
|---|
 | 245 |  | 
|---|
 | 246 | Now consider what happens when the trap is enabled:.  | 
|---|
 | 247 |  | 
|---|
 | 248 | <ol> | 
|---|
 | 249 |   <p> | 
|---|
 | 250 |   <li>Right away, the program traps at  | 
|---|
 | 251 | <code style="white-space: normal"><pre> | 
|---|
 | 252 | ar << x;  | 
|---|
 | 253 | </pre></code> | 
|---|
 | 254 |   <p> | 
|---|
 | 255 |   <li>The programmer curses (another %^&*&* hoop to jump through). If he's in a  | 
|---|
 | 256 |   hurry (and who isn't) and would prefer not to <code style="white-space: normal">const_cast</code> | 
|---|
 | 257 |   - because it looks bad.  So he'll just make the following change an move on.  | 
|---|
 | 258 | <code style="white-space: normal"><pre> | 
|---|
 | 259 | Y y;  | 
|---|
 | 260 | const construct_from x(y);  | 
|---|
 | 261 | ar << x;  | 
|---|
 | 262 | </pre></code> | 
|---|
 | 263 |   <p> | 
|---|
 | 264 |   Things work fine and he moves on.  | 
|---|
 | 265 |   <p> | 
|---|
 | 266 |   <li>Now programer (2) wants to make his change - and again another  | 
|---|
 | 267 |   annoying const issue;  | 
|---|
 | 268 | <code style="white-space: normal"><pre> | 
|---|
 | 269 | Y y;  | 
|---|
 | 270 | const construct_from x(y);  | 
|---|
 | 271 | ...  | 
|---|
 | 272 | x.f(); // change x in some way ; compile error f() is not const  | 
|---|
 | 273 | ...  | 
|---|
 | 274 | ar << x  | 
|---|
 | 275 | </pre></code> | 
|---|
 | 276 |   <p> | 
|---|
 | 277 |   He's mildly annoyed now he tries the following:  | 
|---|
 | 278 |   <ul> | 
|---|
 | 279 |     <li>He considers making f() a const - but presumable that shifts the const  | 
|---|
 | 280 |     error to somewhere else. And his doesn't want to fiddle with "his" code to  | 
|---|
 | 281 |     work around a quirk in the serializaition system  | 
|---|
 | 282 |     <p> | 
|---|
 | 283 |     <li>He removes the <code style="white-space: normal">const</code> | 
|---|
 | 284 |     from <code style="white-space: normal">const construct_from</code> above - damn now he  | 
|---|
 | 285 |     gets the trap. If he looks at the comment code where the  | 
|---|
 | 286 |     <code style="white-space: normal">BOOST_STATIC_ASSERT</code> | 
|---|
 | 287 |     occurs, he'll do one of two things  | 
|---|
 | 288 |     <ol> | 
|---|
 | 289 |       <p> | 
|---|
 | 290 |       <li>This is just crazy. Its making my life needlessly difficult and flagging  | 
|---|
 | 291 |       code that is just fine. So I'll fix this with a <code style="white-space: normal">const_cast</code> | 
|---|
 | 292 |       and fire off a complaint to the list and mabe they will fix it.  | 
|---|
 | 293 |       In this case, the story branches off to the previous scenario. | 
|---|
 | 294 |       <p> | 
|---|
 | 295 |       <li>Oh, this trap is suggesting that the default serialization isn't really  | 
|---|
 | 296 |       what I want. Of course in this particular program it doesn't matter. But  | 
|---|
 | 297 |       then the code in the trap can't really evaluate code in other modules (which  | 
|---|
 | 298 |       might not even be written yet). OK, I'll at the following to my  | 
|---|
 | 299 |       construct_from.hpp to solve the problem.  | 
|---|
 | 300 | <code style="white-space: normal"><pre> | 
|---|
 | 301 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never)  | 
|---|
 | 302 | </pre></code> | 
|---|
 | 303 |     </ol> | 
|---|
 | 304 |   </ul> | 
|---|
 | 305 |   <p> | 
|---|
 | 306 |   <li>Now programmer (3) comes along and make his change. The behavior of the  | 
|---|
 | 307 |   original (and distant module) remains unchanged because the  | 
|---|
 | 308 |   <code style="white-space: normal">construct_from</code> trait has been set to  | 
|---|
 | 309 |   "track_never" so he should always get copies and the log should be what we expect. | 
|---|
 | 310 |   <p> | 
|---|
 | 311 |   <li>But now he gets another trap - trying to save an object of a  | 
|---|
 | 312 |   class marked "track_never" through a pointer. So he goes back to  | 
|---|
 | 313 |   construct_from.hpp and comments out the  | 
|---|
 | 314 |   <code style="white-space: normal">BOOST_SERIALIZATION_TRACKING</code> that  | 
|---|
 | 315 |   was inserted. Now the second trap is avoided, But damn - the first trap is  | 
|---|
 | 316 |   popping up again. Eventually, after some code restructuring, the differing | 
|---|
 | 317 |   requirements of serializating <code style="white-space: normal">construct_from</code> | 
|---|
 | 318 |   are reconciled. | 
|---|
 | 319 | </ol> | 
|---|
 | 320 | Note that in this second scenario | 
|---|
 | 321 | <ul> | 
|---|
 | 322 |   <li>all errors are trapped at compile time. | 
|---|
 | 323 |   <li>no invalid archives are created. | 
|---|
 | 324 |   <li>no data is lost. | 
|---|
 | 325 |   <li>no runtime errors occur. | 
|---|
 | 326 | </ul> | 
|---|
 | 327 |  | 
|---|
 | 328 | It's true that these traps may sometimes flag code that is currently correct and | 
|---|
 | 329 | that this may be annoying to some programmers.  However, this example illustrates | 
|---|
 | 330 | my view that these traps are useful and that any such annoyance is small price to | 
|---|
 | 331 | pay to avoid particularly vexing programming errors. | 
|---|
 | 332 |  | 
|---|
 | 333 | <!-- | 
|---|
 | 334 | <h2><a name="footnotes"></a>Footnotes</h2> | 
|---|
 | 335 | <dl> | 
|---|
 | 336 |   <dt><a name="footnote1" class="footnote">(1)</a> {{text}}</dt> | 
|---|
 | 337 |   <dt><a name="footnote2" class="footnote">(2)</a> {{text}}</dt> | 
|---|
 | 338 | </dl> | 
|---|
 | 339 | --> | 
|---|
 | 340 | <hr> | 
|---|
 | 341 | <p><i>© Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004.  | 
|---|
 | 342 | Distributed under the Boost Software License, Version 1.0. (See | 
|---|
 | 343 | accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) | 
|---|
 | 344 | </i></p> | 
|---|
 | 345 | </body> | 
|---|
 | 346 | </html> | 
|---|