[29] | 1 | <!doctype HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd"> |
---|
| 2 | <html> |
---|
| 3 | <!-- |
---|
| 4 | (C) Copyright 2002-4 Robert Ramey - http://www.rrsd.com . |
---|
| 5 | Use, modification and distribution is subject to the Boost Software |
---|
| 6 | License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at |
---|
| 7 | http://www.boost.org/LICENSE_1_0.txt) |
---|
| 8 | --> |
---|
| 9 | <head> |
---|
| 10 | <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"> |
---|
| 11 | <link rel="stylesheet" type="text/css" href="../../../boost.css"> |
---|
| 12 | <link rel="stylesheet" type="text/css" href="style.css"> |
---|
| 13 | <title>Seriealization - Rationale</title> |
---|
| 14 | </head> |
---|
| 15 | <body link="#0000ff" vlink="#800080"> |
---|
| 16 | <table border="0" cellpadding="7" cellspacing="0" width="100%" summary= |
---|
| 17 | "header"> |
---|
| 18 | <tr> |
---|
| 19 | <td valign="top" width="300"> |
---|
| 20 | <h3><a href="http://www.boost.org"><img height="86" width="277" alt="C++ Boost" src="../../../boost.png" border="0"></a></h3> |
---|
| 21 | </td> |
---|
| 22 | <td valign="top"> |
---|
| 23 | <h1 align="center">Serialization</h1> |
---|
| 24 | <h2 align="center">Rationale</h2> |
---|
| 25 | </td> |
---|
| 26 | </tr> |
---|
| 27 | </table> |
---|
| 28 | <hr> |
---|
| 29 | <dl class="index"> |
---|
| 30 | <dt><a href="#serialization">The term "serialization" is preferred to "persistence"</a></dt> |
---|
| 31 | <dt><a href="#archives">Archives are not streams</a></dt> |
---|
| 32 | <dt><a href="#strings">Strings are treated specially in text archives</a></dt> |
---|
| 33 | <dt><a href="#typeid"><code style="white-space: normal">typeid</code> information is not included in archives</a></dt> |
---|
| 34 | <dt><a href="#trap">Compile time trap when saving a non-const value</a></dt> |
---|
| 35 | <!-- |
---|
| 36 | <dt><a href="#footnotes">Footnotes</a></dt> |
---|
| 37 | --> |
---|
| 38 | </dl> |
---|
| 39 | <h2><a name="serialization"></a>The term "serialization" is preferred to "persistence"</h2> |
---|
| 40 | <p> |
---|
| 41 | I found that persistence is often used to refer |
---|
| 42 | to something quite different. Examples are storage of class |
---|
| 43 | instances (objects) in database schema <a href="bibliography.html#4">[4]</a> |
---|
| 44 | This library will be useful in other contexts besides implementing persistence. The |
---|
| 45 | most obvious case is that of marshalling data for transmission to another system. |
---|
| 46 | <h2><a name="archives"></a>Archives are not streams</h2> |
---|
| 47 | <p> |
---|
| 48 | Archive classes are <strong>NOT</strong> derived from |
---|
| 49 | streams even though they have similar syntax rules. |
---|
| 50 | <ul> |
---|
| 51 | <li>Archive classes are not kinds of streams though they |
---|
| 52 | are implemented in terms of streams. This |
---|
| 53 | distinction is addressed in <a href="bibliography.html#5">[5]</a> item number item 41 . |
---|
| 54 | <li>We don't want users to insert/extract data |
---|
| 55 | directly into/from the stream . This could |
---|
| 56 | create a corrupted archive. Were archives |
---|
| 57 | derived from streams, it would possible to |
---|
| 58 | accidentally do this. So archive classes |
---|
| 59 | only define operations which are safe and necessary. |
---|
| 60 | <li>The usage of streams to implement the archive classes that |
---|
| 61 | are included in the library is merely convenient - not necessary. |
---|
| 62 | Library users may well want to define their own archive format |
---|
| 63 | which doesn't use streams at all. |
---|
| 64 | </ul> |
---|
| 65 | <h2><a name="primitives"></a>Archive Members are Templates |
---|
| 66 | Rather than Virtual Functions</h2> |
---|
| 67 | The previous version of this library defined virtual functions for all |
---|
| 68 | primitive types. These were overridden by each archive class. There were |
---|
| 69 | two issues related to this: |
---|
| 70 | </ul> |
---|
| 71 | <li>Some disliked virtual functions because of the added execution time |
---|
| 72 | overhead. |
---|
| 73 | <li>This caused implementation difficulties since the set of primitive |
---|
| 74 | data types varies between platforms. Attempting to define the correct |
---|
| 75 | set of virtual functions, (think <code style="white-space: normal">long long</code>, |
---|
| 76 | <code style="white-space: normal">__int64</code>, |
---|
| 77 | etc.) resulted in messy and fragile code. Replacing this with templates |
---|
| 78 | and letting the compiler generate the code for the primitive types actually |
---|
| 79 | used, resolved this problem. Of course, the ripple effects of this design |
---|
| 80 | change were significant, but in the end led to smaller, faster, more |
---|
| 81 | maintainable code. |
---|
| 82 | </ul> |
---|
| 83 | <h2><a name="strings"></a><code style="white-space: normal">std::strings</code> are treated specially in text files</h2> |
---|
| 84 | <p> |
---|
| 85 | Treating strings as STL vectors would result in minimal code size. This was |
---|
| 86 | not done because: |
---|
| 87 | <ul> |
---|
| 88 | <li>In text archives it is convenient to be able to view strings. Our text |
---|
| 89 | implementation stores single characters as integers. Storing strings |
---|
| 90 | as a vector of characters would waste space and render the archives |
---|
| 91 | inconvenient for debugging. |
---|
| 92 | <li>Stream implementations have special functions for <code style="white-space: normal">std::string</code> |
---|
| 93 | and <code style="white-space: normal">std::wstring</code>. |
---|
| 94 | Presumably they optimize appropriately. |
---|
| 95 | <li>Other specializations of <code style="white-space: normal">std::basic_string</code> are in fact handled |
---|
| 96 | as vectors of the element type. |
---|
| 97 | </ul> |
---|
| 98 | </p> |
---|
| 99 | <h2><a name="typeid"></a><code style="white-space: normal">typeid</code> information is not included in archives</h2> |
---|
| 100 | <p> |
---|
| 101 | I originally thought that I had to save the name of the class specified by <code style="white-space: normal">std::type_of::name()</code> |
---|
| 102 | in the archive. This created difficulties as <code style="white-space: normal">std::type_of::name()</code> is not portable and |
---|
| 103 | not guaranteed to return the class name. This makes it almost useless for implementing |
---|
| 104 | archive portability. This topic is explained in much more detail in |
---|
| 105 | <a href="bibliography.html#6">[7] page 206</a>. It turned out that it was not necessary. |
---|
| 106 | As long as objects are loaded in the exact sequence as they were saved, the type |
---|
| 107 | is available when loading. The only exception to this is the case of polymorphic |
---|
| 108 | pointers never before loaded/saved. This is addressed with the <code style="white-space: normal">register_type()</code> |
---|
| 109 | and/or <code style="white-space: normal">export</code> facilities described in the reference. |
---|
| 110 | In effect, <code style="white-space: normal">export</code> generates a portable equivalent to |
---|
| 111 | <code style="white-space: normal">typeid</code> information. |
---|
| 112 | |
---|
| 113 | <h2><a name="trap"></a>Compile time trap when saving a non-const value</h2> |
---|
| 114 | </p> |
---|
| 115 | The following code will fail to compile. The failure will occur on a line with a |
---|
| 116 | <code style="white-space: normal">BOOST_STATIC_ASSERT</code>. |
---|
| 117 | Here, we refer to this as a compile time trap. |
---|
| 118 | <code style="white-space: normal"><pre> |
---|
| 119 | T t; |
---|
| 120 | ar << t; |
---|
| 121 | </pre></code> |
---|
| 122 | |
---|
| 123 | unless the tracking_level serialization trait is set to "track_never". The following |
---|
| 124 | will compile without problem: |
---|
| 125 | |
---|
| 126 | <code style="white-space: normal"><pre> |
---|
| 127 | const T t |
---|
| 128 | ar << t; |
---|
| 129 | </pre></code> |
---|
| 130 | |
---|
| 131 | Likewise, the following code will trap at compile time: |
---|
| 132 | |
---|
| 133 | <code style="white-space: normal"><pre> |
---|
| 134 | T * t; |
---|
| 135 | ar >> t; |
---|
| 136 | </pre></code> |
---|
| 137 | |
---|
| 138 | if the tracking_level serialization trait is set to "track_never". |
---|
| 139 | <p> |
---|
| 140 | |
---|
| 141 | This behavior has been contraversial and may be revised in the future. The criticism |
---|
| 142 | is that it will flag code that is in fact correct and force users to insert |
---|
| 143 | <code style="white-space: normal">const_cast</code>. My view is that: |
---|
| 144 | <ul> |
---|
| 145 | <li>The trap is useful in detecting a certain class of programming errors. |
---|
| 146 | <li>Such errors would otherwise be difficult to detect. |
---|
| 147 | <li>The incovenience caused by including this trap is very small in relation |
---|
| 148 | to its benefits. |
---|
| 149 | </ul> |
---|
| 150 | |
---|
| 151 | The following case illustrates my position. It was originally used as an example in the |
---|
| 152 | mailing list by Peter Dimov. |
---|
| 153 | |
---|
| 154 | <code style="white-space: normal"><pre> |
---|
| 155 | class construct_from |
---|
| 156 | { |
---|
| 157 | ... |
---|
| 158 | }; |
---|
| 159 | |
---|
| 160 | void main(){ |
---|
| 161 | ... |
---|
| 162 | Y y; |
---|
| 163 | construct_from x(y); |
---|
| 164 | ar << x; |
---|
| 165 | } |
---|
| 166 | </pre></code> |
---|
| 167 | |
---|
| 168 | Suppose that there is no trap as described above. |
---|
| 169 | <ol> |
---|
| 170 | <li>this example compiles and executes fine. No tracking is done because |
---|
| 171 | construct_from has never been serialized through a pointer. Now some time |
---|
| 172 | later, the next programmer(2) comes along and makes an enhancement. He |
---|
| 173 | wants the archive to be sort of a log. |
---|
| 174 | |
---|
| 175 | <code style="white-space: normal"><pre> |
---|
| 176 | void main(){ |
---|
| 177 | ... |
---|
| 178 | Y y; |
---|
| 179 | construct_from x(y); |
---|
| 180 | ar << x; |
---|
| 181 | ... |
---|
| 182 | x.f(); // change x in some way |
---|
| 183 | ... |
---|
| 184 | ar << x |
---|
| 185 | } |
---|
| 186 | </pre></code> |
---|
| 187 | <p> |
---|
| 188 | Again no problem. He gets two different of copies in the archive, each one is different. |
---|
| 189 | That is he gets exactly what he expects and is naturally delighted. |
---|
| 190 | <p> |
---|
| 191 | <li>Now sometime later, a third programmer(3) sees construct_from and says - |
---|
| 192 | oh cool, just what I need. He writes a function in a totally disjoint |
---|
| 193 | module. (The project is so big, he doesn't even realize the existence of |
---|
| 194 | the original usage) and writes something like: |
---|
| 195 | |
---|
| 196 | <code style="white-space: normal"><pre> |
---|
| 197 | class K { |
---|
| 198 | shared_ptr <construct_from> z; |
---|
| 199 | template <class Archive> |
---|
| 200 | void serialize(Archive & ar, const unsigned version){ |
---|
| 201 | ar << z; |
---|
| 202 | } |
---|
| 203 | }; |
---|
| 204 | </pre></code> |
---|
| 205 | |
---|
| 206 | <p> |
---|
| 207 | He builds and runs the program and tests his new functionality. It works |
---|
| 208 | great and he's delighted. |
---|
| 209 | <p> |
---|
| 210 | <li>Things continue smoothly as before. A month goes by and it's |
---|
| 211 | discovered that when loading the archives made in the last month (reading the |
---|
| 212 | log). Things don't work. The second log entry is always the same as the |
---|
| 213 | first. After a series of very long and increasingly acrimonius email exchanges, |
---|
| 214 | its discovered |
---|
| 215 | that programmer (3) accidently broke programmer(2)'s code .This is because by |
---|
| 216 | serializing via a pointer, the "log" object now being tracked. This is because |
---|
| 217 | the default tracking behavior is "track_selectively". This means that class |
---|
| 218 | instances are tracked only if they are serialized through pointers anywhere in |
---|
| 219 | the program. Now multiple saves from the same address result in only the first one |
---|
| 220 | being written to the archive. Subsequent saves only add the address - even though the |
---|
| 221 | data might have been changed. When it comes time to load the data, all instances of the log record show the same data. |
---|
| 222 | In this way, the behavior of a functioning piece of code is changed due the side |
---|
| 223 | effect of a change in an otherwise disjoint module. |
---|
| 224 | Worse yet, the data has been lost and cannot not be now recovered from the archives. |
---|
| 225 | People are really upset and disappointed with boost (at least the serialization system). |
---|
| 226 | <p> |
---|
| 227 | <li> |
---|
| 228 | After a lot of investigation, it's discovered what the source of the problem |
---|
| 229 | and class construct_from is marked "track_never" by including: |
---|
| 230 | <code style="white-space: normal"><pre> |
---|
| 231 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never) |
---|
| 232 | </pre></code> |
---|
| 233 | <li>Now everything works again. Or - so it seems. |
---|
| 234 | <p> |
---|
| 235 | <li><code style="white-space: normal">shared_ptr<construct_from></code> |
---|
| 236 | is not going to have a single raw pointer shared amongst the instances. Each loaded |
---|
| 237 | <code style="white-space: normal">shared_ptr<construct_from></code> is going to |
---|
| 238 | have its own distince raw pointer. This will break |
---|
| 239 | <code style="white-space: normal">shared_ptr</code> and cause a memory leak. Again, |
---|
| 240 | The cause of this problem is very far removed from the point of discovery. It would |
---|
| 241 | well be that the problem is not even discovered until after the archives are loaded. |
---|
| 242 | Now we not only have difficult to find and fix program bug, but we have a bunch of |
---|
| 243 | invalid archives and lost data. |
---|
| 244 | </ol> |
---|
| 245 | |
---|
| 246 | Now consider what happens when the trap is enabled:. |
---|
| 247 | |
---|
| 248 | <ol> |
---|
| 249 | <p> |
---|
| 250 | <li>Right away, the program traps at |
---|
| 251 | <code style="white-space: normal"><pre> |
---|
| 252 | ar << x; |
---|
| 253 | </pre></code> |
---|
| 254 | <p> |
---|
| 255 | <li>The programmer curses (another %^&*&* hoop to jump through). If he's in a |
---|
| 256 | hurry (and who isn't) and would prefer not to <code style="white-space: normal">const_cast</code> |
---|
| 257 | - because it looks bad. So he'll just make the following change an move on. |
---|
| 258 | <code style="white-space: normal"><pre> |
---|
| 259 | Y y; |
---|
| 260 | const construct_from x(y); |
---|
| 261 | ar << x; |
---|
| 262 | </pre></code> |
---|
| 263 | <p> |
---|
| 264 | Things work fine and he moves on. |
---|
| 265 | <p> |
---|
| 266 | <li>Now programer (2) wants to make his change - and again another |
---|
| 267 | annoying const issue; |
---|
| 268 | <code style="white-space: normal"><pre> |
---|
| 269 | Y y; |
---|
| 270 | const construct_from x(y); |
---|
| 271 | ... |
---|
| 272 | x.f(); // change x in some way ; compile error f() is not const |
---|
| 273 | ... |
---|
| 274 | ar << x |
---|
| 275 | </pre></code> |
---|
| 276 | <p> |
---|
| 277 | He's mildly annoyed now he tries the following: |
---|
| 278 | <ul> |
---|
| 279 | <li>He considers making f() a const - but presumable that shifts the const |
---|
| 280 | error to somewhere else. And his doesn't want to fiddle with "his" code to |
---|
| 281 | work around a quirk in the serializaition system |
---|
| 282 | <p> |
---|
| 283 | <li>He removes the <code style="white-space: normal">const</code> |
---|
| 284 | from <code style="white-space: normal">const construct_from</code> above - damn now he |
---|
| 285 | gets the trap. If he looks at the comment code where the |
---|
| 286 | <code style="white-space: normal">BOOST_STATIC_ASSERT</code> |
---|
| 287 | occurs, he'll do one of two things |
---|
| 288 | <ol> |
---|
| 289 | <p> |
---|
| 290 | <li>This is just crazy. Its making my life needlessly difficult and flagging |
---|
| 291 | code that is just fine. So I'll fix this with a <code style="white-space: normal">const_cast</code> |
---|
| 292 | and fire off a complaint to the list and mabe they will fix it. |
---|
| 293 | In this case, the story branches off to the previous scenario. |
---|
| 294 | <p> |
---|
| 295 | <li>Oh, this trap is suggesting that the default serialization isn't really |
---|
| 296 | what I want. Of course in this particular program it doesn't matter. But |
---|
| 297 | then the code in the trap can't really evaluate code in other modules (which |
---|
| 298 | might not even be written yet). OK, I'll at the following to my |
---|
| 299 | construct_from.hpp to solve the problem. |
---|
| 300 | <code style="white-space: normal"><pre> |
---|
| 301 | BOOST_SERIALIZATION_TRACKING(construct_from, track_never) |
---|
| 302 | </pre></code> |
---|
| 303 | </ol> |
---|
| 304 | </ul> |
---|
| 305 | <p> |
---|
| 306 | <li>Now programmer (3) comes along and make his change. The behavior of the |
---|
| 307 | original (and distant module) remains unchanged because the |
---|
| 308 | <code style="white-space: normal">construct_from</code> trait has been set to |
---|
| 309 | "track_never" so he should always get copies and the log should be what we expect. |
---|
| 310 | <p> |
---|
| 311 | <li>But now he gets another trap - trying to save an object of a |
---|
| 312 | class marked "track_never" through a pointer. So he goes back to |
---|
| 313 | construct_from.hpp and comments out the |
---|
| 314 | <code style="white-space: normal">BOOST_SERIALIZATION_TRACKING</code> that |
---|
| 315 | was inserted. Now the second trap is avoided, But damn - the first trap is |
---|
| 316 | popping up again. Eventually, after some code restructuring, the differing |
---|
| 317 | requirements of serializating <code style="white-space: normal">construct_from</code> |
---|
| 318 | are reconciled. |
---|
| 319 | </ol> |
---|
| 320 | Note that in this second scenario |
---|
| 321 | <ul> |
---|
| 322 | <li>all errors are trapped at compile time. |
---|
| 323 | <li>no invalid archives are created. |
---|
| 324 | <li>no data is lost. |
---|
| 325 | <li>no runtime errors occur. |
---|
| 326 | </ul> |
---|
| 327 | |
---|
| 328 | It's true that these traps may sometimes flag code that is currently correct and |
---|
| 329 | that this may be annoying to some programmers. However, this example illustrates |
---|
| 330 | my view that these traps are useful and that any such annoyance is small price to |
---|
| 331 | pay to avoid particularly vexing programming errors. |
---|
| 332 | |
---|
| 333 | <!-- |
---|
| 334 | <h2><a name="footnotes"></a>Footnotes</h2> |
---|
| 335 | <dl> |
---|
| 336 | <dt><a name="footnote1" class="footnote">(1)</a> {{text}}</dt> |
---|
| 337 | <dt><a name="footnote2" class="footnote">(2)</a> {{text}}</dt> |
---|
| 338 | </dl> |
---|
| 339 | --> |
---|
| 340 | <hr> |
---|
| 341 | <p><i>© Copyright <a href="http://www.rrsd.com">Robert Ramey</a> 2002-2004. |
---|
| 342 | Distributed under the Boost Software License, Version 1.0. (See |
---|
| 343 | accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt) |
---|
| 344 | </i></p> |
---|
| 345 | </body> |
---|
| 346 | </html> |
---|