Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/doc/html/program_options/design.html @ 12

Last change on this file since 12 was 12, checked in by landauf, 18 years ago

added boost

File size: 10.8 KB
Line 
1<html>
2<head>
3<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
4<title>Design Discussion</title>
5<link rel="stylesheet" href="../boostbook.css" type="text/css">
6<meta name="generator" content="DocBook XSL Stylesheets V1.69.1">
7<link rel="start" href="../index.html" title="The Boost C++ Libraries">
8<link rel="up" href="../program_options.html" title="Chapter 7. Boost.Program_options">
9<link rel="prev" href="howto.html" title="How To">
10<link rel="next" href="s06.html" title="Acknowledgements">
11</head>
12<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
13<table cellpadding="2" width="100%">
14<td valign="top"><img alt="boost.png (6897 bytes)" width="277" height="86" src="../../../boost.png"></td>
15<td align="center"><a href="../../../index.htm">Home</a></td>
16<td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td>
17<td align="center"><a href="../../../people/people.htm">People</a></td>
18<td align="center"><a href="../../../more/faq.htm">FAQ</a></td>
19<td align="center"><a href="../../../more/index.htm">More</a></td>
20</table>
21<hr>
22<div class="spirit-nav">
23<a accesskey="p" href="howto.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../program_options.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="s06.html"><img src="../images/next.png" alt="Next"></a>
24</div>
25<div class="section" lang="en">
26<div class="titlepage"><div><div><h3 class="title">
27<a name="program_options.design"></a>Design Discussion</h3></div></div></div>
28<div class="toc"><dl><dt><span class="section"><a href="design.html#program_options.design.unicode">Unicode Support</a></span></dt></dl></div>
29<p>This section focuses on some of the design questions.
30  </p>
31<div class="section" lang="en">
32<div class="titlepage"><div><div><h4 class="title">
33<a name="program_options.design.unicode"></a>Unicode Support</h4></div></div></div>
34<p>Unicode support was one of the features specifically requested
35      during the formal review. Throughout this document "Unicode support" is
36      a synonym for "wchar_t" support, assuming that "wchar_t" always uses
37      Unicode encoding.  Also, when talking about "ascii" (in lowercase) we'll
38      not mean strict 7-bit ASCII encoding, but rather "char" strings in local
39      8-bit encoding.
40    </p>
41<p>
42      Generally, "Unicode support" can mean
43      many things, but for the program_options library it means that:
44
45      </p>
46<div class="itemizedlist"><ul type="disc">
47<li><p>Each parser should accept either <code class="computeroutput">char*</code>
48          or <code class="computeroutput">wchar_t*</code>, correctly split the input into option
49          names and option values and return the data.
50          </p></li>
51<li><p>For each option, it should be possible to specify whether the conversion
52            from string to value uses ascii or Unicode.
53          </p></li>
54<li>
55<p>The library guarantees that:
56            </p>
57<div class="itemizedlist"><ul type="circle">
58<li><p>ascii input is passed to an ascii value without change
59                </p></li>
60<li><p>Unicode input is passed to a Unicode value without change</p></li>
61<li><p>ascii input passed to a Unicode value, and Unicode input
62                  passed to an ascii value will be converted using a codecvt
63                  facet (which may be specified by the user(which can be
64                  specified by the user)
65                </p></li>
66</ul></div>
67</li>
68</ul></div>
69<p>The important point is that it's possible to have some "ascii
70      options" together with "Unicode options". There are two reasons for
71      this. First, for a given type you might not have the code to extract the
72      value from Unicode string and it's not good to require that such code be written.
73      Second, imagine a reusable library which has some options and exposes
74      options description in its interface. If <span class="emphasis"><em>all</em></span>
75      options are either ascii or Unicode, and the library does not use any
76      Unicode strings, then the author will likely to use ascii options, which
77      would make the library unusable inside Unicode
78      applications. Essentially, it would be necessary to provide two versions
79      of the library -- ascii and Unicode.
80    </p>
81<p>Another important point is that ascii strings are passed though
82      without modification. In other words, it's not possible to just convert
83      ascii to Unicode and process the Unicode further. The problem is that the
84      default conversion mechanism -- the <code class="computeroutput">codecvt</code> facet -- might
85      not work with 8-bit input without additional setup.
86    </p>
87<p>The Unicode support outlined above is not complete. For example, we
88      don't plan allow Unicode in option names. Unicode support is hard and
89      requires a Boost-wide solution. Even comparing two arbitrary Unicode
90      strings is non-trivial. Finally, using Unicode in option names is
91      related to internationalization, which has it's own
92      complexities. E.g. if option names depend on current locale, then all
93      program parts and other parts which use the name must be
94      internationalized too.
95    </p>
96<p>The primary question in implementing the Unicode support is whether
97      to use templates and <code class="computeroutput">std::basic_string</code> or to use some
98      internal encoding and convert between internal and external encodings on
99      the interface boundaries.           
100    </p>
101<p>The choice, mostly, is between code size and execution
102      speed. A templated solution would either link library code into every
103      application that uses the library (thereby making shared library
104      impossible), or provide explicit instantiations in the shared library
105      (increasing its size). The solution based on internal encoding would
106      necessarily make conversions in a number of places and will be somewhat slower.
107      Since speed is generally not an issue for this library, the second
108      solution looks more attractive, but we'll take a closer look at
109      individual components.
110    </p>
111<p>For the parsers component, we have three choices:
112      </p>
113<div class="itemizedlist"><ul type="disc">
114<li><p>Use a fully templated implementation: given a string of a
115            certain type, a parser will return a <code class="computeroutput">parsed_options</code> instance
116            with strings of the same type (i.e. the <code class="computeroutput">parsed_options</code> class
117            will be templated).</p></li>
118<li><p>Use internal encoding: same as above, but strings will be converted to and
119            from the internal encoding.</p></li>
120<li><p>Use and partly expose the internal encoding: same as above,
121            but the strings in the <code class="computeroutput">parsed_options</code> instance will be in the
122            internal encoding. This might avoid a conversion if
123            <code class="computeroutput">parsed_options</code> instance is passed directly to other components,
124            but can be also dangerous or confusing for a user.
125          </p></li>
126</ul></div>
127<p>The second solution appears to be the best -- it does not increase
128    the code size much and is cleaner than the third. To avoid extra
129    conversions, the Unicode version of <code class="computeroutput">parsed_options</code> can also store
130    strings in internal encoding.
131    </p>
132<p>For the options descriptions component, we don't have much
133      choice. Since it's not desirable to have either all options use ascii or all
134      of them use Unicode, but rather have some ascii and some Unicode options, the
135      interface of the <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> must work with both. The only way is
136      to pass an additional flag telling if strings use ascii or internal encoding.
137      The instance of <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> can then convert into some
138      other encoding if needed.
139    </p>
140<p>For the storage component, the only affected function is <code class="computeroutput"><a href="../id2349650.html" title="Function store">store</a></code>.
141      For Unicode input, the <code class="computeroutput"><a href="../id2349650.html" title="Function store">store</a></code> function should convert the value to the
142      internal encoding.  It should also inform the <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> class
143      about the used encoding.
144    </p>
145<p>Finally, what internal encoding should we use? The
146    alternatives are:
147    <code class="computeroutput">std::wstring</code> (using UCS-4 encoding) and
148    <code class="computeroutput">std::string</code> (using UTF-8 encoding). The difference between
149    alternatives is:
150      </p>
151<div class="itemizedlist"><ul type="disc">
152<li><p>Speed: UTF-8 is a bit slower</p></li>
153<li><p>Space: UTF-8 takes less space when input is ascii</p></li>
154<li><p>Code size: UTF-8 requires additional conversion code. However,
155            it allows one to use existing parsers without converting them to
156            <code class="computeroutput">std::wstring</code> and such conversion is likely to create a
157            number of new instantiations.
158          </p></li>
159</ul></div>
160<p>
161      There's no clear leader, but the last point seems important, so UTF-8
162      will be used.     
163    </p>
164<p>Choosing the UTF-8 encoding allows the use of existing parsers,
165      because 7-bit ascii characters retain their values in UTF-8,
166      so searching for 7-bit strings is simple. However, there are
167      two subtle issues:
168      </p>
169<div class="itemizedlist"><ul type="disc">
170<li><p>We need to assume the character literals use ascii encoding
171          and that inputs use Unicode encoding.</p></li>
172<li><p>A Unicode character (say '=') can be followed by 'composing
173          character' and the combination is not the same as just '=', so a
174          simple search for '=' might find the wrong character.
175          </p></li>
176</ul></div>
177<p>
178      Neither of these issues appear to be critical in practice, since ascii is
179      almost universal encoding and since composing characters following '=' (and
180      other characters with special meaning to the library) are not likely to appear.
181    </p>
182</div>
183</div>
184<table width="100%"><tr>
185<td align="left"></td>
186<td align="right"><small>Copyright © 2002-2004 Vladimir Prus</small></td>
187</tr></table>
188<hr>
189<div class="spirit-nav">
190<a accesskey="p" href="howto.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../program_options.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="s06.html"><img src="../images/next.png" alt="Next"></a>
191</div>
192</body>
193</html>
Note: See TracBrowser for help on using the repository browser.