1 | <html> |
---|
2 | <head> |
---|
3 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> |
---|
4 | <title>Design Discussion</title> |
---|
5 | <link rel="stylesheet" href="../boostbook.css" type="text/css"> |
---|
6 | <meta name="generator" content="DocBook XSL Stylesheets V1.69.1"> |
---|
7 | <link rel="start" href="../index.html" title="The Boost C++ Libraries"> |
---|
8 | <link rel="up" href="../program_options.html" title="Chapter 7. Boost.Program_options"> |
---|
9 | <link rel="prev" href="howto.html" title="How To"> |
---|
10 | <link rel="next" href="s06.html" title="Acknowledgements"> |
---|
11 | </head> |
---|
12 | <body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF"> |
---|
13 | <table cellpadding="2" width="100%"> |
---|
14 | <td valign="top"><img alt="boost.png (6897 bytes)" width="277" height="86" src="../../../boost.png"></td> |
---|
15 | <td align="center"><a href="../../../index.htm">Home</a></td> |
---|
16 | <td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td> |
---|
17 | <td align="center"><a href="../../../people/people.htm">People</a></td> |
---|
18 | <td align="center"><a href="../../../more/faq.htm">FAQ</a></td> |
---|
19 | <td align="center"><a href="../../../more/index.htm">More</a></td> |
---|
20 | </table> |
---|
21 | <hr> |
---|
22 | <div class="spirit-nav"> |
---|
23 | <a accesskey="p" href="howto.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../program_options.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="s06.html"><img src="../images/next.png" alt="Next"></a> |
---|
24 | </div> |
---|
25 | <div class="section" lang="en"> |
---|
26 | <div class="titlepage"><div><div><h3 class="title"> |
---|
27 | <a name="program_options.design"></a>Design Discussion</h3></div></div></div> |
---|
28 | <div class="toc"><dl><dt><span class="section"><a href="design.html#program_options.design.unicode">Unicode Support</a></span></dt></dl></div> |
---|
29 | <p>This section focuses on some of the design questions. |
---|
30 | </p> |
---|
31 | <div class="section" lang="en"> |
---|
32 | <div class="titlepage"><div><div><h4 class="title"> |
---|
33 | <a name="program_options.design.unicode"></a>Unicode Support</h4></div></div></div> |
---|
34 | <p>Unicode support was one of the features specifically requested |
---|
35 | during the formal review. Throughout this document "Unicode support" is |
---|
36 | a synonym for "wchar_t" support, assuming that "wchar_t" always uses |
---|
37 | Unicode encoding. Also, when talking about "ascii" (in lowercase) we'll |
---|
38 | not mean strict 7-bit ASCII encoding, but rather "char" strings in local |
---|
39 | 8-bit encoding. |
---|
40 | </p> |
---|
41 | <p> |
---|
42 | Generally, "Unicode support" can mean |
---|
43 | many things, but for the program_options library it means that: |
---|
44 | |
---|
45 | </p> |
---|
46 | <div class="itemizedlist"><ul type="disc"> |
---|
47 | <li><p>Each parser should accept either <code class="computeroutput">char*</code> |
---|
48 | or <code class="computeroutput">wchar_t*</code>, correctly split the input into option |
---|
49 | names and option values and return the data. |
---|
50 | </p></li> |
---|
51 | <li><p>For each option, it should be possible to specify whether the conversion |
---|
52 | from string to value uses ascii or Unicode. |
---|
53 | </p></li> |
---|
54 | <li> |
---|
55 | <p>The library guarantees that: |
---|
56 | </p> |
---|
57 | <div class="itemizedlist"><ul type="circle"> |
---|
58 | <li><p>ascii input is passed to an ascii value without change |
---|
59 | </p></li> |
---|
60 | <li><p>Unicode input is passed to a Unicode value without change</p></li> |
---|
61 | <li><p>ascii input passed to a Unicode value, and Unicode input |
---|
62 | passed to an ascii value will be converted using a codecvt |
---|
63 | facet (which may be specified by the user(which can be |
---|
64 | specified by the user) |
---|
65 | </p></li> |
---|
66 | </ul></div> |
---|
67 | </li> |
---|
68 | </ul></div> |
---|
69 | <p>The important point is that it's possible to have some "ascii |
---|
70 | options" together with "Unicode options". There are two reasons for |
---|
71 | this. First, for a given type you might not have the code to extract the |
---|
72 | value from Unicode string and it's not good to require that such code be written. |
---|
73 | Second, imagine a reusable library which has some options and exposes |
---|
74 | options description in its interface. If <span class="emphasis"><em>all</em></span> |
---|
75 | options are either ascii or Unicode, and the library does not use any |
---|
76 | Unicode strings, then the author will likely to use ascii options, which |
---|
77 | would make the library unusable inside Unicode |
---|
78 | applications. Essentially, it would be necessary to provide two versions |
---|
79 | of the library -- ascii and Unicode. |
---|
80 | </p> |
---|
81 | <p>Another important point is that ascii strings are passed though |
---|
82 | without modification. In other words, it's not possible to just convert |
---|
83 | ascii to Unicode and process the Unicode further. The problem is that the |
---|
84 | default conversion mechanism -- the <code class="computeroutput">codecvt</code> facet -- might |
---|
85 | not work with 8-bit input without additional setup. |
---|
86 | </p> |
---|
87 | <p>The Unicode support outlined above is not complete. For example, we |
---|
88 | don't plan allow Unicode in option names. Unicode support is hard and |
---|
89 | requires a Boost-wide solution. Even comparing two arbitrary Unicode |
---|
90 | strings is non-trivial. Finally, using Unicode in option names is |
---|
91 | related to internationalization, which has it's own |
---|
92 | complexities. E.g. if option names depend on current locale, then all |
---|
93 | program parts and other parts which use the name must be |
---|
94 | internationalized too. |
---|
95 | </p> |
---|
96 | <p>The primary question in implementing the Unicode support is whether |
---|
97 | to use templates and <code class="computeroutput">std::basic_string</code> or to use some |
---|
98 | internal encoding and convert between internal and external encodings on |
---|
99 | the interface boundaries. |
---|
100 | </p> |
---|
101 | <p>The choice, mostly, is between code size and execution |
---|
102 | speed. A templated solution would either link library code into every |
---|
103 | application that uses the library (thereby making shared library |
---|
104 | impossible), or provide explicit instantiations in the shared library |
---|
105 | (increasing its size). The solution based on internal encoding would |
---|
106 | necessarily make conversions in a number of places and will be somewhat slower. |
---|
107 | Since speed is generally not an issue for this library, the second |
---|
108 | solution looks more attractive, but we'll take a closer look at |
---|
109 | individual components. |
---|
110 | </p> |
---|
111 | <p>For the parsers component, we have three choices: |
---|
112 | </p> |
---|
113 | <div class="itemizedlist"><ul type="disc"> |
---|
114 | <li><p>Use a fully templated implementation: given a string of a |
---|
115 | certain type, a parser will return a <code class="computeroutput">parsed_options</code> instance |
---|
116 | with strings of the same type (i.e. the <code class="computeroutput">parsed_options</code> class |
---|
117 | will be templated).</p></li> |
---|
118 | <li><p>Use internal encoding: same as above, but strings will be converted to and |
---|
119 | from the internal encoding.</p></li> |
---|
120 | <li><p>Use and partly expose the internal encoding: same as above, |
---|
121 | but the strings in the <code class="computeroutput">parsed_options</code> instance will be in the |
---|
122 | internal encoding. This might avoid a conversion if |
---|
123 | <code class="computeroutput">parsed_options</code> instance is passed directly to other components, |
---|
124 | but can be also dangerous or confusing for a user. |
---|
125 | </p></li> |
---|
126 | </ul></div> |
---|
127 | <p>The second solution appears to be the best -- it does not increase |
---|
128 | the code size much and is cleaner than the third. To avoid extra |
---|
129 | conversions, the Unicode version of <code class="computeroutput">parsed_options</code> can also store |
---|
130 | strings in internal encoding. |
---|
131 | </p> |
---|
132 | <p>For the options descriptions component, we don't have much |
---|
133 | choice. Since it's not desirable to have either all options use ascii or all |
---|
134 | of them use Unicode, but rather have some ascii and some Unicode options, the |
---|
135 | interface of the <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> must work with both. The only way is |
---|
136 | to pass an additional flag telling if strings use ascii or internal encoding. |
---|
137 | The instance of <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> can then convert into some |
---|
138 | other encoding if needed. |
---|
139 | </p> |
---|
140 | <p>For the storage component, the only affected function is <code class="computeroutput"><a href="../id2349650.html" title="Function store">store</a></code>. |
---|
141 | For Unicode input, the <code class="computeroutput"><a href="../id2349650.html" title="Function store">store</a></code> function should convert the value to the |
---|
142 | internal encoding. It should also inform the <code class="computeroutput"><a href="../value_semantic.html" title="Class value_semantic">value_semantic</a></code> class |
---|
143 | about the used encoding. |
---|
144 | </p> |
---|
145 | <p>Finally, what internal encoding should we use? The |
---|
146 | alternatives are: |
---|
147 | <code class="computeroutput">std::wstring</code> (using UCS-4 encoding) and |
---|
148 | <code class="computeroutput">std::string</code> (using UTF-8 encoding). The difference between |
---|
149 | alternatives is: |
---|
150 | </p> |
---|
151 | <div class="itemizedlist"><ul type="disc"> |
---|
152 | <li><p>Speed: UTF-8 is a bit slower</p></li> |
---|
153 | <li><p>Space: UTF-8 takes less space when input is ascii</p></li> |
---|
154 | <li><p>Code size: UTF-8 requires additional conversion code. However, |
---|
155 | it allows one to use existing parsers without converting them to |
---|
156 | <code class="computeroutput">std::wstring</code> and such conversion is likely to create a |
---|
157 | number of new instantiations. |
---|
158 | </p></li> |
---|
159 | </ul></div> |
---|
160 | <p> |
---|
161 | There's no clear leader, but the last point seems important, so UTF-8 |
---|
162 | will be used. |
---|
163 | </p> |
---|
164 | <p>Choosing the UTF-8 encoding allows the use of existing parsers, |
---|
165 | because 7-bit ascii characters retain their values in UTF-8, |
---|
166 | so searching for 7-bit strings is simple. However, there are |
---|
167 | two subtle issues: |
---|
168 | </p> |
---|
169 | <div class="itemizedlist"><ul type="disc"> |
---|
170 | <li><p>We need to assume the character literals use ascii encoding |
---|
171 | and that inputs use Unicode encoding.</p></li> |
---|
172 | <li><p>A Unicode character (say '=') can be followed by 'composing |
---|
173 | character' and the combination is not the same as just '=', so a |
---|
174 | simple search for '=' might find the wrong character. |
---|
175 | </p></li> |
---|
176 | </ul></div> |
---|
177 | <p> |
---|
178 | Neither of these issues appear to be critical in practice, since ascii is |
---|
179 | almost universal encoding and since composing characters following '=' (and |
---|
180 | other characters with special meaning to the library) are not likely to appear. |
---|
181 | </p> |
---|
182 | </div> |
---|
183 | </div> |
---|
184 | <table width="100%"><tr> |
---|
185 | <td align="left"></td> |
---|
186 | <td align="right"><small>Copyright © 2002-2004 Vladimir Prus</small></td> |
---|
187 | </tr></table> |
---|
188 | <hr> |
---|
189 | <div class="spirit-nav"> |
---|
190 | <a accesskey="p" href="howto.html"><img src="../images/prev.png" alt="Prev"></a><a accesskey="u" href="../program_options.html"><img src="../images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../images/home.png" alt="Home"></a><a accesskey="n" href="s06.html"><img src="../images/next.png" alt="Next"></a> |
---|
191 | </div> |
---|
192 | </body> |
---|
193 | </html> |
---|