1 | <HTML> |
---|
2 | <HEAD> |
---|
3 | <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=iso-8859-1"> |
---|
4 | <LINK REL="stylesheet" TYPE="text/css" HREF="../../../../boost.css"> |
---|
5 | <TITLE>Boost Numeric Conversion Library - Definitions</TITLE> |
---|
6 | </HEAD> |
---|
7 | <BODY BGCOLOR="#FFFFFF" TEXT="#000000" LINK="#0000ff" VLINK="#800080"> |
---|
8 | <TABLE BORDER="0" CELLPADDING="7" CELLSPACING="0" WIDTH="100%" |
---|
9 | SUMMARY="header"> |
---|
10 | <TR> |
---|
11 | <TH VALIGN="top" WIDTH="300"> |
---|
12 | <H3><A HREF="../../../../index.htm"><IMG HEIGHT="86" WIDTH="277" |
---|
13 | ALT="C++ Boost" SRC="../../../../boost.png" BORDER="0"></A></H3> </TH> |
---|
14 | <TH VALIGN="top"> |
---|
15 | <H1 ALIGN="center">Boost Numeric Conversion Library</H1> |
---|
16 | <H1 ALIGN="center">Definitions</H1> |
---|
17 | </TH> |
---|
18 | </TR> |
---|
19 | </TABLE> |
---|
20 | <HR> |
---|
21 | <H2>Contents</H2> |
---|
22 | <DL CLASS="page-index"> |
---|
23 | <dt><A HREF="#intro">Introduction</A></dt> |
---|
24 | <dt><A HREF="#typeval">Types and Values</A></dt> |
---|
25 | <dt><A HREF="#stdtypes">C++ Arithmetic Types</A></dt> |
---|
26 | <dt><A HREF="#numtypes">Numeric Types</A></dt> |
---|
27 | <dt><A HREF="#range">Range and Precision</A></dt> |
---|
28 | <dt><A HREF="#roundoff">Exact, Correctly Rounded and Out-Of-Range Representations</A></dt> |
---|
29 | <dt><A HREF="#stdconv">Standard (numeric) Conversions</A></dt> |
---|
30 | <dt><A HREF="#subranged">Subranged Conversion Direction, Subtype and Supertype</A></dt> |
---|
31 | </DL> |
---|
32 | |
---|
33 | |
---|
34 | |
---|
35 | <h2><A NAME="intro">Introduction</A></h2> |
---|
36 | <P>This section provides definitions of terms used in the Numeric Conversion library.</p> |
---|
37 | <p><b>Notation:</b> |
---|
38 | <li><u>underlined text</u> denotes terms defined in the C++ standard.</li> |
---|
39 | <li><b>bold face</b> denotes terms defined here but not in the standard.</li> |
---|
40 | <p></p> |
---|
41 | |
---|
42 | |
---|
43 | |
---|
44 | <hr> |
---|
45 | <h2><A NAME="typeval">Types and Values</A></h2> |
---|
46 | <p>As defined by the <u>C++ Object Model</u> (§1.7) the <u>storage</u> or |
---|
47 | memory on which a C++ program runs is a contiguous sequence of <u>bytes</u> |
---|
48 | where each byte is a contiguous sequence of <u>bits</u>.<br> |
---|
49 | An <u>object</u> is a region of storage (§1.8) and has a type (§3.9).<br> |
---|
50 | A <u>type</u> is a discrete set of values. <br> |
---|
51 | An object of type T has an <u>object representation</u> which is the sequence |
---|
52 | of bytes stored in the object (§3.9/4)<br> |
---|
53 | An object of type T has a <u>value representation</u> which is the set of bits |
---|
54 | that determine the <i>value</i> of an object of that type (§3.9/4). For |
---|
55 | <u>POD</u> types (§3.9/10), this bitset is given by the object representation, |
---|
56 | but not all the bits in the storage need to participate in the value representation |
---|
57 | (except for character types): for example, some bits might be used for padding |
---|
58 | or there may be trap-bits.</p> |
---|
59 | <p>The <b>typed value</b> that is held by an object is |
---|
60 | the value which is determined by its value representation.<br> |
---|
61 | An <b>abstract value</b> (untyped) is |
---|
62 | the conceptual information that is represented in a type |
---|
63 | (i.e. the number π).<br> |
---|
64 | The <b>intrinsic value</b> of an object is |
---|
65 | the binary value of the sequence of unsigned characters which form its object representation.</p> |
---|
66 | <p><i>Abstract values</i> can be <b>represented</b> in a given type.<br> |
---|
67 | To <b>represent</b> an abstract value 'V' in a type 'T' |
---|
68 | is to obtain a typed value 'v' which <i>corresponds</i> to the abstract value 'V'.<br> |
---|
69 | The operation is denoted using the 'rep()' operator, as in: <code>v=rep(V)</code>.<br> |
---|
70 | 'v' is the <b>representation</b> of 'V' in the type 'T'.<br> |
---|
71 | For example, the abstract value π can be represented in the type <code>'double'</code> as the |
---|
72 | 'double value M_PI' and in the type <code>'int'</code> as the 'int value 3'</p> |
---|
73 | <p>Conversely, <i>typed values</i> can be <b>abstracted</b>.<br> |
---|
74 | To <b>abstract</b> a typed value 'v' of type 'T' is to obtain the |
---|
75 | abstract value 'V' whose representation in 'T' is 'v'.<br> |
---|
76 | The operation is denoted using the 'abt()' operator, as in: <code>V=abt(v)</code>.<br> |
---|
77 | 'V' is the <b>abstraction</b> of 'v' of type 'T'.<br> |
---|
78 | Abstraction is just an abstract operation (you can't do it); but it is defined nevertheless |
---|
79 | because it will be used to give the definitions in the rest of this document.</p> |
---|
80 | |
---|
81 | |
---|
82 | |
---|
83 | |
---|
84 | <hr> |
---|
85 | <h2><A NAME="stdtypes">C++ Arithmetic Types</A></h2> |
---|
86 | <P>The C++ language defines <u>fundamental types</u> (§3.9.1). The following |
---|
87 | subsets of the fundamental types are intended to represent <i>numbers</i>:</p> |
---|
88 | <li><u>signed integer types</u> (§3.9.1/2):<br> |
---|
89 | <blockquote> |
---|
90 | <code>{signed char, signed short int, signed int, signed long int}</code><br> |
---|
91 | Can be used to represent general integer numbers (both negative and positive). |
---|
92 | </blockquote> |
---|
93 | </li> |
---|
94 | <li><u>unsigned integer types</u> (§3.9.1/3):<br> |
---|
95 | <blockquote> |
---|
96 | <code>{unsigned char, unsigned short int, unsigned int, unsigned long int}</code><br> |
---|
97 | Can be used to represent positive integer numbers <u>with modulo-arithmetic</u>.<br> |
---|
98 | </blockquote> |
---|
99 | <li><u>floating-point types</u> (§3.9.1/8):<br> |
---|
100 | <blockquote> |
---|
101 | <code>{float,double,long double}</code><br> |
---|
102 | Can be used to represent real numbers. |
---|
103 | </blockquote> |
---|
104 | </li> |
---|
105 | <li><u>integral or integer types</u> (§3.9.1/7):<br> |
---|
106 | <blockquote> |
---|
107 | <code>{{signed integers},{unsigned integers}, bool, char and wchar_t}</code> |
---|
108 | </blockquote> |
---|
109 | </li> |
---|
110 | <li><u>arithmetic types</u> (§3.9.1/8):<br> |
---|
111 | <blockquote> |
---|
112 | <code>{{integer types},{floating types}}</code> |
---|
113 | </blockquote> |
---|
114 | </li> |
---|
115 | <P>The integer types are required to have a <i>binary</i> value representation.<br> |
---|
116 | Additionally, the signed/unsigned integer types of the same base type (short, int or long) |
---|
117 | are required to have the same value representation, that is:</P> |
---|
118 | <pre> int i = -3 ; // suppose value representation is: 10011 (sign bit + 4 magnitude bits) |
---|
119 | unsigned int u = i ; // u is required to have the same 10011 as its value representation. |
---|
120 | </pre> |
---|
121 | <P>In other words, the integer types signed/unsigned X use the same value representation |
---|
122 | but a different <i>interpretation</i> of it; that is, their <i>typed values</i> |
---|
123 | might differ.<br> |
---|
124 | Another consequence of this is that the range for signed X is always a smaller subset |
---|
125 | of the range of unsigned X, as required by §3.9.1/3.</P> |
---|
126 | <P>Note: always remember that unsigned types, unlike signed types, have modulo-arithmetic; |
---|
127 | that is, they do not overflow.<br> |
---|
128 | This means that: |
---|
129 | <li> Always be extra careful when mixing signed/unsigned types</li> |
---|
130 | <li> Use unsigned types only when you need modulo arithmetic or very very large numbers. |
---|
131 | Don't use unsigned types just because you intend to deal with positive values only |
---|
132 | (you can do this with signed types as well).</li>. |
---|
133 | <p></P> |
---|
134 | |
---|
135 | |
---|
136 | |
---|
137 | |
---|
138 | |
---|
139 | <hr> |
---|
140 | <h2><A NAME="numtypes">Numeric Types</A></h2> |
---|
141 | <p>This section introduces the following definitions intended to integrate arithmetic |
---|
142 | types with user-defined types which behave like numbers. Some definitions are |
---|
143 | purposely broad in order to include a vast variety of user-defined number |
---|
144 | types.</p> |
---|
145 | <p>Within this library, the term <i>number</i> refers to an abstract numeric value.</p> |
---|
146 | <p>A type is <b>numeric</b> if:</p> |
---|
147 | <li>It is an arithmetic type, or,</li> |
---|
148 | <li>It is a user-defined type which</li> |
---|
149 | <blockquote> |
---|
150 | <li>Represents numeric abstract values (i.e. numbers).</li> |
---|
151 | |
---|
152 | <li>Can be converted (either implicitly or explicitly) to/from at least one |
---|
153 | arithmetic type.</li> |
---|
154 | <li>Has <a href="#range">range</a> (possibly unbounded) and <a href="#range">precision</a> |
---|
155 | (possibly dynamic or unlimited).</li> |
---|
156 | <li>Provides an specialization of <code>std::numeric_limits</code>.</li> |
---|
157 | </blockquote> |
---|
158 | <p></p> |
---|
159 | <p>A numeric type is <b>signed</b> if the abstract values it represent include negative numbers.<br> |
---|
160 | A numeric type is <b>unsigned</b> if the abstract values it represent exclude negative numbers.<br> |
---|
161 | A numeric type is <b>modulo</b> if it has modulo-arithmetic (does not overflow).<br> |
---|
162 | A numeric type is <b>integer</b> if the abstract values it represent are whole numbers.<br> |
---|
163 | A numeric type is <b>floating</b> if the abstract values it represent are real numbers.<br> |
---|
164 | An <b>arithmetic value</b> is the typed value of an arithmetic type<br> |
---|
165 | A <b>numeric value</b> is the typed value of a numeric type</p> |
---|
166 | <p></p> |
---|
167 | <p>These definitions simply generalize the standard notions of arithmetic types |
---|
168 | and values by introducing a superset called <u>numeric</u>. All arithmetic types |
---|
169 | and values are numeric types and values, but not vice versa, since user-defined |
---|
170 | numeric types are not arithmetic types.</p> |
---|
171 | <p>The following examples clarify the differences between arithmetic and numeric types (and values):</p> |
---|
172 | <pre>// A numeric type which is not an arithmetic type (is user-defined) |
---|
173 | // and which is intended to represent integer numbers (i.e., an 'integer' numeric type) |
---|
174 | class MyInt |
---|
175 | { |
---|
176 | MyInt ( long long v ) ; |
---|
177 | long long to_builtin(); |
---|
178 | } ; |
---|
179 | namespace std { |
---|
180 | template<> numeric_limits<MyInt> { ... } ; |
---|
181 | } |
---|
182 | |
---|
183 | // A 'floating' numeric type (double) which is also an arithmetic type (built-in), |
---|
184 | // with a float numeric value. |
---|
185 | double pi = M_PI ; |
---|
186 | |
---|
187 | // A 'floating' numeric type with a whole numeric value. |
---|
188 | // NOTE: numeric values are typed valued, hence, they are, for instance, |
---|
189 | // integer or floating, despite the value itself being whole or including |
---|
190 | // a fractional part. |
---|
191 | double two = 2.0 ; |
---|
192 | |
---|
193 | // An integer numeric type with an integer numeric value. |
---|
194 | MyInt i(1234); |
---|
195 | </pre> |
---|
196 | |
---|
197 | |
---|
198 | |
---|
199 | |
---|
200 | <hr> |
---|
201 | <h2><A NAME="range">Range and Precision</A></h2> |
---|
202 | <p>Given a number set 'N', some of its elements are representable in a numeric type 'T'.<br> |
---|
203 | The set of representable values of type 'T', or numeric set of 'T', is a set of numeric values |
---|
204 | whose elements are the representation of some <i>subset</i> of 'N'.<br> |
---|
205 | For example, the interval of 'int' values [INT_MIN,INT_MAX] is the set of representable values |
---|
206 | of type 'int', i.e. the 'int' numeric set, and corresponds to the representation of the elements |
---|
207 | of the interval of abstract values [abt(INT_MIN),abt(INT_MAX)] from the integer numbers.<br> |
---|
208 | Similarly, the interval of 'double' values [-DBL_MAX,DBL_MAX] is the 'double' numeric set, |
---|
209 | which corresponds to the subset of the real numbers from abt(-DBL_MAX) to abt(DBL_MAX). |
---|
210 | </p> |
---|
211 | <p>Let <b>next(x)</b> denote the lowest numeric value greater than x.<br> |
---|
212 | Let <b>prev(x)</b> denote the highest numeric value lower then x.</p> |
---|
213 | <p>Let <code><b>v=prev(next(V))</b></code> and <code><b>v=next(prev(V))</b></code> be identities that relate a numeric |
---|
214 | typed value 'v' with a number 'V'.</p> |
---|
215 | <p>An ordered pair of numeric values <i>x,y</i> s.t. <i>x<y</i> are <b>consecutive</b> iff |
---|
216 | <code>next(x)==y</code>.</p> |
---|
217 | <p>The abstract distance between consecutive numeric values is usually referred |
---|
218 | to as a <u>Unit in the Last Place</u>, or <b>ulp</b> for short. A ulp is a quantity whose abstract |
---|
219 | magnitude is <i>relative</i> to the numeric values it corresponds to: If the numeric set is not evenly |
---|
220 | distributed, that is, if the abstract distance between consecutive numeric values varies along the set |
---|
221 | -as is the case with the floating-point types-, the magnitude of 1ulp after the numeric value x |
---|
222 | might be (usually is) different from the magnitude of a 1ulp after the numeric value y for x!=y.</p> |
---|
223 | <p>Since numbers are inherently ordered, a <b>numeric set</b> of type 'T' |
---|
224 | is an ordered sequence of numeric values (of type 'T') of the form: |
---|
225 | </p> |
---|
226 | <p><code>REP(T)={l,next(l),next(next(l)),...,prev(prev(h)),prev(h),h}</code> |
---|
227 | </p> |
---|
228 | <p>where 'l' and 'h' are respectively the lowest and highest values of type 'T', called the |
---|
229 | <b>boundary values</b> of type T.</p> |
---|
230 | <p>A numeric set is discrete. It has a <b>size</b> which is the number |
---|
231 | of numeric values in the set, a <b>width</b> which is the abstract difference between |
---|
232 | the highest and lowest boundary values: [abt(h)-abt(l)], and a <b>density</b> |
---|
233 | which is the relation between its size and width: 'density=size/width'.<br> |
---|
234 | The integer types have density 1, which means that there are no unrepresentable integer numbers |
---|
235 | between abt(l) and abt(h) (i.e. there are no gaps). On the other hand, |
---|
236 | floating types have density much smaller than 1, which means that there are |
---|
237 | real numbers unrepresented between consecutive floating values (i.e. there are gaps). |
---|
238 | </p> |
---|
239 | <p>The interval of <u>abstract values</u> [abt(l),abt(h)] is the <b>range</b> of the type 'T', |
---|
240 | denoted 'R(T)'.<br> |
---|
241 | A range is a set of abstract values and not a set of numeric values. In other |
---|
242 | documents, such as the C++ standard, the word 'range' is <i>sometimes</i> used |
---|
243 | as synonym for 'numeric set', that is, as the ordered sequence of numeric values |
---|
244 | from 'l' to 'h'. In this document, however, a range is an abstract interval |
---|
245 | which subtends the numeric set.<br> |
---|
246 | For example, the sequence [-DBL_MAX,DBL_MAX] is the numeric set of the type 'double', and |
---|
247 | the real interval [abt(-DBL_MAX),abt(DBL_MAX)] is its range.<br> |
---|
248 | Notice, for instance, that the range of a floating-point type is <i>continuous</i> unlike |
---|
249 | its numeric set.<br> |
---|
250 | This definition was chosen because: |
---|
251 | <li>(a) The discrete set of numeric values is already given by the numeric set.</li> |
---|
252 | <li>(b) Abstract intervals are easier to compare and overlap since only boundary values |
---|
253 | need to be considered.</li><br> |
---|
254 | This definition allows for a concise definition of 'subranged' as given in the last section.<br> |
---|
255 | The width of a numeric set, as defined, is exactly equivalent to the width of a range. |
---|
256 | <p></p> |
---|
257 | <p>The <b>precision</b> of a type is given by the width or density of the numeric set.<br> |
---|
258 | For integer types, which have density 1, the precision is conceptually equivalent to the range |
---|
259 | and is determined by the number of bits used in the value representation: The higher the |
---|
260 | number of bits the bigger the size of the numeric set, the wider the range, and the higher |
---|
261 | the precision.<br> |
---|
262 | For floating types, which have density <<1, the precision is given not by the |
---|
263 | width of the range but by the density. In a typical implementation, |
---|
264 | the range is determined by the number of bits used in the exponent, and the precision by |
---|
265 | the number of bits used in the mantissa (giving the maximum number of significant digits |
---|
266 | that can be exactly represented). The higher the number of exponent bits the |
---|
267 | wider the range, while the higher the number of mantissa bits, the higher the precision. |
---|
268 | </p> |
---|
269 | |
---|
270 | |
---|
271 | |
---|
272 | |
---|
273 | |
---|
274 | |
---|
275 | <hr> |
---|
276 | <h2><A NAME="roundoff">Exact, Correctly Rounded and Out-Of-Range Representations</A></h2> |
---|
277 | <p>Given an abstract value 'V' and a type 'T' with its corresponding range [abt(l),abt(h)]:</p> |
---|
278 | <p>If <code>V < abt(l)</code> or <code>V > abt(h)</code>, 'V' is <b>not representable</b> |
---|
279 | (cannot be represented) in the type T, or, equivalently, it's representation in the type 'T' |
---|
280 | is <b>out of range</b>, or <b>overflows</b>.<br> |
---|
281 | If <code>V < abt(l)</code>, the <b>overflow is negative</b>.<br> |
---|
282 | If <code>V > abt(h)</code>, the <b>overflow is positive</b>. |
---|
283 | </p> |
---|
284 | <p>If <code>V ≥ abt(l)</code> and <code>V ≤ abt(h)</code>,'V' is <b>representable</b> |
---|
285 | (can be represented) in the type T, or, equivalently, its representation in the type 'T' |
---|
286 | is in <b>in range</b>, or <b>does not overflow</b>.</p> |
---|
287 | <p>Notice that a numeric type, such as a C++ unsigned type, can define that any 'V' does not |
---|
288 | overflow by always representing not 'V' itself but the abstract value <code>U = [ V % (abt(h)+1) ]</code>, |
---|
289 | which is always in range.</p> |
---|
290 | <p>Given an abstract value 'V' represented in the type 'T' as 'v', the <b>roundoff</b> error |
---|
291 | of the representation is the abstract difference: (abt(v)-V).<br> |
---|
292 | Notice that a representation is an <i>operation</i>, hence, the roundoff error corresponds to |
---|
293 | the representation operation and not to the numeric value itself (i.e. numeric values do not |
---|
294 | have any error themselves)<br> |
---|
295 | If the roundoff is 0, the representation is <b>exact</b>, and 'V' is <b>exactly representable</b> |
---|
296 | in the type T.<br> |
---|
297 | If the roundoff is not 0, the representation is <b>inexact</b>, and 'V' is <b>inexactly representable</b> |
---|
298 | in the type T.</p> |
---|
299 | <p>Given an abstract value 'V' representable in a type 'T', there are always two consecutive |
---|
300 | numeric values of type 'T', 'prev' and 'next', such that <code>abt(prev) ≤ V ≤ abt(next)</code>. |
---|
301 | These are called the <b>adjacents</b> of 'V' in the type 'T'.<br> |
---|
302 | If a representation 'v' in a type 'T' -either exact or inexact-, is any of the adjacents of 'V' |
---|
303 | in that type, that is, if <code>v==prev or v==next</code>, the representation is |
---|
304 | <b>faithfully rounded</b>. If the choice between 'prev' and 'next' |
---|
305 | matches a given <b>rounding direction</b>, it is <b>correctly rounded</b>.<br> |
---|
306 | All exact representations are correctly rounded, but not all inexact representations are. In particular, |
---|
307 | C++ requires numeric conversions (described below) and the result of arithmetic operations |
---|
308 | (not covered by this document) to be correctly rounded, but batch operations propagate roundoff, thus |
---|
309 | final results are usually incorrectly rounded, that is, the numeric value 'r' which is the computed |
---|
310 | result is neither of the adjacents of the abstract value 'R' which is the theoretical result.<br> |
---|
311 | Because a correctly rounded representation is always one of adjacents of the abstract value being |
---|
312 | represented, the roundoff is guaranteed to be at most 1ulp.</p> |
---|
313 | <P>The following examples summarize the given definitions. Consider:</p> |
---|
314 | <li>A numeric type 'Int' representing integer numbers with a <i>numeric set</i>: {-2,-1,0,1,2} |
---|
315 | and <i>range</i>: [-2,2]</li>. |
---|
316 | <li>A numeric type 'Cardinal' representing integer numbers with a <i>numeric set</i>: |
---|
317 | {0,1,2,3,4,5,6,7,8,9} and <i>range</i>: [0,9] (no modulo-arithmetic here)</li>. |
---|
318 | <li>A numeric type 'Real' representing real numbers with a <i>numeric set</i>: |
---|
319 | {-2.0,-1.5,-1.0,-0.5,-0.0,+0.0,+0.5,+1.0,+1.5,+2.0} and <i>range</i>: [-2.0,+2.0]</li> |
---|
320 | <li>A numeric type 'Whole' representing real numbers with a <i>numeric set</i>: |
---|
321 | {-2.0,-1.0,0.0,+1.0,+2.0} and <i>range</i>: [-2.0,+2.0]</li> |
---|
322 | <p>First, notice that the types 'Real' and 'Whole' both represent real numbers, have the |
---|
323 | same range, but different precision.</p> |
---|
324 | <p>The integer number 1 (an abstract value) can be exactly represented in any of these types.<br> |
---|
325 | The integer number -1 can be exactly represented in 'Int', 'Real' and 'Whole', but cannot |
---|
326 | be represented in 'Cardinal', yielding negative overflow.<br> |
---|
327 | The real number 1.5 can be exactly represented in 'Real', and inexactly represented in the |
---|
328 | other types.<br> |
---|
329 | If 1.5 is represented as either 1 or 2 in any of the types (except Real), the |
---|
330 | representation is correctly rounded.<br> |
---|
331 | If 0.5 is represented as +1.5 in the type 'Real', it is incorrectly rounded.<br> |
---|
332 | (-2.0,-1.5) are the 'Real' adjacents of any real number in the interval [-2.0,-1.5], |
---|
333 | yet there are no 'Real' adjacents for x < -2.0, nor for x > +2.0. |
---|
334 | </p> |
---|
335 | |
---|
336 | |
---|
337 | |
---|
338 | |
---|
339 | <hr> |
---|
340 | <h2><A NAME="stdconv">Standard (numeric) Conversions</A></h2> |
---|
341 | <P>The C++ language defines <u>Standard Conversions</u> (§4) some of which are |
---|
342 | conversions between arithmetic types.<br> |
---|
343 | These are <u>Integral promotions</u> (§4.5), <u>Integral conversions</u> (§4.7), |
---|
344 | <u>Floating point promotions</u> (§4.6), <u>Floating point conversions</u> (§4.8) |
---|
345 | and <u>Floating-integral conversions</u> (§4.9).<br> |
---|
346 | In the sequel, integral and floating point promotions are called <b>arithmetic promotions</b>, |
---|
347 | and these plus integral, floating-point and floating-integral conversions are called |
---|
348 | <b>arithmetic conversions</b> (i.e, promotions are conversions). |
---|
349 | </P> |
---|
350 | <P>Promotions, both Integral and Floating point, are <i>value-preserving</i>, which means |
---|
351 | that the typed value is not changed with the conversion.</p> |
---|
352 | <p>In the sequel, consider a source typed value 's' of type 'S', the source abstract value 'N=abt(s)', |
---|
353 | a destination type 'T'; and whenever possible, a result typed value 't' of type 'T'.</p> |
---|
354 | <p>Integer to integer conversions are always defined:<br> |
---|
355 | If 'T' is unsigned, the abstract value which is effectively represented is not 'N' but |
---|
356 | 'M=[ N % ( abt(h) + 1 ) ]', where 'h' is the highest unsigned typed value of type 'T'.<br> |
---|
357 | If 'T' is signed and 'N' is not directly representable, the result 't' is |
---|
358 | <u>implementation-defined</u>, which means that the C++ implementation is required to produce |
---|
359 | a value 't' even if it is totally unrelated to 's'.</p> |
---|
360 | <p>Floating to Floating conversions are defined only if 'N' is representable; |
---|
361 | if it is not, the conversion has <u>undefined behavior.</u><br> |
---|
362 | If 'N' is exactly representable, 't' is required to be the exact representation.<br> |
---|
363 | If 'N' is inexactly representable, 't' is required to be one of the two adjacents, with |
---|
364 | an implementation-defined choice of rounding direction; that is, the conversion is required |
---|
365 | to be correctly rounded.</p> |
---|
366 | <p>Floating to Integer conversions represent not 'N' but 'M=trunc(N)', were trunc() is to truncate: i.e. |
---|
367 | to remove the fractional part, if any.<br> |
---|
368 | If 'M' is not representable in 'T', the conversion has <u>undefined behavior</u> |
---|
369 | (unless 'T' is bool, see §4.12).</p> |
---|
370 | <p>Integer to Floating conversions are always defined.<br> |
---|
371 | If 'N' is exactly representable, 't' is required to be the exact representation.<br> |
---|
372 | If 'N' is inexactly representable, 't' is required to be one of the two adjacents, with |
---|
373 | an implementation-defined choice of rounding direction; that is, the conversion is required |
---|
374 | to be correctly rounded.</p> |
---|
375 | |
---|
376 | |
---|
377 | |
---|
378 | |
---|
379 | |
---|
380 | |
---|
381 | <hr> |
---|
382 | <h2><A NAME="subranged">Subranged Conversion Direction, Subtype and Supertype</A></h2> |
---|
383 | <P>Given a source type 'S' and a destination type 'T', there is a <b>conversion direction</b> |
---|
384 | denoted: <code>'S->T'</code>.<br> |
---|
385 | For any two ranges the following <i>range relation</i> can be defined: A range |
---|
386 | 'X' can be <i>entirely contained</i> in a range 'Y', in which case it is said that |
---|
387 | 'X' is enclosed by 'Y'.<br> |
---|
388 | Formally: R(S) is <b>enclosed</b> by R(T) iif (R(S) intersection R(T)) == R(S).</P> |
---|
389 | <P>If the source type range, R(S), is <i>not enclosed</i> in the target type range, R(T); |
---|
390 | that is, if (R(S) & R(T)) != R(S), the conversion direction is said to be <b>subranged</b>, |
---|
391 | which means that R(S) is not entirely contained in R(T) and therefore there is |
---|
392 | some portion of the source range which falls outside the target range. In other words, |
---|
393 | if a conversion direction S->T is subranged, there are values in S which cannot be represented |
---|
394 | in T because they are out of range.<br> |
---|
395 | Notice that for S->T, the adjective subranged applies to 'T'.</p> |
---|
396 | <p>Examples:<br> |
---|
397 | Given the following numeric types all representing real numbers:<br> |
---|
398 | <br> |
---|
399 | X with numeric set {-2.0,-1.0,0.0,+1.0,+2.0} and range [-2.0,+2.0]<br> |
---|
400 | Y with numeric set {-2.0,-1.5,-1.0,-0.5,0.0,+0.5,+1.0,+1.5,+2.0} and range [-2.0,+2.0]<br> |
---|
401 | Z with numeric set {-1.0,0.0,+1.0} and range [-1.0,+1.0]<br> |
---|
402 | <br> |
---|
403 | For:<br> |
---|
404 | <br> |
---|
405 | (a) X->Y: |
---|
406 | <blockquote> |
---|
407 | R(X) & R(Y) == R(X), then X->Y is not subranged. |
---|
408 | Thus, all values of type X are representable in the type Y. |
---|
409 | </blockquote> |
---|
410 | (b) Y->X: |
---|
411 | <blockquote> |
---|
412 | R(Y) & R(X) == R(Y), then Y->X is not subranged. |
---|
413 | Thus, all values of type Y are representable in the type X, but in this case, some values |
---|
414 | are <i>inexactly</i> representable (all the halves).<br> |
---|
415 | (note: it is to permit this case that a range is an interval of abstract values |
---|
416 | and not an interval of typed values) |
---|
417 | </blockquote> |
---|
418 | (b) X->Z: |
---|
419 | <blockquote> |
---|
420 | R(X) & R(Z) != R(X), then X->Z is subranged. |
---|
421 | Thus, some values of type X are not representable in the type Z, they fall out of range |
---|
422 | (-2.0 and +2.0) |
---|
423 | </blockquote> |
---|
424 | <p></p> |
---|
425 | <p>It is possible that R(S) is not enclosed by R(T), while neither is R(T) enclosed |
---|
426 | by R(S); for example, UNSIG=[0,255] is not enclosed by SIG=[-128,127]; neither is SIG |
---|
427 | enclosed by UNSIG.<br> |
---|
428 | This implies that is possible that a conversion direction is subranged both |
---|
429 | ways. This occurs when a mixture of signed/unsigned types are involved and indicates |
---|
430 | that in both directions there are values which can fall out of range.</P> |
---|
431 | <P>Given the range relation (subranged or not) of a conversion direction S->T, |
---|
432 | it is possible to classify 'S' and 'T' as <b>supertype</b> and <b>subtype</b>:<br> |
---|
433 | If the conversion is subranged, which means that 'T' cannot represent all possible values of type 'S', |
---|
434 | 'S' is the supertype and 'T' the subtype; otherwise, 'T' is the supertype and 'S' the subtype.<br> |
---|
435 | <br> |
---|
436 | For example:<br> |
---|
437 | R(float)=[-FLT_MAX,FLT_MAX] and R(double)=[-DBL_MAX,DBL_MAX].<br> |
---|
438 | If FLT_MAX < DBL_MAX:<br> |
---|
439 | 'double->float' is subranged and supertype=double, subtype=float.<br> |
---|
440 | 'float->double' is not subranged and supertype=double, subtype=float.<br> |
---|
441 | Notice that while 'double->float' is subranged, 'float->double' is not, |
---|
442 | which yields the same supertype,subtype for both directions.<br> |
---|
443 | <br> |
---|
444 | Now consider:<br> |
---|
445 | R(int)=[INT_MIN,INT_MAX] and R(unsigned int)=[0,UINT_MAX].<br> |
---|
446 | A C++ implementation is required to have UINT_MAX > INT_MAX (§3.9/3), so:<br> |
---|
447 | 'int->unsigned' is subranged (negative values fall out of range) and supertype=int, subtype=unsigned.<br> |
---|
448 | 'unsigned->int' is <em>also</em> subranged (high positive values fall out of range) |
---|
449 | and supertype=unsigned, subtype=int.<br> |
---|
450 | In this case, the conversion is subranged in both directions and the supertype,subtype pairs |
---|
451 | are not invariant (under inversion of direction). This indicates that none of the types can |
---|
452 | represent all the values of the other.</p> |
---|
453 | <p>When the supertype is the same for both 'S->T' and 'T->S', it is effectively indicating |
---|
454 | a type which can represent all the values of the subtype.<br> |
---|
455 | Consequently, if a conversion X->Y is not subranged, but the opposite (Y->X) |
---|
456 | is, so that the supertype is always 'Y', it is said that the direction X->Y |
---|
457 | is <b>correctly rounded value preserving</b>, meaning that all such conversions |
---|
458 | are guaranteed to produce results in range and correctly rounded (even if inexact).<br> |
---|
459 | For example, all integer to floating conversions are correctly rounded value preserving. |
---|
460 | </p> |
---|
461 | <HR> |
---|
462 | <P>Back to <A HREF="index.html">Numeric Conversion library index</A></P> |
---|
463 | <HR> |
---|
464 | <P>Revised 23 June 2004</P> |
---|
465 | <p>© Copyright Fernando Luis Cacciola Carballal, 2004</p> |
---|
466 | <p> Use, modification, and distribution are subject to the Boost Software |
---|
467 | License, Version 1.0. (See accompanying file <a href="../../../../LICENSE_1_0.txt"> |
---|
468 | LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt"> |
---|
469 | www.boost.org/LICENSE_1_0.txt</a>)</p> |
---|
470 | </body> |
---|
471 | </HTML> |
---|