Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_34_1/libs/numeric/interval/doc/rounding.htm @ 47

Last change on this file since 47 was 29, checked in by landauf, 17 years ago
updated boost from 1_33_1 to 1_34_1
File size: 28.2 KB

Line
1	<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
2	"http://www.w3.org/TR/html4/loose.dtd">
3
4	<html>
5	<head>
6	<meta http-equiv="Content-Language" content="en-us">
7	<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
8	<link rel="stylesheet" type="text/css" href="../../../../boost.css">
9
10	<title>Rounding Policies</title>
11	</head>
12
13	<body lang="en">
14	<h1>Rounding Policies</h1>
15
16	<p>In order to be as general as possible, the library uses a class to
17	compute all the necessary functions rounded upward or downward. This class
18	is the first parameter of <code>policies</code>, it is also the type named
19	<code>rounding</code> in the policy definition of
20	<code>interval</code>.</p>
21
22	<p>By default, it is <code>interval_lib::rounded_math<T></code>. The
23	class <code>interval_lib::rounded_math</code> is already specialized for
24	the standard floating types (<code>float</code> , <code>double</code> and
25	<code>long double</code>). So if the base type of your intervals is not one
26	of these, a good solution would probably be to provide a specialization of
27	this class. But if the default specialization of
28	<code>rounded_math<T></code> for <code>float</code>,
29	<code>double</code>, or <code>long double</code> is not what you seek, or
30	you do not want to specialize
31	<code>interval_lib::rounded_math<T></code> (say because you prefer to
32	work in your own namespace) you can also define your own rounding policy
33	and pass it directly to <code>interval_lib::policies</code>.</p>
34
35	<h2>Requirements</h2>
36
37	<p>Here comes what the class is supposed to provide. The domains are
38	written next to their respective functions (as you can see, the functions
39	do not have to worry about invalid values, but they have to handle infinite
40	arguments).</p>
41	<pre>
42	/* Rounding requirements */
43	struct rounding {
44	// defaut constructor, destructor
45	rounding();
46	~rounding();
47	// mathematical operations
48	T add_down(T, T); // [-∞;+∞][-∞;+∞]
49	T add_up (T, T); // [-∞;+∞][-∞;+∞]
50	T sub_down(T, T); // [-∞;+∞][-∞;+∞]
51	T sub_up (T, T); // [-∞;+∞][-∞;+∞]
52	T mul_down(T, T); // [-∞;+∞][-∞;+∞]
53	T mul_up (T, T); // [-∞;+∞][-∞;+∞]
54	T div_down(T, T); // [-∞;+∞]([-∞;+∞]-{0})
55	T div_up (T, T); // [-∞;+∞]([-∞;+∞]-{0})
56	T sqrt_down(T); // ]0;+∞]
57	T sqrt_up (T); // ]0;+∞]
58	T exp_down(T); // [-∞;+∞]
59	T exp_up (T); // [-∞;+∞]
60	T log_down(T); // ]0;+∞]
61	T log_up (T); // ]0;+∞]
62	T cos_down(T); // [0;2π]
63	T cos_up (T); // [0;2π]
64	T tan_down(T); // ]-π/2;π/2[
65	T tan_up (T); // ]-π/2;π/2[
66	T asin_down(T); // [-1;1]
67	T asin_up (T); // [-1;1]
68	T acos_down(T); // [-1;1]
69	T acos_up (T); // [-1;1]
70	T atan_down(T); // [-∞;+∞]
71	T atan_up (T); // [-∞;+∞]
72	T sinh_down(T); // [-∞;+∞]
73	T sinh_up (T); // [-∞;+∞]
74	T cosh_down(T); // [-∞;+∞]
75	T cosh_up (T); // [-∞;+∞]
76	T tanh_down(T); // [-∞;+∞]
77	T tanh_up (T); // [-∞;+∞]
78	T asinh_down(T); // [-∞;+∞]
79	T asinh_up (T); // [-∞;+∞]
80	T acosh_down(T); // [1;+∞]
81	T acosh_up (T); // [1;+∞]
82	T atanh_down(T); // [-1;1]
83	T atanh_up (T); // [-1;1]
84	T median(T, T); // [-∞;+∞][-∞;+∞]
85	T int_down(T); // [-∞;+∞]
86	T int_up (T); // [-∞;+∞]
87	// conversion functions
88	T conv_down(U);
89	T conv_up (U);
90	// unprotected rounding class
91	typedef ... unprotected_rounding;
92	};
93	</pre>
94
95	<p>The constructor and destructor of the rounding class have a very
96	important semantic requirement: they are responsible for setting and
97	resetting the rounding modes of the computation on T. For instance, if T is
98	a standard floating point type and floating point computation is performed
99	according to the Standard IEEE 754, the constructor can save the current
100	rounding state, each <code>_up</code> (resp. <code>_down</code>) function
101	will round up (resp. down), and the destructor will restore the saved
102	rounding state. Indeed this is the behavior of the default rounding
103	policy.</p>
104
105	<p>The meaning of all the mathematical functions up until
106	<code>atanh_up</code> is clear: each function returns number representable
107	in the type <code>T</code> which is a lower bound (for <code>_down</code>)
108	or upper bound (for <code>_up</code>) on the true mathematical result of
109	the corresponding function. The function <code>median</code> computes the
110	average of its two arguments rounded to its nearest representable number.
111	The functions <code>int_down</code> and <code>int_up</code> compute the
112	nearest integer smaller or bigger than their argument. Finally,
113	<code>conv_down</code> and <code>conv_up</code> are responsible of the
114	conversions of values of other types to the base number type: the first one
115	must round down the value and the second one must round it up.</p>
116
117	<p>The type <code>unprotected_rounding</code> allows to remove all
118	controls. For reasons why one might to do this, see the <a href=
119	"#Protection">protection</a> paragraph below.</p>
120
121	<h2>Overview of the provided classes</h2>
122
123	<p>A lot of classes are provided. The classes are organized by level. At
124	the bottom is the class <code>rounding_control</code>. At the next level
125	come <code>rounded_arith_exact</code>, <code>rounded_arith_std</code> and
126	<code>rounded_arith_opp</code>. Then there are
127	<code>rounded_transc_dummy</code>, <code>rounded_transc_exact</code>,
128	<code>rounded_transc_std</code> and <code>rounded_transc_opp</code>. And
129	finally are <code>save_state</code> and <code>save_state_nothing</code>.
130	Each of these classes provide a set of members that are required by the
131	classes of the next level. For example, a <code>rounded_transc_...</code>
132	class needs the members of a <code>rounded_arith_...</code> class.</p>
133
134	<p>When they exist in two versions <code>_std</code> and <code>_opp</code>,
135	the first one does switch the rounding mode each time, and the second one
136	tries to keep it oriented toward plus infinity. The main purpose of the
137	<code>_opp</code> version is to speed up the computations through the use
138	of the "opposite trick" (see the <a href="#perf">performance notes</a>).
139	This version requires the rounding mode to be upward before entering any
140	computation functions of the class. It guarantees that the rounding mode
141	will still be upward at the exit of the functions.</p>
142
143	<p>Please note that it is really a very bad idea to mix the
144	<code>_opp</code> version with the <code>_std</code> since they do not have
145	compatible properties.</p>
146
147	<p>There is a third version named <code>_exact</code> which computes the
148	functions without changing the rounding mode. It is an "exact" version
149	because it is intended for a base type that produces exact results.</p>
150
151	<p>The last version is the <code>_dummy</code> version. It does not do any
152	computations but still produces compatible results.</p>
153
154	<p>Please note that it is possible to use the "exact" version for an
155	inexact base type, e.g. <code>float</code> or <code>double</code>. In that
156	case, the inclusion property is no longer guaranteed, but this can be
157	useful to speed up the computation when the inclusion property is not
158	desired strictly. For instance, in computer graphics, a small error due to
159	floating-point roundoff is acceptable as long as an approximate version of
160	the inclusion property holds.</p>
161
162	<p>Here comes what each class defines. Later, when they will be described
163	more thoroughly, these members will not be repeated. Please come back here
164	in order to see them. Inheritance is also used to avoid repetitions.</p>
165	<pre>
166	template <class T>
167	struct rounding_control
168	{
169	typedef ... rounding_mode;
170	void set_rounding_mode(rounding_mode);
171	void get_rounding_mode(rounding_mode&);
172	void downward ();
173	void upward ();
174	void to_nearest();
175	T to_int(T);
176	T force_rounding(T);
177	};
178
179	template <class T, class Rounding>
180	struct rounded_arith_... : Rounding
181	{
182	void init();
183	T add_down(T, T);
184	T add_up (T, T);
185	T sub_down(T, T);
186	T sub_up (T, T);
187	T mul_down(T, T);
188	T mul_up (T, T);
189	T div_down(T, T);
190	T div_up (T, T);
191	T sqrt_down(T);
192	T sqrt_up (T);
193	T median(T, T);
194	T int_down(T);
195	T int_up (T);
196	};
197
198	template <class T, class Rounding>
199	struct rounded_transc_... : Rounding
200	{
201	T exp_down(T);
202	T exp_up (T);
203	T log_down(T);
204	T log_up (T);
205	T cos_down(T);
206	T cos_up (T);
207	T tan_down(T);
208	T tan_up (T);
209	T asin_down(T);
210	T asin_up (T);
211	T acos_down(T);
212	T acos_up (T);
213	T atan_down(T);
214	T atan_up (T);
215	T sinh_down(T);
216	T sinh_up (T);
217	T cosh_down(T);
218	T cosh_up (T);
219	T tanh_down(T);
220	T tanh_up (T);
221	T asinh_down(T);
222	T asinh_up (T);
223	T acosh_down(T);
224	T acosh_up (T);
225	T atanh_down(T);
226	T atanh_up (T);
227	};
228
229	template <class Rounding>
230	struct save_state_... : Rounding
231	{
232	save_state_...();
233	~save_state_...();
234	typedef ... unprotected_rounding;
235	};
236	</pre>
237
238	<h2>Synopsis.</h2>
239	<pre>
240	namespace boost {
241	namespace numeric {
242	namespace interval_lib {
243
244	<span style="color: #FF0000">/* basic rounding control */</span>
245	template <class T> struct rounding_control;
246
247	<span style="color: #FF0000">/* arithmetic functions rounding */</span>
248	template <class T, class Rounding = rounding_control<T> > struct rounded_arith_exact;
249	template <class T, class Rounding = rounding_control<T> > struct rounded_arith_std;
250	template <class T, class Rounding = rounding_control<T> > struct rounded_arith_opp;
251
252	<span style="color: #FF0000">/* transcendental functions rounding */</span>
253	template <class T, class Rounding> struct rounded_transc_dummy;
254	template <class T, class Rounding = rounded_arith_exact<T> > struct rounded_transc_exact;
255	template <class T, class Rounding = rounded_arith_std<T> > struct rounded_transc_std;
256	template <class T, class Rounding = rounded_arith_opp<T> > struct rounded_transc_opp;
257
258	<span style="color: #FF0000">/* rounding-state-saving classes */</span>
259	template <class Rounding> struct save_state;
260	template <class Rounding> struct save_state_nothing;
261
262	<span style="color: #FF0000">/* default policy for type T */</span>
263	template <class T> struct rounded_math;
264	template <> struct rounded_math<float>;
265	template <> struct rounded_math<double>;
266
267	<span style=
268	"color: #FF0000">/* some metaprogramming to convert a protected to unprotected rounding */</span>
269	template <class I> struct unprotect;
270
271	} // namespace interval_lib
272	} // namespace numeric
273	} // namespace boost
274	</pre>
275
276	<h2>Description of the provided classes</h2>
277
278	<p>We now describe each class in the order they appear in the definition of
279	a rounding policy (this outermost-to-innermost order is the reverse order
280	from the synopsis).</p>
281
282	<h3 id="Protection">Protection control</h3>
283
284	<p>Protection refers to the fact that the interval operations will be
285	surrounded by rounding mode controls. Unprotecting a class means to remove
286	all the rounding controls. Each rounding policy provides a type
287	<code>unprotected_rounding</code>. The required type
288	<code>unprotected_rounding</code> gives another rounding class that enables
289	to work when nested inside rounding. For example, the first three lines
290	below should all produce the same result (because the first operation is
291	the rounding constructor, and the last is its destructor, which take care
292	of setting the rounding modes); and the last line is allowed to have an
293	undefined behavior (since no rounding constructor or destructor is ever
294	called).</p>
295	<pre>
296	T c; { rounding rnd; c = rnd.add_down(a, b); }
297	T c; { rounding rnd1; { rounding rnd2; c = rnd2.add_down(a, b); } }
298	T c; { rounding rnd1; { rounding::unprotected_rounding rnd2; c = rnd2.add_down(a, b); } }
299	T d; { rounding::unprotected_rounding rnd; d = rnd.add_down(a, b); }
300	</pre>
301
302	<p>Naturally <code>rounding::unprotected_rounding</code> may simply be
303	<code>rounding</code> itself. But it can improve performance if it is a
304	simplified version with empty constructor and destructor. In order to avoid
305	undefined behaviors, in the library, an object of type
306	<code>rounding::unprotected_rounding</code> is guaranteed to be created
307	only when an object of type <code>rounding</code> is already alive. See the
308	<a href="#perf">performance notes</a> for some additional details.</p>
309
310	<p>The support library defines a metaprogramming class template
311	<code>unprotect</code> which takes an interval type <code>I</code> and
312	returns an interval type <code>unprotect<I>::type</code> where the
313	rounding policy has been unprotected. Some information about the types:
314	<code>interval<T, interval_lib::policies<Rounding, _>
315	>::traits_type::rounding</code> <b>is</b> the same type as
316	<code>Rounding</code>, and <code>unprotect<interval<T,
317	interval_lib::policies<Rounding, _> > >::type</code> <b>is</b>
318	the same type as <code>interval<T,
319	interval_lib::policies<Rounding::unprotected, _> ></code>.</p>
320
321	<h3>State saving</h3>
322
323	<p>First comes <code>save_state</code>. This class is responsible for
324	saving the current rounding mode and calling init in its constructor, and
325	for restoring the saved rounding mode in its destructor. This class also
326	defines the <code>unprotected_rounding</code> type.</p>
327
328	<p>If the rounding mode does not require any state-saving or
329	initialization, <code>save_state_nothing</code> can be used instead of
330	<code>save_state</code>.</p>
331
332	<h3>Transcendental functions</h3>
333
334	<p>The classes <code>rounded_transc_exact</code>,
335	<code>rounded_transc_std</code> and <code>rounded_transc_opp</code> expect
336	the std namespace to provide the functions exp log cos tan acos asin atan
337	cosh sinh tanh acosh asinh atanh. For the <code>_std</code> and
338	<code>_opp</code> versions, all these functions should respect the current
339	rounding mode fixed by a call to downward or upward.</p>
340
341	<p><strong>Please note:</strong> Unfortunately, the latter is rarely the
342	case. It is the reason why a class <code>rounded_transc_dummy</code> is
343	provided which does not depend on the functions from the std namespace.
344	There is no magic, however. The functions of
345	<code>rounded_transc_dummy</code> do not compute anything. They only return
346	valid values. For example, <code>cos_down</code> always returns -1. In this
347	way, we do verify the inclusion property for the default implementation,
348	even if this has strictly no value for the user. In order to have useful
349	values, another policy should be used explicitely, which will most likely
350	lead to a violation of the inclusion property. In this way, we ensure that
351	the violation is clearly pointed out to the user who then knows what he
352	stands against. This class could have been used as the default
353	transcendental rounding class, but it was decided it would be better for
354	the compilation to fail due to missing declarations rather than succeed
355	thanks to valid but unusable functions.</p>
356
357	<h3>Basic arithmetic functions</h3>
358
359	<p>The classes <code>rounded_arith_std</code> and
360	<code>rounded_arith_opp</code> expect the operators + - * / and the
361	function <code>std::sqrt</code> to respect the current rounding mode.</p>
362
363	<p>The class <code>rounded_arith_exact</code> requires
364	<code>std::floor</code> and <code>std::ceil</code> to be defined since it
365	can not rely on <code>to_int</code>.</p>
366
367	<h3>Rounding control</h3>
368
369	<p>The functions defined by each of the previous classes did not need any
370	explanation. For example, the behavior of <code>add_down</code> is to
371	compute the sum of two numbers rounded downward. For
372	<code>rounding_control</code>, the situation is a bit more complex.</p>
373
374	<p>The basic function is <code>force_rounding</code> which returns its
375	argument correctly rounded accordingly to the current rounding mode if it
376	was not already the case. This function is necessary to handle delayed
377	rounding. Indeed, depending on the way the computations are done, the
378	intermediate results may be internaly stored in a more precise format and
379	it can lead to a wrong rounding. So the function enforces the rounding.
380	<a href="#extreg">Here</a> is an example of what happens when the rounding
381	is not enforced.</p>
382
383	<p>The function <code>get_rounding_mode</code> returns the current rounding
384	mode, <code>set_rounding_mode</code> sets the rounding mode back to a
385	previous value returned by <code>get_rounding_mode</code>.
386	<code>downward</code>, <code>upward</code> and <code>to_nearest</code> sets
387	the rounding mode in one of the three directions. This rounding mode should
388	be global to all the functions that use the type <code>T</code>. For
389	example, after a call to <code>downward</code>,
390	<code>force_rounding(x+y)</code> is expected to return the sum rounded
391	toward -∞.</p>
392
393	<p>The function <code>to_int</code> computes the nearest integer
394	accordingly to the current rounding mode.</p>
395
396	<p>The non-specialized version of <code>rounding_control</code> does not do
397	anything. The functions for the rounding mode are empty, and
398	<code>to_int</code> and <code>force_rounding</code> are identity functions.
399	The <code>pi_</code> constant functions return suitable integers (for
400	example, <code>pi_up</code> returns <code>T(4)</code>).</p>
401
402	<p>The class template <code>rounding_control</code> is specialized for
403	<code>float</code>, <code>double</code> and <code>long double</code> in
404	order to best use the floating point unit of the computer.</p>
405
406	<h2>Template class <tt>rounded_math</tt></h2>
407
408	<p>The default policy (aka <code>rounded_math<T></code>) is simply
409	defined as:</p>
410	<pre>
411	template <class T> struct rounded_math<T> : save_state_nothing<rounded_arith_exact<T> > {};
412	</pre>
413
414	<p>and the specializations for <code>float</code>, <code>double</code> and
415	<code>long double</code> use <code>rounded_arith_opp</code>, as in:</p>
416	<pre>
417	template <> struct rounded_math<float> : save_state<rounded_arith_opp<float> > {};
418	template <> struct rounded_math<double> : save_state<rounded_arith_opp<double> > {};
419	template <> struct rounded_math<long double> : save_state<rounded_arith_opp<long double> > {};
420	</pre>
421
422	<h2 id="perf">Performance Issues</h2>
423
424	<p>This paragraph deals mostly with the performance of the library with
425	intervals using the floating-point unit (FPU) of the computer. Let's
426	consider the sum of [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] as an
427	example. The result is [<code>down</code>(<i>a</i>+<i>c</i>),
428	<code>up</code>(<i>b</i>+<i>d</i>)], where <code>down</code> and
429	<code>up</code> indicate the rounding mode needed.</p>
430
431	<h3>Rounding Mode Switch</h3>
432
433	<p>If the FPU is able to use a different rounding mode for each operation,
434	there is no problem. For example, it's the case for the Alpha processor:
435	each floating-point instruction can specify a different rounding mode.
436	However, the IEEE-754 Standard does not require such a behavior. So most of
437	the FPUs only provide some instructions to set the rounding mode for all
438	subsequent operations. And generally, these instructions need to flush the
439	pipeline of the FPU.</p>
440
441	<p>In this situation, the time needed to sum [<i>a</i>,<i>b</i>] and
442	[<i>c</i>,<i>d</i>] is far worse than the time needed to calculate
443	<i>a</i>+<i>b</i> and <i>c</i>+<i>d</i> since the two additions cannot be
444	parallelized. Consequently, the objective is to diminish the number of
445	rounding mode switches.</p>
446
447	<p>If this library is not used to provide exact computations, but only for
448	pair arithmetic, the solution is quite simple: do not use rounding. In that
449	case, doing the sum [<i>a</i>,<i>b</i>] and [<i>c</i>,<i>d</i>] will be as
450	fast as computing <i>a</i>+<i>b</i> and <i>c</i>+<i>d</i>. Everything is
451	perfect.</p>
452
453	<p>However, if exact computations are required, such a solution is totally
454	unthinkable. So, are we penniless? No, there is still a trick available.
455	Indeed, down(<i>a</i>+<i>c</i>) = -up(-<i>a</i>-<i>c</i>) if the unary
456	minus is an exact operation. It is now possible to calculate the whole sum
457	with the same rounding mode. Generally, the cost of the mode switching is
458	worse than the cost of the sign changes.</p>
459
460	<h4>Speeding up consecutive operations</h4>
461
462	<p>The interval addition is not the only operation; most of the interval
463	operations can be computed by setting the rounding direction of the FPU
464	only once. So the operations of the floating point rounding policy assume
465	that the direction is correctly set. This assumption is usually not true in
466	a program (the user and the standard library expect the rounding direction
467	to be to nearest), so these operations have to be enclosed in a shell that
468	sets the floating point environment. This protection is done by the
469	constructor and destructor of the rounding policy.</p>
470
471	<p>Les us now consider the case of two consecutive interval additions:
472	[<i>a</i>,<i>b</i>] + [<i>c</i>,<i>d</i>] + [<i>e</i>,<i>f</i>]. The
473	generated code should look like:</p>
474	<pre>
475	init_rounding_mode(); // rounding object construction during the first addition
476	t1 = -(-a - c);
477	t2 = b + d;
478	restore_rounding_mode(); // rounding object destruction
479	init_rounding_mode(); // rounding object construction during the second addition
480	x = -(-t1 - e);
481	y = t2 + f;
482	restore_rounding_mode(); // rounding object destruction
483	// the result is the interval [x,y]
484	</pre>
485
486	<p>Between the two operations, the rounding direction is restored, and then
487	initialized again. Ideally, compilers should be able to optimize this
488	useless code away. But unfortunately they are not, and this slows the code
489	down by an order of magnitude. In order to avoid this bottleneck, the user
490	can tell to the interval operations that they do not need to be protected
491	anymore. It will then be up to the user to protect the interval
492	computations. The compiler will then be able to generate such a code:</p>
493	<pre>
494	init_rounding_mode(); // done by the user
495	x = -(-a - c - e);
496	y = b + d + f;
497	restore_rounding_mode(); // done by the user
498	</pre>
499
500	<p>The user will have to create a rounding object. And as long as this
501	object is alive, unprotected versions of the interval operations can be
502	used. They are selected by using an interval type with a specific rounding
503	policy. If the initial interval type is <code>I</code>, then
504	<code>I::traits_type::rounding</code> is the type of the rounding object,
505	and <code>interval_lib::unprotect<I>::type</code> is the type of the
506	unprotected interval type.</p>
507
508	<p>Because the rounding mode of the FPU is changed during the life of the
509	rounding object, any arithmetic floating point operation that does not
510	involve the interval library can lead to unexpected results. And
511	reciprocally, using unprotected interval operation when no rounding object
512	is alive will produce intervals that are not guaranteed anymore to contain
513	the real result.</p>
514
515	<h4 id="perfexp">Example</h4>
516
517	<p>Here is an example of Horner's scheme to compute the value of a polynom.
518	The rounding mode switches are disabled for the whole computation.</p>
519	<pre>
520	// I is an interval class, the polynom is a simple array
521	template<class I>
522	I horner(const I& x, const I p[], int n) {
523
524	// save and initialize the rounding mode
525	typename I::traits_type::rounding rnd;
526
527	// define the unprotected version of the interval type
528	typedef typename boost::numeric::interval_lib::unprotect<I>::type R;
529
530	const R& a = x;
531	R y = p[n - 1];
532	for(int i = n - 2; i >= 0; i--) {
533	y = y * a + (const R&)(p[i]);
534	}
535	return y;
536
537	// restore the rounding mode with the destruction of rnd
538	}
539	</pre>
540
541	<p>Please note that a rounding object is specially created in order to
542	protect all the interval computations. Each interval of type I is converted
543	in an interval of type R before any operations. If this conversion is not
544	done, the result is still correct, but the interest of this whole
545	optimization has disappeared. Whenever possible, it is good to convert to
546	<code>const R&</code> instead of <code>R</code>: indeed, the function
547	could already be called inside an unprotection block so the types
548	<code>R</code> and <code>I</code> would be the same interval, no need for a
549	conversion.</p>
550
551	<h4>Uninteresting remark</h4>
552
553	<p>It was said at the beginning that the Alpha processors can use a
554	specific rounding mode for each operation. However, due to the instruction
555	format, the rounding toward plus infinity is not available. Only the
556	rounding toward minus infinity can be used. So the trick using the change
557	of sign becomes essential, but there is no need to save and restore the
558	rounding mode on both sides of an operation.</p>
559
560	<h3 id="extreg">Extended Registers</h3>
561
562	<p>There is another problem besides the cost of the rounding mode switch.
563	Some FPUs use extended registers (for example, float computations will be
564	done with double registers, or double computations with long double
565	registers). Consequently, many problems can arise.</p>
566
567	<p>The first one is due to to the extended precision of the mantissa. The
568	rounding is also done on this extended precision. And consequently, we
569	still have down(<i>a</i>+<i>b</i>) = -up(-<i>a</i>-<i>b</i>) in the
570	extended registers. But back to the standard precision, we now have
571	down(<i>a</i>+<i>b</i>) < -up(-<i>a</i>-<i>b</i>) instead of an
572	equality. A solution could be not to use this method. But there still are
573	other problems, with the comparisons between numbers for example.</p>
574
575	<p>Naturally, there is also a problem with the extended precision of the
576	exponent. To illustrate this problem, let <i>m</i> be the biggest number
577	before +<i>inf</i>. If we calculate 2*[<i>m</i>,<i>m</i>], the answer
578	should be [<i>m</i>,<i>inf</i>]. But due to the extended registers, the FPU
579	will first store [<i>2m</i>,<i>2m</i>] and then convert it to
580	[<i>inf</i>,<i>inf</i>] at the end of the calculus (when the rounding mode
581	is toward +<i>inf</i>). So the answer is no more accurate.</p>
582
583	<p>There is only one solution: to force the FPU to convert the extended
584	values back to standard precision after each operation. Some FPUs provide
585	an instruction able to do this conversion (for example the PowerPC
586	processors). But for the FPUs that do not provide it (the x86 processors),
587	the only solution is to write the values to memory and read them back. Such
588	an operation is obviously very expensive.</p>
589
590	<h2>Some Examples</h2>
591
592	<p>Here come several cases:</p>
593
594	<ul>
595	<li>if you need precise computations with the <code>float</code> or
596	<code>double</code> types, use the default
597	<code>rounded_math<T></code>;</li>
598
599	<li>for fast wide intervals without any rounding nor precision, use
600	<code>save_state_nothing<rounded_transc_exact<T>
601	></code>;</li>
602
603	<li>for an exact type (like int or rational with a little help for
604	infinite and NaN values) without support for transcendental functions,
605	the solution could be
606	<code>save_state_nothing<rounded_transc_dummy<T,
607	rounded_arith_exact<T> > ></code> or directly
608	<code>save_state_nothing<rounded_arith_exact<T>
609	></code>;</li>
610
611	<li>if it is a more complex case than the previous ones, the best thing
612	is probably to directly define your own policy.</li>
613	</ul>
614	<hr>
615
616	<p><a href="http://validator.w3.org/check?uri=referer"><img border="0" src=
617	"http://www.w3.org/Icons/valid-html401" alt="Valid HTML 4.01 Transitional"
618	height="31" width="88"></a></p>
619
620	<p>Revised
621	<!--webbot bot="Timestamp" s-type="EDITED" s-format="%Y-%m-%d" startspan -->2006-12-24<!--webbot bot="Timestamp" endspan i-checksum="12172" --></p>
622
623	<p><i>Copyright © 2002 Guillaume Melquiond, Sylvain Pion, Hervé
624	Brönnimann, Polytechnic University<br>
625	Copyright © 2004-2005 Guillaume Melquiond, ENS Lyon</i></p>
626
627	<p><i>Distributed under the Boost Software License, Version 1.0. (See
628	accompanying file <a href="../../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a>
629	or copy at <a href=
630	"http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p>
631	</body>
632	</html>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: