Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/filesystem/doc/design.htm @ 12

Last change on this file since 12 was 12, checked in by landauf, 18 years ago

added boost

File size: 15.6 KB
Line 
1<html>
2
3<head>
4<meta http-equiv="Content-Language" content="en-us">
5<meta name="GENERATOR" content="Microsoft FrontPage 5.0">
6<meta name="ProgId" content="FrontPage.Editor.Document">
7<meta http-equiv="Content-Type" content="text/html; charset=windows-1252">
8<title>Boost Filesystem Library Design</title>
9</head>
10
11<body bgcolor="#FFFFFF">
12
13<h1>
14<img border="0" src="../../../boost.png" align="center" width="277" height="86">Filesystem
15Library Design</h1>
16
17<p><a href="#Introduction">Introduction</a><br>
18<a href="#Requirements">Requirements</a><br>
19<a href="#Realities">Realities</a><br>
20<a href="#Rationale">Rationale</a><br>
21<a href="#Abandoned_Designs">Abandoned_Designs</a><br>
22<a href="#References">References</a></p>
23
24<h2><a name="Introduction">Introduction</a></h2>
25
26<p>The primary motivation for beginning work on the Filesystem Library was
27frustration with Boost administrative tools.&nbsp; Scripts were written in
28Python, Perl, Bash, and Windows command languages.&nbsp; There was no single
29scripting language familiar and acceptable to all Boost administrators. Yet they
30were all skilled C++ programmers - why couldn't C++ be used as the scripting
31language?</p>
32
33<p>The key feature C++ lacked for script-like applications was the ability to
34perform portable filesystem operations on directories and their contents. The
35Filesystem Library was developed to fill that void.</p>
36
37<p>The intent is not to compete with traditional scripting languages, but to
38provide a solution for situations where C++ is already the language
39of choice..</p>
40
41<h2><a name="Requirements">Requirements</a></h2>
42<ul>
43  <li>Be able to write portable script-style filesystem operations in modern
44  C++.<br>
45  <br>
46  Rationale: This is a common programming need. It is both an
47  embarrassment and a hardship that this is not possible with either the current
48  C++ or Boost libraries.&nbsp; The need is particularly acute
49  when C++ is the only toolset allowed in the tool chain.&nbsp; File system
50  operations are provided by many languages&nbsp;used on multiple platforms,
51  such as Perl and Python, as well as by many platform specific scripting
52  languages. All operating systems provide some form of API for filesystem
53  operations, and the POSIX bindings are increasingly available even on
54  operating systems not normally associated with POSIX, such as the Mac, z/OS,
55  or OS/390.<br>
56&nbsp;</li>
57  <li>Work within the <a href="#Realities">realities</a> described below.<br>
58  <br>
59  Rationale: This isn't a research project. The need is for something that works on
60  today's platforms, including some of the embedded operating systems
61  with limited file systems. Because of the emphasis on portability, such a
62  library would be much more useful if standardized. That means being able to
63  work with a much wider range of platforms that just Unix or Windows and their
64  clones.<br>
65&nbsp;</li>
66  <li>Avoid dangerous programming practices. Particularly, all-too-easy-to-ignore error notifications
67  and use of global variables.&nbsp;If a dangerous feature is provided, identify it as such.<br>
68  <br>
69  Rationale: Normally this would be covered by &quot;the usual Boost requirements...&quot;,
70  but it is mentioned explicitly because the equivalent native platform and
71  scripting language interfaces often depend on all-too-easy-to-ignore error
72  notifications and global variables like &quot;current
73  working directory&quot;.<br>
74&nbsp;</li>
75  <li>Structure the library so that it is still useful even if some functionality
76  does not map well onto a given platform or directory tree. Particularly, much
77  useful functionality should be portable even to flat
78(non-hierarchical) filesystems.<br>
79  <br>
80  Rationale: Much functionality which does not
81  require a hierarchical directory structure is still useful on flat-structure
82  filesystems.&nbsp; There are many systems, particularly embedded systems,
83  where even very limited functionality is still useful.</li>
84</ul>
85<ul>
86  <li>Interface smoothly with current C++ Standard Library input/output
87  facilities.&nbsp; For example, paths should work in std::basic_fstream constructors.<br>
88  <br>
89  Rationale: One of the most common uses of file system functionality is to
90  manipulate paths for eventual use in input/output operations.&nbsp; 
91  Thus the need to interface smoothly with standard library I/O.<br>
92&nbsp;</li>
93  <li>Suitable for eventual standardization. The implication of this requirement
94  is that the interface be close to minimal, and that great care be take
95  regarding portability.<br>
96  <br>
97  Rationale: The lack of file system operations is a serious hole
98  in the current standard, with no other known candidates to fill that hole.
99  Libraries with elaborate interfaces and difficult to port specifications are much less likely to be accepted for
100  standardization.<br>
101&nbsp;</li>
102  <li>The usual Boost <a href="../../../more/lib_guide.htm">requirements and
103  guidelines</a> apply.<br>
104&nbsp;</li>
105  <li>Encourage, but do not require, portability in path names.<br>
106  <br>
107  Rationale: For paths which originate from user input it is unreasonable to
108  require portable path syntax.<br>
109&nbsp;</li>
110  <li>Avoid giving the illusion of portability where portability in fact does not
111  exist.<br>
112  <br>
113  Rationale: Leaving important behavior unspecified or &quot;implementation defined&quot; does a
114  great disservice to programmers using a library because it makes it appear
115  that code relying on the behavior is portable, when in fact there is nothing
116  portable about it. The only case where such under-specification is acceptable is when both users and implementors know from
117  other sources exactly what behavior is required, yet for some reason it isn't
118  possible to specify it exactly.</li>
119</ul>
120<h2><a name="Realities">Realities</a></h2>
121<ul>
122  <li>Some operating systems have a single directory tree root, others have
123  multiple roots.<br>
124&nbsp;</li>
125  <li>Some file systems provide both a long and short form of filenames.<br>
126&nbsp;</li>
127  <li>Some file systems have different syntax for file paths and directory
128  paths.<br>
129&nbsp;</li>
130  <li>Some file systems have different rules for valid file names and valid
131  directory names.<br>
132&nbsp;</li>
133  <li>Some file systems (ISO-9660, level 1, for example) use very restricted
134  (so-called 8.3) file names.<br>
135&nbsp;</li>
136  <li>Some operating systems allow file systems with different
137  characteristics to be &quot;mounted&quot; within a directory tree.&nbsp; Thus a
138  ISO-9660 or Windows
139  file system may end up as a sub-tree of a POSIX directory tree.<br>
140&nbsp;</li>
141  <li>Wide-character versions of directory and file operations are available on some operating
142  systems, and not available on others.<br>
143&nbsp;</li>
144  <li>There is no law that says directory hierarchies have to be specified in
145  terms of left-to-right decent from the root.<br>
146&nbsp;</li>
147  <li>Some file systems have a concept of file &quot;version number&quot; or &quot;generation
148  number&quot;.&nbsp; Some don't.<br>
149&nbsp;</li>
150  <li>Not all operating systems use single character separators in path names.&nbsp; Some use
151  paired notations. A typical fully-specified OpenVMS filename
152  might look something like this:<br>
153  <br>
154  <code>&nbsp;&nbsp; DISK$SCRATCH:[GEORGE.PROJECT1.DAT]BIG_DATA_FILE.NTP;5<br>
155  </code><br>
156  The general OpenVMS format is:<br>
157  <br>
158&nbsp;&nbsp;&nbsp;&nbsp; 
159  <i>Device:[directories.dot.separated]filename.extension;version_number</i><br>
160&nbsp;</li>
161  <li>For common file systems, determining if two descriptors are for same
162  entity is extremely difficult or impossible.&nbsp; For example, the concept of
163  equality can be different for each portion of a path - some portions may be
164  case or locale sensitive, others not. Case sensitivity is a property of the
165  pathname itself, and not the platform. Determining collating sequence is even
166  worse.<br>
167&nbsp;</li>
168  <li>Race-conditions may occur. Directory trees, directories, files, and file attributes are in effect shared between all threads, processes, and computers which have access to the
169  filesystem.&nbsp; That may well include computers on the other side of the
170  world or in orbit around the world. This implies that file system operations
171  may fail in unexpected ways.&nbsp;For example:<br>
172  <br>
173  <code>&nbsp;&nbsp;&nbsp;&nbsp; assert( exists(&quot;foo&quot;) == exists(&quot;foo&quot;) );
174  // may fail!<br>
175&nbsp;&nbsp;&nbsp;&nbsp; assert( is_directory(&quot;foo&quot;) == is_directory(&quot;foo&quot;);
176  // may fail!<br>
177  </code><br>
178  In the first example, the file may have been deleted between calls to
179  exists().&nbsp; In the second example, the file may have been deleted and then
180  replaced by a directory of the same name between the calls to is_directory().<br>
181&nbsp;</li>
182  <li>Even though an application may be portable, it still will have to traffic
183  in system specific paths occasionally; user provided input is a common
184  example.<br>
185&nbsp;</li>
186  <li><a name="symbolic-link-use-case">Symbolic</a> links cause canonical and
187  normal form of some paths to represent different files or directories. For
188  example, given the directory hierarchy <code>/a/b/c</code>, with a symbolic
189  link in <code>/a</code> named <code>x</code>&nbsp; pointing to <code>b/c</code>,
190  then under POSIX Pathname Resolution rules a path of <code>&quot;/a/x/..&quot;</code> 
191  should resolve to <code>&quot;/a/b&quot;</code>. If <code>&quot;/a/x/..&quot;</code> were first
192  normalized to <code>&quot;/a&quot;</code>, it would resolve incorrectly. (Case supplied
193  by Walter Landry.)</li>
194</ul>
195
196<h2><a name="Rationale">Rationale</a></h2>
197
198<p>The <a href="#Requirements">Requirements</a> and <a href="#Realities">
199Realities</a> above drove much of the C++ interface design.&nbsp; In particular,
200the desire to make script-like code straightforward caused a great deal of
201effort to go into ensuring that apparently simple expressions like <i>exists( &quot;foo&quot; 
202)</i> work as expected.</p>
203
204<p>See the <a href="faq.htm">FAQ</a> for the rationale behind many detailed
205design decisions.</p>
206
207<p>Several key insights went into the <i>path</i> class design:</p>
208<ul>
209  <li>Decoupling of the input formats, internal conceptual (<i>vector&lt;string&gt;</i> 
210  or other sequence)
211  model, and output formats.</li>
212  <li>Providing two input formats (generic and O/S specific) broke a major
213  design deadlock.</li>
214  <li>Providing several output formats solved another set of previously
215  intractable problems.</li>
216  <li>Several non-obvious functions (particularly decomposition and composition)
217  are required to support portable code. (Peter Dimov, Thomas Witt, Glen
218  Knowles, others.)</li>
219</ul>
220
221<p>Error checking was a particularly difficult area. One key insight was that
222with file and directory names, portability isn't a universal truth.&nbsp; 
223Rather, the programmer must think out the question &quot;What operating systems do I
224want this path to be portable to?&quot;&nbsp; By providing support for several
225answers to that question, the Filesystem Library alerts programmers of the need
226to ask it in the first place.</p>
227<h2><a name="Abandoned_Designs">Abandoned Designs</a></h2>
228<h3>operations.hpp</h3>
229<p>Dietmar Kühl's original dir_it design and implementation supported
230wide-character file and directory names. It was abandoned after extensive
231discussions among Library Working Group members failed to identify portable
232semantics for wide-character names on systems not providing native support. See
233<a href="faq.htm#wide-character_names">FAQ</a>.</p>
234<p>Previous iterations of the interface design used explicitly named functions providing a
235large number of convenience operations, with no compile-time or run-time
236options. There were so many function names that they were very confusing to use,
237and the interface was much larger. Any benefits seemed theoretical rather than
238real. </p>
239<p>Designs based on compile time (rather than runtime) flag and option selection
240(via policy, enum, or int template parameters) became so complicated that they
241were abandoned, often after investing quite a bit of time and effort. The need
242to qualify attribute or option names with namespaces, even aliases, made use in
243template parameters ugly; that wasn't fully appreciated until actually writing
244real code.</p>
245<p>Yet another set of convenience functions ( for example, <i>remove</i> with
246permissive, prune, recurse, and other options, plus predicate, and possibly
247other, filtering features) were abandoned because the details became both
248complex and contentious.</p>
249
250<p>What is left is a toolkit of low-level operations from which the user can
251create more complex convenience operations, plus a very small number of
252convenience functions which were found to be useful enough to justify inclusion.</p>
253
254<h3>path.hpp</h3>
255
256<p>There were so many abandoned path designs, I've lost track. Policy-based
257class templates in several flavors, constructor supplied runtime policies,
258operation specific runtime policies, they were all considered, often
259implemented, and ultimately abandoned as far too complicated for any small
260benefits observed.</p>
261
262<h3>error checking</h3>
263
264<p>A number of designs for the error checking machinery were abandoned, some
265after experiments with implementations. Totally automatic error checking was
266attempted in particular. But automatic error checking tended to make the overall
267library design much more complicated.</p>
268
269<p>Some designs associated error checking mechanisms with paths.&nbsp; Some with
270operations functions.&nbsp; A policy-based error checking template design was
271partially implemented, then abandoned as too complicated for everyday
272script-like programs.</p>
273
274<p>The final design, which depends partially on explicit error checking function
275calls,&nbsp; is much simpler and straightforward, although it does depend to
276some extent on programmer discipline.&nbsp; But it should allow programmers who
277are concerned about portability to be reasonably sure that their programs will
278work correctly on their choice of target systems.</p>
279
280<h2><a name="References">References</a></h2>
281
282<p>[<a name="IBM-01">IBM-01</a>] IBM Corporation, <i>z/OS V1R3.0 C/C++ Run-Time
283Library Reference</i>, SA22-7821-02, 2001,
284<a href="http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/">
285http://www-1.ibm.com/servers/eserver/zseries/zos/bkserv/</a></p>
286
287<p>[<a name="ISO-9660">ISO-9660</a>] International Standards Organization, 1988.</p>
288
289<p>[<a name="MSDN">MSDN</a>] Microsoft Platform SDK for Windows, Storage Start
290Page,
291<a href="http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp">
292http://msdn.microsoft.com/library/en-us/fileio/base/storage_start_page.asp</a></p>
293
294<p>[<a name="POSIX-01">POSIX-01</a>] IEEE Std 1003.1-2001/ISO/IEC 9945:2002 ,
295<a href="http://www.unix-systems.org/version3/">
296http://www.unix-systems.org/version3/</a>. The ISO JTC1/SC22/WG15 - POSIX
297homepage is <a href="http://std.dkuug.dk/JTC1/SC22/WG15/">
298http://std.dkuug.dk/JTC1/SC22/WG15/</a>.</p>
299
300<p>[<a name="URI">URI</a>] RFC-2396, Uniform Resource Identifiers (URI): Generic
301Syntax, <a href="http://www.ietf.org/rfc/rfc2396.txt">
302http://www.ietf.org/rfc/rfc2396.txt</a></p>
303
304<p>[<a name="Wulf-Shaw-73">Wulf-Shaw-73</a>] William Wulf, Mary Shaw, <i>Global
305Variable Considered Harmful</i>, ACM SIGPLAN Notices, 8, 2, 1973, pp. 23-34</p>
306
307<hr>
308<p>Revised
309<!--webbot bot="Timestamp" S-Type="EDITED" S-Format="%d %B, %Y" startspan -->02 August, 2005<!--webbot bot="Timestamp" endspan i-checksum="34600" --></p>
310
311<p>© Copyright Beman Dawes, 2002</p>
312<p> Use, modification, and distribution are subject to the Boost Software
313License, Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt">
314LICENSE_1_0.txt</a> or copy at <a href="http://www.boost.org/LICENSE_1_0.txt">
315www.boost.org/LICENSE_1_0.txt</a>)</p>
316
317</body>
318
319</html>
Note: See TracBrowser for help on using the repository browser.