Planet
navi homePPSaboutscreenshotsdownloaddevelopmentforum

source: downloads/boost_1_33_1/libs/graph/doc/file_dependency_example.html @ 12

Last change on this file since 12 was 12, checked in by landauf, 18 years ago

added boost

File size: 14.1 KB
Line 
1<HTML>
2<!--
3  -- Copyright (c) Jeremy Siek 2000
4  --
5  -- Permission to use, copy, modify, distribute and sell this software
6  -- and its documentation for any purpose is hereby granted without fee,
7  -- provided that the above copyright notice appears in all copies and
8  -- that both that copyright notice and this permission notice appear
9  -- in supporting documentation.  Silicon Graphics makes no
10  -- representations about the suitability of this software for any
11  -- purpose.  It is provided "as is" without express or implied warranty.
12  -->
13<Head>
14<Title>File Dependency Example</Title>
15<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b" 
16        ALINK="#ff0000"> 
17<IMG SRC="../../../boost.png" 
18     ALT="C++ Boost" width="277" height="86"> 
19
20<BR Clear>
21
22
23<H1><A NAME="sec:file-depend-eg"></A>
24File Dependency Example
25</H1>
26
27<P>
28One of the most common uses of the graph abstraction in computer
29science is to track dependencies. An example of dependency tracking
30that we deal with on a day to day basis is the compilation
31dependencies for files in programs that we write. These dependencies
32are used inside programs such as <TT>make</TT> or in an IDE such as
33Visual C++ to minimize the number of files that must be recompiled
34after some changes have been made.
35
36<P>
37<A HREF="#fig:file-dep">Figure 1</A> shows a graph that has a vertex
38for each source file, object file, and library that is used in the
39<TT>killerapp</TT> program. The edges in the graph show which files
40are used in creating other files. The choice of which direction to
41point the arrows is somewhat arbitrary. As long as we are consistent
42in remembering that the arrows mean ``used by'' then things will work
43out. The opposite direction would mean ``depends on''.
44
45<P>
46
47<P></P>
48<DIV ALIGN="CENTER"><A NAME="fig:file-dep"></A>
49<TABLE>
50<CAPTION ALIGN="BOTTOM"><STRONG>Figure 1:</STRONG>
51A graph representing file dependencies.</CAPTION>
52<TR><TD><IMG SRC="./figs/file_dep.gif" width="331" height="351"></TD></TR>
53</TABLE>
54</DIV><P></P>
55
56<P>
57A compilation system such as <TT>make</TT> has to be able to answer a
58number of questions:
59
60<P>
61
62<OL>
63<LI>If we need to compile (or recompile) all of the files, what order
64   should that be done it?
65</LI>
66<LI>What files can be compiled in parallel?
67</LI>
68<LI>If a file is changed, which files must be recompiled?
69</LI>
70<LI>Are there any cycles in the dependencies? (which means the user
71  has made a mistake and an error should be emitted)
72</LI>
73</OL>
74
75<P>
76In the following examples we will formulate each of these questions in
77terms of the dependency graph, and then find a graph algorithm to
78provide the solution. The graph in <A HREF="#fig:file-dep">Figure
791</A> will be used in all of the following examples. The source code
80for this example can be found in the file <a
81href="../example/file_dependencies.cpp"><TT>examples/file_dependencies.cpp</TT></a>.
82
83<P>
84
85<H2>Graph Setup</H2>
86
87<P>
88Here we show the construction of the graph.  For simplicity we have
89constructed the graph &quot;by-hand&quot;. A compilation system such
90as <TT>make</TT> would instead parse a <TT>Makefile</TT> to get the
91list of files and to set-up the dependencies. We use the
92<TT>adjacency_list</TT> class to represent the graph. The
93<TT>vecS</TT> selector means that a <TT>std::vector</TT> will be used
94to represent each edge-list, which provides efficient traversal. The
95<TT>directedS</TT> selector means we want a directed graph, and the
96<TT>color_property</TT> attaches a color property to each vertex of the
97graph. The color property will be used in several of the algorithms in
98the following sections.
99
100<P>
101<PRE>
102  enum files_e { dax_h, yow_h, boz_h, zow_h, foo_cpp,
103                 foo_o, bar_cpp, bar_o, libfoobar_a,
104                 zig_cpp, zig_o, zag_cpp, zag_o,
105                 libzigzag_a, killerapp, N };
106  const char* name[] = { "dax.h", "yow.h", "boz.h", "zow.h", "foo.cpp",
107                         "foo.o", "bar.cpp", "bar.o", "libfoobar.a",
108                         "zig.cpp", "zig.o", "zag.cpp", "zag.o",
109                         "libzigzag.a", "killerapp" };
110
111  typedef std::pair&lt;int, int&gt; Edge;
112  Edge used_by[] = {
113    Edge(dax_h, foo_cpp), Edge(dax_h, bar_cpp), Edge(dax_h, yow_h),
114    Edge(yow_h, bar_cpp), Edge(yow_h, zag_cpp),
115    Edge(boz_h, bar_cpp), Edge(boz_h, zig_cpp), Edge(boz_h, zag_cpp),
116    Edge(zow_h, foo_cpp),
117    Edge(foo_cpp, foo_o),
118    Edge(foo_o, libfoobar_a),
119    Edge(bar_cpp, bar_o),
120    Edge(bar_o, libfoobar_a),
121    Edge(libfoobar_a, libzigzag_a),
122    Edge(zig_cpp, zig_o),
123    Edge(zig_o, libzigzag_a),
124    Edge(zag_cpp, zag_o),
125    Edge(zag_o, libzigzag_a),
126    Edge(libzigzag_a, killerapp)
127  };
128
129  using namespace boost;
130  typedef adjacency_list&lt;vecS, vecS, directedS,
131      property&lt;vertex_color_t, default_color_type&gt;,
132      property&lt;edge_weight_t, int&gt;
133    &gt; Graph;
134  Graph g(N, used_by, used_by + sizeof(used_by) / sizeof(Edge));
135  typedef graph_traits&lt;Graph&gt;::vertex_descriptor Vertex;
136</PRE>
137
138<P>
139
140<H2>Compilation Order (All Files)</H2>
141
142<P>
143On the first invocation of <TT>make</TT> for a particular project, all
144of the files must be compiled. Given the dependencies between the
145various files, what is the correct order in which to compile and link
146them? First we need to formulate this in terms of a graph. Finding a
147compilation order is the same as ordering the vertices in the graph.
148The constraint on the ordering is the file dependencies which we have
149represented as edges. So if there is an edge <i>(u,v)</i> in the graph
150then <i>v</i> better not come before <i>u</i> in the ordering. It
151turns out that this kind of constrained ordering is called a
152<I>topological sort</I>. Therefore, answering the question of
153compilation order is as easy as calling the BGL algorithm <a
154href="./topological_sort.html"><TT>topological_sort()</TT></a>. The
155traditional form of the output for topological sort is a linked-list
156of the sorted vertices.  The BGL algorithm instead puts the sorted
157vertices into any <a
158href="http://www.sgi.com/tech/stl/OutputIterator.html">OutputIterator</a>,
159which allows for much more flexibility.  Here we use the
160<TT>std::front_insert_iterator</TT> to create an output iterator that
161inserts the vertices on the front of a linked list.  Other possible
162options are writing the output to a file or inserting into a different
163STL or custom-made container.
164
165<P>
166<PRE>
167  typedef std::list&lt;Vertex&gt; MakeOrder;
168  MakeOrder make_order;
169  boost::topological_sort(g, std::front_inserter(make_order));
170   
171  std::cout &lt;&lt; "make ordering: ";
172  for (MakeOrder::iterator i = make_order.begin();
173       i != make_order.end(); ++i)
174    std::cout &lt;&lt; name[*i] &lt;&lt; " ";
175  std::cout &lt;&lt; std::endl;
176</PRE>
177The output of this is:
178<PRE>
179  make ordering: zow.h boz.h zig.cpp zig.o dax.h yow.h zag.cpp \
180  zag.o bar.cpp bar.o foo.cpp foo.o libfoobar.a libzigzag.a killerapp
181</PRE>
182
183<P>
184
185<H2><A NAME="sec:parallel-compilation"></A>
186Parallel Compilation
187</H2>
188
189<P>
190Another question the compilation system might need to answer is: what
191files can be compiled simultaneously? This would allow the system to
192spawn threads and utilize multiple processors to speed up the build.
193This question can also be put in a slightly different way: what is the
194earliest time that a file can be built assuming that an unlimited
195number of files can be built at the same time? The main criteria for
196when a file can be built is that all of the files it depends on must
197already be built. To simplify things for this example, we'll assume
198that each file takes 1 time unit to build (even header files). The
199main observation for determining the ``time slot'' for a file is that
200the time slot must be one more than the maximum time-slot of the files
201it depends on.
202
203<P>
204This idea of calculating a value based on the previously computed
205values of neighboring vertices is the same idea behind Dijkstra's
206single-source shortest paths algorithm (see <a
207href="./dijkstra_shortest_paths.html"><tt>dijkstra_shortest_paths()</tt></a>). The
208main difference between this situation and a shortest-path algorithm
209is that we want to use the maximum of the neighbors' values instead of
210the minimum.  In addition, we do not have a single source
211vertex. Instead we will want to treat all vertices with in-degree of
212zero as sources (i.e., vertices with no edges coming into them). So we
213use Dijkstra's algorithm with several extra parameters instead
214of relying on the defaults.
215
216<P>
217To use <TT>dijkstra_shortest_paths()</TT>, we must first set up the vertex
218and edge properties that will be used in the algorithm.  We will need
219a time property (replacing the distance property of Dijkstra's
220algorithm) and an edge weight property. We will use a
221<TT>std::vector</TT> to store the time. The weight property has already
222been attached to the graph via a plug-in so here we just declare an
223map for the internal weight property.
224
225<P>
226<PRE>
227  std::vector&lt;int&gt; time(N, 0);
228  typedef std::vector&lt;int&gt;::iterator Time;
229  using boost::edge_weight_t;
230  typedef boost::property_map&lt;Graph, edge_weight_t&gt;::type Weight;
231  Weight weight = get(edge_weight, g);
232</PRE>
233
234<P>
235The next step is to identify the vertices with zero in-degree which
236will be our ``source'' vertices from which to start the shortest path
237searches. The in-degrees can be calculated with the following loop.
238
239<P>
240<PRE>
241  std::vector&lt;int&gt; in_degree(N, 0);
242  Graph::vertex_iterator i, iend;
243  Graph::out_edge_iterator j, jend;
244  for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
245    for (boost::tie(j, jend) = out_edges(*i, g); j != jend; ++j)
246      in_degree[target(*j, g)] += 1;
247</PRE>
248
249<P>
250Next we need to define comparison of the &quot;cost&quot;. In this
251case we want each file to have a time stamp greater than any of its
252predecessors.  Therefore we define comparison with the
253<TT>std::greater&lt;int&gt;</TT> function object.  We also need to
254tell the algorithm that we want to use addition to combine time
255values, so we use <TT>std::plus&lt;int&gt;</TT>.
256
257<P>
258<PRE>
259  std::greater&lt;int&gt; compare;
260  std::plus&lt;int&gt; combine;
261</PRE>
262
263<P>
264We are now ready to call <TT>uniform_cost_search()</TT>. We just
265loop through all the vertices in the graph, and invoke the algorithm
266if the vertex has zero in-degree.
267
268<P>
269<PRE>
270  for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
271    if (in_degree[*i] == 0)
272      boost::dijkstra_shortest_paths(g, *i,
273                                     distance_map(&amp;time[0]).
274                                     weight_map(weight).
275                                     distance_compare(compare).
276                                     distance_combine(combine));
277</PRE>
278
279<P>
280Last, we output the time-slot that we've calculated for each vertex.
281
282<P>
283<PRE>
284  std::cout &lt;&lt; "parallel make ordering, " &lt;&lt; std::endl
285       &lt;&lt; "  vertices with same group number" &lt;&lt; std::endl
286       &lt;&lt; "  can be made in parallel" &lt;&lt; std::endl &lt;&lt; std::endl;
287  for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
288    std::cout &lt;&lt; "time_slot[" &lt;&lt; name[*i] &lt;&lt; "] = " &lt;&lt; time[*i] &lt;&lt; std::endl;
289</PRE>
290The output is:
291<PRE>
292  parallel make ordering,
293    vertices with same group number
294    can be made in parallel
295  time_slot[dax.h] = 0
296  time_slot[yow.h] = 1
297  time_slot[boz.h] = 0
298  time_slot[zow.h] = 0
299  time_slot[foo.cpp] = 1
300  time_slot[foo.o] = 2
301  time_slot[bar.cpp] = 2
302  time_slot[bar.o] = 3
303  time_slot[libfoobar.a] = 4
304  time_slot[zig.cpp] = 1
305  time_slot[zig.o] = 2
306  time_slot[zag.cpp] = 2
307  time_slot[zag.o] = 3
308  time_slot[libzigzag.a] = 5
309  time_slot[killerapp] = 6
310</PRE>
311
312<P>
313
314<H2><A NAME="SECTION001014000000000000000"></A>
315<A NAME="sec:cycles"></A>
316<BR>
317Cyclic Dependencies
318</H2>
319
320<P>
321Another question the compilation system needs to be able to answer is
322whether there are any cycles in the dependencies. If there are cycles,
323the system will need to report an error to the user so that the cycles
324can be removed. One easy way to detect a cycle is to run a <a
325href="./depth_first_search.html">depth-first search</a>, and if the
326search runs into an already discovered vertex (of the current search
327tree), then there is a cycle. The BGL graph search algorithms (which
328includes
329<TT>depth_first_search()</TT>) are all extensible via the
330<I>visitor</I> mechanism. A visitor is similar to a function object,
331but it has several methods instead of just the one
332<TT>operator()</TT>.  The visitor's methods are called at certain
333points within the algorithm, thereby giving the user a way to extend
334the functionality of the graph search algorithms. See Section <A
335HREF="visitor_concepts.html">Visitor Concepts</A>
336for a detailed description of visitors.
337
338<P>
339We will create a visitor class and fill in the <TT>back_edge()</TT>
340method, which is the <a href="./DFSVisitor.html">DFSVisitor</a> method
341that is called when DFS explores an edge to an already discovered
342vertex. A call to this method indicates the existence of a
343cycle. Inheriting from <TT>dfs_visitor&lt;&gt;</TT> 
344provides the visitor with empty versions of the other visitor methods.
345Once our visitor is created, we can construct and object and pass it
346to the BGL algorithm.  Visitor objects are passed by value inside of
347the BGL algorithms, so the <TT>has_cycle</TT> flag is stored by
348reference in this visitor.
349
350<P>
351<PRE>
352  struct cycle_detector : public dfs_visitor&lt;&gt;
353  {
354    cycle_detector( bool&amp; has_cycle)
355      : _has_cycle(has_cycle) { }
356
357    template &lt;class Edge, class Graph&gt;
358    void back_edge(Edge, Graph&amp;) {
359      _has_cycle = true;
360    }
361  protected:
362    bool&amp; _has_cycle;
363  };
364</PRE>
365
366<P>
367We can now invoke the BGL <TT>depth_first_search()</TT>
368algorithm and pass in the cycle detector visitor.
369
370<P>
371<PRE>
372  bool has_cycle = false;
373  cycle_detector vis(has_cycle);
374  boost::depth_first_search(g, visitor(vis));
375  std::cout &lt;&lt; "The graph has a cycle? " &lt;&lt; has_cycle &lt;&lt; std::endl;
376</PRE>
377
378<P>
379The graph in <A HREF="#fig:file-dep">Figure 1</A> (ignoring the dotted
380line) did not have any cycles, but if we add a dependency between
381<TT>bar.cpp</TT> and <TT>dax.h</TT> there there will be. Such a
382dependency would be flagged as a user error.
383
384<P>
385<PRE>
386  add_edge(bar_cpp, dax_h, g);
387</PRE>
388
389
390<br>
391<HR>
392<TABLE>
393<TR valign=top>
394<TD nowrap>Copyright &copy 2000-2001</TD><TD>
395<A HREF="../../../people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
396</TD></TR></TABLE>
397
398</BODY>
399</HTML> 
Note: See TracBrowser for help on using the repository browser.