Planet

navi

home

PPS

about

screenshots

download

development

forum

Context Navigation

source: downloads/boost_1_33_1/libs/graph/doc/file_dependency_example.html @ 12

Last change on this file since 12 was 12, checked in by landauf, 18 years ago
added boost
File size: 14.1 KB

Line
1	<HTML>
2	<!--
3	-- Copyright (c) Jeremy Siek 2000
4	--
5	-- Permission to use, copy, modify, distribute and sell this software
6	-- and its documentation for any purpose is hereby granted without fee,
7	-- provided that the above copyright notice appears in all copies and
8	-- that both that copyright notice and this permission notice appear
9	-- in supporting documentation. Silicon Graphics makes no
10	-- representations about the suitability of this software for any
11	-- purpose. It is provided "as is" without express or implied warranty.
12	-->
13	<Head>
14	<Title>File Dependency Example</Title>
15	<BODY BGCOLOR="#ffffff" LINK="#0000ee" TEXT="#000000" VLINK="#551a8b"
16	ALINK="#ff0000">
17	<IMG SRC="../../../boost.png"
18	ALT="C++ Boost" width="277" height="86">
19
20	<BR Clear>
21
22
23	<H1><A NAME="sec:file-depend-eg"></A>
24	File Dependency Example
25	</H1>
26
27	<P>
28	One of the most common uses of the graph abstraction in computer
29	science is to track dependencies. An example of dependency tracking
30	that we deal with on a day to day basis is the compilation
31	dependencies for files in programs that we write. These dependencies
32	are used inside programs such as <TT>make</TT> or in an IDE such as
33	Visual C++ to minimize the number of files that must be recompiled
34	after some changes have been made.
35
36	<P>
37	<A HREF="#fig:file-dep">Figure 1</A> shows a graph that has a vertex
38	for each source file, object file, and library that is used in the
39	<TT>killerapp</TT> program. The edges in the graph show which files
40	are used in creating other files. The choice of which direction to
41	point the arrows is somewhat arbitrary. As long as we are consistent
42	in remembering that the arrows mean ``used by'' then things will work
43	out. The opposite direction would mean ``depends on''.
44
45	<P>
46
47	<P></P>
48	<DIV ALIGN="CENTER"><A NAME="fig:file-dep"></A>
49	<TABLE>
50	<CAPTION ALIGN="BOTTOM"><STRONG>Figure 1:</STRONG>
51	A graph representing file dependencies.</CAPTION>
52	<TR><TD><IMG SRC="./figs/file_dep.gif" width="331" height="351"></TD></TR>
53	</TABLE>
54	</DIV><P></P>
55
56	<P>
57	A compilation system such as <TT>make</TT> has to be able to answer a
58	number of questions:
59
60	<P>
61
62	<OL>
63	<LI>If we need to compile (or recompile) all of the files, what order
64	should that be done it?
65	</LI>
66	<LI>What files can be compiled in parallel?
67	</LI>
68	<LI>If a file is changed, which files must be recompiled?
69	</LI>
70	<LI>Are there any cycles in the dependencies? (which means the user
71	has made a mistake and an error should be emitted)
72	</LI>
73	</OL>
74
75	<P>
76	In the following examples we will formulate each of these questions in
77	terms of the dependency graph, and then find a graph algorithm to
78	provide the solution. The graph in <A HREF="#fig:file-dep">Figure
79	1</A> will be used in all of the following examples. The source code
80	for this example can be found in the file <a
81	href="../example/file_dependencies.cpp"><TT>examples/file_dependencies.cpp</TT></a>.
82
83	<P>
84
85	<H2>Graph Setup</H2>
86
87	<P>
88	Here we show the construction of the graph. For simplicity we have
89	constructed the graph "by-hand". A compilation system such
90	as <TT>make</TT> would instead parse a <TT>Makefile</TT> to get the
91	list of files and to set-up the dependencies. We use the
92	<TT>adjacency_list</TT> class to represent the graph. The
93	<TT>vecS</TT> selector means that a <TT>std::vector</TT> will be used
94	to represent each edge-list, which provides efficient traversal. The
95	<TT>directedS</TT> selector means we want a directed graph, and the
96	<TT>color_property</TT> attaches a color property to each vertex of the
97	graph. The color property will be used in several of the algorithms in
98	the following sections.
99
100	<P>
101	<PRE>
102	enum files_e { dax_h, yow_h, boz_h, zow_h, foo_cpp,
103	foo_o, bar_cpp, bar_o, libfoobar_a,
104	zig_cpp, zig_o, zag_cpp, zag_o,
105	libzigzag_a, killerapp, N };
106	const char* name[] = { "dax.h", "yow.h", "boz.h", "zow.h", "foo.cpp",
107	"foo.o", "bar.cpp", "bar.o", "libfoobar.a",
108	"zig.cpp", "zig.o", "zag.cpp", "zag.o",
109	"libzigzag.a", "killerapp" };
110
111	typedef std::pair<int, int> Edge;
112	Edge used_by[] = {
113	Edge(dax_h, foo_cpp), Edge(dax_h, bar_cpp), Edge(dax_h, yow_h),
114	Edge(yow_h, bar_cpp), Edge(yow_h, zag_cpp),
115	Edge(boz_h, bar_cpp), Edge(boz_h, zig_cpp), Edge(boz_h, zag_cpp),
116	Edge(zow_h, foo_cpp),
117	Edge(foo_cpp, foo_o),
118	Edge(foo_o, libfoobar_a),
119	Edge(bar_cpp, bar_o),
120	Edge(bar_o, libfoobar_a),
121	Edge(libfoobar_a, libzigzag_a),
122	Edge(zig_cpp, zig_o),
123	Edge(zig_o, libzigzag_a),
124	Edge(zag_cpp, zag_o),
125	Edge(zag_o, libzigzag_a),
126	Edge(libzigzag_a, killerapp)
127	};
128
129	using namespace boost;
130	typedef adjacency_list<vecS, vecS, directedS,
131	property<vertex_color_t, default_color_type>,
132	property<edge_weight_t, int>
133	> Graph;
134	Graph g(N, used_by, used_by + sizeof(used_by) / sizeof(Edge));
135	typedef graph_traits<Graph>::vertex_descriptor Vertex;
136	</PRE>
137
138	<P>
139
140	<H2>Compilation Order (All Files)</H2>
141
142	<P>
143	On the first invocation of <TT>make</TT> for a particular project, all
144	of the files must be compiled. Given the dependencies between the
145	various files, what is the correct order in which to compile and link
146	them? First we need to formulate this in terms of a graph. Finding a
147	compilation order is the same as ordering the vertices in the graph.
148	The constraint on the ordering is the file dependencies which we have
149	represented as edges. So if there is an edge <i>(u,v)</i> in the graph
150	then <i>v</i> better not come before <i>u</i> in the ordering. It
151	turns out that this kind of constrained ordering is called a
152	<I>topological sort</I>. Therefore, answering the question of
153	compilation order is as easy as calling the BGL algorithm <a
154	href="./topological_sort.html"><TT>topological_sort()</TT></a>. The
155	traditional form of the output for topological sort is a linked-list
156	of the sorted vertices. The BGL algorithm instead puts the sorted
157	vertices into any <a
158	href="http://www.sgi.com/tech/stl/OutputIterator.html">OutputIterator</a>,
159	which allows for much more flexibility. Here we use the
160	<TT>std::front_insert_iterator</TT> to create an output iterator that
161	inserts the vertices on the front of a linked list. Other possible
162	options are writing the output to a file or inserting into a different
163	STL or custom-made container.
164
165	<P>
166	<PRE>
167	typedef std::list<Vertex> MakeOrder;
168	MakeOrder make_order;
169	boost::topological_sort(g, std::front_inserter(make_order));
170
171	std::cout << "make ordering: ";
172	for (MakeOrder::iterator i = make_order.begin();
173	i != make_order.end(); ++i)
174	std::cout << name[*i] << " ";
175	std::cout << std::endl;
176	</PRE>
177	The output of this is:
178	<PRE>
179	make ordering: zow.h boz.h zig.cpp zig.o dax.h yow.h zag.cpp \
180	zag.o bar.cpp bar.o foo.cpp foo.o libfoobar.a libzigzag.a killerapp
181	</PRE>
182
183	<P>
184
185	<H2><A NAME="sec:parallel-compilation"></A>
186	Parallel Compilation
187	</H2>
188
189	<P>
190	Another question the compilation system might need to answer is: what
191	files can be compiled simultaneously? This would allow the system to
192	spawn threads and utilize multiple processors to speed up the build.
193	This question can also be put in a slightly different way: what is the
194	earliest time that a file can be built assuming that an unlimited
195	number of files can be built at the same time? The main criteria for
196	when a file can be built is that all of the files it depends on must
197	already be built. To simplify things for this example, we'll assume
198	that each file takes 1 time unit to build (even header files). The
199	main observation for determining the ``time slot'' for a file is that
200	the time slot must be one more than the maximum time-slot of the files
201	it depends on.
202
203	<P>
204	This idea of calculating a value based on the previously computed
205	values of neighboring vertices is the same idea behind Dijkstra's
206	single-source shortest paths algorithm (see <a
207	href="./dijkstra_shortest_paths.html"><tt>dijkstra_shortest_paths()</tt></a>). The
208	main difference between this situation and a shortest-path algorithm
209	is that we want to use the maximum of the neighbors' values instead of
210	the minimum. In addition, we do not have a single source
211	vertex. Instead we will want to treat all vertices with in-degree of
212	zero as sources (i.e., vertices with no edges coming into them). So we
213	use Dijkstra's algorithm with several extra parameters instead
214	of relying on the defaults.
215
216	<P>
217	To use <TT>dijkstra_shortest_paths()</TT>, we must first set up the vertex
218	and edge properties that will be used in the algorithm. We will need
219	a time property (replacing the distance property of Dijkstra's
220	algorithm) and an edge weight property. We will use a
221	<TT>std::vector</TT> to store the time. The weight property has already
222	been attached to the graph via a plug-in so here we just declare an
223	map for the internal weight property.
224
225	<P>
226	<PRE>
227	std::vector<int> time(N, 0);
228	typedef std::vector<int>::iterator Time;
229	using boost::edge_weight_t;
230	typedef boost::property_map<Graph, edge_weight_t>::type Weight;
231	Weight weight = get(edge_weight, g);
232	</PRE>
233
234	<P>
235	The next step is to identify the vertices with zero in-degree which
236	will be our ``source'' vertices from which to start the shortest path
237	searches. The in-degrees can be calculated with the following loop.
238
239	<P>
240	<PRE>
241	std::vector<int> in_degree(N, 0);
242	Graph::vertex_iterator i, iend;
243	Graph::out_edge_iterator j, jend;
244	for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
245	for (boost::tie(j, jend) = out_edges(*i, g); j != jend; ++j)
246	in_degree[target(*j, g)] += 1;
247	</PRE>
248
249	<P>
250	Next we need to define comparison of the "cost". In this
251	case we want each file to have a time stamp greater than any of its
252	predecessors. Therefore we define comparison with the
253	<TT>std::greater<int></TT> function object. We also need to
254	tell the algorithm that we want to use addition to combine time
255	values, so we use <TT>std::plus<int></TT>.
256
257	<P>
258	<PRE>
259	std::greater<int> compare;
260	std::plus<int> combine;
261	</PRE>
262
263	<P>
264	We are now ready to call <TT>uniform_cost_search()</TT>. We just
265	loop through all the vertices in the graph, and invoke the algorithm
266	if the vertex has zero in-degree.
267
268	<P>
269	<PRE>
270	for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
271	if (in_degree[*i] == 0)
272	boost::dijkstra_shortest_paths(g, *i,
273	distance_map(&time[0]).
274	weight_map(weight).
275	distance_compare(compare).
276	distance_combine(combine));
277	</PRE>
278
279	<P>
280	Last, we output the time-slot that we've calculated for each vertex.
281
282	<P>
283	<PRE>
284	std::cout << "parallel make ordering, " << std::endl
285	<< " vertices with same group number" << std::endl
286	<< " can be made in parallel" << std::endl << std::endl;
287	for (boost::tie(i, iend) = vertices(g); i != iend; ++i)
288	std::cout << "time_slot[" << name[i] << "] = " << time[i] << std::endl;
289	</PRE>
290	The output is:
291	<PRE>
292	parallel make ordering,
293	vertices with same group number
294	can be made in parallel
295	time_slot[dax.h] = 0
296	time_slot[yow.h] = 1
297	time_slot[boz.h] = 0
298	time_slot[zow.h] = 0
299	time_slot[foo.cpp] = 1
300	time_slot[foo.o] = 2
301	time_slot[bar.cpp] = 2
302	time_slot[bar.o] = 3
303	time_slot[libfoobar.a] = 4
304	time_slot[zig.cpp] = 1
305	time_slot[zig.o] = 2
306	time_slot[zag.cpp] = 2
307	time_slot[zag.o] = 3
308	time_slot[libzigzag.a] = 5
309	time_slot[killerapp] = 6
310	</PRE>
311
312	<P>
313
314	<H2><A NAME="SECTION001014000000000000000"></A>
315	<A NAME="sec:cycles"></A>
316	<BR>
317	Cyclic Dependencies
318	</H2>
319
320	<P>
321	Another question the compilation system needs to be able to answer is
322	whether there are any cycles in the dependencies. If there are cycles,
323	the system will need to report an error to the user so that the cycles
324	can be removed. One easy way to detect a cycle is to run a <a
325	href="./depth_first_search.html">depth-first search</a>, and if the
326	search runs into an already discovered vertex (of the current search
327	tree), then there is a cycle. The BGL graph search algorithms (which
328	includes
329	<TT>depth_first_search()</TT>) are all extensible via the
330	<I>visitor</I> mechanism. A visitor is similar to a function object,
331	but it has several methods instead of just the one
332	<TT>operator()</TT>. The visitor's methods are called at certain
333	points within the algorithm, thereby giving the user a way to extend
334	the functionality of the graph search algorithms. See Section <A
335	HREF="visitor_concepts.html">Visitor Concepts</A>
336	for a detailed description of visitors.
337
338	<P>
339	We will create a visitor class and fill in the <TT>back_edge()</TT>
340	method, which is the <a href="./DFSVisitor.html">DFSVisitor</a> method
341	that is called when DFS explores an edge to an already discovered
342	vertex. A call to this method indicates the existence of a
343	cycle. Inheriting from <TT>dfs_visitor<></TT>
344	provides the visitor with empty versions of the other visitor methods.
345	Once our visitor is created, we can construct and object and pass it
346	to the BGL algorithm. Visitor objects are passed by value inside of
347	the BGL algorithms, so the <TT>has_cycle</TT> flag is stored by
348	reference in this visitor.
349
350	<P>
351	<PRE>
352	struct cycle_detector : public dfs_visitor<>
353	{
354	cycle_detector( bool& has_cycle)
355	: _has_cycle(has_cycle) { }
356
357	template <class Edge, class Graph>
358	void back_edge(Edge, Graph&) {
359	_has_cycle = true;
360	}
361	protected:
362	bool& _has_cycle;
363	};
364	</PRE>
365
366	<P>
367	We can now invoke the BGL <TT>depth_first_search()</TT>
368	algorithm and pass in the cycle detector visitor.
369
370	<P>
371	<PRE>
372	bool has_cycle = false;
373	cycle_detector vis(has_cycle);
374	boost::depth_first_search(g, visitor(vis));
375	std::cout << "The graph has a cycle? " << has_cycle << std::endl;
376	</PRE>
377
378	<P>
379	The graph in <A HREF="#fig:file-dep">Figure 1</A> (ignoring the dotted
380	line) did not have any cycles, but if we add a dependency between
381	<TT>bar.cpp</TT> and <TT>dax.h</TT> there there will be. Such a
382	dependency would be flagged as a user error.
383
384	<P>
385	<PRE>
386	add_edge(bar_cpp, dax_h, g);
387	</PRE>
388
389
390	<br>
391	<HR>
392	<TABLE>
393	<TR valign=top>
394	<TD nowrap>Copyright &copy 2000-2001</TD><TD>
395	<A HREF="../../../people/jeremy_siek.htm">Jeremy Siek</A>, Indiana University (<A HREF="mailto:jsiek@osl.iu.edu">jsiek@osl.iu.edu</A>)
396	</TD></TR></TABLE>
397
398	</BODY>
399	</HTML>

Note: See TracBrowser for help on using the repository browser.

Download in other formats: