1 | <html> |
---|
2 | <head> |
---|
3 | <title>Regular Expression Performance Comparison</title> |
---|
4 | <meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1"> |
---|
5 | <meta name="vs_targetSchema" content="http://schemas.microsoft.com/intellisense/ie5"> |
---|
6 | <meta name="Template" content="C:\PROGRAM FILES\MICROSOFT OFFICE\OFFICE\html.dot"> |
---|
7 | <meta name="GENERATOR" content="Microsoft FrontPage Express 2.0"> |
---|
8 | </head> |
---|
9 | <body bgcolor="#ffffff" link="#0000ff" vlink="#800080"> |
---|
10 | <h2>Regular Expression Performance Comparison</h2> |
---|
11 | <p> |
---|
12 | The following tables provide comparisons between the following regular |
---|
13 | expression libraries:</p> |
---|
14 | <p><a href="http://research.microsoft.com/projects/greta">GRETA</a>.</p> |
---|
15 | <p><a href="http://www.boost.org/">The Boost regex library</a>.</p> |
---|
16 | <p><a href="http://arglist.com/regex/">Henry Spencer's regular expression library</a> |
---|
17 | - this is provided for comparison as a typical non-backtracking implementation.</p> |
---|
18 | <P>Philip Hazel's <A href="http://www.pcre.org">PCRE</A> library.</P> |
---|
19 | <H3>Details</H3> |
---|
20 | <P>Machine: Intel Pentium 4 2.8GHz PC.</P> |
---|
21 | <P>Compiler: %compiler%.</P> |
---|
22 | <P>C++ Standard Library: %library%.</P> |
---|
23 | <P>OS: %os%.</P> |
---|
24 | <P>Boost version: %boost%.</P> |
---|
25 | <P>PCRE version: %pcre%.</P> |
---|
26 | <P> |
---|
27 | As ever care should be taken in interpreting the results, only sensible regular |
---|
28 | expressions (rather than pathological cases) are given, most are taken from the |
---|
29 | Boost regex examples, or from the <a href="http://www.regxlib.com/">Library of |
---|
30 | Regular Expressions</a>. In addition, some variation in the relative |
---|
31 | performance of these libraries can be expected on other machines - as memory |
---|
32 | access and processor caching effects can be quite large for most finite state |
---|
33 | machine algorithms.</P> |
---|
34 | <H3>Averages</H3> |
---|
35 | <P>The following are the average relative scores for all the tests: the perfect |
---|
36 | regular expression library would score 1, in practice anything less than 2 |
---|
37 | is pretty good.</P> |
---|
38 | <P>%averages%</P> |
---|
39 | <h3>Comparison 1: Long Search</h3> |
---|
40 | <p>For each of the following regular expressions the time taken to find all |
---|
41 | occurrences of the expression within a long English language text was measured |
---|
42 | (<a href="http://www.gutenberg.org/files/3200/old/mtent12.zip">mtent12.txt</a> |
---|
43 | from <a href="http://promo.net/pg/">Project Gutenberg</a>, 19Mb). </p> |
---|
44 | <P>%long_twain_search%</P> |
---|
45 | <h3>Comparison 2: Medium Sized Search</h3> |
---|
46 | <p>For each of the following regular expressions the time taken to find all |
---|
47 | occurrences of the expression within a medium sized English language text was |
---|
48 | measured (the first 50K from mtent12.txt). </p> |
---|
49 | <P>%short_twain_search%</P> |
---|
50 | <H3>Comparison 3: C++ Code Search</H3> |
---|
51 | <P>For each of the following regular expressions the time taken to find all |
---|
52 | occurrences of the expression within the C++ source file <A href="../../../boost/crc.hpp"> |
---|
53 | boost/crc.hpp</A> was measured. </P> |
---|
54 | <P>%code_search%</P> |
---|
55 | <H3> |
---|
56 | <H3>Comparison 4: HTML Document Search</H3> |
---|
57 | </H3> |
---|
58 | <P>For each of the following regular expressions the time taken to find all |
---|
59 | occurrences of the expression within the html file <A href="../../libraries.htm">libs/libraries.htm</A> |
---|
60 | was measured. </P> |
---|
61 | <P>%html_search%</P> |
---|
62 | <H3>Comparison 3: Simple Matches</H3> |
---|
63 | <p> |
---|
64 | For each of the following regular expressions the time taken to match against |
---|
65 | the text indicated was measured. </p> |
---|
66 | <P>%short_matches%</P> |
---|
67 | <hr> |
---|
68 | <p><i>© Copyright John Maddock 2003</i></p> |
---|
69 | <p><i>Use, modification and distribution are subject to the Boost Software License, |
---|
70 | Version 1.0. (See accompanying file <a href="../../../LICENSE_1_0.txt">LICENSE_1_0.txt</a> |
---|
71 | or copy at <a href="http://www.boost.org/LICENSE_1_0.txt">http://www.boost.org/LICENSE_1_0.txt</a>)</i></p> |
---|
72 | |
---|
73 | </body> |
---|
74 | </html> |
---|