Add new performance numbers; no discussion yet. Obvious two

conclusions are our PCH generation is way faster than gcc, and the
Python based driver kills compile times.


git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@65980 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Daniel Dunbar 2009-03-04 00:04:28 +00:00
Родитель d198abae59
Коммит 3d1c9462e3
8 изменённых файлов: 3677 добавлений и 77 удалений

Просмотреть файл

@ -0,0 +1,134 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
<title>Clang - Performance</title>
<link type="text/css" rel="stylesheet" href="menu.css" />
<link type="text/css" rel="stylesheet" href="content.css" />
<style type="text/css">
</style>
</head>
<body>
<!--#include virtual="menu.html.incl"-->
<div id="content">
<!--*************************************************************************-->
<h1>Clang - Performance</h1>
<!--*************************************************************************-->
<p>This page tracks the compile time performance of Clang on two
interesting benchmarks:
<ul>
<li><i>Sketch</i>: The Objective-C example application shipped on
Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
"typical" Objective-C app. The source itself has a relatively
small amount of code (~7,500 lines of source code), but it relies
on the extensive Cocoa APIs to build its functionality. Like many
Objective-C applications, it includes
<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
significant stress test of the front-end's performance on lexing,
preprocessing, parsing, and syntax analysis.</li>
<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
large amount of C source code (~220,000 lines) with few system
dependencies. This stresses the back-end's performance on generating
assembly code and debug information.</li>
</ul>
</p>
<!--*************************************************************************-->
<h2><a name="enduser">Experiments</a></h2>
<!--*************************************************************************-->
<p>Measurements are done by serially processing each file in the
respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
order to track the performance of various subsystems the timings have
been broken down into separate stages where possible:
<ul>
<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
perform any output. For gcc and llvm-gcc, the -MM option is used
as a rough equivalent to this step.</li>
<li><tt>-parse-noop</tt>: This option runs the parser on the input,
but without semantic analysis or any output. gcc and llvm-gcc have
no equivalent for this option.</li>
<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
analysis.</li>
<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
converts to the LLVM intermediate representation but doesn't
generate native code.</li>
<li><tt>-S -O0</tt>: Perform actual code generation to produce a
native assembler file.</li>
<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
the assembly output.</li>
</ul>
</p>
<p>This set of stages is chosen to be approximately additive, that is
each subsequent stage simply adds some additional processing. The
timings measure the delta of the given stage from the previous
one. For example, the timings for <tt>-fsyntax-only</tt> below show
the difference of running with <tt>-fsyntax-only</tt> versus running
with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
llvm-gcc. This amounts to a fairly accurate measure of only the time
to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
<p>These timings are chosen to break down the compilation process for
clang as much as possible. The graphs below show these numbers
combined so that it is easy to see how the time for a particular task
is divided among various components. For example, <tt>-S -O0</tt>
includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
faster than the GCC optimizers at a given -O level, so we only focus on -O0
compile time here.</p>
<!--*************************************************************************-->
<h2><a name="enduser">Timing Results</a></h2>
<!--*************************************************************************-->
<!--=======================================================================-->
<h3><a name="2008-10-31">2008-10-31</a></h3>
<!--=======================================================================-->
<center><h4>Sketch</h4></center>
<img class="img_slide"
src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>
<p>This shows Clang's substantial performance improvements in
preprocessing and semantic analysis; over 90% faster on
-fsyntax-only. As expected, time spent in code generation for this
benchmark is relatively small. One caveat, Clang's debug information
generation for Objective-C is very incomplete; this means the <tt>-S
-O0 -g</tt> numbers are unfair since Clang is generating substantially
less output.</p>
<p>This chart also shows the effect of using precompiled headers (PCH)
on compiler time. gcc and llvm-gcc see a large performance improvement
with PCH; about 4x in wall time. Unfortunately, Clang does not yet
have an implementation of PCH-style optimizations, but we are actively
working to address this.</p>
<center><h4>176.gcc</h4></center>
<img class="img_slide"
src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>
<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
involves a large amount of code generation. The time spent in Clang's
LLVM IR generation and code generation is on par with gcc's code
generation time but the improved parsing & semantic analysis
performance means Clang still comes in at ~29% faster versus gcc
on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
<p>These numbers indicate that Clang still has room for improvement in
several areas, notably our LLVM IR generation is significantly slower
than that of llvm-gcc, and both Clang and llvm-gcc incur a
significantly higher cost for adding debugging information compared to
gcc.</p>
</div>
</body>
</html>

Просмотреть файл

@ -19,7 +19,7 @@
<h1>Clang - Performance</h1>
<!--*************************************************************************-->
<p>This page tracks the compile time performance of Clang on two
<p>This page shows the compile time performance of Clang on two
interesting benchmarks:
<ul>
<li><i>Sketch</i>: The Objective-C example application shipped on
@ -27,107 +27,85 @@ interesting benchmarks:
"typical" Objective-C app. The source itself has a relatively
small amount of code (~7,500 lines of source code), but it relies
on the extensive Cocoa APIs to build its functionality. Like many
Objective-C applications, it includes
<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
significant stress test of the front-end's performance on lexing,
preprocessing, parsing, and syntax analysis.</li>
Objective-C applications, it includes <tt>Cocoa/Cocoa.h</tt> in
all of its source files, which represents a significant stress
test of the front-end's performance on lexing, preprocessing,
parsing, and syntax analysis.</li>
<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
large amount of C source code (~220,000 lines) with few system
dependencies. This stresses the back-end's performance on generating
assembly code and debug information.</li>
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
large amount of C source code (~200,000 lines) with few system
dependencies. This stresses the back-end's performance on generating
assembly code and debug information.</li>
</ul>
</p>
<p>
For previous performance numbers, please
go <a href="performance-2008-10-31.html">here</a>.
</p>
<!--*************************************************************************-->
<h2><a name="enduser">Experiments</a></h2>
<h2><a name="experiments">Experiments</a></h2>
<!--*************************************************************************-->
<p>Measurements are done by serially processing each file in the
respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
order to track the performance of various subsystems the timings have
been broken down into separate stages where possible:
<p>Measurements are done by running a full build (using xcodebuild or
make for Sketch and 176.gcc respectively) using Clang and gcc 4.2 as
compilers; gcc is run both with and without the new clang driver (ccc)
in order to evaluate the overhead of the driver itself.</p>
<p>In order to track the performance of various subsystems the timings
have been broken down into separate stages where possible. This is
done by over-riding the CC environment variable used during the build
to point to one of a few simple shell scripts which may skip part of
the build.
<ul>
<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
perform any output. For gcc and llvm-gcc, the -MM option is used
as a rough equivalent to this step.</li>
<li><tt>-parse-noop</tt>: This option runs the parser on the input,
but without semantic analysis or any output. gcc and llvm-gcc have
no equivalent for this option.</li>
<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
analysis.</li>
<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
converts to the LLVM intermediate representation but doesn't
generate native code.</li>
<li><tt>-S -O0</tt>: Perform actual code generation to produce a
native assembler file.</li>
<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
the assembly output.</li>
<li><tt>non-compiler</tt>: The overhead of the build system itself;
for Sketch this also includes the time to build/copy various
non-source code resource files.</li>
<li><tt>+ driver</tt>: Add execution of the driver, but do not execute any
commands (by using the -### driver option).</li>
<li><tt>+ pch gen</tt>: Add generation of PCH files.</li>
<li><tt>+ cpp</tt>: Add preprocessing of source files (this time is
include in syntax for gcc).</li>
<li><tt>+ parse</tt>: Add parsing of source files (this time is
include in syntax for gcc).</li>
<li><tt>+ syntax</tt>: Add semantic checking of source files (for
gcc, this includes preprocessing and parsing as well).</li>
<li><tt>+ IRgen</tt>: Add generation of LLVM IR (gcc has no
corresponding phase).</li>
<li><tt>+ codegen</tt>: Add generation of assembler files.</li>
<li><tt>+ assembler</tt>: Add assembler time to generate .o files.</li>
<li><tt>+ linker</tt>: Add linker time.</li>
</ul>
</p>
<p>This set of stages is chosen to be approximately additive, that is
each subsequent stage simply adds some additional processing. The
timings measure the delta of the given stage from the previous
one. For example, the timings for <tt>-fsyntax-only</tt> below show
the difference of running with <tt>-fsyntax-only</tt> versus running
with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
llvm-gcc. This amounts to a fairly accurate measure of only the time
to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
<p>These timings are chosen to break down the compilation process for
clang as much as possible. The graphs below show these numbers
combined so that it is easy to see how the time for a particular task
is divided among various components. For example, <tt>-S -O0</tt>
includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
faster than the GCC optimizers at a given -O level, so we only focus on -O0
compile time here.</p>
one. For example, the timings for <tt>+ syntax</tt> below show the
difference of running with <tt>+ syntax</tt> versus running with <tt>+
parse</tt> (for clang) or <tt>+ driver</tt> with gcc. This amounts to
a fairly accurate measure of only the time to perform semantic
analysis (and preprocessing/parsing, in the case of gcc).</p>
<!--*************************************************************************-->
<h2><a name="enduser">Timing Results</a></h2>
<h2><a name="timings">Timing Results</a></h2>
<!--*************************************************************************-->
<!--=======================================================================-->
<h3><a name="2008-10-31">2008-10-31</a></h3>
<h3><a name="2009-03-02">2009-03-02</a></h3>
<!--=======================================================================-->
<center><h4>Sketch</h4></center>
<a href="timing-data/2009-03-02/sketch.pdf">
<img class="img_slide"
src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>
src="timing-data/2009-03-02/sketch.png" alt="Sketch Timings"/>
</a>
<p>This shows Clang's substantial performance improvements in
preprocessing and semantic analysis; over 90% faster on
-fsyntax-only. As expected, time spent in code generation for this
benchmark is relatively small. One caveat, Clang's debug information
generation for Objective-C is very incomplete; this means the <tt>-S
-O0 -g</tt> numbers are unfair since Clang is generating substantially
less output.</p>
<p>This chart also shows the effect of using precompiled headers (PCH)
on compiler time. gcc and llvm-gcc see a large performance improvement
with PCH; about 4x in wall time. Unfortunately, Clang does not yet
have an implementation of PCH-style optimizations, but we are actively
working to address this.</p>
<center><h4>176.gcc</h4></center>
<a href="timing-data/2009-03-02/176.gcc.pdf">
<img class="img_slide"
src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>
<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
involves a large amount of code generation. The time spent in Clang's
LLVM IR generation and code generation is on par with gcc's code
generation time but the improved parsing & semantic analysis
performance means Clang still comes in at ~29% faster versus gcc
on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
<p>These numbers indicate that Clang still has room for improvement in
several areas, notably our LLVM IR generation is significantly slower
than that of llvm-gcc, and both Clang and llvm-gcc incur a
significantly higher cost for adding debugging information compared to
gcc.</p>
src="timing-data/2009-03-02/176.gcc.png" alt="176.gcc Timings"/>
</a>
</div>
</body>

Двоичные данные
www/timing-data/2009-03-02/176.gcc.pdf Normal file

Двоичный файл не отображается.

Двоичные данные
www/timing-data/2009-03-02/176.gcc.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 75 KiB

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Двоичные данные
www/timing-data/2009-03-02/sketch.pdf Normal file

Двоичный файл не отображается.

Двоичные данные
www/timing-data/2009-03-02/sketch.png Normal file

Двоичный файл не отображается.

После

Ширина:  |  Высота:  |  Размер: 76 KiB

Разница между файлами не показана из-за своего большого размера Загрузить разницу