зеркало из https://github.com/microsoft/clang.git
Add new performance numbers; no discussion yet. Obvious two
conclusions are our PCH generation is way faster than gcc, and the Python based driver kills compile times. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@65980 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Родитель
d198abae59
Коммит
3d1c9462e3
|
@ -0,0 +1,134 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1" />
|
||||
<title>Clang - Performance</title>
|
||||
<link type="text/css" rel="stylesheet" href="menu.css" />
|
||||
<link type="text/css" rel="stylesheet" href="content.css" />
|
||||
<style type="text/css">
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!--#include virtual="menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<!--*************************************************************************-->
|
||||
<h1>Clang - Performance</h1>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<p>This page tracks the compile time performance of Clang on two
|
||||
interesting benchmarks:
|
||||
<ul>
|
||||
<li><i>Sketch</i>: The Objective-C example application shipped on
|
||||
Mac OS X as part of Xcode. <i>Sketch</i> is indicative of a
|
||||
"typical" Objective-C app. The source itself has a relatively
|
||||
small amount of code (~7,500 lines of source code), but it relies
|
||||
on the extensive Cocoa APIs to build its functionality. Like many
|
||||
Objective-C applications, it includes
|
||||
<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
|
||||
significant stress test of the front-end's performance on lexing,
|
||||
preprocessing, parsing, and syntax analysis.</li>
|
||||
<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
|
||||
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
|
||||
large amount of C source code (~220,000 lines) with few system
|
||||
dependencies. This stresses the back-end's performance on generating
|
||||
assembly code and debug information.</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
<!--*************************************************************************-->
|
||||
<h2><a name="enduser">Experiments</a></h2>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<p>Measurements are done by serially processing each file in the
|
||||
respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
|
||||
order to track the performance of various subsystems the timings have
|
||||
been broken down into separate stages where possible:
|
||||
|
||||
<ul>
|
||||
<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
|
||||
perform any output. For gcc and llvm-gcc, the -MM option is used
|
||||
as a rough equivalent to this step.</li>
|
||||
<li><tt>-parse-noop</tt>: This option runs the parser on the input,
|
||||
but without semantic analysis or any output. gcc and llvm-gcc have
|
||||
no equivalent for this option.</li>
|
||||
<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
|
||||
analysis.</li>
|
||||
<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
|
||||
converts to the LLVM intermediate representation but doesn't
|
||||
generate native code.</li>
|
||||
<li><tt>-S -O0</tt>: Perform actual code generation to produce a
|
||||
native assembler file.</li>
|
||||
<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
|
||||
the assembly output.</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
<p>This set of stages is chosen to be approximately additive, that is
|
||||
each subsequent stage simply adds some additional processing. The
|
||||
timings measure the delta of the given stage from the previous
|
||||
one. For example, the timings for <tt>-fsyntax-only</tt> below show
|
||||
the difference of running with <tt>-fsyntax-only</tt> versus running
|
||||
with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
|
||||
llvm-gcc. This amounts to a fairly accurate measure of only the time
|
||||
to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
|
||||
|
||||
<p>These timings are chosen to break down the compilation process for
|
||||
clang as much as possible. The graphs below show these numbers
|
||||
combined so that it is easy to see how the time for a particular task
|
||||
is divided among various components. For example, <tt>-S -O0</tt>
|
||||
includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
|
||||
|
||||
<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
|
||||
faster than the GCC optimizers at a given -O level, so we only focus on -O0
|
||||
compile time here.</p>
|
||||
|
||||
<!--*************************************************************************-->
|
||||
<h2><a name="enduser">Timing Results</a></h2>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<!--=======================================================================-->
|
||||
<h3><a name="2008-10-31">2008-10-31</a></h3>
|
||||
<!--=======================================================================-->
|
||||
|
||||
<center><h4>Sketch</h4></center>
|
||||
<img class="img_slide"
|
||||
src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>
|
||||
|
||||
<p>This shows Clang's substantial performance improvements in
|
||||
preprocessing and semantic analysis; over 90% faster on
|
||||
-fsyntax-only. As expected, time spent in code generation for this
|
||||
benchmark is relatively small. One caveat, Clang's debug information
|
||||
generation for Objective-C is very incomplete; this means the <tt>-S
|
||||
-O0 -g</tt> numbers are unfair since Clang is generating substantially
|
||||
less output.</p>
|
||||
|
||||
<p>This chart also shows the effect of using precompiled headers (PCH)
|
||||
on compiler time. gcc and llvm-gcc see a large performance improvement
|
||||
with PCH; about 4x in wall time. Unfortunately, Clang does not yet
|
||||
have an implementation of PCH-style optimizations, but we are actively
|
||||
working to address this.</p>
|
||||
|
||||
<center><h4>176.gcc</h4></center>
|
||||
<img class="img_slide"
|
||||
src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>
|
||||
|
||||
<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
|
||||
involves a large amount of code generation. The time spent in Clang's
|
||||
LLVM IR generation and code generation is on par with gcc's code
|
||||
generation time but the improved parsing & semantic analysis
|
||||
performance means Clang still comes in at ~29% faster versus gcc
|
||||
on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
|
||||
|
||||
<p>These numbers indicate that Clang still has room for improvement in
|
||||
several areas, notably our LLVM IR generation is significantly slower
|
||||
than that of llvm-gcc, and both Clang and llvm-gcc incur a
|
||||
significantly higher cost for adding debugging information compared to
|
||||
gcc.</p>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -19,7 +19,7 @@
|
|||
<h1>Clang - Performance</h1>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<p>This page tracks the compile time performance of Clang on two
|
||||
<p>This page shows the compile time performance of Clang on two
|
||||
interesting benchmarks:
|
||||
<ul>
|
||||
<li><i>Sketch</i>: The Objective-C example application shipped on
|
||||
|
@ -27,107 +27,85 @@ interesting benchmarks:
|
|||
"typical" Objective-C app. The source itself has a relatively
|
||||
small amount of code (~7,500 lines of source code), but it relies
|
||||
on the extensive Cocoa APIs to build its functionality. Like many
|
||||
Objective-C applications, it includes
|
||||
<tt>Cocoa/Cocoa.h</tt> in all of its source files, which represents a
|
||||
significant stress test of the front-end's performance on lexing,
|
||||
preprocessing, parsing, and syntax analysis.</li>
|
||||
Objective-C applications, it includes <tt>Cocoa/Cocoa.h</tt> in
|
||||
all of its source files, which represents a significant stress
|
||||
test of the front-end's performance on lexing, preprocessing,
|
||||
parsing, and syntax analysis.</li>
|
||||
<li><i>176.gcc</i>: This is the gcc-2.7.2.2 code base as present in
|
||||
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
|
||||
large amount of C source code (~220,000 lines) with few system
|
||||
dependencies. This stresses the back-end's performance on generating
|
||||
assembly code and debug information.</li>
|
||||
SPECINT 2000. In contrast to Sketch, <i>176.gcc</i> consists of a
|
||||
large amount of C source code (~200,000 lines) with few system
|
||||
dependencies. This stresses the back-end's performance on generating
|
||||
assembly code and debug information.</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
<p>
|
||||
For previous performance numbers, please
|
||||
go <a href="performance-2008-10-31.html">here</a>.
|
||||
</p>
|
||||
|
||||
<!--*************************************************************************-->
|
||||
<h2><a name="enduser">Experiments</a></h2>
|
||||
<h2><a name="experiments">Experiments</a></h2>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<p>Measurements are done by serially processing each file in the
|
||||
respective benchmark, using Clang, gcc, and llvm-gcc as compilers. In
|
||||
order to track the performance of various subsystems the timings have
|
||||
been broken down into separate stages where possible:
|
||||
<p>Measurements are done by running a full build (using xcodebuild or
|
||||
make for Sketch and 176.gcc respectively) using Clang and gcc 4.2 as
|
||||
compilers; gcc is run both with and without the new clang driver (ccc)
|
||||
in order to evaluate the overhead of the driver itself.</p>
|
||||
|
||||
<p>In order to track the performance of various subsystems the timings
|
||||
have been broken down into separate stages where possible. This is
|
||||
done by over-riding the CC environment variable used during the build
|
||||
to point to one of a few simple shell scripts which may skip part of
|
||||
the build.
|
||||
|
||||
<ul>
|
||||
<li><tt>-Eonly</tt>: This option runs the preprocessor but does not
|
||||
perform any output. For gcc and llvm-gcc, the -MM option is used
|
||||
as a rough equivalent to this step.</li>
|
||||
<li><tt>-parse-noop</tt>: This option runs the parser on the input,
|
||||
but without semantic analysis or any output. gcc and llvm-gcc have
|
||||
no equivalent for this option.</li>
|
||||
<li><tt>-fsyntax-only</tt>: This option runs the parser with semantic
|
||||
analysis.</li>
|
||||
<li><tt>-emit-llvm -O0</tt>: For Clang and llvm-gcc, this option
|
||||
converts to the LLVM intermediate representation but doesn't
|
||||
generate native code.</li>
|
||||
<li><tt>-S -O0</tt>: Perform actual code generation to produce a
|
||||
native assembler file.</li>
|
||||
<li><tt>-S -O0 -g</tt>: This adds emission of debug information to
|
||||
the assembly output.</li>
|
||||
<li><tt>non-compiler</tt>: The overhead of the build system itself;
|
||||
for Sketch this also includes the time to build/copy various
|
||||
non-source code resource files.</li>
|
||||
<li><tt>+ driver</tt>: Add execution of the driver, but do not execute any
|
||||
commands (by using the -### driver option).</li>
|
||||
<li><tt>+ pch gen</tt>: Add generation of PCH files.</li>
|
||||
<li><tt>+ cpp</tt>: Add preprocessing of source files (this time is
|
||||
include in syntax for gcc).</li>
|
||||
<li><tt>+ parse</tt>: Add parsing of source files (this time is
|
||||
include in syntax for gcc).</li>
|
||||
<li><tt>+ syntax</tt>: Add semantic checking of source files (for
|
||||
gcc, this includes preprocessing and parsing as well).</li>
|
||||
<li><tt>+ IRgen</tt>: Add generation of LLVM IR (gcc has no
|
||||
corresponding phase).</li>
|
||||
<li><tt>+ codegen</tt>: Add generation of assembler files.</li>
|
||||
<li><tt>+ assembler</tt>: Add assembler time to generate .o files.</li>
|
||||
<li><tt>+ linker</tt>: Add linker time.</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
<p>This set of stages is chosen to be approximately additive, that is
|
||||
each subsequent stage simply adds some additional processing. The
|
||||
timings measure the delta of the given stage from the previous
|
||||
one. For example, the timings for <tt>-fsyntax-only</tt> below show
|
||||
the difference of running with <tt>-fsyntax-only</tt> versus running
|
||||
with <tt>-parse-noop</tt> (for clang) or <tt>-MM</tt> with gcc and
|
||||
llvm-gcc. This amounts to a fairly accurate measure of only the time
|
||||
to perform semantic analysis (and parsing, in the case of gcc and llvm-gcc).</p>
|
||||
|
||||
<p>These timings are chosen to break down the compilation process for
|
||||
clang as much as possible. The graphs below show these numbers
|
||||
combined so that it is easy to see how the time for a particular task
|
||||
is divided among various components. For example, <tt>-S -O0</tt>
|
||||
includes the time of <tt>-fsyntax-only</tt> and <tt>-emit-llvm -O0</tt>.</p>
|
||||
|
||||
<p>Note that we already know that the LLVM optimizers are substantially (30-40%)
|
||||
faster than the GCC optimizers at a given -O level, so we only focus on -O0
|
||||
compile time here.</p>
|
||||
one. For example, the timings for <tt>+ syntax</tt> below show the
|
||||
difference of running with <tt>+ syntax</tt> versus running with <tt>+
|
||||
parse</tt> (for clang) or <tt>+ driver</tt> with gcc. This amounts to
|
||||
a fairly accurate measure of only the time to perform semantic
|
||||
analysis (and preprocessing/parsing, in the case of gcc).</p>
|
||||
|
||||
<!--*************************************************************************-->
|
||||
<h2><a name="enduser">Timing Results</a></h2>
|
||||
<h2><a name="timings">Timing Results</a></h2>
|
||||
<!--*************************************************************************-->
|
||||
|
||||
<!--=======================================================================-->
|
||||
<h3><a name="2008-10-31">2008-10-31</a></h3>
|
||||
<h3><a name="2009-03-02">2009-03-02</a></h3>
|
||||
<!--=======================================================================-->
|
||||
|
||||
<center><h4>Sketch</h4></center>
|
||||
<a href="timing-data/2009-03-02/sketch.pdf">
|
||||
<img class="img_slide"
|
||||
src="timing-data/2008-10-31/sketch.png" alt="Sketch Timings"/>
|
||||
src="timing-data/2009-03-02/sketch.png" alt="Sketch Timings"/>
|
||||
</a>
|
||||
|
||||
<p>This shows Clang's substantial performance improvements in
|
||||
preprocessing and semantic analysis; over 90% faster on
|
||||
-fsyntax-only. As expected, time spent in code generation for this
|
||||
benchmark is relatively small. One caveat, Clang's debug information
|
||||
generation for Objective-C is very incomplete; this means the <tt>-S
|
||||
-O0 -g</tt> numbers are unfair since Clang is generating substantially
|
||||
less output.</p>
|
||||
|
||||
<p>This chart also shows the effect of using precompiled headers (PCH)
|
||||
on compiler time. gcc and llvm-gcc see a large performance improvement
|
||||
with PCH; about 4x in wall time. Unfortunately, Clang does not yet
|
||||
have an implementation of PCH-style optimizations, but we are actively
|
||||
working to address this.</p>
|
||||
|
||||
<center><h4>176.gcc</h4></center>
|
||||
<a href="timing-data/2009-03-02/176.gcc.pdf">
|
||||
<img class="img_slide"
|
||||
src="timing-data/2008-10-31/176.gcc.png" alt="176.gcc Timings"/>
|
||||
|
||||
<p>Unlike the <i>Sketch</i> timings, compilation of <i>176.gcc</i>
|
||||
involves a large amount of code generation. The time spent in Clang's
|
||||
LLVM IR generation and code generation is on par with gcc's code
|
||||
generation time but the improved parsing & semantic analysis
|
||||
performance means Clang still comes in at ~29% faster versus gcc
|
||||
on <tt>-S -O0 -g</tt> and ~20% faster versus llvm-gcc.</p>
|
||||
|
||||
<p>These numbers indicate that Clang still has room for improvement in
|
||||
several areas, notably our LLVM IR generation is significantly slower
|
||||
than that of llvm-gcc, and both Clang and llvm-gcc incur a
|
||||
significantly higher cost for adding debugging information compared to
|
||||
gcc.</p>
|
||||
src="timing-data/2009-03-02/176.gcc.png" alt="176.gcc Timings"/>
|
||||
</a>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
|
|
Двоичный файл не отображается.
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 75 KiB |
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Двоичный файл не отображается.
Двоичный файл не отображается.
После Ширина: | Высота: | Размер: 76 KiB |
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Загрузка…
Ссылка в новой задаче