зеркало из https://github.com/microsoft/clang-1.git
docs: Convert some docs to reST.
Converts: LanguageExtensions LibASTMatchers LibTooling PCHInternals ThreadSanitizer Tooling Patch by Mykhailo Pustovit! (with minor edits by Dmitri Gribenko and Sean Silva) git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@170048 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Родитель
b34ae9be52
Коммит
3872b46ba9
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
Разница между файлами не показана из-за своего большого размера
Загрузить разницу
|
@ -1,130 +0,0 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>Matching the Clang AST</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css" />
|
||||
<link type="text/css" rel="stylesheet" href="../content.css" />
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!--#include virtual="../menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h1>Matching the Clang AST</h1>
|
||||
<p>This document explains how to use Clang's LibASTMatchers to match interesting
|
||||
nodes of the AST and execute code that uses the matched nodes. Combined with
|
||||
<a href="LibTooling.html">LibTooling</a>, LibASTMatchers helps to write
|
||||
code-to-code transformation tools or query tools.</p>
|
||||
|
||||
<p>We assume basic knowledge about the Clang AST. See the
|
||||
<a href="IntroductionToTheClangAST.html">Introduction to the Clang AST</a> if
|
||||
you want to learn more about how the AST is structured.</p>
|
||||
|
||||
<!-- FIXME: create tutorial and link to the tutorial -->
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="intro">Introduction</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>LibASTMatchers provides a domain specific language to create predicates on Clang's
|
||||
AST. This DSL is written in and can be used from C++, allowing users to write
|
||||
a single program to both match AST nodes and access the node's C++ interface
|
||||
to extract attributes, source locations, or any other information provided on
|
||||
the AST level.</p>
|
||||
|
||||
<p>AST matchers are predicates on nodes in the AST. Matchers are created
|
||||
by calling creator functions that allow building up a tree of matchers, where
|
||||
inner matchers are used to make the match more specific.</p>
|
||||
|
||||
</p>For example, to create a matcher that matches all class or union declarations
|
||||
in the AST of a translation unit, you can call
|
||||
<a href="LibASTMatchersReference.html#recordDecl0Anchor">recordDecl()</a>.
|
||||
To narrow the match down, for example to find all class or union declarations with the name "Foo",
|
||||
insert a <a href="LibASTMatchersReference.html#hasName0Anchor">hasName</a>
|
||||
matcher: the call recordDecl(hasName("Foo")) returns a matcher that matches classes
|
||||
or unions that are named "Foo", in any namespace. By default, matchers that accept
|
||||
multiple inner matchers use an implicit <a href="LibASTMatchersReference.html#allOf0Anchor">allOf()</a>.
|
||||
This allows further narrowing down the match, for example to match all classes
|
||||
that are derived from "Bar": recordDecl(hasName("Foo"), isDerivedFrom("Bar")).</p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="writing">How to create a matcher</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>With more than a thousand classes in the Clang AST, one can quickly get lost
|
||||
when trying to figure out how to create a matcher for a specific pattern. This
|
||||
section will teach you how to use a rigorous step-by-step pattern to build the
|
||||
matcher you are interested in. Note that there will always be matchers missing
|
||||
for some part of the AST. See the section about <a href="#writing">how to write
|
||||
your own AST matchers</a> later in this document.</p>
|
||||
|
||||
<p>The precondition to using the matchers is to understand how the AST
|
||||
for what you want to match looks like. The <a href="IntroductionToTheClangAST.html">Introduction to the Clang AST</a>
|
||||
teaches you how to dump a translation unit's AST into a human readable format.</p>
|
||||
|
||||
<!-- FIXME: Introduce link to ASTMatchersTutorial.html -->
|
||||
<!-- FIXME: Introduce link to ASTMatchersCookbook.html -->
|
||||
|
||||
<p>In general, the strategy to create the right matchers is:</p>
|
||||
<ol>
|
||||
<li>Find the outermost class in Clang's AST you want to match.</li>
|
||||
<li>Look at the <a href="LibASTMatchersReference.html">AST Matcher Reference</a> for matchers that either match the
|
||||
node you're interested in or narrow down attributes on the node.</li>
|
||||
<li>Create your outer match expression. Verify that it works as expected.</li>
|
||||
<li>Examine the matchers for what the next inner node you want to match is.</li>
|
||||
<li>Repeat until the matcher is finished.</li>
|
||||
</ol>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="binding">Binding nodes in match expressions</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>Matcher expressions allow you to specify which parts of the AST are interesting
|
||||
for a certain task. Often you will want to then do something with the nodes
|
||||
that were matched, like building source code transformations.</p>
|
||||
|
||||
<p>To that end, matchers that match specific AST nodes (so called node matchers)
|
||||
are bindable; for example, recordDecl(hasName("MyClass")).bind("id") will bind
|
||||
the matched recordDecl node to the string "id", to be later retrieved in the
|
||||
<a href="http://clang.llvm.org/doxygen/classclang_1_1ast__matchers_1_1MatchFinder_1_1MatchCallback.html">match callback</a>.</p>
|
||||
|
||||
<!-- FIXME: Introduce link to ASTMatchersTutorial.html -->
|
||||
<!-- FIXME: Introduce link to ASTMatchersCookbook.html -->
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="writing">Writing your own matchers</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>There are multiple different ways to define a matcher, depending on its
|
||||
type and flexibility.</p>
|
||||
<ul>
|
||||
<li><b>VariadicDynCastAllOfMatcher<Base, Derived></b><p>Those match all nodes
|
||||
of type <i>Base</i> if they can be dynamically casted to <i>Derived</i>. The
|
||||
names of those matchers are nouns, which closely resemble <i>Derived</i>.
|
||||
VariadicDynCastAllOfMatchers are the backbone of the matcher hierarchy. Most
|
||||
often, your match expression will start with one of them, and you can
|
||||
<a href="#binding">bind</a> the node they represent to ids for later processing.</p>
|
||||
<p>VariadicDynCastAllOfMatchers are callable classes that model variadic
|
||||
template functions in C++03. They take an aribtrary number of Matcher<Derived>
|
||||
and return a Matcher<Base>.</p></li>
|
||||
<li><b>AST_MATCHER_P(Type, Name, ParamType, Param)</b><p> Most matcher definitions
|
||||
use the matcher creation macros. Those define both the matcher of type Matcher<Type>
|
||||
itself, and a matcher-creation function named <i>Name</i> that takes a parameter
|
||||
of type <i>ParamType</i> and returns the corresponding matcher.</p>
|
||||
<p>There are multiple matcher definition macros that deal with polymorphic return
|
||||
values and different parameter counts. See <a href="http://clang.llvm.org/doxygen/ASTMatchersMacros_8h.html">ASTMatchersMacros.h</a>.
|
||||
</p></li>
|
||||
<li><b>Matcher creation functions</b><p>Matchers are generated by nesting
|
||||
calls to matcher creation functions. Most of the time those functions are either
|
||||
created by using VariadicDynCastAllOfMatcher or the matcher creation macros
|
||||
(see below). The free-standing functions are an indication that this matcher
|
||||
is just a combination of other matchers, as is for example the case with
|
||||
<a href="LibASTMatchersReference.html#callee1Anchor">callee</a>.</p></li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,134 @@
|
|||
======================
|
||||
Matching the Clang AST
|
||||
======================
|
||||
|
||||
This document explains how to use Clang's LibASTMatchers to match interesting
|
||||
nodes of the AST and execute code that uses the matched nodes. Combined with
|
||||
:doc:`LibTooling`, LibASTMatchers helps to write code-to-code transformation
|
||||
tools or query tools.
|
||||
|
||||
We assume basic knowledge about the Clang AST. See the `Introduction to the
|
||||
Clang AST <IntroductionToTheClangAST.html>`_ if you want to learn more about
|
||||
how the AST is structured.
|
||||
|
||||
.. FIXME: create tutorial and link to the tutorial
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
LibASTMatchers provides a domain specific language to create predicates on
|
||||
Clang's AST. This DSL is written in and can be used from C++, allowing users
|
||||
to write a single program to both match AST nodes and access the node's C++
|
||||
interface to extract attributes, source locations, or any other information
|
||||
provided on the AST level.
|
||||
|
||||
AST matchers are predicates on nodes in the AST. Matchers are created by
|
||||
calling creator functions that allow building up a tree of matchers, where
|
||||
inner matchers are used to make the match more specific.
|
||||
|
||||
For example, to create a matcher that matches all class or union declarations
|
||||
in the AST of a translation unit, you can call `recordDecl()
|
||||
<LibASTMatchersReference.html#recordDecl0Anchor>`_. To narrow the match down,
|
||||
for example to find all class or union declarations with the name "``Foo``",
|
||||
insert a `hasName <LibASTMatchersReference.html#hasName0Anchor>`_ matcher: the
|
||||
call ``recordDecl(hasName("Foo"))`` returns a matcher that matches classes or
|
||||
unions that are named "``Foo``", in any namespace. By default, matchers that
|
||||
accept multiple inner matchers use an implicit `allOf()
|
||||
<LibASTMatchersReference.html#allOf0Anchor>`_. This allows further narrowing
|
||||
down the match, for example to match all classes that are derived from
|
||||
"``Bar``": ``recordDecl(hasName("Foo"), isDerivedFrom("Bar"))``.
|
||||
|
||||
How to create a matcher
|
||||
-----------------------
|
||||
|
||||
With more than a thousand classes in the Clang AST, one can quickly get lost
|
||||
when trying to figure out how to create a matcher for a specific pattern. This
|
||||
section will teach you how to use a rigorous step-by-step pattern to build the
|
||||
matcher you are interested in. Note that there will always be matchers missing
|
||||
for some part of the AST. See the section about :ref:`how to write your own
|
||||
AST matchers <astmatchers-writing>` later in this document.
|
||||
|
||||
.. FIXME: why is it linking back to the same section?!
|
||||
|
||||
The precondition to using the matchers is to understand how the AST for what you
|
||||
want to match looks like. The
|
||||
`Introduction to the Clang AST <IntroductionToTheClangAST.html>`_ teaches you
|
||||
how to dump a translation unit's AST into a human readable format.
|
||||
|
||||
.. FIXME: Introduce link to ASTMatchersTutorial.html
|
||||
.. FIXME: Introduce link to ASTMatchersCookbook.html
|
||||
|
||||
In general, the strategy to create the right matchers is:
|
||||
|
||||
#. Find the outermost class in Clang's AST you want to match.
|
||||
#. Look at the `AST Matcher Reference <LibASTMatchersReference.html>`_ for
|
||||
matchers that either match the node you're interested in or narrow down
|
||||
attributes on the node.
|
||||
#. Create your outer match expression. Verify that it works as expected.
|
||||
#. Examine the matchers for what the next inner node you want to match is.
|
||||
#. Repeat until the matcher is finished.
|
||||
|
||||
.. _astmatchers-bind:
|
||||
|
||||
Binding nodes in match expressions
|
||||
----------------------------------
|
||||
|
||||
Matcher expressions allow you to specify which parts of the AST are interesting
|
||||
for a certain task. Often you will want to then do something with the nodes
|
||||
that were matched, like building source code transformations.
|
||||
|
||||
To that end, matchers that match specific AST nodes (so called node matchers)
|
||||
are bindable; for example, ``recordDecl(hasName("MyClass")).bind("id")`` will
|
||||
bind the matched ``recordDecl`` node to the string "``id``", to be later
|
||||
retrieved in the `match callback
|
||||
<http://clang.llvm.org/doxygen/classclang_1_1ast__matchers_1_1MatchFinder_1_1MatchCallback.html>`_.
|
||||
|
||||
.. FIXME: Introduce link to ASTMatchersTutorial.html
|
||||
.. FIXME: Introduce link to ASTMatchersCookbook.html
|
||||
|
||||
Writing your own matchers
|
||||
-------------------------
|
||||
|
||||
There are multiple different ways to define a matcher, depending on its type
|
||||
and flexibility.
|
||||
|
||||
``VariadicDynCastAllOfMatcher<Base, Derived>``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Those match all nodes of type *Base* if they can be dynamically casted to
|
||||
*Derived*. The names of those matchers are nouns, which closely resemble
|
||||
*Derived*. ``VariadicDynCastAllOfMatchers`` are the backbone of the matcher
|
||||
hierarchy. Most often, your match expression will start with one of them, and
|
||||
you can :ref:`bind <astmatchers-bind>` the node they represent to ids for later
|
||||
processing.
|
||||
|
||||
``VariadicDynCastAllOfMatchers`` are callable classes that model variadic
|
||||
template functions in C++03. They take an aribtrary number of
|
||||
``Matcher<Derived>`` and return a ``Matcher<Base>``.
|
||||
|
||||
``AST_MATCHER_P(Type, Name, ParamType, Param)``
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Most matcher definitions use the matcher creation macros. Those define both
|
||||
the matcher of type ``Matcher<Type>`` itself, and a matcher-creation function
|
||||
named *Name* that takes a parameter of type *ParamType* and returns the
|
||||
corresponding matcher.
|
||||
|
||||
There are multiple matcher definition macros that deal with polymorphic return
|
||||
values and different parameter counts. See `ASTMatchersMacros.h
|
||||
<http://clang.llvm.org/doxygen/ASTMatchersMacros_8h.html>`_.
|
||||
|
||||
.. _astmatchers-writing:
|
||||
|
||||
Matcher creation functions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Matchers are generated by nesting calls to matcher creation functions. Most of
|
||||
the time those functions are either created by using
|
||||
``VariadicDynCastAllOfMatcher`` or the matcher creation macros (see below).
|
||||
The free-standing functions are an indication that this matcher is just a
|
||||
combination of other matchers, as is for example the case with `callee
|
||||
<LibASTMatchersReference.html#callee1Anchor>`_.
|
||||
|
||||
.. FIXME: "... macros (see below)" --- there isn't anything below
|
||||
|
|
@ -1,212 +0,0 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>LibTooling</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="../content.css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!--#include virtual="../menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h1>LibTooling</h1>
|
||||
<p>LibTooling is a library to support writing standalone tools based on
|
||||
Clang. This document will provide a basic walkthrough of how to write
|
||||
a tool using LibTooling.</p>
|
||||
<p>For the information on how to setup Clang Tooling for LLVM see
|
||||
<a href="HowToSetupToolingForLLVM.html">HowToSetupToolingForLLVM.html</a></p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="intro">Introduction</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>Tools built with LibTooling, like Clang Plugins, run
|
||||
<code>FrontendActions</code> over code.
|
||||
<!-- See FIXME for a tutorial on how to write FrontendActions. -->
|
||||
In this tutorial, we'll demonstrate the different ways of running clang's
|
||||
<code>SyntaxOnlyAction</code>, which runs a quick syntax check, over a bunch of
|
||||
code.</p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="runoncode">Parsing a code snippet in memory.</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>If you ever wanted to run a <code>FrontendAction</code> over some sample
|
||||
code, for example to unit test parts of the Clang AST,
|
||||
<code>runToolOnCode</code> is what you looked for. Let me give you an example:
|
||||
<pre>
|
||||
#include "clang/Tooling/Tooling.h"
|
||||
|
||||
TEST(runToolOnCode, CanSyntaxCheckCode) {
|
||||
// runToolOnCode returns whether the action was correctly run over the
|
||||
// given code.
|
||||
EXPECT_TRUE(runToolOnCode(new clang::SyntaxOnlyAction, "class X {};"));
|
||||
}
|
||||
</pre>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="standalonetool">Writing a standalone tool.</h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>Once you unit tested your <code>FrontendAction</code> to the point where it
|
||||
cannot possibly break, it's time to create a standalone tool. For a standalone
|
||||
tool to run clang, it first needs to figure out what command line arguments to
|
||||
use for a specified file. To that end we create a
|
||||
<code>CompilationDatabase</code>. There are different ways to create a
|
||||
compilation database, and we need to support all of them depending on
|
||||
command-line options. There's the <code>CommonOptionsParser</code> class
|
||||
that takes the responsibility to parse command-line parameters related to
|
||||
compilation databases and inputs, so that all tools share the implementation.
|
||||
</p>
|
||||
|
||||
<h3 id="parsingcommonoptions">Parsing common tools options.</h3>
|
||||
<p><code>CompilationDatabase</code> can be read from a build directory or the
|
||||
command line. Using <code>CommonOptionsParser</code> allows for explicit
|
||||
specification of a compile command line, specification of build path using the
|
||||
<code>-p</code> command-line option, and automatic location of the compilation
|
||||
database using source files paths.
|
||||
<pre>
|
||||
#include "clang/Tooling/CommonOptionsParser.h"
|
||||
|
||||
using namespace clang::tooling;
|
||||
|
||||
int main(int argc, const char **argv) {
|
||||
// CommonOptionsParser constructor will parse arguments and create a
|
||||
// CompilationDatabase. In case of error it will terminate the program.
|
||||
CommonOptionsParser OptionsParser(argc, argv);
|
||||
|
||||
// Use OptionsParser.GetCompilations() and OptionsParser.GetSourcePathList()
|
||||
// to retrieve CompilationDatabase and the list of input file paths.
|
||||
}
|
||||
</pre>
|
||||
</p>
|
||||
|
||||
<h3 id="tool">Creating and running a ClangTool.</h3>
|
||||
<p>Once we have a <code>CompilationDatabase</code>, we can create a
|
||||
<code>ClangTool</code> and run our <code>FrontendAction</code> over some code.
|
||||
For example, to run the <code>SyntaxOnlyAction</code> over the files "a.cc" and
|
||||
"b.cc" one would write:
|
||||
<pre>
|
||||
// A clang tool can run over a number of sources in the same process...
|
||||
std::vector<std::string> Sources;
|
||||
Sources.push_back("a.cc");
|
||||
Sources.push_back("b.cc");
|
||||
|
||||
// We hand the CompilationDatabase we created and the sources to run over into
|
||||
// the tool constructor.
|
||||
ClangTool Tool(OptionsParser.GetCompilations(), Sources);
|
||||
|
||||
// The ClangTool needs a new FrontendAction for each translation unit we run
|
||||
// on. Thus, it takes a FrontendActionFactory as parameter. To create a
|
||||
// FrontendActionFactory from a given FrontendAction type, we call
|
||||
// newFrontendActionFactory<clang::SyntaxOnlyAction>().
|
||||
int result = Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
|
||||
</pre>
|
||||
</p>
|
||||
|
||||
<h3 id="main">Putting it together - the first tool.</h3>
|
||||
<p>Now we combine the two previous steps into our first real tool. This example
|
||||
tool is also checked into the clang tree at tools/clang-check/ClangCheck.cpp.
|
||||
<pre>
|
||||
// Declares clang::SyntaxOnlyAction.
|
||||
#include "clang/Frontend/FrontendActions.h"
|
||||
#include "clang/Tooling/CommonOptionsParser.h"
|
||||
#include "clang/Tooling/Tooling.h"
|
||||
// Declares llvm::cl::extrahelp.
|
||||
#include "llvm/Support/CommandLine.h"
|
||||
|
||||
using namespace clang::tooling;
|
||||
using namespace llvm;
|
||||
|
||||
// CommonOptionsParser declares HelpMessage with a description of the common
|
||||
// command-line options related to the compilation database and input files.
|
||||
// It's nice to have this help message in all tools.
|
||||
static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
|
||||
|
||||
// A help message for this specific tool can be added afterwards.
|
||||
static cl::extrahelp MoreHelp("\nMore help text...");
|
||||
|
||||
int main(int argc, const char **argv) {
|
||||
CommonOptionsParser OptionsParser(argc, argv);
|
||||
ClangTool Tool(OptionsParser.GetCompilations(),
|
||||
OptionsParser.GetSourcePathList());
|
||||
return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
|
||||
}
|
||||
</pre>
|
||||
</p>
|
||||
|
||||
<h3 id="running">Running the tool on some code.</h3>
|
||||
<p>When you check out and build clang, clang-check is already built and
|
||||
available to you in bin/clang-check inside your build directory.</p>
|
||||
<p>You can run clang-check on a file in the llvm repository by specifying
|
||||
all the needed parameters after a "--" separator:
|
||||
<pre>
|
||||
$ cd /path/to/source/llvm
|
||||
$ export BD=/path/to/build/llvm
|
||||
$ $BD/bin/clang-check tools/clang/tools/clang-check/ClangCheck.cpp -- \
|
||||
clang++ -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS \
|
||||
-Itools/clang/include -I$BD/include -Iinclude -Itools/clang/lib/Headers -c
|
||||
</pre>
|
||||
</p>
|
||||
|
||||
<p>As an alternative, you can also configure cmake to output a compile command
|
||||
database into its build directory:
|
||||
<pre>
|
||||
# Alternatively to calling cmake, use ccmake, toggle to advanced mode and
|
||||
# set the parameter CMAKE_EXPORT_COMPILE_COMMANDS from the UI.
|
||||
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
|
||||
</pre>
|
||||
</p>
|
||||
<p>
|
||||
This creates a file called compile_commands.json in the build directory. Now
|
||||
you can run clang-check over files in the project by specifying the build path
|
||||
as first argument and some source files as further positional arguments:
|
||||
<pre>
|
||||
$ cd /path/to/source/llvm
|
||||
$ export BD=/path/to/build/llvm
|
||||
$ $BD/bin/clang-check -p $BD tools/clang/tools/clang-check/ClangCheck.cpp
|
||||
</pre>
|
||||
</p>
|
||||
|
||||
<h3 id="builtin">Builtin includes.</h3>
|
||||
<p>Clang tools need their builtin headers and search for them the same way clang
|
||||
does. Thus, the default location to look for builtin headers is in a path
|
||||
$(dirname /path/to/tool)/../lib/clang/3.2/include relative to the tool
|
||||
binary. This works out-of-the-box for tools running from llvm's toplevel
|
||||
binary directory after building clang-headers, or if the tool is running
|
||||
from the binary directory of a clang install next to the clang binary.</p>
|
||||
|
||||
<p>Tips: if your tool fails to find stddef.h or similar headers, call
|
||||
the tool with -v and look at the search paths it looks through.</p>
|
||||
|
||||
<h3 id="linking">Linking.</h3>
|
||||
<p>Please note that this presents the linking requirements at the time of this
|
||||
writing. For the most up-to-date information, look at one of the tools'
|
||||
Makefiles (for example
|
||||
<a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/tools/clang-check/Makefile?view=markup">clang-check/Makefile</a>).
|
||||
</p>
|
||||
|
||||
<p>To link a binary using the tooling infrastructure, link in the following
|
||||
libraries:
|
||||
<ul>
|
||||
<li>Tooling</li>
|
||||
<li>Frontend</li>
|
||||
<li>Driver</li>
|
||||
<li>Serialization</li>
|
||||
<li>Parse</li>
|
||||
<li>Sema</li>
|
||||
<li>Analysis</li>
|
||||
<li>Edit</li>
|
||||
<li>AST</li>
|
||||
<li>Lex</li>
|
||||
<li>Basic</li>
|
||||
</ul>
|
||||
</p>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,206 @@
|
|||
==========
|
||||
LibTooling
|
||||
==========
|
||||
|
||||
LibTooling is a library to support writing standalone tools based on Clang.
|
||||
This document will provide a basic walkthrough of how to write a tool using
|
||||
LibTooling.
|
||||
|
||||
For the information on how to setup Clang Tooling for LLVM see
|
||||
`HowToSetupToolingForLLVM.html <HowToSetupToolingForLLVM.html>`_
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
Tools built with LibTooling, like Clang Plugins, run ``FrontendActions`` over
|
||||
code.
|
||||
|
||||
.. See FIXME for a tutorial on how to write FrontendActions.
|
||||
|
||||
In this tutorial, we'll demonstrate the different ways of running Clang's
|
||||
``SyntaxOnlyAction``, which runs a quick syntax check, over a bunch of code.
|
||||
|
||||
Parsing a code snippet in memory
|
||||
--------------------------------
|
||||
|
||||
If you ever wanted to run a ``FrontendAction`` over some sample code, for
|
||||
example to unit test parts of the Clang AST, ``runToolOnCode`` is what you
|
||||
looked for. Let me give you an example:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
#include "clang/Tooling/Tooling.h"
|
||||
|
||||
TEST(runToolOnCode, CanSyntaxCheckCode) {
|
||||
// runToolOnCode returns whether the action was correctly run over the
|
||||
// given code.
|
||||
EXPECT_TRUE(runToolOnCode(new clang::SyntaxOnlyAction, "class X {};"));
|
||||
}
|
||||
|
||||
Writing a standalone tool
|
||||
-------------------------
|
||||
|
||||
Once you unit tested your ``FrontendAction`` to the point where it cannot
|
||||
possibly break, it's time to create a standalone tool. For a standalone tool
|
||||
to run clang, it first needs to figure out what command line arguments to use
|
||||
for a specified file. To that end we create a ``CompilationDatabase``. There
|
||||
are different ways to create a compilation database, and we need to support all
|
||||
of them depending on command-line options. There's the ``CommonOptionsParser``
|
||||
class that takes the responsibility to parse command-line parameters related to
|
||||
compilation databases and inputs, so that all tools share the implementation.
|
||||
|
||||
Parsing common tools options
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
``CompilationDatabase`` can be read from a build directory or the command line.
|
||||
Using ``CommonOptionsParser`` allows for explicit specification of a compile
|
||||
command line, specification of build path using the ``-p`` command-line option,
|
||||
and automatic location of the compilation database using source files paths.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
#include "clang/Tooling/CommonOptionsParser.h"
|
||||
|
||||
using namespace clang::tooling;
|
||||
|
||||
int main(int argc, const char **argv) {
|
||||
// CommonOptionsParser constructor will parse arguments and create a
|
||||
// CompilationDatabase. In case of error it will terminate the program.
|
||||
CommonOptionsParser OptionsParser(argc, argv);
|
||||
|
||||
// Use OptionsParser.GetCompilations() and OptionsParser.GetSourcePathList()
|
||||
// to retrieve CompilationDatabase and the list of input file paths.
|
||||
}
|
||||
|
||||
Creating and running a ClangTool
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Once we have a ``CompilationDatabase``, we can create a ``ClangTool`` and run
|
||||
our ``FrontendAction`` over some code. For example, to run the
|
||||
``SyntaxOnlyAction`` over the files "a.cc" and "b.cc" one would write:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// A clang tool can run over a number of sources in the same process...
|
||||
std::vector<std::string> Sources;
|
||||
Sources.push_back("a.cc");
|
||||
Sources.push_back("b.cc");
|
||||
|
||||
// We hand the CompilationDatabase we created and the sources to run over into
|
||||
// the tool constructor.
|
||||
ClangTool Tool(OptionsParser.GetCompilations(), Sources);
|
||||
|
||||
// The ClangTool needs a new FrontendAction for each translation unit we run
|
||||
// on. Thus, it takes a FrontendActionFactory as parameter. To create a
|
||||
// FrontendActionFactory from a given FrontendAction type, we call
|
||||
// newFrontendActionFactory<clang::SyntaxOnlyAction>().
|
||||
int result = Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
|
||||
|
||||
Putting it together --- the first tool
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Now we combine the two previous steps into our first real tool. This example
|
||||
tool is also checked into the clang tree at
|
||||
``tools/clang-check/ClangCheck.cpp``.
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
// Declares clang::SyntaxOnlyAction.
|
||||
#include "clang/Frontend/FrontendActions.h"
|
||||
#include "clang/Tooling/CommonOptionsParser.h"
|
||||
#include "clang/Tooling/Tooling.h"
|
||||
// Declares llvm::cl::extrahelp.
|
||||
#include "llvm/Support/CommandLine.h"
|
||||
|
||||
using namespace clang::tooling;
|
||||
using namespace llvm;
|
||||
|
||||
// CommonOptionsParser declares HelpMessage with a description of the common
|
||||
// command-line options related to the compilation database and input files.
|
||||
// It's nice to have this help message in all tools.
|
||||
static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
|
||||
|
||||
// A help message for this specific tool can be added afterwards.
|
||||
static cl::extrahelp MoreHelp("\nMore help text...");
|
||||
|
||||
int main(int argc, const char **argv) {
|
||||
CommonOptionsParser OptionsParser(argc, argv);
|
||||
ClangTool Tool(OptionsParser.GetCompilations(),
|
||||
OptionsParser.GetSourcePathList());
|
||||
return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
|
||||
}
|
||||
|
||||
Running the tool on some code
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
When you check out and build clang, clang-check is already built and available
|
||||
to you in bin/clang-check inside your build directory.
|
||||
|
||||
You can run clang-check on a file in the llvm repository by specifying all the
|
||||
needed parameters after a "``--``" separator:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd /path/to/source/llvm
|
||||
$ export BD=/path/to/build/llvm
|
||||
$ $BD/bin/clang-check tools/clang/tools/clang-check/ClangCheck.cpp -- \
|
||||
clang++ -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS \
|
||||
-Itools/clang/include -I$BD/include -Iinclude \
|
||||
-Itools/clang/lib/Headers -c
|
||||
|
||||
As an alternative, you can also configure cmake to output a compile command
|
||||
database into its build directory:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
# Alternatively to calling cmake, use ccmake, toggle to advanced mode and
|
||||
# set the parameter CMAKE_EXPORT_COMPILE_COMMANDS from the UI.
|
||||
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
|
||||
|
||||
This creates a file called ``compile_commands.json`` in the build directory.
|
||||
Now you can run :program:`clang-check` over files in the project by specifying
|
||||
the build path as first argument and some source files as further positional
|
||||
arguments:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ cd /path/to/source/llvm
|
||||
$ export BD=/path/to/build/llvm
|
||||
$ $BD/bin/clang-check -p $BD tools/clang/tools/clang-check/ClangCheck.cpp
|
||||
|
||||
Builtin includes
|
||||
^^^^^^^^^^^^^^^^
|
||||
|
||||
Clang tools need their builtin headers and search for them the same way Clang
|
||||
does. Thus, the default location to look for builtin headers is in a path
|
||||
``$(dirname /path/to/tool)/../lib/clang/3.2/include`` relative to the tool
|
||||
binary. This works out-of-the-box for tools running from llvm's toplevel
|
||||
binary directory after building clang-headers, or if the tool is running from
|
||||
the binary directory of a clang install next to the clang binary.
|
||||
|
||||
Tips: if your tool fails to find ``stddef.h`` or similar headers, call the tool
|
||||
with ``-v`` and look at the search paths it looks through.
|
||||
|
||||
Linking
|
||||
^^^^^^^
|
||||
|
||||
Please note that this presents the linking requirements at the time of this
|
||||
writing. For the most up-to-date information, look at one of the tools'
|
||||
Makefiles (for example `clang-check/Makefile
|
||||
<http://llvm.org/viewvc/llvm-project/cfe/trunk/tools/clang-check/Makefile?view=markup>`_).
|
||||
|
||||
To link a binary using the tooling infrastructure, link in the following
|
||||
libraries:
|
||||
|
||||
* Tooling
|
||||
* Frontend
|
||||
* Driver
|
||||
* Serialization
|
||||
* Parse
|
||||
* Sema
|
||||
* Analysis
|
||||
* Edit
|
||||
* AST
|
||||
* Lex
|
||||
* Basic
|
||||
|
|
@ -1,658 +0,0 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>Precompiled Header and Modules Internals</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="../content.css">
|
||||
<style type="text/css">
|
||||
td {
|
||||
vertical-align: top;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
|
||||
<body>
|
||||
|
||||
<!--#include virtual="../menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h1>Precompiled Header and Modules Internals</h1>
|
||||
|
||||
<p>This document describes the design and implementation of Clang's
|
||||
precompiled headers (PCH) and modules. If you are interested in the end-user
|
||||
view, please see the <a
|
||||
href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
|
||||
|
||||
<p><b>Table of Contents</b></p>
|
||||
<ul>
|
||||
<li><a href="#usage">Using Precompiled Headers with
|
||||
<tt>clang</tt></a></li>
|
||||
<li><a href="#philosophy">Design Philosophy</a></li>
|
||||
<li><a href="#contents">Serialized AST File Contents</a>
|
||||
<ul>
|
||||
<li><a href="#metadata">Metadata Block</a></li>
|
||||
<li><a href="#sourcemgr">Source Manager Block</a></li>
|
||||
<li><a href="#preprocessor">Preprocessor Block</a></li>
|
||||
<li><a href="#types">Types Block</a></li>
|
||||
<li><a href="#decls">Declarations Block</a></li>
|
||||
<li><a href="#stmt">Statements and Expressions</a></li>
|
||||
<li><a href="#idtable">Identifier Table Block</a></li>
|
||||
<li><a href="#method-pool">Method Pool Block</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#tendrils">AST Reader Integration Points</a></li>
|
||||
<li><a href="#chained">Chained precompiled headers</a></li>
|
||||
<li><a href="#modules">Modules</a></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2>
|
||||
|
||||
<p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line
|
||||
options for generating and using PCH files.<p>
|
||||
|
||||
<p>To generate PCH files using <tt>clang -cc1</tt>, use the option
|
||||
<b><tt>-emit-pch</tt></b>:
|
||||
|
||||
<pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre>
|
||||
|
||||
<p>This option is transparently used by <tt>clang</tt> when generating
|
||||
PCH files. The resulting PCH file contains the serialized form of the
|
||||
compiler's internal representation after it has completed parsing and
|
||||
semantic analysis. The PCH file can then be used as a prefix header
|
||||
with the <b><tt>-include-pch</tt></b> option:</p>
|
||||
|
||||
<pre>
|
||||
$ clang -cc1 -include-pch test.h.pch test.c -o test.s
|
||||
</pre>
|
||||
|
||||
<h2 id="philosophy">Design Philosophy</h2>
|
||||
|
||||
<p>Precompiled headers are meant to improve overall compile times for
|
||||
projects, so the design of precompiled headers is entirely driven by
|
||||
performance concerns. The use case for precompiled headers is
|
||||
relatively simple: when there is a common set of headers that is
|
||||
included in nearly every source file in the project, we
|
||||
<i>precompile</i> that bundle of headers into a single precompiled
|
||||
header (PCH file). Then, when compiling the source files in the
|
||||
project, we load the PCH file first (as a prefix header), which acts
|
||||
as a stand-in for that bundle of headers.</p>
|
||||
|
||||
<p>A precompiled header implementation improves performance when:</p>
|
||||
<ul>
|
||||
<li>Loading the PCH file is significantly faster than re-parsing the
|
||||
bundle of headers stored within the PCH file. Thus, a precompiled
|
||||
header design attempts to minimize the cost of reading the PCH
|
||||
file. Ideally, this cost should not vary with the size of the
|
||||
precompiled header file.</li>
|
||||
|
||||
<li>The cost of generating the PCH file initially is not so large
|
||||
that it counters the per-source-file performance improvement due to
|
||||
eliminating the need to parse the bundled headers in the first
|
||||
place. This is particularly important on multi-core systems, because
|
||||
PCH file generation serializes the build when all compilations
|
||||
require the PCH file to be up-to-date.</li>
|
||||
</ul>
|
||||
|
||||
<p>Modules, as implemented in Clang, use the same mechanisms as
|
||||
precompiled headers to save a serialized AST file (one per module) and
|
||||
use those AST modules. From an implementation standpoint, modules are
|
||||
a generalization of precompiled headers, lifting a number of
|
||||
restrictions placed on precompiled headers. In particular, there can
|
||||
only be one precompiled header and it must be included at the
|
||||
beginning of the translation unit. The extensions to the AST file
|
||||
format required for modules are discussed in the section on <a href="#modules">modules</a>.</p>
|
||||
|
||||
<p>Clang's AST files are designed with a compact on-disk
|
||||
representation, which minimizes both creation time and the time
|
||||
required to initially load the AST file. The AST file itself contains
|
||||
a serialized representation of Clang's abstract syntax trees and
|
||||
supporting data structures, stored using the same compressed bitstream
|
||||
as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode
|
||||
file format</a>.</p>
|
||||
|
||||
<p>Clang's AST files are loaded "lazily" from disk. When an
|
||||
AST file is initially loaded, Clang reads only a small amount of data
|
||||
from the AST file to establish where certain important data structures
|
||||
are stored. The amount of data read in this initial load is
|
||||
independent of the size of the AST file, such that a larger AST file
|
||||
does not lead to longer AST load times. The actual header data in the
|
||||
AST file--macros, functions, variables, types, etc.--is loaded only
|
||||
when it is referenced from the user's code, at which point only that
|
||||
entity (and those entities it depends on) are deserialized from the
|
||||
AST file. With this approach, the cost of using an AST file
|
||||
for a translation unit is proportional to the amount of code actually
|
||||
used from the AST file, rather than being proportional to the size of
|
||||
the AST file itself.</p>
|
||||
|
||||
<p>When given the <code>-print-stats</code> option, Clang produces
|
||||
statistics describing how much of the AST file was actually
|
||||
loaded from disk. For a simple "Hello, World!" program that includes
|
||||
the Apple <code>Cocoa.h</code> header (which is built as a precompiled
|
||||
header), this option illustrates how little of the actual precompiled
|
||||
header is required:</p>
|
||||
|
||||
<pre>
|
||||
*** PCH Statistics:
|
||||
933 stat cache hits
|
||||
4 stat cache misses
|
||||
895/39981 source location entries read (2.238563%)
|
||||
19/15315 types read (0.124061%)
|
||||
20/82685 declarations read (0.024188%)
|
||||
154/58070 identifiers read (0.265197%)
|
||||
0/7260 selectors read (0.000000%)
|
||||
0/30842 statements read (0.000000%)
|
||||
4/8400 macros read (0.047619%)
|
||||
1/4995 lexical declcontexts read (0.020020%)
|
||||
0/4413 visible declcontexts read (0.000000%)
|
||||
0/7230 method pool entries read (0.000000%)
|
||||
0 method pool misses
|
||||
</pre>
|
||||
|
||||
<p>For this small program, only a tiny fraction of the source
|
||||
locations, types, declarations, identifiers, and macros were actually
|
||||
deserialized from the precompiled header. These statistics can be
|
||||
useful to determine whether the AST file implementation can
|
||||
be improved by making more of the implementation lazy.</p>
|
||||
|
||||
<p>Precompiled headers can be chained. When you create a PCH while
|
||||
including an existing PCH, Clang can create the new PCH by referencing
|
||||
the original file and only writing the new data to the new file. For
|
||||
example, you could create a PCH out of all the headers that are very
|
||||
commonly used throughout your project, and then create a PCH for every
|
||||
single source file in the project that includes the code that is
|
||||
specific to that file, so that recompiling the file itself is very fast,
|
||||
without duplicating the data from the common headers for every
|
||||
file. The mechanisms behind chained precompiled headers are discussed
|
||||
in a <a href="#chained">later section</a>.
|
||||
|
||||
<h2 id="contents">AST File Contents</h2>
|
||||
|
||||
<img src="PCHLayout.png" style="float:right" alt="Precompiled header layout">
|
||||
|
||||
<p>Clang's AST files are organized into several different
|
||||
blocks, each of which contains the serialized representation of a part
|
||||
of Clang's internal representation. Each of the blocks corresponds to
|
||||
either a block or a record within <a
|
||||
href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream
|
||||
format</a>. The contents of each of these logical blocks are described
|
||||
below.</p>
|
||||
|
||||
<p>For a given AST file, the <a
|
||||
href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a>
|
||||
utility can be used to examine the actual structure of the bitstream
|
||||
for the AST file. This information can be used both to help
|
||||
understand the structure of the AST file and to isolate
|
||||
areas where AST files can still be optimized, e.g., through
|
||||
the introduction of abbreviations.</p>
|
||||
|
||||
<h3 id="metadata">Metadata Block</h3>
|
||||
|
||||
<p>The metadata block contains several records that provide
|
||||
information about how the AST file was built. This metadata
|
||||
is primarily used to validate the use of an AST file. For
|
||||
example, a precompiled header built for a 32-bit x86 target cannot be used
|
||||
when compiling for a 64-bit x86 target. The metadata block contains
|
||||
information about:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Language options</dt>
|
||||
<dd>Describes the particular language dialect used to compile the
|
||||
AST file, including major options (e.g., Objective-C support) and more
|
||||
minor options (e.g., support for "//" comments). The contents of this
|
||||
record correspond to the <code>LangOptions</code> class.</dd>
|
||||
|
||||
<dt>Target architecture</dt>
|
||||
<dd>The target triple that describes the architecture, platform, and
|
||||
ABI for which the AST file was generated, e.g.,
|
||||
<code>i386-apple-darwin9</code>.</dd>
|
||||
|
||||
<dt>AST version</dt>
|
||||
<dd>The major and minor version numbers of the AST file
|
||||
format. Changes in the minor version number should not affect backward
|
||||
compatibility, while changes in the major version number imply that a
|
||||
newer compiler cannot read an older precompiled header (and
|
||||
vice-versa).</dd>
|
||||
|
||||
<dt>Original file name</dt>
|
||||
<dd>The full path of the header that was used to generate the
|
||||
AST file.</dd>
|
||||
|
||||
<dt>Predefines buffer</dt>
|
||||
<dd>Although not explicitly stored as part of the metadata, the
|
||||
predefines buffer is used in the validation of the AST file.
|
||||
The predefines buffer itself contains code generated by the compiler
|
||||
to initialize the preprocessor state according to the current target,
|
||||
platform, and command-line options. For example, the predefines buffer
|
||||
will contain "<code>#define __STDC__ 1</code>" when we are compiling C
|
||||
without Microsoft extensions. The predefines buffer itself is stored
|
||||
within the <a href="#sourcemgr">source manager block</a>, but its
|
||||
contents are verified along with the rest of the metadata.</dd>
|
||||
|
||||
</dl>
|
||||
|
||||
<p>A chained PCH file (that is, one that references another PCH) and a
|
||||
module (which may import other modules) have additional metadata
|
||||
containing the list of all AST files that this AST file depends
|
||||
on. Each of those files will be loaded along with this AST file.</p>
|
||||
|
||||
<p>For chained precompiled headers, the language options, target
|
||||
architecture and predefines buffer data is taken from the end of the
|
||||
chain, since they have to match anyway.</p>
|
||||
|
||||
<h3 id="sourcemgr">Source Manager Block</h3>
|
||||
|
||||
<p>The source manager block contains the serialized representation of
|
||||
Clang's <a
|
||||
href="InternalsManual.html#SourceLocation">SourceManager</a> class,
|
||||
which handles the mapping from source locations (as represented in
|
||||
Clang's abstract syntax tree) into actual column/line positions within
|
||||
a source file or macro instantiation. The AST file's
|
||||
representation of the source manager also includes information about
|
||||
all of the headers that were (transitively) included when building the
|
||||
AST file.</p>
|
||||
|
||||
<p>The bulk of the source manager block is dedicated to information
|
||||
about the various files, buffers, and macro instantiations into which
|
||||
a source location can refer. Each of these is referenced by a numeric
|
||||
"file ID", which is a unique number (allocated starting at 1) stored
|
||||
in the source location. Clang serializes the information for each kind
|
||||
of file ID, along with an index that maps file IDs to the position
|
||||
within the AST file where the information about that file ID is
|
||||
stored. The data associated with a file ID is loaded only when
|
||||
required by the front end, e.g., to emit a diagnostic that includes a
|
||||
macro instantiation history inside the header itself.</p>
|
||||
|
||||
<p>The source manager block also contains information about all of the
|
||||
headers that were included when building the AST file. This
|
||||
includes information about the controlling macro for the header (e.g.,
|
||||
when the preprocessor identified that the contents of the header
|
||||
dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>)
|
||||
along with a cached version of the results of the <code>stat()</code>
|
||||
system calls performed when building the AST file. The
|
||||
latter is particularly useful in reducing system time when searching
|
||||
for include files.</p>
|
||||
|
||||
<h3 id="preprocessor">Preprocessor Block</h3>
|
||||
|
||||
<p>The preprocessor block contains the serialized representation of
|
||||
the preprocessor. Specifically, it contains all of the macros that
|
||||
have been defined by the end of the header used to build the
|
||||
AST file, along with the token sequences that comprise each
|
||||
macro. The macro definitions are only read from the AST file when the
|
||||
name of the macro first occurs in the program. This lazy loading of
|
||||
macro definitions is triggered by lookups into the <a
|
||||
href="#idtable">identifier table</a>.</p>
|
||||
|
||||
<h3 id="types">Types Block</h3>
|
||||
|
||||
<p>The types block contains the serialized representation of all of
|
||||
the types referenced in the translation unit. Each Clang type node
|
||||
(<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a
|
||||
corresponding record type in the AST file. When types are deserialized
|
||||
from the AST file, the data within the record is used to
|
||||
reconstruct the appropriate type node using the AST context.</p>
|
||||
|
||||
<p>Each type has a unique type ID, which is an integer that uniquely
|
||||
identifies that type. Type ID 0 represents the NULL type, type IDs
|
||||
less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types
|
||||
(<code>void</code>, <code>float</code>, etc.), while other
|
||||
"user-defined" type IDs are assigned consecutively from
|
||||
<code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered.
|
||||
The AST file has an associated mapping from the user-defined types
|
||||
block to the location within the types block where the serialized
|
||||
representation of that type resides, enabling lazy deserialization of
|
||||
types. When a type is referenced from within the AST file, that
|
||||
reference is encoded using the type ID shifted left by 3 bits. The
|
||||
lower three bits are used to represent the <code>const</code>,
|
||||
<code>volatile</code>, and <code>restrict</code> qualifiers, as in
|
||||
Clang's <a
|
||||
href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a>
|
||||
class.</p>
|
||||
|
||||
<h3 id="decls">Declarations Block</h3>
|
||||
|
||||
<p>The declarations block contains the serialized representation of
|
||||
all of the declarations referenced in the translation unit. Each Clang
|
||||
declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>,
|
||||
etc.) has a corresponding record type in the AST file. When
|
||||
declarations are deserialized from the AST file, the data
|
||||
within the record is used to build and populate a new instance of the
|
||||
corresponding <code>Decl</code> node. As with types, each declaration
|
||||
node has a numeric ID that is used to refer to that declaration within
|
||||
the AST file. In addition, a lookup table provides a mapping from that
|
||||
numeric ID to the offset within the precompiled header where that
|
||||
declaration is described.</p>
|
||||
|
||||
<p>Declarations in Clang's abstract syntax trees are stored
|
||||
hierarchically. At the top of the hierarchy is the translation unit
|
||||
(<code>TranslationUnitDecl</code>), which contains all of the
|
||||
declarations in the translation unit but is not actually written as a
|
||||
specific declaration node. Its child declarations (such as
|
||||
functions or struct types) may also contain other declarations inside
|
||||
them, and so on. Within Clang, each declaration is stored within a <a
|
||||
href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration
|
||||
context</a>, as represented by the <code>DeclContext</code> class.
|
||||
Declaration contexts provide the mechanism to perform name lookup
|
||||
within a given declaration (e.g., find the member named <code>x</code>
|
||||
in a structure) and iterate over the declarations stored within a
|
||||
context (e.g., iterate over all of the fields of a structure for
|
||||
structure layout).</p>
|
||||
|
||||
<p>In Clang's AST file format, deserializing a declaration
|
||||
that is a <code>DeclContext</code> is a separate operation from
|
||||
deserializing all of the declarations stored within that declaration
|
||||
context. Therefore, Clang will deserialize the translation unit
|
||||
declaration without deserializing the declarations within that
|
||||
translation unit. When required, the declarations stored within a
|
||||
declaration context will be deserialized. There are two representations
|
||||
of the declarations within a declaration context, which correspond to
|
||||
the name-lookup and iteration behavior described above:</p>
|
||||
|
||||
<ul>
|
||||
<li>When the front end performs name lookup to find a name
|
||||
<code>x</code> within a given declaration context (for example,
|
||||
during semantic analysis of the expression <code>p->x</code>,
|
||||
where <code>p</code>'s type is defined in the precompiled header),
|
||||
Clang refers to an on-disk hash table that maps from the names
|
||||
within that declaration context to the declaration IDs that
|
||||
represent each visible declaration with that name. The actual
|
||||
declarations will then be deserialized to provide the results of
|
||||
name lookup.</li>
|
||||
|
||||
<li>When the front end performs iteration over all of the
|
||||
declarations within a declaration context, all of those declarations
|
||||
are immediately de-serialized. For large declaration contexts (e.g.,
|
||||
the translation unit), this operation is expensive; however, large
|
||||
declaration contexts are not traversed in normal compilation, since
|
||||
such a traversal is unnecessary. However, it is common for the code
|
||||
generator and semantic analysis to traverse declaration contexts for
|
||||
structs, classes, unions, and enumerations, although those contexts
|
||||
contain relatively few declarations in the common case.</li>
|
||||
</ul>
|
||||
|
||||
<h3 id="stmt">Statements and Expressions</h3>
|
||||
|
||||
<p>Statements and expressions are stored in the AST file in
|
||||
both the <a href="#types">types</a> and the <a
|
||||
href="#decls">declarations</a> blocks, because every statement or
|
||||
expression will be associated with either a type or declaration. The
|
||||
actual statement and expression records are stored immediately
|
||||
following the declaration or type that owns the statement or
|
||||
expression. For example, the statement representing the body of a
|
||||
function will be stored directly following the declaration of the
|
||||
function.</p>
|
||||
|
||||
<p>As with types and declarations, each statement and expression kind
|
||||
in Clang's abstract syntax tree (<code>ForStmt</code>,
|
||||
<code>CallExpr</code>, etc.) has a corresponding record type in the
|
||||
AST file, which contains the serialized representation of
|
||||
that statement or expression. Each substatement or subexpression
|
||||
within an expression is stored as a separate record (which keeps most
|
||||
records to a fixed size). Within the AST file, the
|
||||
subexpressions of an expression are stored, in reverse order, prior to the expression
|
||||
that owns those expression, using a form of <a
|
||||
href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse
|
||||
Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code>
|
||||
would be represented as follows:</p>
|
||||
|
||||
<table border="1">
|
||||
<tr><td><code>IntegerLiteral(5)</code></td></tr>
|
||||
<tr><td><code>IntegerLiteral(4)</code></td></tr>
|
||||
<tr><td><code>IntegerLiteral(3)</code></td></tr>
|
||||
<tr><td><code>BinaryOperator(-)</code></td></tr>
|
||||
<tr><td><code>BinaryOperator(+)</code></td></tr>
|
||||
<tr><td>STOP</td></tr>
|
||||
</table>
|
||||
|
||||
<p>When reading this representation, Clang evaluates each expression
|
||||
record it encounters, builds the appropriate abstract syntax tree node,
|
||||
and then pushes that expression on to a stack. When a record contains <i>N</i>
|
||||
subexpressions--<code>BinaryOperator</code> has two of them--those
|
||||
expressions are popped from the top of the stack. The special STOP
|
||||
code indicates that we have reached the end of a serialized expression
|
||||
or statement; other expression or statement records may follow, but
|
||||
they are part of a different expression.</p>
|
||||
|
||||
<h3 id="idtable">Identifier Table Block</h3>
|
||||
|
||||
<p>The identifier table block contains an on-disk hash table that maps
|
||||
each identifier mentioned within the AST file to the
|
||||
serialized representation of the identifier's information (e.g, the
|
||||
<code>IdentifierInfo</code> structure). The serialized representation
|
||||
contains:</p>
|
||||
|
||||
<ul>
|
||||
<li>The actual identifier string.</li>
|
||||
<li>Flags that describe whether this identifier is the name of a
|
||||
built-in, a poisoned identifier, an extension token, or a
|
||||
macro.</li>
|
||||
<li>If the identifier names a macro, the offset of the macro
|
||||
definition within the <a href="#preprocessor">preprocessor
|
||||
block</a>.</li>
|
||||
<li>If the identifier names one or more declarations visible from
|
||||
translation unit scope, the <a href="#decls">declaration IDs</a> of these
|
||||
declarations.</li>
|
||||
</ul>
|
||||
|
||||
<p>When an AST file is loaded, the AST file reader
|
||||
mechanism introduces itself into the identifier table as an external
|
||||
lookup source. Thus, when the user program refers to an identifier
|
||||
that has not yet been seen, Clang will perform a lookup into the
|
||||
identifier table. If an identifier is found, its contents (macro
|
||||
definitions, flags, top-level declarations, etc.) will be
|
||||
deserialized, at which point the corresponding
|
||||
<code>IdentifierInfo</code> structure will have the same contents it
|
||||
would have after parsing the headers in the AST file.</p>
|
||||
|
||||
<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
|
||||
hash table where that identifier is stored. This mapping is used when
|
||||
deserializing the name of a declaration, the identifier of a token, or
|
||||
any other construct in the AST file that refers to a name.</p>
|
||||
|
||||
<h3 id="method-pool">Method Pool Block</h3>
|
||||
|
||||
<p>The method pool block is represented as an on-disk hash table that
|
||||
serves two purposes: it provides a mapping from the names of
|
||||
Objective-C selectors to the set of Objective-C instance and class
|
||||
methods that have that particular selector (which is required for
|
||||
semantic analysis in Objective-C) and also stores all of the selectors
|
||||
used by entities within the AST file. The design of the
|
||||
method pool is similar to that of the <a href="#idtable">identifier
|
||||
table</a>: the first time a particular selector is formed during the
|
||||
compilation of the program, Clang will search in the on-disk hash
|
||||
table of selectors; if found, Clang will read the Objective-C methods
|
||||
associated with that selector into the appropriate front-end data
|
||||
structure (<code>Sema::InstanceMethodPool</code> and
|
||||
<code>Sema::FactoryMethodPool</code> for instance and class methods,
|
||||
respectively).</p>
|
||||
|
||||
<p>As with identifiers, selectors are represented by numeric values
|
||||
within the AST file. A separate index maps these numeric selector
|
||||
values to the offset of the selector within the on-disk hash table,
|
||||
and will be used when de-serializing an Objective-C method declaration
|
||||
(or other Objective-C construct) that refers to the selector.</p>
|
||||
|
||||
<h2 id="tendrils">AST Reader Integration Points</h2>
|
||||
|
||||
<p>The "lazy" deserialization behavior of AST files requires
|
||||
their integration into several completely different submodules of
|
||||
Clang. For example, lazily deserializing the declarations during name
|
||||
lookup requires that the name-lookup routines be able to query the
|
||||
AST file to find entities stored there.</p>
|
||||
|
||||
<p>For each Clang data structure that requires direct interaction with
|
||||
the AST reader logic, there is an abstract class that provides
|
||||
the interface between the two modules. The <code>ASTReader</code>
|
||||
class, which handles the loading of an AST file, inherits
|
||||
from all of these abstract classes to provide lazy deserialization of
|
||||
Clang's data structures. <code>ASTReader</code> implements the
|
||||
following abstract classes:</p>
|
||||
|
||||
<dl>
|
||||
<dt><code>StatSysCallCache</code></dt>
|
||||
<dd>This abstract interface is associated with the
|
||||
<code>FileManager</code> class, and is used whenever the file
|
||||
manager is going to perform a <code>stat()</code> system call.</dd>
|
||||
|
||||
<dt><code>ExternalSLocEntrySource</code></dt>
|
||||
<dd>This abstract interface is associated with the
|
||||
<code>SourceManager</code> class, and is used whenever the
|
||||
<a href="#sourcemgr">source manager</a> needs to load the details
|
||||
of a file, buffer, or macro instantiation.</dd>
|
||||
|
||||
<dt><code>IdentifierInfoLookup</code></dt>
|
||||
<dd>This abstract interface is associated with the
|
||||
<code>IdentifierTable</code> class, and is used whenever the
|
||||
program source refers to an identifier that has not yet been seen.
|
||||
In this case, the AST reader searches for
|
||||
this identifier within its <a href="#idtable">identifier table</a>
|
||||
to load any top-level declarations or macros associated with that
|
||||
identifier.</dd>
|
||||
|
||||
<dt><code>ExternalASTSource</code></dt>
|
||||
<dd>This abstract interface is associated with the
|
||||
<code>ASTContext</code> class, and is used whenever the abstract
|
||||
syntax tree nodes need to loaded from the AST file. It
|
||||
provides the ability to de-serialize declarations and types
|
||||
identified by their numeric values, read the bodies of functions
|
||||
when required, and read the declarations stored within a
|
||||
declaration context (either for iteration or for name lookup).</dd>
|
||||
|
||||
<dt><code>ExternalSemaSource</code></dt>
|
||||
<dd>This abstract interface is associated with the <code>Sema</code>
|
||||
class, and is used whenever semantic analysis needs to read
|
||||
information from the <a href="#methodpool">global method
|
||||
pool</a>.</dd>
|
||||
</dl>
|
||||
|
||||
<h2 id="chained">Chained precompiled headers</h2>
|
||||
|
||||
<p>Chained precompiled headers were initially intended to improve the
|
||||
performance of IDE-centric operations such as syntax highlighting and
|
||||
code completion while a particular source file is being edited by the
|
||||
user. To minimize the amount of reparsing required after a change to
|
||||
the file, a form of precompiled header--called a precompiled
|
||||
<i>preamble</i>--is automatically generated by parsing all of the
|
||||
headers in the source file, up to and including the last
|
||||
#include. When only the source file changes (and none of the headers
|
||||
it depends on), reparsing of that source file can use the precompiled
|
||||
preamble and start parsing after the #includes, so parsing time is
|
||||
proportional to the size of the source file (rather than all of its
|
||||
includes). However, the compilation of that translation unit
|
||||
may already use a precompiled header: in this case, Clang will create
|
||||
the precompiled preamble as a chained precompiled header that refers
|
||||
to the original precompiled header. This drastically reduces the time
|
||||
needed to serialize the precompiled preamble for use in reparsing.</p>
|
||||
|
||||
<p>Chained precompiled headers get their name because each precompiled header
|
||||
can depend on one other precompiled header, forming a chain of
|
||||
dependencies. A translation unit will then include the precompiled
|
||||
header that starts the chain (i.e., nothing depends on it). This
|
||||
linearity of dependencies is important for the semantic model of
|
||||
chained precompiled headers, because the most-recent precompiled
|
||||
header can provide information that overrides the information provided
|
||||
by the precompiled headers it depends on, just like a header file
|
||||
<code>B.h</code> that includes another header <code>A.h</code> can
|
||||
modify the state produced by parsing <code>A.h</code>, e.g., by
|
||||
<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p>
|
||||
|
||||
<p>There are several ways in which chained precompiled headers
|
||||
generalize the AST file model:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Numbering of IDs</dt>
|
||||
<dd>Many different kinds of entities--identifiers, declarations,
|
||||
types, etc.---have ID numbers that start at 1 or some other
|
||||
predefined constant and grow upward. Each precompiled header records
|
||||
the maximum ID number it has assigned in each category. Then, when a
|
||||
new precompiled header is generated that depends on (chains to)
|
||||
another precompiled header, it will start counting at the next
|
||||
available ID number. This way, one can determine, given an ID
|
||||
number, which AST file actually contains the entity.</dd>
|
||||
|
||||
<dt>Name lookup</dt>
|
||||
<dd>When writing a chained precompiled header, Clang attempts to
|
||||
write only information that has changed from the precompiled header
|
||||
on which it is based. This changes the lookup algorithm for the
|
||||
various tables, such as the <a href="#idtable">identifier table</a>:
|
||||
the search starts at the most-recent precompiled header. If no entry
|
||||
is found, lookup then proceeds to the identifier table in the
|
||||
precompiled header it depends on, and so one. Once a lookup
|
||||
succeeds, that result is considered definitive, overriding any
|
||||
results from earlier precompiled headers.</dd>
|
||||
|
||||
<dt>Update records</dt>
|
||||
<dd>There are various ways in which a later precompiled header can
|
||||
modify the entities described in an earlier precompiled header. For
|
||||
example, later precompiled headers can add entries into the various
|
||||
name-lookup tables for the translation unit or namespaces, or add
|
||||
new categories to an Objective-C class. Each of these updates is
|
||||
captured in an "update record" that is stored in the chained
|
||||
precompiled header file and will be loaded along with the original
|
||||
entity.</dd>
|
||||
</dl>
|
||||
|
||||
<h2 id="modules">Modules</h2>
|
||||
|
||||
<p>Modules generalize the chained precompiled header model yet
|
||||
further, from a linear chain of precompiled headers to an arbitrary
|
||||
directed acyclic graph (DAG) of AST files. All of the same techniques
|
||||
used to make chained precompiled headers work---ID number, name
|
||||
lookup, update records---are shared with modules. However, the DAG
|
||||
nature of modules introduce a number of additional complications to
|
||||
the model:
|
||||
|
||||
<dl>
|
||||
<dt>Numbering of IDs</dt>
|
||||
<dd>The simple, linear numbering scheme used in chained precompiled
|
||||
headers falls apart with the module DAG, because different modules
|
||||
may end up with different numbering schemes for entities they
|
||||
imported from common shared modules. To account for this, each
|
||||
module file provides information about which modules it depends on
|
||||
and which ID numbers it assigned to the entities in those modules,
|
||||
as well as which ID numbers it took for its own new entities. The
|
||||
AST reader then maps these "local" ID numbers into a "global" ID
|
||||
number space for the current translation unit, providing a 1-1
|
||||
mapping between entities (in whatever AST file they inhabit) and
|
||||
global ID numbers. If that translation unit is then serialized into
|
||||
an AST file, this mapping will be stored for use when the AST file
|
||||
is imported.</dd>
|
||||
|
||||
<dt>Declaration merging</dt>
|
||||
<dd>It is possible for a given entity (from the language's
|
||||
perspective) to be declared multiple times in different places. For
|
||||
example, two different headers can have the declaration of
|
||||
<tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If
|
||||
each of those headers is included in a module, and some third party
|
||||
imports both of those modules, there is a potentially serious
|
||||
problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will
|
||||
find both declarations, but the AST nodes are unrelated. This would
|
||||
result in a compilation error, due to an ambiguity in name
|
||||
lookup. Therefore, the AST reader performs declaration merging
|
||||
according to the appropriate language semantics, ensuring that the
|
||||
two disjoint declarations are merged into a single redeclaration
|
||||
chain (with a common canonical declaration), so that it is as if one
|
||||
of the headers had been included before the other.</dd>
|
||||
|
||||
<dt>Name Visibility</dt>
|
||||
<dd>Modules allow certain names that occur during module creation to
|
||||
be "hidden", so that they are not part of the public interface of
|
||||
the module and are not visible to its clients. The AST reader
|
||||
maintains a "visible" bit on various AST nodes (declarations, macros,
|
||||
etc.) to indicate whether that particular AST node is currently
|
||||
visible; the various name lookup mechanisms in Clang inspect the
|
||||
visible bit to determine whether that entity, which is still in the
|
||||
AST (because other, visible AST nodes may depend on it), can
|
||||
actually be found by name lookup. When a new (sub)module is
|
||||
imported, it may make existing, non-visible, already-deserialized
|
||||
AST nodes visible; it is the responsibility of the AST reader to
|
||||
find and update these AST nodes when it is notified of the import.</dd>
|
||||
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,573 @@
|
|||
========================================
|
||||
Precompiled Header and Modules Internals
|
||||
========================================
|
||||
|
||||
.. contents::
|
||||
:local:
|
||||
|
||||
This document describes the design and implementation of Clang's precompiled
|
||||
headers (PCH) and modules. If you are interested in the end-user view, please
|
||||
see the `User's Manual <UsersManual.html#precompiledheaders>`_.
|
||||
|
||||
Using Precompiled Headers with ``clang``
|
||||
----------------------------------------
|
||||
|
||||
The Clang compiler frontend, ``clang -cc1``, supports two command line options
|
||||
for generating and using PCH files.
|
||||
|
||||
To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ clang -cc1 test.h -emit-pch -o test.h.pch
|
||||
|
||||
This option is transparently used by ``clang`` when generating PCH files. The
|
||||
resulting PCH file contains the serialized form of the compiler's internal
|
||||
representation after it has completed parsing and semantic analysis. The PCH
|
||||
file can then be used as a prefix header with the :option:`-include-pch`
|
||||
option:
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
$ clang -cc1 -include-pch test.h.pch test.c -o test.s
|
||||
|
||||
Design Philosophy
|
||||
-----------------
|
||||
|
||||
Precompiled headers are meant to improve overall compile times for projects, so
|
||||
the design of precompiled headers is entirely driven by performance concerns.
|
||||
The use case for precompiled headers is relatively simple: when there is a
|
||||
common set of headers that is included in nearly every source file in the
|
||||
project, we *precompile* that bundle of headers into a single precompiled
|
||||
header (PCH file). Then, when compiling the source files in the project, we
|
||||
load the PCH file first (as a prefix header), which acts as a stand-in for that
|
||||
bundle of headers.
|
||||
|
||||
A precompiled header implementation improves performance when:
|
||||
|
||||
* Loading the PCH file is significantly faster than re-parsing the bundle of
|
||||
headers stored within the PCH file. Thus, a precompiled header design
|
||||
attempts to minimize the cost of reading the PCH file. Ideally, this cost
|
||||
should not vary with the size of the precompiled header file.
|
||||
|
||||
* The cost of generating the PCH file initially is not so large that it
|
||||
counters the per-source-file performance improvement due to eliminating the
|
||||
need to parse the bundled headers in the first place. This is particularly
|
||||
important on multi-core systems, because PCH file generation serializes the
|
||||
build when all compilations require the PCH file to be up-to-date.
|
||||
|
||||
Modules, as implemented in Clang, use the same mechanisms as precompiled
|
||||
headers to save a serialized AST file (one per module) and use those AST
|
||||
modules. From an implementation standpoint, modules are a generalization of
|
||||
precompiled headers, lifting a number of restrictions placed on precompiled
|
||||
headers. In particular, there can only be one precompiled header and it must
|
||||
be included at the beginning of the translation unit. The extensions to the
|
||||
AST file format required for modules are discussed in the section on
|
||||
:ref:`modules <pchinternals-modules>`.
|
||||
|
||||
Clang's AST files are designed with a compact on-disk representation, which
|
||||
minimizes both creation time and the time required to initially load the AST
|
||||
file. The AST file itself contains a serialized representation of Clang's
|
||||
abstract syntax trees and supporting data structures, stored using the same
|
||||
compressed bitstream as `LLVM's bitcode file format
|
||||
<http://llvm.org/docs/BitCodeFormat.html>`_.
|
||||
|
||||
Clang's AST files are loaded "lazily" from disk. When an AST file is initially
|
||||
loaded, Clang reads only a small amount of data from the AST file to establish
|
||||
where certain important data structures are stored. The amount of data read in
|
||||
this initial load is independent of the size of the AST file, such that a
|
||||
larger AST file does not lead to longer AST load times. The actual header data
|
||||
in the AST file --- macros, functions, variables, types, etc. --- is loaded
|
||||
only when it is referenced from the user's code, at which point only that
|
||||
entity (and those entities it depends on) are deserialized from the AST file.
|
||||
With this approach, the cost of using an AST file for a translation unit is
|
||||
proportional to the amount of code actually used from the AST file, rather than
|
||||
being proportional to the size of the AST file itself.
|
||||
|
||||
When given the :option:`-print-stats` option, Clang produces statistics
|
||||
describing how much of the AST file was actually loaded from disk. For a
|
||||
simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
|
||||
(which is built as a precompiled header), this option illustrates how little of
|
||||
the actual precompiled header is required:
|
||||
|
||||
.. code-block:: none
|
||||
|
||||
*** PCH Statistics:
|
||||
933 stat cache hits
|
||||
4 stat cache misses
|
||||
895/39981 source location entries read (2.238563%)
|
||||
19/15315 types read (0.124061%)
|
||||
20/82685 declarations read (0.024188%)
|
||||
154/58070 identifiers read (0.265197%)
|
||||
0/7260 selectors read (0.000000%)
|
||||
0/30842 statements read (0.000000%)
|
||||
4/8400 macros read (0.047619%)
|
||||
1/4995 lexical declcontexts read (0.020020%)
|
||||
0/4413 visible declcontexts read (0.000000%)
|
||||
0/7230 method pool entries read (0.000000%)
|
||||
0 method pool misses
|
||||
|
||||
For this small program, only a tiny fraction of the source locations, types,
|
||||
declarations, identifiers, and macros were actually deserialized from the
|
||||
precompiled header. These statistics can be useful to determine whether the
|
||||
AST file implementation can be improved by making more of the implementation
|
||||
lazy.
|
||||
|
||||
Precompiled headers can be chained. When you create a PCH while including an
|
||||
existing PCH, Clang can create the new PCH by referencing the original file and
|
||||
only writing the new data to the new file. For example, you could create a PCH
|
||||
out of all the headers that are very commonly used throughout your project, and
|
||||
then create a PCH for every single source file in the project that includes the
|
||||
code that is specific to that file, so that recompiling the file itself is very
|
||||
fast, without duplicating the data from the common headers for every file. The
|
||||
mechanisms behind chained precompiled headers are discussed in a :ref:`later
|
||||
section <pchinternals-chained>`.
|
||||
|
||||
AST File Contents
|
||||
-----------------
|
||||
|
||||
Clang's AST files are organized into several different blocks, each of which
|
||||
contains the serialized representation of a part of Clang's internal
|
||||
representation. Each of the blocks corresponds to either a block or a record
|
||||
within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
|
||||
The contents of each of these logical blocks are described below.
|
||||
|
||||
.. image:: PCHLayout.png
|
||||
|
||||
For a given AST file, the `llvm-bcanalyzer
|
||||
<http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used
|
||||
to examine the actual structure of the bitstream for the AST file. This
|
||||
information can be used both to help understand the structure of the AST file
|
||||
and to isolate areas where AST files can still be optimized, e.g., through the
|
||||
introduction of abbreviations.
|
||||
|
||||
Metadata Block
|
||||
^^^^^^^^^^^^^^
|
||||
|
||||
The metadata block contains several records that provide information about how
|
||||
the AST file was built. This metadata is primarily used to validate the use of
|
||||
an AST file. For example, a precompiled header built for a 32-bit x86 target
|
||||
cannot be used when compiling for a 64-bit x86 target. The metadata block
|
||||
contains information about:
|
||||
|
||||
Language options
|
||||
Describes the particular language dialect used to compile the AST file,
|
||||
including major options (e.g., Objective-C support) and more minor options
|
||||
(e.g., support for "``//``" comments). The contents of this record correspond to
|
||||
the ``LangOptions`` class.
|
||||
|
||||
Target architecture
|
||||
The target triple that describes the architecture, platform, and ABI for
|
||||
which the AST file was generated, e.g., ``i386-apple-darwin9``.
|
||||
|
||||
AST version
|
||||
The major and minor version numbers of the AST file format. Changes in the
|
||||
minor version number should not affect backward compatibility, while changes
|
||||
in the major version number imply that a newer compiler cannot read an older
|
||||
precompiled header (and vice-versa).
|
||||
|
||||
Original file name
|
||||
The full path of the header that was used to generate the AST file.
|
||||
|
||||
Predefines buffer
|
||||
Although not explicitly stored as part of the metadata, the predefines buffer
|
||||
is used in the validation of the AST file. The predefines buffer itself
|
||||
contains code generated by the compiler to initialize the preprocessor state
|
||||
according to the current target, platform, and command-line options. For
|
||||
example, the predefines buffer will contain "``#define __STDC__ 1``" when we
|
||||
are compiling C without Microsoft extensions. The predefines buffer itself
|
||||
is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
|
||||
verified along with the rest of the metadata.
|
||||
|
||||
A chained PCH file (that is, one that references another PCH) and a module
|
||||
(which may import other modules) have additional metadata containing the list
|
||||
of all AST files that this AST file depends on. Each of those files will be
|
||||
loaded along with this AST file.
|
||||
|
||||
For chained precompiled headers, the language options, target architecture and
|
||||
predefines buffer data is taken from the end of the chain, since they have to
|
||||
match anyway.
|
||||
|
||||
.. _pchinternals-sourcemgr:
|
||||
|
||||
Source Manager Block
|
||||
^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The source manager block contains the serialized representation of Clang's
|
||||
`SourceManager <InternalsManual.html#SourceLocation>`_ class, which handles the
|
||||
mapping from source locations (as represented in Clang's abstract syntax tree)
|
||||
into actual column/line positions within a source file or macro instantiation.
|
||||
The AST file's representation of the source manager also includes information
|
||||
about all of the headers that were (transitively) included when building the
|
||||
AST file.
|
||||
|
||||
The bulk of the source manager block is dedicated to information about the
|
||||
various files, buffers, and macro instantiations into which a source location
|
||||
can refer. Each of these is referenced by a numeric "file ID", which is a
|
||||
unique number (allocated starting at 1) stored in the source location. Clang
|
||||
serializes the information for each kind of file ID, along with an index that
|
||||
maps file IDs to the position within the AST file where the information about
|
||||
that file ID is stored. The data associated with a file ID is loaded only when
|
||||
required by the front end, e.g., to emit a diagnostic that includes a macro
|
||||
instantiation history inside the header itself.
|
||||
|
||||
The source manager block also contains information about all of the headers
|
||||
that were included when building the AST file. This includes information about
|
||||
the controlling macro for the header (e.g., when the preprocessor identified
|
||||
that the contents of the header dependent on a macro like
|
||||
``LLVM_CLANG_SOURCEMANAGER_H``) along with a cached version of the results of
|
||||
the ``stat()`` system calls performed when building the AST file. The latter
|
||||
is particularly useful in reducing system time when searching for include
|
||||
files.
|
||||
|
||||
.. _pchinternals-preprocessor:
|
||||
|
||||
Preprocessor Block
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The preprocessor block contains the serialized representation of the
|
||||
preprocessor. Specifically, it contains all of the macros that have been
|
||||
defined by the end of the header used to build the AST file, along with the
|
||||
token sequences that comprise each macro. The macro definitions are only read
|
||||
from the AST file when the name of the macro first occurs in the program. This
|
||||
lazy loading of macro definitions is triggered by lookups into the
|
||||
:ref:`identifier table <pchinternals-ident-table>`.
|
||||
|
||||
.. _pchinternals-types:
|
||||
|
||||
Types Block
|
||||
^^^^^^^^^^^
|
||||
|
||||
The types block contains the serialized representation of all of the types
|
||||
referenced in the translation unit. Each Clang type node (``PointerType``,
|
||||
``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
|
||||
When types are deserialized from the AST file, the data within the record is
|
||||
used to reconstruct the appropriate type node using the AST context.
|
||||
|
||||
Each type has a unique type ID, which is an integer that uniquely identifies
|
||||
that type. Type ID 0 represents the NULL type, type IDs less than
|
||||
``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
|
||||
while other "user-defined" type IDs are assigned consecutively from
|
||||
``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has
|
||||
an associated mapping from the user-defined types block to the location within
|
||||
the types block where the serialized representation of that type resides,
|
||||
enabling lazy deserialization of types. When a type is referenced from within
|
||||
the AST file, that reference is encoded using the type ID shifted left by 3
|
||||
bits. The lower three bits are used to represent the ``const``, ``volatile``,
|
||||
and ``restrict`` qualifiers, as in Clang's
|
||||
`QualType <http://clang.llvm.org/docs/InternalsManual.html#Type>`_ class.
|
||||
|
||||
.. _pchinternals-decls:
|
||||
|
||||
Declarations Block
|
||||
^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The declarations block contains the serialized representation of all of the
|
||||
declarations referenced in the translation unit. Each Clang declaration node
|
||||
(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
|
||||
AST file. When declarations are deserialized from the AST file, the data
|
||||
within the record is used to build and populate a new instance of the
|
||||
corresponding ``Decl`` node. As with types, each declaration node has a
|
||||
numeric ID that is used to refer to that declaration within the AST file. In
|
||||
addition, a lookup table provides a mapping from that numeric ID to the offset
|
||||
within the precompiled header where that declaration is described.
|
||||
|
||||
Declarations in Clang's abstract syntax trees are stored hierarchically. At
|
||||
the top of the hierarchy is the translation unit (``TranslationUnitDecl``),
|
||||
which contains all of the declarations in the translation unit but is not
|
||||
actually written as a specific declaration node. Its child declarations (such
|
||||
as functions or struct types) may also contain other declarations inside them,
|
||||
and so on. Within Clang, each declaration is stored within a `declaration
|
||||
context <http://clang.llvm.org/docs/InternalsManual.html#DeclContext>`_, as
|
||||
represented by the ``DeclContext`` class. Declaration contexts provide the
|
||||
mechanism to perform name lookup within a given declaration (e.g., find the
|
||||
member named ``x`` in a structure) and iterate over the declarations stored
|
||||
within a context (e.g., iterate over all of the fields of a structure for
|
||||
structure layout).
|
||||
|
||||
In Clang's AST file format, deserializing a declaration that is a
|
||||
``DeclContext`` is a separate operation from deserializing all of the
|
||||
declarations stored within that declaration context. Therefore, Clang will
|
||||
deserialize the translation unit declaration without deserializing the
|
||||
declarations within that translation unit. When required, the declarations
|
||||
stored within a declaration context will be deserialized. There are two
|
||||
representations of the declarations within a declaration context, which
|
||||
correspond to the name-lookup and iteration behavior described above:
|
||||
|
||||
* When the front end performs name lookup to find a name ``x`` within a given
|
||||
declaration context (for example, during semantic analysis of the expression
|
||||
``p->x``, where ``p``'s type is defined in the precompiled header), Clang
|
||||
refers to an on-disk hash table that maps from the names within that
|
||||
declaration context to the declaration IDs that represent each visible
|
||||
declaration with that name. The actual declarations will then be
|
||||
deserialized to provide the results of name lookup.
|
||||
* When the front end performs iteration over all of the declarations within a
|
||||
declaration context, all of those declarations are immediately
|
||||
de-serialized. For large declaration contexts (e.g., the translation unit),
|
||||
this operation is expensive; however, large declaration contexts are not
|
||||
traversed in normal compilation, since such a traversal is unnecessary.
|
||||
However, it is common for the code generator and semantic analysis to
|
||||
traverse declaration contexts for structs, classes, unions, and
|
||||
enumerations, although those contexts contain relatively few declarations in
|
||||
the common case.
|
||||
|
||||
Statements and Expressions
|
||||
^^^^^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
Statements and expressions are stored in the AST file in both the :ref:`types
|
||||
<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
|
||||
because every statement or expression will be associated with either a type or
|
||||
declaration. The actual statement and expression records are stored
|
||||
immediately following the declaration or type that owns the statement or
|
||||
expression. For example, the statement representing the body of a function
|
||||
will be stored directly following the declaration of the function.
|
||||
|
||||
As with types and declarations, each statement and expression kind in Clang's
|
||||
abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
|
||||
record type in the AST file, which contains the serialized representation of
|
||||
that statement or expression. Each substatement or subexpression within an
|
||||
expression is stored as a separate record (which keeps most records to a fixed
|
||||
size). Within the AST file, the subexpressions of an expression are stored, in
|
||||
reverse order, prior to the expression that owns those expression, using a form
|
||||
of `Reverse Polish Notation
|
||||
<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an
|
||||
expression ``3 - 4 + 5`` would be represented as follows:
|
||||
|
||||
+-----------------------+
|
||||
| ``IntegerLiteral(5)`` |
|
||||
+-----------------------+
|
||||
| ``IntegerLiteral(4)`` |
|
||||
+-----------------------+
|
||||
| ``IntegerLiteral(3)`` |
|
||||
+-----------------------+
|
||||
| ``IntegerLiteral(-)`` |
|
||||
+-----------------------+
|
||||
| ``IntegerLiteral(+)`` |
|
||||
+-----------------------+
|
||||
| ``STOP`` |
|
||||
+-----------------------+
|
||||
|
||||
When reading this representation, Clang evaluates each expression record it
|
||||
encounters, builds the appropriate abstract syntax tree node, and then pushes
|
||||
that expression on to a stack. When a record contains *N* subexpressions ---
|
||||
``BinaryOperator`` has two of them --- those expressions are popped from the
|
||||
top of the stack. The special STOP code indicates that we have reached the end
|
||||
of a serialized expression or statement; other expression or statement records
|
||||
may follow, but they are part of a different expression.
|
||||
|
||||
.. _pchinternals-ident-table:
|
||||
|
||||
Identifier Table Block
|
||||
^^^^^^^^^^^^^^^^^^^^^^
|
||||
|
||||
The identifier table block contains an on-disk hash table that maps each
|
||||
identifier mentioned within the AST file to the serialized representation of
|
||||
the identifier's information (e.g, the ``IdentifierInfo`` structure). The
|
||||
serialized representation contains:
|
||||
|
||||
* The actual identifier string.
|
||||
* Flags that describe whether this identifier is the name of a built-in, a
|
||||
poisoned identifier, an extension token, or a macro.
|
||||
* If the identifier names a macro, the offset of the macro definition within
|
||||
the :ref:`pchinternals-preprocessor`.
|
||||
* If the identifier names one or more declarations visible from translation
|
||||
unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
|
||||
declarations.
|
||||
|
||||
When an AST file is loaded, the AST file reader mechanism introduces itself
|
||||
into the identifier table as an external lookup source. Thus, when the user
|
||||
program refers to an identifier that has not yet been seen, Clang will perform
|
||||
a lookup into the identifier table. If an identifier is found, its contents
|
||||
(macro definitions, flags, top-level declarations, etc.) will be deserialized,
|
||||
at which point the corresponding ``IdentifierInfo`` structure will have the
|
||||
same contents it would have after parsing the headers in the AST file.
|
||||
|
||||
Within the AST file, the identifiers used to name declarations are represented
|
||||
with an integral value. A separate table provides a mapping from this integral
|
||||
value (the identifier ID) to the location within the on-disk hash table where
|
||||
that identifier is stored. This mapping is used when deserializing the name of
|
||||
a declaration, the identifier of a token, or any other construct in the AST
|
||||
file that refers to a name.
|
||||
|
||||
.. _pchinternals-method-pool:
|
||||
|
||||
Method Pool Block
|
||||
^^^^^^^^^^^^^^^^^
|
||||
|
||||
The method pool block is represented as an on-disk hash table that serves two
|
||||
purposes: it provides a mapping from the names of Objective-C selectors to the
|
||||
set of Objective-C instance and class methods that have that particular
|
||||
selector (which is required for semantic analysis in Objective-C) and also
|
||||
stores all of the selectors used by entities within the AST file. The design
|
||||
of the method pool is similar to that of the :ref:`identifier table
|
||||
<pchinternals-ident-table>`: the first time a particular selector is formed
|
||||
during the compilation of the program, Clang will search in the on-disk hash
|
||||
table of selectors; if found, Clang will read the Objective-C methods
|
||||
associated with that selector into the appropriate front-end data structure
|
||||
(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
|
||||
class methods, respectively).
|
||||
|
||||
As with identifiers, selectors are represented by numeric values within the AST
|
||||
file. A separate index maps these numeric selector values to the offset of the
|
||||
selector within the on-disk hash table, and will be used when de-serializing an
|
||||
Objective-C method declaration (or other Objective-C construct) that refers to
|
||||
the selector.
|
||||
|
||||
AST Reader Integration Points
|
||||
-----------------------------
|
||||
|
||||
The "lazy" deserialization behavior of AST files requires their integration
|
||||
into several completely different submodules of Clang. For example, lazily
|
||||
deserializing the declarations during name lookup requires that the name-lookup
|
||||
routines be able to query the AST file to find entities stored there.
|
||||
|
||||
For each Clang data structure that requires direct interaction with the AST
|
||||
reader logic, there is an abstract class that provides the interface between
|
||||
the two modules. The ``ASTReader`` class, which handles the loading of an AST
|
||||
file, inherits from all of these abstract classes to provide lazy
|
||||
deserialization of Clang's data structures. ``ASTReader`` implements the
|
||||
following abstract classes:
|
||||
|
||||
``StatSysCallCache``
|
||||
This abstract interface is associated with the ``FileManager`` class, and is
|
||||
used whenever the file manager is going to perform a ``stat()`` system call.
|
||||
|
||||
``ExternalSLocEntrySource``
|
||||
This abstract interface is associated with the ``SourceManager`` class, and
|
||||
is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
|
||||
load the details of a file, buffer, or macro instantiation.
|
||||
|
||||
``IdentifierInfoLookup``
|
||||
This abstract interface is associated with the ``IdentifierTable`` class, and
|
||||
is used whenever the program source refers to an identifier that has not yet
|
||||
been seen. In this case, the AST reader searches for this identifier within
|
||||
its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
|
||||
declarations or macros associated with that identifier.
|
||||
|
||||
``ExternalASTSource``
|
||||
This abstract interface is associated with the ``ASTContext`` class, and is
|
||||
used whenever the abstract syntax tree nodes need to loaded from the AST
|
||||
file. It provides the ability to de-serialize declarations and types
|
||||
identified by their numeric values, read the bodies of functions when
|
||||
required, and read the declarations stored within a declaration context
|
||||
(either for iteration or for name lookup).
|
||||
|
||||
``ExternalSemaSource``
|
||||
This abstract interface is associated with the ``Sema`` class, and is used
|
||||
whenever semantic analysis needs to read information from the :ref:`global
|
||||
method pool <pchinternals-method-pool>`.
|
||||
|
||||
.. _pchinternals-chained:
|
||||
|
||||
Chained precompiled headers
|
||||
---------------------------
|
||||
|
||||
Chained precompiled headers were initially intended to improve the performance
|
||||
of IDE-centric operations such as syntax highlighting and code completion while
|
||||
a particular source file is being edited by the user. To minimize the amount
|
||||
of reparsing required after a change to the file, a form of precompiled header
|
||||
--- called a precompiled *preamble* --- is automatically generated by parsing
|
||||
all of the headers in the source file, up to and including the last
|
||||
``#include``. When only the source file changes (and none of the headers it
|
||||
depends on), reparsing of that source file can use the precompiled preamble and
|
||||
start parsing after the ``#include``\ s, so parsing time is proportional to the
|
||||
size of the source file (rather than all of its includes). However, the
|
||||
compilation of that translation unit may already use a precompiled header: in
|
||||
this case, Clang will create the precompiled preamble as a chained precompiled
|
||||
header that refers to the original precompiled header. This drastically
|
||||
reduces the time needed to serialize the precompiled preamble for use in
|
||||
reparsing.
|
||||
|
||||
Chained precompiled headers get their name because each precompiled header can
|
||||
depend on one other precompiled header, forming a chain of dependencies. A
|
||||
translation unit will then include the precompiled header that starts the chain
|
||||
(i.e., nothing depends on it). This linearity of dependencies is important for
|
||||
the semantic model of chained precompiled headers, because the most-recent
|
||||
precompiled header can provide information that overrides the information
|
||||
provided by the precompiled headers it depends on, just like a header file
|
||||
``B.h`` that includes another header ``A.h`` can modify the state produced by
|
||||
parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
|
||||
|
||||
There are several ways in which chained precompiled headers generalize the AST
|
||||
file model:
|
||||
|
||||
Numbering of IDs
|
||||
Many different kinds of entities --- identifiers, declarations, types, etc.
|
||||
--- have ID numbers that start at 1 or some other predefined constant and
|
||||
grow upward. Each precompiled header records the maximum ID number it has
|
||||
assigned in each category. Then, when a new precompiled header is generated
|
||||
that depends on (chains to) another precompiled header, it will start
|
||||
counting at the next available ID number. This way, one can determine, given
|
||||
an ID number, which AST file actually contains the entity.
|
||||
|
||||
Name lookup
|
||||
When writing a chained precompiled header, Clang attempts to write only
|
||||
information that has changed from the precompiled header on which it is
|
||||
based. This changes the lookup algorithm for the various tables, such as the
|
||||
:ref:`identifier table <pchinternals-ident-table>`: the search starts at the
|
||||
most-recent precompiled header. If no entry is found, lookup then proceeds
|
||||
to the identifier table in the precompiled header it depends on, and so one.
|
||||
Once a lookup succeeds, that result is considered definitive, overriding any
|
||||
results from earlier precompiled headers.
|
||||
|
||||
Update records
|
||||
There are various ways in which a later precompiled header can modify the
|
||||
entities described in an earlier precompiled header. For example, later
|
||||
precompiled headers can add entries into the various name-lookup tables for
|
||||
the translation unit or namespaces, or add new categories to an Objective-C
|
||||
class. Each of these updates is captured in an "update record" that is
|
||||
stored in the chained precompiled header file and will be loaded along with
|
||||
the original entity.
|
||||
|
||||
.. _pchinternals-modules:
|
||||
|
||||
Modules
|
||||
-------
|
||||
|
||||
Modules generalize the chained precompiled header model yet further, from a
|
||||
linear chain of precompiled headers to an arbitrary directed acyclic graph
|
||||
(DAG) of AST files. All of the same techniques used to make chained
|
||||
precompiled headers work --- ID number, name lookup, update records --- are
|
||||
shared with modules. However, the DAG nature of modules introduce a number of
|
||||
additional complications to the model:
|
||||
|
||||
Numbering of IDs
|
||||
The simple, linear numbering scheme used in chained precompiled headers falls
|
||||
apart with the module DAG, because different modules may end up with
|
||||
different numbering schemes for entities they imported from common shared
|
||||
modules. To account for this, each module file provides information about
|
||||
which modules it depends on and which ID numbers it assigned to the entities
|
||||
in those modules, as well as which ID numbers it took for its own new
|
||||
entities. The AST reader then maps these "local" ID numbers into a "global"
|
||||
ID number space for the current translation unit, providing a 1-1 mapping
|
||||
between entities (in whatever AST file they inhabit) and global ID numbers.
|
||||
If that translation unit is then serialized into an AST file, this mapping
|
||||
will be stored for use when the AST file is imported.
|
||||
|
||||
Declaration merging
|
||||
It is possible for a given entity (from the language's perspective) to be
|
||||
declared multiple times in different places. For example, two different
|
||||
headers can have the declaration of ``printf`` or could forward-declare
|
||||
``struct stat``. If each of those headers is included in a module, and some
|
||||
third party imports both of those modules, there is a potentially serious
|
||||
problem: name lookup for ``printf`` or ``struct stat`` will find both
|
||||
declarations, but the AST nodes are unrelated. This would result in a
|
||||
compilation error, due to an ambiguity in name lookup. Therefore, the AST
|
||||
reader performs declaration merging according to the appropriate language
|
||||
semantics, ensuring that the two disjoint declarations are merged into a
|
||||
single redeclaration chain (with a common canonical declaration), so that it
|
||||
is as if one of the headers had been included before the other.
|
||||
|
||||
Name Visibility
|
||||
Modules allow certain names that occur during module creation to be "hidden",
|
||||
so that they are not part of the public interface of the module and are not
|
||||
visible to its clients. The AST reader maintains a "visible" bit on various
|
||||
AST nodes (declarations, macros, etc.) to indicate whether that particular
|
||||
AST node is currently visible; the various name lookup mechanisms in Clang
|
||||
inspect the visible bit to determine whether that entity, which is still in
|
||||
the AST (because other, visible AST nodes may depend on it), can actually be
|
||||
found by name lookup. When a new (sub)module is imported, it may make
|
||||
existing, non-visible, already-deserialized AST nodes visible; it is the
|
||||
responsibility of the AST reader to find and update these AST nodes when it
|
||||
is notified of the import.
|
||||
|
|
@ -1,126 +0,0 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
|
||||
<html>
|
||||
<head>
|
||||
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
||||
<title>ThreadSanitizer, a race detector</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="../content.css">
|
||||
<style type="text/css">
|
||||
td {
|
||||
vertical-align: top;
|
||||
}
|
||||
</style>
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!--#include virtual="../menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h1>ThreadSanitizer</h1>
|
||||
<ul>
|
||||
<li> <a href="#intro">Introduction</a>
|
||||
<li> <a href="#howtobuild">How to Build</a>
|
||||
<li> <a href="#platforms">Supported Platforms</a>
|
||||
<li> <a href="#usage">Usage</a>
|
||||
<li> <a href="#limitations">Limitations</a>
|
||||
<li> <a href="#status">Current Status</a>
|
||||
<li> <a href="#moreinfo">More Information</a>
|
||||
</ul>
|
||||
|
||||
<h2 id="intro">Introduction</h2>
|
||||
ThreadSanitizer is a tool that detects data races. <BR>
|
||||
It consists of a compiler instrumentation module and a run-time library. <BR>
|
||||
Typical slowdown introduced by ThreadSanitizer is <b>5x-15x</b> (TODO: these numbers are
|
||||
approximate so far).
|
||||
|
||||
<h2 id="howtobuild">How to build</h2>
|
||||
Follow the <a href="../get_started.html">clang build instructions</a>.
|
||||
CMake build is supported.<BR>
|
||||
|
||||
<h2 id="platforms">Supported Platforms</h2>
|
||||
ThreadSanitizer is supported on Linux x86_64 (tested on Ubuntu 10.04). <BR>
|
||||
Support for MacOS 10.7 (64-bit only) is planned for late 2012. <BR>
|
||||
Support for 32-bit platforms is problematic and not yet planned.
|
||||
|
||||
|
||||
|
||||
<h2 id="usage">Usage</h2>
|
||||
Simply compile your program with <tt>-fsanitize=thread -fPIE</tt> and link it
|
||||
with <tt>-fsanitize=thread -pie</tt>.<BR>
|
||||
To get a reasonable performance add <tt>-O1</tt> or higher. <BR>
|
||||
Use <tt>-g</tt> to get file names and line numbers in the warning messages. <BR>
|
||||
|
||||
Example:
|
||||
<pre>
|
||||
% cat projects/compiler-rt/lib/tsan/output_tests/tiny_race.c
|
||||
#include <pthread.h>
|
||||
int Global;
|
||||
void *Thread1(void *x) {
|
||||
Global = 42;
|
||||
return x;
|
||||
}
|
||||
int main() {
|
||||
pthread_t t;
|
||||
pthread_create(&t, NULL, Thread1, NULL);
|
||||
Global = 43;
|
||||
pthread_join(t, NULL);
|
||||
return Global;
|
||||
}
|
||||
</pre>
|
||||
|
||||
<pre>
|
||||
% clang -fsanitize=thread -g -O1 tiny_race.c -fPIE -pie
|
||||
</pre>
|
||||
|
||||
If a bug is detected, the program will print an error message to stderr.
|
||||
Currently, ThreadSanitizer symbolizes its output using an external
|
||||
<tt>addr2line</tt>
|
||||
process (this will be fixed in future).
|
||||
<pre>
|
||||
% TSAN_OPTIONS=strip_path_prefix=`pwd`/ # Don't print full paths.
|
||||
% ./a.out 2> log
|
||||
% cat log
|
||||
WARNING: ThreadSanitizer: data race (pid=19219)
|
||||
Write of size 4 at 0x7fcf47b21bc0 by thread 1:
|
||||
#0 Thread1 tiny_race.c:4 (exe+0x00000000a360)
|
||||
Previous write of size 4 at 0x7fcf47b21bc0 by main thread:
|
||||
#0 main tiny_race.c:10 (exe+0x00000000a3b4)
|
||||
Thread 1 (running) created at:
|
||||
#0 pthread_create ??:0 (exe+0x00000000c790)
|
||||
#1 main tiny_race.c:9 (exe+0x00000000a3a4)
|
||||
</pre>
|
||||
|
||||
|
||||
<h2 id="limitations">Limitations</h2>
|
||||
<ul>
|
||||
<li> ThreadSanitizer uses more real memory than a native run.
|
||||
At the default settings the memory overhead is 9x plus 9Mb per each thread.
|
||||
Settings with 5x and 3x overhead (but less accurate analysis) are also available.
|
||||
<li> ThreadSanitizer maps (but does not reserve) a lot of virtual address space.
|
||||
This means that tools like <tt>ulimit</tt> may not work as usually expected.
|
||||
<li> Static linking is not supported.
|
||||
<li> ThreadSanitizer requires <tt>-fPIE -pie</tt>
|
||||
</ul>
|
||||
|
||||
|
||||
<h2 id="status">Current Status</h2>
|
||||
ThreadSanitizer is in alpha stage.
|
||||
It is known to work on large C++ programs using pthreads, but we do not promise
|
||||
anything (yet). <BR>
|
||||
C++11 threading is not yet supported. <BR>
|
||||
The test suite is integrated into CMake build and can be run with
|
||||
<tt>make check-tsan</tt> command. <BR>
|
||||
|
||||
We are actively working on enhancing the tool -- stay tuned.
|
||||
Any help, especially in the form of minimized standalone tests is more than welcome.
|
||||
|
||||
<h2 id="moreinfo">More Information</h2>
|
||||
<a href="http://code.google.com/p/thread-sanitizer/">http://code.google.com/p/thread-sanitizer</a>.
|
||||
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
|
@ -0,0 +1,95 @@
|
|||
ThreadSanitizer
|
||||
===============
|
||||
|
||||
Introduction
|
||||
------------
|
||||
|
||||
ThreadSanitizer is a tool that detects data races. It consists of a compiler
|
||||
instrumentation module and a run-time library. Typical slowdown introduced by
|
||||
ThreadSanitizer is **5x-15x** (TODO: these numbers are approximate so far).
|
||||
|
||||
How to build
|
||||
------------
|
||||
|
||||
Follow the `Clang build instructions <../get_started.html>`_. CMake build is
|
||||
supported.
|
||||
|
||||
Supported Platforms
|
||||
-------------------
|
||||
|
||||
ThreadSanitizer is supported on Linux x86_64 (tested on Ubuntu 10.04). Support
|
||||
for MacOS 10.7 (64-bit only) is planned for late 2012. Support for 32-bit
|
||||
platforms is problematic and not yet planned.
|
||||
|
||||
Usage
|
||||
-----
|
||||
|
||||
Simply compile your program with ``-fsanitize=thread -fPIE`` and link it with
|
||||
``-fsanitize=thread -pie``. To get a reasonable performance add ``-O1`` or
|
||||
higher. Use ``-g`` to get file names and line numbers in the warning messages.
|
||||
|
||||
Example:
|
||||
|
||||
.. code-block:: c++
|
||||
|
||||
% cat projects/compiler-rt/lib/tsan/output_tests/tiny_race.c
|
||||
#include <pthread.h>
|
||||
int Global;
|
||||
void *Thread1(void *x) {
|
||||
Global = 42;
|
||||
return x;
|
||||
}
|
||||
int main() {
|
||||
pthread_t t;
|
||||
pthread_create(&t, NULL, Thread1, NULL);
|
||||
Global = 43;
|
||||
pthread_join(t, NULL);
|
||||
return Global;
|
||||
}
|
||||
|
||||
$ clang -fsanitize=thread -g -O1 tiny_race.c -fPIE -pie
|
||||
|
||||
If a bug is detected, the program will print an error message to stderr.
|
||||
Currently, ThreadSanitizer symbolizes its output using an external
|
||||
``addr2line`` process (this will be fixed in future).
|
||||
|
||||
.. code-block:: bash
|
||||
|
||||
% TSAN_OPTIONS=strip_path_prefix=`pwd`/ # Don't print full paths.
|
||||
% ./a.out 2> log
|
||||
% cat log
|
||||
WARNING: ThreadSanitizer: data race (pid=19219)
|
||||
Write of size 4 at 0x7fcf47b21bc0 by thread 1:
|
||||
#0 Thread1 tiny_race.c:4 (exe+0x00000000a360)
|
||||
Previous write of size 4 at 0x7fcf47b21bc0 by main thread:
|
||||
#0 main tiny_race.c:10 (exe+0x00000000a3b4)
|
||||
Thread 1 (running) created at:
|
||||
#0 pthread_create ??:0 (exe+0x00000000c790)
|
||||
#1 main tiny_race.c:9 (exe+0x00000000a3a4)
|
||||
|
||||
Limitations
|
||||
-----------
|
||||
|
||||
* ThreadSanitizer uses more real memory than a native run. At the default
|
||||
settings the memory overhead is 9x plus 9Mb per each thread. Settings with 5x
|
||||
and 3x overhead (but less accurate analysis) are also available.
|
||||
* ThreadSanitizer maps (but does not reserve) a lot of virtual address space.
|
||||
This means that tools like ``ulimit`` may not work as usually expected.
|
||||
* Static linking is not supported.
|
||||
* ThreadSanitizer requires ``-fPIE -pie``.
|
||||
|
||||
Current Status
|
||||
--------------
|
||||
|
||||
ThreadSanitizer is in alpha stage. It is known to work on large C++ programs
|
||||
using pthreads, but we do not promise anything (yet). C++11 threading is not
|
||||
yet supported. The test suite is integrated into CMake build and can be run
|
||||
with ``make check-tsan`` command.
|
||||
|
||||
We are actively working on enhancing the tool --- stay tuned. Any help,
|
||||
especially in the form of minimized standalone tests is more than welcome.
|
||||
|
||||
More Information
|
||||
----------------
|
||||
`http://code.google.com/p/thread-sanitizer <http://code.google.com/p/thread-sanitizer/>`_.
|
||||
|
|
@ -1,120 +0,0 @@
|
|||
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
|
||||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>Writing Clang Tools</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="../content.css">
|
||||
</head>
|
||||
<body>
|
||||
|
||||
<!--#include virtual="../menu.html.incl"-->
|
||||
|
||||
<div id="content">
|
||||
|
||||
<h1>Writing Clang Tools</h1>
|
||||
<p>Clang provides infrastructure to write tools that need syntactic and semantic
|
||||
information about a program. This document will give a short introduction of the
|
||||
different ways to write clang tools, and their pros and cons.</p>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="libclang"><a href="http://clang.llvm.org/doxygen/group__CINDEX.html">LibClang</a></h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>LibClang is a stable high level C interface to clang. When in doubt LibClang
|
||||
is probably the interface you want to use. Consider the other interfaces only
|
||||
when you have a good reason not to use LibClang.</p>
|
||||
<p>Canonical examples of when to use LibClang:</p>
|
||||
<ul>
|
||||
<li>Xcode</li>
|
||||
<li>Clang Python Bindings</li>
|
||||
</ul>
|
||||
<p>Use LibClang when you...</p>
|
||||
<ul>
|
||||
<li>want to interface with clang from other languages than C++</li>
|
||||
<li>need a stable interface that takes care to be backwards compatible</li>
|
||||
<li>want powerful high-level abstractions, like iterating through an AST
|
||||
with a cursor, and don't want to learn all the nitty gritty details of Clang's
|
||||
AST.</li>
|
||||
</ul>
|
||||
<p>Do not use LibClang when you...</p>
|
||||
<ul>
|
||||
<li>want full control over the Clang AST</li>
|
||||
</ul>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="clang-plugins"><a href="ClangPlugins.html">Clang Plugins</a></h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>Clang Plugins allow you to run additional actions on the AST as part of
|
||||
a compilation. Plugins are dynamic libraries that are loaded at runtime by
|
||||
the compiler, and they're easy to integrate into your build environment.</p>
|
||||
<p>Canonical examples of when to use Clang Plugins:</p>
|
||||
<ul>
|
||||
<li>special lint-style warnings or errors for your project</li>
|
||||
<li>creating additional build artifacts from a single compile step</li>
|
||||
</ul>
|
||||
<p>Use Clang Plugins when you...</p>
|
||||
<ul>
|
||||
<li>need your tool to rerun if any of the dependencies change</li>
|
||||
<li>want your tool to make or break a build</li>
|
||||
<li>need full control over the Clang AST</li>
|
||||
</ul>
|
||||
<p>Do not use Clang Plugins when you...</p>
|
||||
<ul>
|
||||
<li>want to run tools outside of your build environment</li>
|
||||
<li>want full control on how Clang is set up, including mapping of in-memory
|
||||
virtual files</li>
|
||||
<li>need to run over a specific subset of files in your project which is not
|
||||
necessarily related to any changes which would trigger rebuilds</li>
|
||||
</ul>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="libtooling"><a href="LibTooling.html">LibTooling</a></h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>LibTooling is a C++ interface aimed at writing standalone tools, as well as
|
||||
integrating into services that run clang tools.</p>
|
||||
<p>Canonical examples of when to use LibTooling:</p>
|
||||
<ul>
|
||||
<li>a simple syntax checker</li>
|
||||
<li>refactoring tools</li>
|
||||
</ul>
|
||||
<p>Use LibTooling when you...</p>
|
||||
<ul>
|
||||
<li>want to run tools over a single file, or a specific subset of files,
|
||||
independently of the build system</li>
|
||||
<li>want full control over the Clang AST</li>
|
||||
<li>want to share code with Clang Plugins</li>
|
||||
</ul>
|
||||
<p>Do not use LibTooling when you...</p>
|
||||
<ul>
|
||||
<li>want to run as part of the build triggered by dependency changes</li>
|
||||
<li>want a stable interface so you don't need to change your code when the
|
||||
AST API changes</li>
|
||||
<li>want high level abstractions like cursors and code completion out of the
|
||||
box</li>
|
||||
<li>do not want to write your tools in C++</li>
|
||||
</ul>
|
||||
|
||||
<!-- ======================================================================= -->
|
||||
<h2 id="clang-tools"><a href="ClangTools.html">Clang Tools</a></h2>
|
||||
<!-- ======================================================================= -->
|
||||
|
||||
<p>These are a collection of specific developer tools built on top of the
|
||||
LibTooling infrastructure as part of the Clang project. They are targeted at
|
||||
automating and improving core development activities of C/C++ developers.</p>
|
||||
<p>Examples of tools we are building or planning as part of the Clang
|
||||
project:</p>
|
||||
<ul>
|
||||
<li>Syntax checking (clang-check)</li>
|
||||
<li>Automatic fixing of compile errors (clangc-fixit)</li>
|
||||
<li>Automatic code formatting</li>
|
||||
<li>Migration tools for new features in new language standards</li>
|
||||
<li>Core refactoring tools</li>
|
||||
</ul>
|
||||
|
||||
</div>
|
||||
</body>
|
||||
</html>
|
||||
|
|
@ -0,0 +1,100 @@
|
|||
===================
|
||||
Writing Clang Tools
|
||||
===================
|
||||
|
||||
Clang provides infrastructure to write tools that need syntactic and semantic
|
||||
information about a program. This document will give a short introduction of
|
||||
the different ways to write clang tools, and their pros and cons.
|
||||
|
||||
LibClang
|
||||
--------
|
||||
|
||||
`LibClang <http://clang.llvm.org/doxygen/group__CINDEX.html>`_ is a stable high
|
||||
level C interface to clang. When in doubt LibClang is probably the interface
|
||||
you want to use. Consider the other interfaces only when you have a good
|
||||
reason not to use LibClang.
|
||||
|
||||
Canonical examples of when to use LibClang:
|
||||
|
||||
* Xcode
|
||||
* Clang Python Bindings
|
||||
|
||||
Use LibClang when you...:
|
||||
|
||||
* want to interface with clang from other languages than C++
|
||||
* need a stable interface that takes care to be backwards compatible
|
||||
* want powerful high-level abstractions, like iterating through an AST with a
|
||||
cursor, and don't want to learn all the nitty gritty details of Clang's AST.
|
||||
|
||||
Do not use LibClang when you...:
|
||||
|
||||
* want full control over the Clang AST
|
||||
|
||||
Clang Plugins
|
||||
-------------
|
||||
|
||||
`Clang Plugins <ClangPlugins.html>`_ allow you to run additional actions on the
|
||||
AST as part of a compilation. Plugins are dynamic libraries that are loaded at
|
||||
runtime by the compiler, and they're easy to integrate into your build
|
||||
environment.
|
||||
|
||||
Canonical examples of when to use Clang Plugins:
|
||||
|
||||
* special lint-style warnings or errors for your project
|
||||
* creating additional build artifacts from a single compile step
|
||||
|
||||
Use Clang Plugins when you...:
|
||||
|
||||
* need your tool to rerun if any of the dependencies change
|
||||
* want your tool to make or break a build
|
||||
* need full control over the Clang AST
|
||||
|
||||
Do not use Clang Plugins when you...:
|
||||
|
||||
* want to run tools outside of your build environment
|
||||
* want full control on how Clang is set up, including mapping of in-memory
|
||||
virtual files
|
||||
* need to run over a specific subset of files in your project which is not
|
||||
necessarily related to any changes which would trigger rebuilds
|
||||
|
||||
LibTooling
|
||||
----------
|
||||
|
||||
`LibTooling <LibTooling.html>`_ is a C++ interface aimed at writing standalone
|
||||
tools, as well as integrating into services that run clang tools. Canonical
|
||||
examples of when to use LibTooling:
|
||||
|
||||
* a simple syntax checker
|
||||
* refactoring tools
|
||||
|
||||
Use LibTooling when you...:
|
||||
|
||||
* want to run tools over a single file, or a specific subset of files,
|
||||
independently of the build system
|
||||
* want full control over the Clang AST
|
||||
* want to share code with Clang Plgins
|
||||
|
||||
Do not use LibTooling when you...:
|
||||
|
||||
* want to run as part of the build triggered by dependency changes
|
||||
* want a stable interface so you don't need to change your code when the AST API
|
||||
changes
|
||||
* want high level abstractions like cursors and code completion out of the box
|
||||
* do not want to write your tools in C++
|
||||
|
||||
Clang Tools
|
||||
-----------
|
||||
|
||||
`Clang tools <ClangTools.html>`_ are a collection of specific developer tools
|
||||
built on top of the LibTooling infrastructure as part of the Clang project.
|
||||
They are targeted at automating and improving core development activities of
|
||||
C/C++ developers.
|
||||
|
||||
Examples of tools we are building or planning as part of the Clang project:
|
||||
|
||||
* Syntax checking (:program:`clang-check`)
|
||||
* Automatic fixing of compile errors (:program:`clang-fixit`)
|
||||
* Automatic code formatting
|
||||
* Migration tools for new features in new language standards
|
||||
* Core refactoring tools
|
||||
|
|
@ -12,6 +12,12 @@ progress. This page will get filled out with docs soon...
|
|||
.. toctree::
|
||||
:maxdepth: 2
|
||||
|
||||
LanguageExtensions
|
||||
LibASTMatchers
|
||||
LibTooling
|
||||
PCHInternals
|
||||
ThreadSanitizer
|
||||
Tooling
|
||||
|
||||
|
||||
Indices and tables
|
||||
|
|
Загрузка…
Ссылка в новой задаче