docs: Convert some docs to reST.

Converts:
    LanguageExtensions
    LibASTMatchers
    LibTooling
    PCHInternals
    ThreadSanitizer
    Tooling

Patch by Mykhailo Pustovit!
(with minor edits by Dmitri Gribenko and Sean Silva)

git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@170048 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Sean Silva 2012-12-12 23:44:55 +00:00
Родитель b34ae9be52
Коммит 3872b46ba9
13 изменённых файлов: 2952 добавлений и 3328 удалений

Разница между файлами не показана из-за своего большого размера Загрузить разницу

1838
docs/LanguageExtensions.rst Normal file

Разница между файлами не показана из-за своего большого размера Загрузить разницу

Просмотреть файл

@ -1,130 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Matching the Clang AST</title>
<link type="text/css" rel="stylesheet" href="../menu.css" />
<link type="text/css" rel="stylesheet" href="../content.css" />
</head>
<body>
<!--#include virtual="../menu.html.incl"-->
<div id="content">
<h1>Matching the Clang AST</h1>
<p>This document explains how to use Clang's LibASTMatchers to match interesting
nodes of the AST and execute code that uses the matched nodes. Combined with
<a href="LibTooling.html">LibTooling</a>, LibASTMatchers helps to write
code-to-code transformation tools or query tools.</p>
<p>We assume basic knowledge about the Clang AST. See the
<a href="IntroductionToTheClangAST.html">Introduction to the Clang AST</a> if
you want to learn more about how the AST is structured.</p>
<!-- FIXME: create tutorial and link to the tutorial -->
<!-- ======================================================================= -->
<h2 id="intro">Introduction</h2>
<!-- ======================================================================= -->
<p>LibASTMatchers provides a domain specific language to create predicates on Clang's
AST. This DSL is written in and can be used from C++, allowing users to write
a single program to both match AST nodes and access the node's C++ interface
to extract attributes, source locations, or any other information provided on
the AST level.</p>
<p>AST matchers are predicates on nodes in the AST. Matchers are created
by calling creator functions that allow building up a tree of matchers, where
inner matchers are used to make the match more specific.</p>
</p>For example, to create a matcher that matches all class or union declarations
in the AST of a translation unit, you can call
<a href="LibASTMatchersReference.html#recordDecl0Anchor">recordDecl()</a>.
To narrow the match down, for example to find all class or union declarations with the name "Foo",
insert a <a href="LibASTMatchersReference.html#hasName0Anchor">hasName</a>
matcher: the call recordDecl(hasName("Foo")) returns a matcher that matches classes
or unions that are named "Foo", in any namespace. By default, matchers that accept
multiple inner matchers use an implicit <a href="LibASTMatchersReference.html#allOf0Anchor">allOf()</a>.
This allows further narrowing down the match, for example to match all classes
that are derived from "Bar": recordDecl(hasName("Foo"), isDerivedFrom("Bar")).</p>
<!-- ======================================================================= -->
<h2 id="writing">How to create a matcher</h2>
<!-- ======================================================================= -->
<p>With more than a thousand classes in the Clang AST, one can quickly get lost
when trying to figure out how to create a matcher for a specific pattern. This
section will teach you how to use a rigorous step-by-step pattern to build the
matcher you are interested in. Note that there will always be matchers missing
for some part of the AST. See the section about <a href="#writing">how to write
your own AST matchers</a> later in this document.</p>
<p>The precondition to using the matchers is to understand how the AST
for what you want to match looks like. The <a href="IntroductionToTheClangAST.html">Introduction to the Clang AST</a>
teaches you how to dump a translation unit's AST into a human readable format.</p>
<!-- FIXME: Introduce link to ASTMatchersTutorial.html -->
<!-- FIXME: Introduce link to ASTMatchersCookbook.html -->
<p>In general, the strategy to create the right matchers is:</p>
<ol>
<li>Find the outermost class in Clang's AST you want to match.</li>
<li>Look at the <a href="LibASTMatchersReference.html">AST Matcher Reference</a> for matchers that either match the
node you're interested in or narrow down attributes on the node.</li>
<li>Create your outer match expression. Verify that it works as expected.</li>
<li>Examine the matchers for what the next inner node you want to match is.</li>
<li>Repeat until the matcher is finished.</li>
</ol>
<!-- ======================================================================= -->
<h2 id="binding">Binding nodes in match expressions</h2>
<!-- ======================================================================= -->
<p>Matcher expressions allow you to specify which parts of the AST are interesting
for a certain task. Often you will want to then do something with the nodes
that were matched, like building source code transformations.</p>
<p>To that end, matchers that match specific AST nodes (so called node matchers)
are bindable; for example, recordDecl(hasName("MyClass")).bind("id") will bind
the matched recordDecl node to the string "id", to be later retrieved in the
<a href="http://clang.llvm.org/doxygen/classclang_1_1ast__matchers_1_1MatchFinder_1_1MatchCallback.html">match callback</a>.</p>
<!-- FIXME: Introduce link to ASTMatchersTutorial.html -->
<!-- FIXME: Introduce link to ASTMatchersCookbook.html -->
<!-- ======================================================================= -->
<h2 id="writing">Writing your own matchers</h2>
<!-- ======================================================================= -->
<p>There are multiple different ways to define a matcher, depending on its
type and flexibility.</p>
<ul>
<li><b>VariadicDynCastAllOfMatcher&ltBase, Derived></b><p>Those match all nodes
of type <i>Base</i> if they can be dynamically casted to <i>Derived</i>. The
names of those matchers are nouns, which closely resemble <i>Derived</i>.
VariadicDynCastAllOfMatchers are the backbone of the matcher hierarchy. Most
often, your match expression will start with one of them, and you can
<a href="#binding">bind</a> the node they represent to ids for later processing.</p>
<p>VariadicDynCastAllOfMatchers are callable classes that model variadic
template functions in C++03. They take an aribtrary number of Matcher&lt;Derived>
and return a Matcher&lt;Base>.</p></li>
<li><b>AST_MATCHER_P(Type, Name, ParamType, Param)</b><p> Most matcher definitions
use the matcher creation macros. Those define both the matcher of type Matcher&lt;Type>
itself, and a matcher-creation function named <i>Name</i> that takes a parameter
of type <i>ParamType</i> and returns the corresponding matcher.</p>
<p>There are multiple matcher definition macros that deal with polymorphic return
values and different parameter counts. See <a href="http://clang.llvm.org/doxygen/ASTMatchersMacros_8h.html">ASTMatchersMacros.h</a>.
</p></li>
<li><b>Matcher creation functions</b><p>Matchers are generated by nesting
calls to matcher creation functions. Most of the time those functions are either
created by using VariadicDynCastAllOfMatcher or the matcher creation macros
(see below). The free-standing functions are an indication that this matcher
is just a combination of other matchers, as is for example the case with
<a href="LibASTMatchersReference.html#callee1Anchor">callee</a>.</p></li>
</ul>
</div>
</body>
</html>

134
docs/LibASTMatchers.rst Normal file
Просмотреть файл

@ -0,0 +1,134 @@
======================
Matching the Clang AST
======================
This document explains how to use Clang's LibASTMatchers to match interesting
nodes of the AST and execute code that uses the matched nodes. Combined with
:doc:`LibTooling`, LibASTMatchers helps to write code-to-code transformation
tools or query tools.
We assume basic knowledge about the Clang AST. See the `Introduction to the
Clang AST <IntroductionToTheClangAST.html>`_ if you want to learn more about
how the AST is structured.
.. FIXME: create tutorial and link to the tutorial
Introduction
------------
LibASTMatchers provides a domain specific language to create predicates on
Clang's AST. This DSL is written in and can be used from C++, allowing users
to write a single program to both match AST nodes and access the node's C++
interface to extract attributes, source locations, or any other information
provided on the AST level.
AST matchers are predicates on nodes in the AST. Matchers are created by
calling creator functions that allow building up a tree of matchers, where
inner matchers are used to make the match more specific.
For example, to create a matcher that matches all class or union declarations
in the AST of a translation unit, you can call `recordDecl()
<LibASTMatchersReference.html#recordDecl0Anchor>`_. To narrow the match down,
for example to find all class or union declarations with the name "``Foo``",
insert a `hasName <LibASTMatchersReference.html#hasName0Anchor>`_ matcher: the
call ``recordDecl(hasName("Foo"))`` returns a matcher that matches classes or
unions that are named "``Foo``", in any namespace. By default, matchers that
accept multiple inner matchers use an implicit `allOf()
<LibASTMatchersReference.html#allOf0Anchor>`_. This allows further narrowing
down the match, for example to match all classes that are derived from
"``Bar``": ``recordDecl(hasName("Foo"), isDerivedFrom("Bar"))``.
How to create a matcher
-----------------------
With more than a thousand classes in the Clang AST, one can quickly get lost
when trying to figure out how to create a matcher for a specific pattern. This
section will teach you how to use a rigorous step-by-step pattern to build the
matcher you are interested in. Note that there will always be matchers missing
for some part of the AST. See the section about :ref:`how to write your own
AST matchers <astmatchers-writing>` later in this document.
.. FIXME: why is it linking back to the same section?!
The precondition to using the matchers is to understand how the AST for what you
want to match looks like. The
`Introduction to the Clang AST <IntroductionToTheClangAST.html>`_ teaches you
how to dump a translation unit's AST into a human readable format.
.. FIXME: Introduce link to ASTMatchersTutorial.html
.. FIXME: Introduce link to ASTMatchersCookbook.html
In general, the strategy to create the right matchers is:
#. Find the outermost class in Clang's AST you want to match.
#. Look at the `AST Matcher Reference <LibASTMatchersReference.html>`_ for
matchers that either match the node you're interested in or narrow down
attributes on the node.
#. Create your outer match expression. Verify that it works as expected.
#. Examine the matchers for what the next inner node you want to match is.
#. Repeat until the matcher is finished.
.. _astmatchers-bind:
Binding nodes in match expressions
----------------------------------
Matcher expressions allow you to specify which parts of the AST are interesting
for a certain task. Often you will want to then do something with the nodes
that were matched, like building source code transformations.
To that end, matchers that match specific AST nodes (so called node matchers)
are bindable; for example, ``recordDecl(hasName("MyClass")).bind("id")`` will
bind the matched ``recordDecl`` node to the string "``id``", to be later
retrieved in the `match callback
<http://clang.llvm.org/doxygen/classclang_1_1ast__matchers_1_1MatchFinder_1_1MatchCallback.html>`_.
.. FIXME: Introduce link to ASTMatchersTutorial.html
.. FIXME: Introduce link to ASTMatchersCookbook.html
Writing your own matchers
-------------------------
There are multiple different ways to define a matcher, depending on its type
and flexibility.
``VariadicDynCastAllOfMatcher<Base, Derived>``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Those match all nodes of type *Base* if they can be dynamically casted to
*Derived*. The names of those matchers are nouns, which closely resemble
*Derived*. ``VariadicDynCastAllOfMatchers`` are the backbone of the matcher
hierarchy. Most often, your match expression will start with one of them, and
you can :ref:`bind <astmatchers-bind>` the node they represent to ids for later
processing.
``VariadicDynCastAllOfMatchers`` are callable classes that model variadic
template functions in C++03. They take an aribtrary number of
``Matcher<Derived>`` and return a ``Matcher<Base>``.
``AST_MATCHER_P(Type, Name, ParamType, Param)``
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Most matcher definitions use the matcher creation macros. Those define both
the matcher of type ``Matcher<Type>`` itself, and a matcher-creation function
named *Name* that takes a parameter of type *ParamType* and returns the
corresponding matcher.
There are multiple matcher definition macros that deal with polymorphic return
values and different parameter counts. See `ASTMatchersMacros.h
<http://clang.llvm.org/doxygen/ASTMatchersMacros_8h.html>`_.
.. _astmatchers-writing:
Matcher creation functions
^^^^^^^^^^^^^^^^^^^^^^^^^^
Matchers are generated by nesting calls to matcher creation functions. Most of
the time those functions are either created by using
``VariadicDynCastAllOfMatcher`` or the matcher creation macros (see below).
The free-standing functions are an indication that this matcher is just a
combination of other matchers, as is for example the case with `callee
<LibASTMatchersReference.html#callee1Anchor>`_.
.. FIXME: "... macros (see below)" --- there isn't anything below

Просмотреть файл

@ -1,212 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>LibTooling</title>
<link type="text/css" rel="stylesheet" href="../menu.css">
<link type="text/css" rel="stylesheet" href="../content.css">
</head>
<body>
<!--#include virtual="../menu.html.incl"-->
<div id="content">
<h1>LibTooling</h1>
<p>LibTooling is a library to support writing standalone tools based on
Clang. This document will provide a basic walkthrough of how to write
a tool using LibTooling.</p>
<p>For the information on how to setup Clang Tooling for LLVM see
<a href="HowToSetupToolingForLLVM.html">HowToSetupToolingForLLVM.html</a></p>
<!-- ======================================================================= -->
<h2 id="intro">Introduction</h2>
<!-- ======================================================================= -->
<p>Tools built with LibTooling, like Clang Plugins, run
<code>FrontendActions</code> over code.
<!-- See FIXME for a tutorial on how to write FrontendActions. -->
In this tutorial, we'll demonstrate the different ways of running clang's
<code>SyntaxOnlyAction</code>, which runs a quick syntax check, over a bunch of
code.</p>
<!-- ======================================================================= -->
<h2 id="runoncode">Parsing a code snippet in memory.</h2>
<!-- ======================================================================= -->
<p>If you ever wanted to run a <code>FrontendAction</code> over some sample
code, for example to unit test parts of the Clang AST,
<code>runToolOnCode</code> is what you looked for. Let me give you an example:
<pre>
#include "clang/Tooling/Tooling.h"
TEST(runToolOnCode, CanSyntaxCheckCode) {
// runToolOnCode returns whether the action was correctly run over the
// given code.
EXPECT_TRUE(runToolOnCode(new clang::SyntaxOnlyAction, "class X {};"));
}
</pre>
<!-- ======================================================================= -->
<h2 id="standalonetool">Writing a standalone tool.</h2>
<!-- ======================================================================= -->
<p>Once you unit tested your <code>FrontendAction</code> to the point where it
cannot possibly break, it's time to create a standalone tool. For a standalone
tool to run clang, it first needs to figure out what command line arguments to
use for a specified file. To that end we create a
<code>CompilationDatabase</code>. There are different ways to create a
compilation database, and we need to support all of them depending on
command-line options. There's the <code>CommonOptionsParser</code> class
that takes the responsibility to parse command-line parameters related to
compilation databases and inputs, so that all tools share the implementation.
</p>
<h3 id="parsingcommonoptions">Parsing common tools options.</h3>
<p><code>CompilationDatabase</code> can be read from a build directory or the
command line. Using <code>CommonOptionsParser</code> allows for explicit
specification of a compile command line, specification of build path using the
<code>-p</code> command-line option, and automatic location of the compilation
database using source files paths.
<pre>
#include "clang/Tooling/CommonOptionsParser.h"
using namespace clang::tooling;
int main(int argc, const char **argv) {
// CommonOptionsParser constructor will parse arguments and create a
// CompilationDatabase. In case of error it will terminate the program.
CommonOptionsParser OptionsParser(argc, argv);
// Use OptionsParser.GetCompilations() and OptionsParser.GetSourcePathList()
// to retrieve CompilationDatabase and the list of input file paths.
}
</pre>
</p>
<h3 id="tool">Creating and running a ClangTool.</h3>
<p>Once we have a <code>CompilationDatabase</code>, we can create a
<code>ClangTool</code> and run our <code>FrontendAction</code> over some code.
For example, to run the <code>SyntaxOnlyAction</code> over the files "a.cc" and
"b.cc" one would write:
<pre>
// A clang tool can run over a number of sources in the same process...
std::vector&lt;std::string> Sources;
Sources.push_back("a.cc");
Sources.push_back("b.cc");
// We hand the CompilationDatabase we created and the sources to run over into
// the tool constructor.
ClangTool Tool(OptionsParser.GetCompilations(), Sources);
// The ClangTool needs a new FrontendAction for each translation unit we run
// on. Thus, it takes a FrontendActionFactory as parameter. To create a
// FrontendActionFactory from a given FrontendAction type, we call
// newFrontendActionFactory&lt;clang::SyntaxOnlyAction>().
int result = Tool.run(newFrontendActionFactory&lt;clang::SyntaxOnlyAction>());
</pre>
</p>
<h3 id="main">Putting it together - the first tool.</h3>
<p>Now we combine the two previous steps into our first real tool. This example
tool is also checked into the clang tree at tools/clang-check/ClangCheck.cpp.
<pre>
// Declares clang::SyntaxOnlyAction.
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
// Declares llvm::cl::extrahelp.
#include "llvm/Support/CommandLine.h"
using namespace clang::tooling;
using namespace llvm;
// CommonOptionsParser declares HelpMessage with a description of the common
// command-line options related to the compilation database and input files.
// It's nice to have this help message in all tools.
static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
// A help message for this specific tool can be added afterwards.
static cl::extrahelp MoreHelp("\nMore help text...");
int main(int argc, const char **argv) {
CommonOptionsParser OptionsParser(argc, argv);
ClangTool Tool(OptionsParser.GetCompilations(),
OptionsParser.GetSourcePathList());
return Tool.run(newFrontendActionFactory&lt;clang::SyntaxOnlyAction&gt;());
}
</pre>
</p>
<h3 id="running">Running the tool on some code.</h3>
<p>When you check out and build clang, clang-check is already built and
available to you in bin/clang-check inside your build directory.</p>
<p>You can run clang-check on a file in the llvm repository by specifying
all the needed parameters after a "--" separator:
<pre>
$ cd /path/to/source/llvm
$ export BD=/path/to/build/llvm
$ $BD/bin/clang-check tools/clang/tools/clang-check/ClangCheck.cpp -- \
clang++ -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS \
-Itools/clang/include -I$BD/include -Iinclude -Itools/clang/lib/Headers -c
</pre>
</p>
<p>As an alternative, you can also configure cmake to output a compile command
database into its build directory:
<pre>
# Alternatively to calling cmake, use ccmake, toggle to advanced mode and
# set the parameter CMAKE_EXPORT_COMPILE_COMMANDS from the UI.
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
</pre>
</p>
<p>
This creates a file called compile_commands.json in the build directory. Now
you can run clang-check over files in the project by specifying the build path
as first argument and some source files as further positional arguments:
<pre>
$ cd /path/to/source/llvm
$ export BD=/path/to/build/llvm
$ $BD/bin/clang-check -p $BD tools/clang/tools/clang-check/ClangCheck.cpp
</pre>
</p>
<h3 id="builtin">Builtin includes.</h3>
<p>Clang tools need their builtin headers and search for them the same way clang
does. Thus, the default location to look for builtin headers is in a path
$(dirname /path/to/tool)/../lib/clang/3.2/include relative to the tool
binary. This works out-of-the-box for tools running from llvm's toplevel
binary directory after building clang-headers, or if the tool is running
from the binary directory of a clang install next to the clang binary.</p>
<p>Tips: if your tool fails to find stddef.h or similar headers, call
the tool with -v and look at the search paths it looks through.</p>
<h3 id="linking">Linking.</h3>
<p>Please note that this presents the linking requirements at the time of this
writing. For the most up-to-date information, look at one of the tools'
Makefiles (for example
<a href="http://llvm.org/viewvc/llvm-project/cfe/trunk/tools/clang-check/Makefile?view=markup">clang-check/Makefile</a>).
</p>
<p>To link a binary using the tooling infrastructure, link in the following
libraries:
<ul>
<li>Tooling</li>
<li>Frontend</li>
<li>Driver</li>
<li>Serialization</li>
<li>Parse</li>
<li>Sema</li>
<li>Analysis</li>
<li>Edit</li>
<li>AST</li>
<li>Lex</li>
<li>Basic</li>
</ul>
</p>
</div>
</body>
</html>

206
docs/LibTooling.rst Normal file
Просмотреть файл

@ -0,0 +1,206 @@
==========
LibTooling
==========
LibTooling is a library to support writing standalone tools based on Clang.
This document will provide a basic walkthrough of how to write a tool using
LibTooling.
For the information on how to setup Clang Tooling for LLVM see
`HowToSetupToolingForLLVM.html <HowToSetupToolingForLLVM.html>`_
Introduction
------------
Tools built with LibTooling, like Clang Plugins, run ``FrontendActions`` over
code.
.. See FIXME for a tutorial on how to write FrontendActions.
In this tutorial, we'll demonstrate the different ways of running Clang's
``SyntaxOnlyAction``, which runs a quick syntax check, over a bunch of code.
Parsing a code snippet in memory
--------------------------------
If you ever wanted to run a ``FrontendAction`` over some sample code, for
example to unit test parts of the Clang AST, ``runToolOnCode`` is what you
looked for. Let me give you an example:
.. code-block:: c++
#include "clang/Tooling/Tooling.h"
TEST(runToolOnCode, CanSyntaxCheckCode) {
// runToolOnCode returns whether the action was correctly run over the
// given code.
EXPECT_TRUE(runToolOnCode(new clang::SyntaxOnlyAction, "class X {};"));
}
Writing a standalone tool
-------------------------
Once you unit tested your ``FrontendAction`` to the point where it cannot
possibly break, it's time to create a standalone tool. For a standalone tool
to run clang, it first needs to figure out what command line arguments to use
for a specified file. To that end we create a ``CompilationDatabase``. There
are different ways to create a compilation database, and we need to support all
of them depending on command-line options. There's the ``CommonOptionsParser``
class that takes the responsibility to parse command-line parameters related to
compilation databases and inputs, so that all tools share the implementation.
Parsing common tools options
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
``CompilationDatabase`` can be read from a build directory or the command line.
Using ``CommonOptionsParser`` allows for explicit specification of a compile
command line, specification of build path using the ``-p`` command-line option,
and automatic location of the compilation database using source files paths.
.. code-block:: c++
#include "clang/Tooling/CommonOptionsParser.h"
using namespace clang::tooling;
int main(int argc, const char **argv) {
// CommonOptionsParser constructor will parse arguments and create a
// CompilationDatabase. In case of error it will terminate the program.
CommonOptionsParser OptionsParser(argc, argv);
// Use OptionsParser.GetCompilations() and OptionsParser.GetSourcePathList()
// to retrieve CompilationDatabase and the list of input file paths.
}
Creating and running a ClangTool
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Once we have a ``CompilationDatabase``, we can create a ``ClangTool`` and run
our ``FrontendAction`` over some code. For example, to run the
``SyntaxOnlyAction`` over the files "a.cc" and "b.cc" one would write:
.. code-block:: c++
// A clang tool can run over a number of sources in the same process...
std::vector<std::string> Sources;
Sources.push_back("a.cc");
Sources.push_back("b.cc");
// We hand the CompilationDatabase we created and the sources to run over into
// the tool constructor.
ClangTool Tool(OptionsParser.GetCompilations(), Sources);
// The ClangTool needs a new FrontendAction for each translation unit we run
// on. Thus, it takes a FrontendActionFactory as parameter. To create a
// FrontendActionFactory from a given FrontendAction type, we call
// newFrontendActionFactory<clang::SyntaxOnlyAction>().
int result = Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
Putting it together --- the first tool
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Now we combine the two previous steps into our first real tool. This example
tool is also checked into the clang tree at
``tools/clang-check/ClangCheck.cpp``.
.. code-block:: c++
// Declares clang::SyntaxOnlyAction.
#include "clang/Frontend/FrontendActions.h"
#include "clang/Tooling/CommonOptionsParser.h"
#include "clang/Tooling/Tooling.h"
// Declares llvm::cl::extrahelp.
#include "llvm/Support/CommandLine.h"
using namespace clang::tooling;
using namespace llvm;
// CommonOptionsParser declares HelpMessage with a description of the common
// command-line options related to the compilation database and input files.
// It's nice to have this help message in all tools.
static cl::extrahelp CommonHelp(CommonOptionsParser::HelpMessage);
// A help message for this specific tool can be added afterwards.
static cl::extrahelp MoreHelp("\nMore help text...");
int main(int argc, const char **argv) {
CommonOptionsParser OptionsParser(argc, argv);
ClangTool Tool(OptionsParser.GetCompilations(),
OptionsParser.GetSourcePathList());
return Tool.run(newFrontendActionFactory<clang::SyntaxOnlyAction>());
}
Running the tool on some code
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
When you check out and build clang, clang-check is already built and available
to you in bin/clang-check inside your build directory.
You can run clang-check on a file in the llvm repository by specifying all the
needed parameters after a "``--``" separator:
.. code-block:: bash
$ cd /path/to/source/llvm
$ export BD=/path/to/build/llvm
$ $BD/bin/clang-check tools/clang/tools/clang-check/ClangCheck.cpp -- \
clang++ -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS \
-Itools/clang/include -I$BD/include -Iinclude \
-Itools/clang/lib/Headers -c
As an alternative, you can also configure cmake to output a compile command
database into its build directory:
.. code-block:: bash
# Alternatively to calling cmake, use ccmake, toggle to advanced mode and
# set the parameter CMAKE_EXPORT_COMPILE_COMMANDS from the UI.
$ cmake -DCMAKE_EXPORT_COMPILE_COMMANDS=ON .
This creates a file called ``compile_commands.json`` in the build directory.
Now you can run :program:`clang-check` over files in the project by specifying
the build path as first argument and some source files as further positional
arguments:
.. code-block:: bash
$ cd /path/to/source/llvm
$ export BD=/path/to/build/llvm
$ $BD/bin/clang-check -p $BD tools/clang/tools/clang-check/ClangCheck.cpp
Builtin includes
^^^^^^^^^^^^^^^^
Clang tools need their builtin headers and search for them the same way Clang
does. Thus, the default location to look for builtin headers is in a path
``$(dirname /path/to/tool)/../lib/clang/3.2/include`` relative to the tool
binary. This works out-of-the-box for tools running from llvm's toplevel
binary directory after building clang-headers, or if the tool is running from
the binary directory of a clang install next to the clang binary.
Tips: if your tool fails to find ``stddef.h`` or similar headers, call the tool
with ``-v`` and look at the search paths it looks through.
Linking
^^^^^^^
Please note that this presents the linking requirements at the time of this
writing. For the most up-to-date information, look at one of the tools'
Makefiles (for example `clang-check/Makefile
<http://llvm.org/viewvc/llvm-project/cfe/trunk/tools/clang-check/Makefile?view=markup>`_).
To link a binary using the tooling infrastructure, link in the following
libraries:
* Tooling
* Frontend
* Driver
* Serialization
* Parse
* Sema
* Analysis
* Edit
* AST
* Lex
* Basic

Просмотреть файл

@ -1,658 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Precompiled Header and Modules Internals</title>
<link type="text/css" rel="stylesheet" href="../menu.css">
<link type="text/css" rel="stylesheet" href="../content.css">
<style type="text/css">
td {
vertical-align: top;
}
</style>
</head>
<body>
<!--#include virtual="../menu.html.incl"-->
<div id="content">
<h1>Precompiled Header and Modules Internals</h1>
<p>This document describes the design and implementation of Clang's
precompiled headers (PCH) and modules. If you are interested in the end-user
view, please see the <a
href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
<p><b>Table of Contents</b></p>
<ul>
<li><a href="#usage">Using Precompiled Headers with
<tt>clang</tt></a></li>
<li><a href="#philosophy">Design Philosophy</a></li>
<li><a href="#contents">Serialized AST File Contents</a>
<ul>
<li><a href="#metadata">Metadata Block</a></li>
<li><a href="#sourcemgr">Source Manager Block</a></li>
<li><a href="#preprocessor">Preprocessor Block</a></li>
<li><a href="#types">Types Block</a></li>
<li><a href="#decls">Declarations Block</a></li>
<li><a href="#stmt">Statements and Expressions</a></li>
<li><a href="#idtable">Identifier Table Block</a></li>
<li><a href="#method-pool">Method Pool Block</a></li>
</ul>
</li>
<li><a href="#tendrils">AST Reader Integration Points</a></li>
<li><a href="#chained">Chained precompiled headers</a></li>
<li><a href="#modules">Modules</a></li>
</ul>
<h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2>
<p>The Clang compiler frontend, <tt>clang -cc1</tt>, supports two command line
options for generating and using PCH files.<p>
<p>To generate PCH files using <tt>clang -cc1</tt>, use the option
<b><tt>-emit-pch</tt></b>:
<pre> $ clang -cc1 test.h -emit-pch -o test.h.pch </pre>
<p>This option is transparently used by <tt>clang</tt> when generating
PCH files. The resulting PCH file contains the serialized form of the
compiler's internal representation after it has completed parsing and
semantic analysis. The PCH file can then be used as a prefix header
with the <b><tt>-include-pch</tt></b> option:</p>
<pre>
$ clang -cc1 -include-pch test.h.pch test.c -o test.s
</pre>
<h2 id="philosophy">Design Philosophy</h2>
<p>Precompiled headers are meant to improve overall compile times for
projects, so the design of precompiled headers is entirely driven by
performance concerns. The use case for precompiled headers is
relatively simple: when there is a common set of headers that is
included in nearly every source file in the project, we
<i>precompile</i> that bundle of headers into a single precompiled
header (PCH file). Then, when compiling the source files in the
project, we load the PCH file first (as a prefix header), which acts
as a stand-in for that bundle of headers.</p>
<p>A precompiled header implementation improves performance when:</p>
<ul>
<li>Loading the PCH file is significantly faster than re-parsing the
bundle of headers stored within the PCH file. Thus, a precompiled
header design attempts to minimize the cost of reading the PCH
file. Ideally, this cost should not vary with the size of the
precompiled header file.</li>
<li>The cost of generating the PCH file initially is not so large
that it counters the per-source-file performance improvement due to
eliminating the need to parse the bundled headers in the first
place. This is particularly important on multi-core systems, because
PCH file generation serializes the build when all compilations
require the PCH file to be up-to-date.</li>
</ul>
<p>Modules, as implemented in Clang, use the same mechanisms as
precompiled headers to save a serialized AST file (one per module) and
use those AST modules. From an implementation standpoint, modules are
a generalization of precompiled headers, lifting a number of
restrictions placed on precompiled headers. In particular, there can
only be one precompiled header and it must be included at the
beginning of the translation unit. The extensions to the AST file
format required for modules are discussed in the section on <a href="#modules">modules</a>.</p>
<p>Clang's AST files are designed with a compact on-disk
representation, which minimizes both creation time and the time
required to initially load the AST file. The AST file itself contains
a serialized representation of Clang's abstract syntax trees and
supporting data structures, stored using the same compressed bitstream
as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode
file format</a>.</p>
<p>Clang's AST files are loaded "lazily" from disk. When an
AST file is initially loaded, Clang reads only a small amount of data
from the AST file to establish where certain important data structures
are stored. The amount of data read in this initial load is
independent of the size of the AST file, such that a larger AST file
does not lead to longer AST load times. The actual header data in the
AST file--macros, functions, variables, types, etc.--is loaded only
when it is referenced from the user's code, at which point only that
entity (and those entities it depends on) are deserialized from the
AST file. With this approach, the cost of using an AST file
for a translation unit is proportional to the amount of code actually
used from the AST file, rather than being proportional to the size of
the AST file itself.</p>
<p>When given the <code>-print-stats</code> option, Clang produces
statistics describing how much of the AST file was actually
loaded from disk. For a simple "Hello, World!" program that includes
the Apple <code>Cocoa.h</code> header (which is built as a precompiled
header), this option illustrates how little of the actual precompiled
header is required:</p>
<pre>
*** PCH Statistics:
933 stat cache hits
4 stat cache misses
895/39981 source location entries read (2.238563%)
19/15315 types read (0.124061%)
20/82685 declarations read (0.024188%)
154/58070 identifiers read (0.265197%)
0/7260 selectors read (0.000000%)
0/30842 statements read (0.000000%)
4/8400 macros read (0.047619%)
1/4995 lexical declcontexts read (0.020020%)
0/4413 visible declcontexts read (0.000000%)
0/7230 method pool entries read (0.000000%)
0 method pool misses
</pre>
<p>For this small program, only a tiny fraction of the source
locations, types, declarations, identifiers, and macros were actually
deserialized from the precompiled header. These statistics can be
useful to determine whether the AST file implementation can
be improved by making more of the implementation lazy.</p>
<p>Precompiled headers can be chained. When you create a PCH while
including an existing PCH, Clang can create the new PCH by referencing
the original file and only writing the new data to the new file. For
example, you could create a PCH out of all the headers that are very
commonly used throughout your project, and then create a PCH for every
single source file in the project that includes the code that is
specific to that file, so that recompiling the file itself is very fast,
without duplicating the data from the common headers for every
file. The mechanisms behind chained precompiled headers are discussed
in a <a href="#chained">later section</a>.
<h2 id="contents">AST File Contents</h2>
<img src="PCHLayout.png" style="float:right" alt="Precompiled header layout">
<p>Clang's AST files are organized into several different
blocks, each of which contains the serialized representation of a part
of Clang's internal representation. Each of the blocks corresponds to
either a block or a record within <a
href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitstream
format</a>. The contents of each of these logical blocks are described
below.</p>
<p>For a given AST file, the <a
href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a>
utility can be used to examine the actual structure of the bitstream
for the AST file. This information can be used both to help
understand the structure of the AST file and to isolate
areas where AST files can still be optimized, e.g., through
the introduction of abbreviations.</p>
<h3 id="metadata">Metadata Block</h3>
<p>The metadata block contains several records that provide
information about how the AST file was built. This metadata
is primarily used to validate the use of an AST file. For
example, a precompiled header built for a 32-bit x86 target cannot be used
when compiling for a 64-bit x86 target. The metadata block contains
information about:</p>
<dl>
<dt>Language options</dt>
<dd>Describes the particular language dialect used to compile the
AST file, including major options (e.g., Objective-C support) and more
minor options (e.g., support for "//" comments). The contents of this
record correspond to the <code>LangOptions</code> class.</dd>
<dt>Target architecture</dt>
<dd>The target triple that describes the architecture, platform, and
ABI for which the AST file was generated, e.g.,
<code>i386-apple-darwin9</code>.</dd>
<dt>AST version</dt>
<dd>The major and minor version numbers of the AST file
format. Changes in the minor version number should not affect backward
compatibility, while changes in the major version number imply that a
newer compiler cannot read an older precompiled header (and
vice-versa).</dd>
<dt>Original file name</dt>
<dd>The full path of the header that was used to generate the
AST file.</dd>
<dt>Predefines buffer</dt>
<dd>Although not explicitly stored as part of the metadata, the
predefines buffer is used in the validation of the AST file.
The predefines buffer itself contains code generated by the compiler
to initialize the preprocessor state according to the current target,
platform, and command-line options. For example, the predefines buffer
will contain "<code>#define __STDC__ 1</code>" when we are compiling C
without Microsoft extensions. The predefines buffer itself is stored
within the <a href="#sourcemgr">source manager block</a>, but its
contents are verified along with the rest of the metadata.</dd>
</dl>
<p>A chained PCH file (that is, one that references another PCH) and a
module (which may import other modules) have additional metadata
containing the list of all AST files that this AST file depends
on. Each of those files will be loaded along with this AST file.</p>
<p>For chained precompiled headers, the language options, target
architecture and predefines buffer data is taken from the end of the
chain, since they have to match anyway.</p>
<h3 id="sourcemgr">Source Manager Block</h3>
<p>The source manager block contains the serialized representation of
Clang's <a
href="InternalsManual.html#SourceLocation">SourceManager</a> class,
which handles the mapping from source locations (as represented in
Clang's abstract syntax tree) into actual column/line positions within
a source file or macro instantiation. The AST file's
representation of the source manager also includes information about
all of the headers that were (transitively) included when building the
AST file.</p>
<p>The bulk of the source manager block is dedicated to information
about the various files, buffers, and macro instantiations into which
a source location can refer. Each of these is referenced by a numeric
"file ID", which is a unique number (allocated starting at 1) stored
in the source location. Clang serializes the information for each kind
of file ID, along with an index that maps file IDs to the position
within the AST file where the information about that file ID is
stored. The data associated with a file ID is loaded only when
required by the front end, e.g., to emit a diagnostic that includes a
macro instantiation history inside the header itself.</p>
<p>The source manager block also contains information about all of the
headers that were included when building the AST file. This
includes information about the controlling macro for the header (e.g.,
when the preprocessor identified that the contents of the header
dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>)
along with a cached version of the results of the <code>stat()</code>
system calls performed when building the AST file. The
latter is particularly useful in reducing system time when searching
for include files.</p>
<h3 id="preprocessor">Preprocessor Block</h3>
<p>The preprocessor block contains the serialized representation of
the preprocessor. Specifically, it contains all of the macros that
have been defined by the end of the header used to build the
AST file, along with the token sequences that comprise each
macro. The macro definitions are only read from the AST file when the
name of the macro first occurs in the program. This lazy loading of
macro definitions is triggered by lookups into the <a
href="#idtable">identifier table</a>.</p>
<h3 id="types">Types Block</h3>
<p>The types block contains the serialized representation of all of
the types referenced in the translation unit. Each Clang type node
(<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a
corresponding record type in the AST file. When types are deserialized
from the AST file, the data within the record is used to
reconstruct the appropriate type node using the AST context.</p>
<p>Each type has a unique type ID, which is an integer that uniquely
identifies that type. Type ID 0 represents the NULL type, type IDs
less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types
(<code>void</code>, <code>float</code>, etc.), while other
"user-defined" type IDs are assigned consecutively from
<code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered.
The AST file has an associated mapping from the user-defined types
block to the location within the types block where the serialized
representation of that type resides, enabling lazy deserialization of
types. When a type is referenced from within the AST file, that
reference is encoded using the type ID shifted left by 3 bits. The
lower three bits are used to represent the <code>const</code>,
<code>volatile</code>, and <code>restrict</code> qualifiers, as in
Clang's <a
href="http://clang.llvm.org/docs/InternalsManual.html#Type">QualType</a>
class.</p>
<h3 id="decls">Declarations Block</h3>
<p>The declarations block contains the serialized representation of
all of the declarations referenced in the translation unit. Each Clang
declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>,
etc.) has a corresponding record type in the AST file. When
declarations are deserialized from the AST file, the data
within the record is used to build and populate a new instance of the
corresponding <code>Decl</code> node. As with types, each declaration
node has a numeric ID that is used to refer to that declaration within
the AST file. In addition, a lookup table provides a mapping from that
numeric ID to the offset within the precompiled header where that
declaration is described.</p>
<p>Declarations in Clang's abstract syntax trees are stored
hierarchically. At the top of the hierarchy is the translation unit
(<code>TranslationUnitDecl</code>), which contains all of the
declarations in the translation unit but is not actually written as a
specific declaration node. Its child declarations (such as
functions or struct types) may also contain other declarations inside
them, and so on. Within Clang, each declaration is stored within a <a
href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration
context</a>, as represented by the <code>DeclContext</code> class.
Declaration contexts provide the mechanism to perform name lookup
within a given declaration (e.g., find the member named <code>x</code>
in a structure) and iterate over the declarations stored within a
context (e.g., iterate over all of the fields of a structure for
structure layout).</p>
<p>In Clang's AST file format, deserializing a declaration
that is a <code>DeclContext</code> is a separate operation from
deserializing all of the declarations stored within that declaration
context. Therefore, Clang will deserialize the translation unit
declaration without deserializing the declarations within that
translation unit. When required, the declarations stored within a
declaration context will be deserialized. There are two representations
of the declarations within a declaration context, which correspond to
the name-lookup and iteration behavior described above:</p>
<ul>
<li>When the front end performs name lookup to find a name
<code>x</code> within a given declaration context (for example,
during semantic analysis of the expression <code>p-&gt;x</code>,
where <code>p</code>'s type is defined in the precompiled header),
Clang refers to an on-disk hash table that maps from the names
within that declaration context to the declaration IDs that
represent each visible declaration with that name. The actual
declarations will then be deserialized to provide the results of
name lookup.</li>
<li>When the front end performs iteration over all of the
declarations within a declaration context, all of those declarations
are immediately de-serialized. For large declaration contexts (e.g.,
the translation unit), this operation is expensive; however, large
declaration contexts are not traversed in normal compilation, since
such a traversal is unnecessary. However, it is common for the code
generator and semantic analysis to traverse declaration contexts for
structs, classes, unions, and enumerations, although those contexts
contain relatively few declarations in the common case.</li>
</ul>
<h3 id="stmt">Statements and Expressions</h3>
<p>Statements and expressions are stored in the AST file in
both the <a href="#types">types</a> and the <a
href="#decls">declarations</a> blocks, because every statement or
expression will be associated with either a type or declaration. The
actual statement and expression records are stored immediately
following the declaration or type that owns the statement or
expression. For example, the statement representing the body of a
function will be stored directly following the declaration of the
function.</p>
<p>As with types and declarations, each statement and expression kind
in Clang's abstract syntax tree (<code>ForStmt</code>,
<code>CallExpr</code>, etc.) has a corresponding record type in the
AST file, which contains the serialized representation of
that statement or expression. Each substatement or subexpression
within an expression is stored as a separate record (which keeps most
records to a fixed size). Within the AST file, the
subexpressions of an expression are stored, in reverse order, prior to the expression
that owns those expression, using a form of <a
href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse
Polish Notation</a>. For example, an expression <code>3 - 4 + 5</code>
would be represented as follows:</p>
<table border="1">
<tr><td><code>IntegerLiteral(5)</code></td></tr>
<tr><td><code>IntegerLiteral(4)</code></td></tr>
<tr><td><code>IntegerLiteral(3)</code></td></tr>
<tr><td><code>BinaryOperator(-)</code></td></tr>
<tr><td><code>BinaryOperator(+)</code></td></tr>
<tr><td>STOP</td></tr>
</table>
<p>When reading this representation, Clang evaluates each expression
record it encounters, builds the appropriate abstract syntax tree node,
and then pushes that expression on to a stack. When a record contains <i>N</i>
subexpressions--<code>BinaryOperator</code> has two of them--those
expressions are popped from the top of the stack. The special STOP
code indicates that we have reached the end of a serialized expression
or statement; other expression or statement records may follow, but
they are part of a different expression.</p>
<h3 id="idtable">Identifier Table Block</h3>
<p>The identifier table block contains an on-disk hash table that maps
each identifier mentioned within the AST file to the
serialized representation of the identifier's information (e.g, the
<code>IdentifierInfo</code> structure). The serialized representation
contains:</p>
<ul>
<li>The actual identifier string.</li>
<li>Flags that describe whether this identifier is the name of a
built-in, a poisoned identifier, an extension token, or a
macro.</li>
<li>If the identifier names a macro, the offset of the macro
definition within the <a href="#preprocessor">preprocessor
block</a>.</li>
<li>If the identifier names one or more declarations visible from
translation unit scope, the <a href="#decls">declaration IDs</a> of these
declarations.</li>
</ul>
<p>When an AST file is loaded, the AST file reader
mechanism introduces itself into the identifier table as an external
lookup source. Thus, when the user program refers to an identifier
that has not yet been seen, Clang will perform a lookup into the
identifier table. If an identifier is found, its contents (macro
definitions, flags, top-level declarations, etc.) will be
deserialized, at which point the corresponding
<code>IdentifierInfo</code> structure will have the same contents it
would have after parsing the headers in the AST file.</p>
<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
hash table where that identifier is stored. This mapping is used when
deserializing the name of a declaration, the identifier of a token, or
any other construct in the AST file that refers to a name.</p>
<h3 id="method-pool">Method Pool Block</h3>
<p>The method pool block is represented as an on-disk hash table that
serves two purposes: it provides a mapping from the names of
Objective-C selectors to the set of Objective-C instance and class
methods that have that particular selector (which is required for
semantic analysis in Objective-C) and also stores all of the selectors
used by entities within the AST file. The design of the
method pool is similar to that of the <a href="#idtable">identifier
table</a>: the first time a particular selector is formed during the
compilation of the program, Clang will search in the on-disk hash
table of selectors; if found, Clang will read the Objective-C methods
associated with that selector into the appropriate front-end data
structure (<code>Sema::InstanceMethodPool</code> and
<code>Sema::FactoryMethodPool</code> for instance and class methods,
respectively).</p>
<p>As with identifiers, selectors are represented by numeric values
within the AST file. A separate index maps these numeric selector
values to the offset of the selector within the on-disk hash table,
and will be used when de-serializing an Objective-C method declaration
(or other Objective-C construct) that refers to the selector.</p>
<h2 id="tendrils">AST Reader Integration Points</h2>
<p>The "lazy" deserialization behavior of AST files requires
their integration into several completely different submodules of
Clang. For example, lazily deserializing the declarations during name
lookup requires that the name-lookup routines be able to query the
AST file to find entities stored there.</p>
<p>For each Clang data structure that requires direct interaction with
the AST reader logic, there is an abstract class that provides
the interface between the two modules. The <code>ASTReader</code>
class, which handles the loading of an AST file, inherits
from all of these abstract classes to provide lazy deserialization of
Clang's data structures. <code>ASTReader</code> implements the
following abstract classes:</p>
<dl>
<dt><code>StatSysCallCache</code></dt>
<dd>This abstract interface is associated with the
<code>FileManager</code> class, and is used whenever the file
manager is going to perform a <code>stat()</code> system call.</dd>
<dt><code>ExternalSLocEntrySource</code></dt>
<dd>This abstract interface is associated with the
<code>SourceManager</code> class, and is used whenever the
<a href="#sourcemgr">source manager</a> needs to load the details
of a file, buffer, or macro instantiation.</dd>
<dt><code>IdentifierInfoLookup</code></dt>
<dd>This abstract interface is associated with the
<code>IdentifierTable</code> class, and is used whenever the
program source refers to an identifier that has not yet been seen.
In this case, the AST reader searches for
this identifier within its <a href="#idtable">identifier table</a>
to load any top-level declarations or macros associated with that
identifier.</dd>
<dt><code>ExternalASTSource</code></dt>
<dd>This abstract interface is associated with the
<code>ASTContext</code> class, and is used whenever the abstract
syntax tree nodes need to loaded from the AST file. It
provides the ability to de-serialize declarations and types
identified by their numeric values, read the bodies of functions
when required, and read the declarations stored within a
declaration context (either for iteration or for name lookup).</dd>
<dt><code>ExternalSemaSource</code></dt>
<dd>This abstract interface is associated with the <code>Sema</code>
class, and is used whenever semantic analysis needs to read
information from the <a href="#methodpool">global method
pool</a>.</dd>
</dl>
<h2 id="chained">Chained precompiled headers</h2>
<p>Chained precompiled headers were initially intended to improve the
performance of IDE-centric operations such as syntax highlighting and
code completion while a particular source file is being edited by the
user. To minimize the amount of reparsing required after a change to
the file, a form of precompiled header--called a precompiled
<i>preamble</i>--is automatically generated by parsing all of the
headers in the source file, up to and including the last
#include. When only the source file changes (and none of the headers
it depends on), reparsing of that source file can use the precompiled
preamble and start parsing after the #includes, so parsing time is
proportional to the size of the source file (rather than all of its
includes). However, the compilation of that translation unit
may already use a precompiled header: in this case, Clang will create
the precompiled preamble as a chained precompiled header that refers
to the original precompiled header. This drastically reduces the time
needed to serialize the precompiled preamble for use in reparsing.</p>
<p>Chained precompiled headers get their name because each precompiled header
can depend on one other precompiled header, forming a chain of
dependencies. A translation unit will then include the precompiled
header that starts the chain (i.e., nothing depends on it). This
linearity of dependencies is important for the semantic model of
chained precompiled headers, because the most-recent precompiled
header can provide information that overrides the information provided
by the precompiled headers it depends on, just like a header file
<code>B.h</code> that includes another header <code>A.h</code> can
modify the state produced by parsing <code>A.h</code>, e.g., by
<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p>
<p>There are several ways in which chained precompiled headers
generalize the AST file model:</p>
<dl>
<dt>Numbering of IDs</dt>
<dd>Many different kinds of entities--identifiers, declarations,
types, etc.---have ID numbers that start at 1 or some other
predefined constant and grow upward. Each precompiled header records
the maximum ID number it has assigned in each category. Then, when a
new precompiled header is generated that depends on (chains to)
another precompiled header, it will start counting at the next
available ID number. This way, one can determine, given an ID
number, which AST file actually contains the entity.</dd>
<dt>Name lookup</dt>
<dd>When writing a chained precompiled header, Clang attempts to
write only information that has changed from the precompiled header
on which it is based. This changes the lookup algorithm for the
various tables, such as the <a href="#idtable">identifier table</a>:
the search starts at the most-recent precompiled header. If no entry
is found, lookup then proceeds to the identifier table in the
precompiled header it depends on, and so one. Once a lookup
succeeds, that result is considered definitive, overriding any
results from earlier precompiled headers.</dd>
<dt>Update records</dt>
<dd>There are various ways in which a later precompiled header can
modify the entities described in an earlier precompiled header. For
example, later precompiled headers can add entries into the various
name-lookup tables for the translation unit or namespaces, or add
new categories to an Objective-C class. Each of these updates is
captured in an "update record" that is stored in the chained
precompiled header file and will be loaded along with the original
entity.</dd>
</dl>
<h2 id="modules">Modules</h2>
<p>Modules generalize the chained precompiled header model yet
further, from a linear chain of precompiled headers to an arbitrary
directed acyclic graph (DAG) of AST files. All of the same techniques
used to make chained precompiled headers work---ID number, name
lookup, update records---are shared with modules. However, the DAG
nature of modules introduce a number of additional complications to
the model:
<dl>
<dt>Numbering of IDs</dt>
<dd>The simple, linear numbering scheme used in chained precompiled
headers falls apart with the module DAG, because different modules
may end up with different numbering schemes for entities they
imported from common shared modules. To account for this, each
module file provides information about which modules it depends on
and which ID numbers it assigned to the entities in those modules,
as well as which ID numbers it took for its own new entities. The
AST reader then maps these "local" ID numbers into a "global" ID
number space for the current translation unit, providing a 1-1
mapping between entities (in whatever AST file they inhabit) and
global ID numbers. If that translation unit is then serialized into
an AST file, this mapping will be stored for use when the AST file
is imported.</dd>
<dt>Declaration merging</dt>
<dd>It is possible for a given entity (from the language's
perspective) to be declared multiple times in different places. For
example, two different headers can have the declaration of
<tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If
each of those headers is included in a module, and some third party
imports both of those modules, there is a potentially serious
problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will
find both declarations, but the AST nodes are unrelated. This would
result in a compilation error, due to an ambiguity in name
lookup. Therefore, the AST reader performs declaration merging
according to the appropriate language semantics, ensuring that the
two disjoint declarations are merged into a single redeclaration
chain (with a common canonical declaration), so that it is as if one
of the headers had been included before the other.</dd>
<dt>Name Visibility</dt>
<dd>Modules allow certain names that occur during module creation to
be "hidden", so that they are not part of the public interface of
the module and are not visible to its clients. The AST reader
maintains a "visible" bit on various AST nodes (declarations, macros,
etc.) to indicate whether that particular AST node is currently
visible; the various name lookup mechanisms in Clang inspect the
visible bit to determine whether that entity, which is still in the
AST (because other, visible AST nodes may depend on it), can
actually be found by name lookup. When a new (sub)module is
imported, it may make existing, non-visible, already-deserialized
AST nodes visible; it is the responsibility of the AST reader to
find and update these AST nodes when it is notified of the import.</dd>
</dl>
</div>
</body>
</html>

573
docs/PCHInternals.rst Normal file
Просмотреть файл

@ -0,0 +1,573 @@
========================================
Precompiled Header and Modules Internals
========================================
.. contents::
:local:
This document describes the design and implementation of Clang's precompiled
headers (PCH) and modules. If you are interested in the end-user view, please
see the `User's Manual <UsersManual.html#precompiledheaders>`_.
Using Precompiled Headers with ``clang``
----------------------------------------
The Clang compiler frontend, ``clang -cc1``, supports two command line options
for generating and using PCH files.
To generate PCH files using ``clang -cc1``, use the option :option:`-emit-pch`:
.. code-block:: bash
$ clang -cc1 test.h -emit-pch -o test.h.pch
This option is transparently used by ``clang`` when generating PCH files. The
resulting PCH file contains the serialized form of the compiler's internal
representation after it has completed parsing and semantic analysis. The PCH
file can then be used as a prefix header with the :option:`-include-pch`
option:
.. code-block:: bash
$ clang -cc1 -include-pch test.h.pch test.c -o test.s
Design Philosophy
-----------------
Precompiled headers are meant to improve overall compile times for projects, so
the design of precompiled headers is entirely driven by performance concerns.
The use case for precompiled headers is relatively simple: when there is a
common set of headers that is included in nearly every source file in the
project, we *precompile* that bundle of headers into a single precompiled
header (PCH file). Then, when compiling the source files in the project, we
load the PCH file first (as a prefix header), which acts as a stand-in for that
bundle of headers.
A precompiled header implementation improves performance when:
* Loading the PCH file is significantly faster than re-parsing the bundle of
headers stored within the PCH file. Thus, a precompiled header design
attempts to minimize the cost of reading the PCH file. Ideally, this cost
should not vary with the size of the precompiled header file.
* The cost of generating the PCH file initially is not so large that it
counters the per-source-file performance improvement due to eliminating the
need to parse the bundled headers in the first place. This is particularly
important on multi-core systems, because PCH file generation serializes the
build when all compilations require the PCH file to be up-to-date.
Modules, as implemented in Clang, use the same mechanisms as precompiled
headers to save a serialized AST file (one per module) and use those AST
modules. From an implementation standpoint, modules are a generalization of
precompiled headers, lifting a number of restrictions placed on precompiled
headers. In particular, there can only be one precompiled header and it must
be included at the beginning of the translation unit. The extensions to the
AST file format required for modules are discussed in the section on
:ref:`modules <pchinternals-modules>`.
Clang's AST files are designed with a compact on-disk representation, which
minimizes both creation time and the time required to initially load the AST
file. The AST file itself contains a serialized representation of Clang's
abstract syntax trees and supporting data structures, stored using the same
compressed bitstream as `LLVM's bitcode file format
<http://llvm.org/docs/BitCodeFormat.html>`_.
Clang's AST files are loaded "lazily" from disk. When an AST file is initially
loaded, Clang reads only a small amount of data from the AST file to establish
where certain important data structures are stored. The amount of data read in
this initial load is independent of the size of the AST file, such that a
larger AST file does not lead to longer AST load times. The actual header data
in the AST file --- macros, functions, variables, types, etc. --- is loaded
only when it is referenced from the user's code, at which point only that
entity (and those entities it depends on) are deserialized from the AST file.
With this approach, the cost of using an AST file for a translation unit is
proportional to the amount of code actually used from the AST file, rather than
being proportional to the size of the AST file itself.
When given the :option:`-print-stats` option, Clang produces statistics
describing how much of the AST file was actually loaded from disk. For a
simple "Hello, World!" program that includes the Apple ``Cocoa.h`` header
(which is built as a precompiled header), this option illustrates how little of
the actual precompiled header is required:
.. code-block:: none
*** PCH Statistics:
933 stat cache hits
4 stat cache misses
895/39981 source location entries read (2.238563%)
19/15315 types read (0.124061%)
20/82685 declarations read (0.024188%)
154/58070 identifiers read (0.265197%)
0/7260 selectors read (0.000000%)
0/30842 statements read (0.000000%)
4/8400 macros read (0.047619%)
1/4995 lexical declcontexts read (0.020020%)
0/4413 visible declcontexts read (0.000000%)
0/7230 method pool entries read (0.000000%)
0 method pool misses
For this small program, only a tiny fraction of the source locations, types,
declarations, identifiers, and macros were actually deserialized from the
precompiled header. These statistics can be useful to determine whether the
AST file implementation can be improved by making more of the implementation
lazy.
Precompiled headers can be chained. When you create a PCH while including an
existing PCH, Clang can create the new PCH by referencing the original file and
only writing the new data to the new file. For example, you could create a PCH
out of all the headers that are very commonly used throughout your project, and
then create a PCH for every single source file in the project that includes the
code that is specific to that file, so that recompiling the file itself is very
fast, without duplicating the data from the common headers for every file. The
mechanisms behind chained precompiled headers are discussed in a :ref:`later
section <pchinternals-chained>`.
AST File Contents
-----------------
Clang's AST files are organized into several different blocks, each of which
contains the serialized representation of a part of Clang's internal
representation. Each of the blocks corresponds to either a block or a record
within `LLVM's bitstream format <http://llvm.org/docs/BitCodeFormat.html>`_.
The contents of each of these logical blocks are described below.
.. image:: PCHLayout.png
For a given AST file, the `llvm-bcanalyzer
<http://llvm.org/docs/CommandGuide/llvm-bcanalyzer.html>`_ utility can be used
to examine the actual structure of the bitstream for the AST file. This
information can be used both to help understand the structure of the AST file
and to isolate areas where AST files can still be optimized, e.g., through the
introduction of abbreviations.
Metadata Block
^^^^^^^^^^^^^^
The metadata block contains several records that provide information about how
the AST file was built. This metadata is primarily used to validate the use of
an AST file. For example, a precompiled header built for a 32-bit x86 target
cannot be used when compiling for a 64-bit x86 target. The metadata block
contains information about:
Language options
Describes the particular language dialect used to compile the AST file,
including major options (e.g., Objective-C support) and more minor options
(e.g., support for "``//``" comments). The contents of this record correspond to
the ``LangOptions`` class.
Target architecture
The target triple that describes the architecture, platform, and ABI for
which the AST file was generated, e.g., ``i386-apple-darwin9``.
AST version
The major and minor version numbers of the AST file format. Changes in the
minor version number should not affect backward compatibility, while changes
in the major version number imply that a newer compiler cannot read an older
precompiled header (and vice-versa).
Original file name
The full path of the header that was used to generate the AST file.
Predefines buffer
Although not explicitly stored as part of the metadata, the predefines buffer
is used in the validation of the AST file. The predefines buffer itself
contains code generated by the compiler to initialize the preprocessor state
according to the current target, platform, and command-line options. For
example, the predefines buffer will contain "``#define __STDC__ 1``" when we
are compiling C without Microsoft extensions. The predefines buffer itself
is stored within the :ref:`pchinternals-sourcemgr`, but its contents are
verified along with the rest of the metadata.
A chained PCH file (that is, one that references another PCH) and a module
(which may import other modules) have additional metadata containing the list
of all AST files that this AST file depends on. Each of those files will be
loaded along with this AST file.
For chained precompiled headers, the language options, target architecture and
predefines buffer data is taken from the end of the chain, since they have to
match anyway.
.. _pchinternals-sourcemgr:
Source Manager Block
^^^^^^^^^^^^^^^^^^^^
The source manager block contains the serialized representation of Clang's
`SourceManager <InternalsManual.html#SourceLocation>`_ class, which handles the
mapping from source locations (as represented in Clang's abstract syntax tree)
into actual column/line positions within a source file or macro instantiation.
The AST file's representation of the source manager also includes information
about all of the headers that were (transitively) included when building the
AST file.
The bulk of the source manager block is dedicated to information about the
various files, buffers, and macro instantiations into which a source location
can refer. Each of these is referenced by a numeric "file ID", which is a
unique number (allocated starting at 1) stored in the source location. Clang
serializes the information for each kind of file ID, along with an index that
maps file IDs to the position within the AST file where the information about
that file ID is stored. The data associated with a file ID is loaded only when
required by the front end, e.g., to emit a diagnostic that includes a macro
instantiation history inside the header itself.
The source manager block also contains information about all of the headers
that were included when building the AST file. This includes information about
the controlling macro for the header (e.g., when the preprocessor identified
that the contents of the header dependent on a macro like
``LLVM_CLANG_SOURCEMANAGER_H``) along with a cached version of the results of
the ``stat()`` system calls performed when building the AST file. The latter
is particularly useful in reducing system time when searching for include
files.
.. _pchinternals-preprocessor:
Preprocessor Block
^^^^^^^^^^^^^^^^^^
The preprocessor block contains the serialized representation of the
preprocessor. Specifically, it contains all of the macros that have been
defined by the end of the header used to build the AST file, along with the
token sequences that comprise each macro. The macro definitions are only read
from the AST file when the name of the macro first occurs in the program. This
lazy loading of macro definitions is triggered by lookups into the
:ref:`identifier table <pchinternals-ident-table>`.
.. _pchinternals-types:
Types Block
^^^^^^^^^^^
The types block contains the serialized representation of all of the types
referenced in the translation unit. Each Clang type node (``PointerType``,
``FunctionProtoType``, etc.) has a corresponding record type in the AST file.
When types are deserialized from the AST file, the data within the record is
used to reconstruct the appropriate type node using the AST context.
Each type has a unique type ID, which is an integer that uniquely identifies
that type. Type ID 0 represents the NULL type, type IDs less than
``NUM_PREDEF_TYPE_IDS`` represent predefined types (``void``, ``float``, etc.),
while other "user-defined" type IDs are assigned consecutively from
``NUM_PREDEF_TYPE_IDS`` upward as the types are encountered. The AST file has
an associated mapping from the user-defined types block to the location within
the types block where the serialized representation of that type resides,
enabling lazy deserialization of types. When a type is referenced from within
the AST file, that reference is encoded using the type ID shifted left by 3
bits. The lower three bits are used to represent the ``const``, ``volatile``,
and ``restrict`` qualifiers, as in Clang's
`QualType <http://clang.llvm.org/docs/InternalsManual.html#Type>`_ class.
.. _pchinternals-decls:
Declarations Block
^^^^^^^^^^^^^^^^^^
The declarations block contains the serialized representation of all of the
declarations referenced in the translation unit. Each Clang declaration node
(``VarDecl``, ``FunctionDecl``, etc.) has a corresponding record type in the
AST file. When declarations are deserialized from the AST file, the data
within the record is used to build and populate a new instance of the
corresponding ``Decl`` node. As with types, each declaration node has a
numeric ID that is used to refer to that declaration within the AST file. In
addition, a lookup table provides a mapping from that numeric ID to the offset
within the precompiled header where that declaration is described.
Declarations in Clang's abstract syntax trees are stored hierarchically. At
the top of the hierarchy is the translation unit (``TranslationUnitDecl``),
which contains all of the declarations in the translation unit but is not
actually written as a specific declaration node. Its child declarations (such
as functions or struct types) may also contain other declarations inside them,
and so on. Within Clang, each declaration is stored within a `declaration
context <http://clang.llvm.org/docs/InternalsManual.html#DeclContext>`_, as
represented by the ``DeclContext`` class. Declaration contexts provide the
mechanism to perform name lookup within a given declaration (e.g., find the
member named ``x`` in a structure) and iterate over the declarations stored
within a context (e.g., iterate over all of the fields of a structure for
structure layout).
In Clang's AST file format, deserializing a declaration that is a
``DeclContext`` is a separate operation from deserializing all of the
declarations stored within that declaration context. Therefore, Clang will
deserialize the translation unit declaration without deserializing the
declarations within that translation unit. When required, the declarations
stored within a declaration context will be deserialized. There are two
representations of the declarations within a declaration context, which
correspond to the name-lookup and iteration behavior described above:
* When the front end performs name lookup to find a name ``x`` within a given
declaration context (for example, during semantic analysis of the expression
``p->x``, where ``p``'s type is defined in the precompiled header), Clang
refers to an on-disk hash table that maps from the names within that
declaration context to the declaration IDs that represent each visible
declaration with that name. The actual declarations will then be
deserialized to provide the results of name lookup.
* When the front end performs iteration over all of the declarations within a
declaration context, all of those declarations are immediately
de-serialized. For large declaration contexts (e.g., the translation unit),
this operation is expensive; however, large declaration contexts are not
traversed in normal compilation, since such a traversal is unnecessary.
However, it is common for the code generator and semantic analysis to
traverse declaration contexts for structs, classes, unions, and
enumerations, although those contexts contain relatively few declarations in
the common case.
Statements and Expressions
^^^^^^^^^^^^^^^^^^^^^^^^^^
Statements and expressions are stored in the AST file in both the :ref:`types
<pchinternals-types>` and the :ref:`declarations <pchinternals-decls>` blocks,
because every statement or expression will be associated with either a type or
declaration. The actual statement and expression records are stored
immediately following the declaration or type that owns the statement or
expression. For example, the statement representing the body of a function
will be stored directly following the declaration of the function.
As with types and declarations, each statement and expression kind in Clang's
abstract syntax tree (``ForStmt``, ``CallExpr``, etc.) has a corresponding
record type in the AST file, which contains the serialized representation of
that statement or expression. Each substatement or subexpression within an
expression is stored as a separate record (which keeps most records to a fixed
size). Within the AST file, the subexpressions of an expression are stored, in
reverse order, prior to the expression that owns those expression, using a form
of `Reverse Polish Notation
<http://en.wikipedia.org/wiki/Reverse_Polish_notation>`_. For example, an
expression ``3 - 4 + 5`` would be represented as follows:
+-----------------------+
| ``IntegerLiteral(5)`` |
+-----------------------+
| ``IntegerLiteral(4)`` |
+-----------------------+
| ``IntegerLiteral(3)`` |
+-----------------------+
| ``IntegerLiteral(-)`` |
+-----------------------+
| ``IntegerLiteral(+)`` |
+-----------------------+
| ``STOP`` |
+-----------------------+
When reading this representation, Clang evaluates each expression record it
encounters, builds the appropriate abstract syntax tree node, and then pushes
that expression on to a stack. When a record contains *N* subexpressions ---
``BinaryOperator`` has two of them --- those expressions are popped from the
top of the stack. The special STOP code indicates that we have reached the end
of a serialized expression or statement; other expression or statement records
may follow, but they are part of a different expression.
.. _pchinternals-ident-table:
Identifier Table Block
^^^^^^^^^^^^^^^^^^^^^^
The identifier table block contains an on-disk hash table that maps each
identifier mentioned within the AST file to the serialized representation of
the identifier's information (e.g, the ``IdentifierInfo`` structure). The
serialized representation contains:
* The actual identifier string.
* Flags that describe whether this identifier is the name of a built-in, a
poisoned identifier, an extension token, or a macro.
* If the identifier names a macro, the offset of the macro definition within
the :ref:`pchinternals-preprocessor`.
* If the identifier names one or more declarations visible from translation
unit scope, the :ref:`declaration IDs <pchinternals-decls>` of these
declarations.
When an AST file is loaded, the AST file reader mechanism introduces itself
into the identifier table as an external lookup source. Thus, when the user
program refers to an identifier that has not yet been seen, Clang will perform
a lookup into the identifier table. If an identifier is found, its contents
(macro definitions, flags, top-level declarations, etc.) will be deserialized,
at which point the corresponding ``IdentifierInfo`` structure will have the
same contents it would have after parsing the headers in the AST file.
Within the AST file, the identifiers used to name declarations are represented
with an integral value. A separate table provides a mapping from this integral
value (the identifier ID) to the location within the on-disk hash table where
that identifier is stored. This mapping is used when deserializing the name of
a declaration, the identifier of a token, or any other construct in the AST
file that refers to a name.
.. _pchinternals-method-pool:
Method Pool Block
^^^^^^^^^^^^^^^^^
The method pool block is represented as an on-disk hash table that serves two
purposes: it provides a mapping from the names of Objective-C selectors to the
set of Objective-C instance and class methods that have that particular
selector (which is required for semantic analysis in Objective-C) and also
stores all of the selectors used by entities within the AST file. The design
of the method pool is similar to that of the :ref:`identifier table
<pchinternals-ident-table>`: the first time a particular selector is formed
during the compilation of the program, Clang will search in the on-disk hash
table of selectors; if found, Clang will read the Objective-C methods
associated with that selector into the appropriate front-end data structure
(``Sema::InstanceMethodPool`` and ``Sema::FactoryMethodPool`` for instance and
class methods, respectively).
As with identifiers, selectors are represented by numeric values within the AST
file. A separate index maps these numeric selector values to the offset of the
selector within the on-disk hash table, and will be used when de-serializing an
Objective-C method declaration (or other Objective-C construct) that refers to
the selector.
AST Reader Integration Points
-----------------------------
The "lazy" deserialization behavior of AST files requires their integration
into several completely different submodules of Clang. For example, lazily
deserializing the declarations during name lookup requires that the name-lookup
routines be able to query the AST file to find entities stored there.
For each Clang data structure that requires direct interaction with the AST
reader logic, there is an abstract class that provides the interface between
the two modules. The ``ASTReader`` class, which handles the loading of an AST
file, inherits from all of these abstract classes to provide lazy
deserialization of Clang's data structures. ``ASTReader`` implements the
following abstract classes:
``StatSysCallCache``
This abstract interface is associated with the ``FileManager`` class, and is
used whenever the file manager is going to perform a ``stat()`` system call.
``ExternalSLocEntrySource``
This abstract interface is associated with the ``SourceManager`` class, and
is used whenever the :ref:`source manager <pchinternals-sourcemgr>` needs to
load the details of a file, buffer, or macro instantiation.
``IdentifierInfoLookup``
This abstract interface is associated with the ``IdentifierTable`` class, and
is used whenever the program source refers to an identifier that has not yet
been seen. In this case, the AST reader searches for this identifier within
its :ref:`identifier table <pchinternals-ident-table>` to load any top-level
declarations or macros associated with that identifier.
``ExternalASTSource``
This abstract interface is associated with the ``ASTContext`` class, and is
used whenever the abstract syntax tree nodes need to loaded from the AST
file. It provides the ability to de-serialize declarations and types
identified by their numeric values, read the bodies of functions when
required, and read the declarations stored within a declaration context
(either for iteration or for name lookup).
``ExternalSemaSource``
This abstract interface is associated with the ``Sema`` class, and is used
whenever semantic analysis needs to read information from the :ref:`global
method pool <pchinternals-method-pool>`.
.. _pchinternals-chained:
Chained precompiled headers
---------------------------
Chained precompiled headers were initially intended to improve the performance
of IDE-centric operations such as syntax highlighting and code completion while
a particular source file is being edited by the user. To minimize the amount
of reparsing required after a change to the file, a form of precompiled header
--- called a precompiled *preamble* --- is automatically generated by parsing
all of the headers in the source file, up to and including the last
``#include``. When only the source file changes (and none of the headers it
depends on), reparsing of that source file can use the precompiled preamble and
start parsing after the ``#include``\ s, so parsing time is proportional to the
size of the source file (rather than all of its includes). However, the
compilation of that translation unit may already use a precompiled header: in
this case, Clang will create the precompiled preamble as a chained precompiled
header that refers to the original precompiled header. This drastically
reduces the time needed to serialize the precompiled preamble for use in
reparsing.
Chained precompiled headers get their name because each precompiled header can
depend on one other precompiled header, forming a chain of dependencies. A
translation unit will then include the precompiled header that starts the chain
(i.e., nothing depends on it). This linearity of dependencies is important for
the semantic model of chained precompiled headers, because the most-recent
precompiled header can provide information that overrides the information
provided by the precompiled headers it depends on, just like a header file
``B.h`` that includes another header ``A.h`` can modify the state produced by
parsing ``A.h``, e.g., by ``#undef``'ing a macro defined in ``A.h``.
There are several ways in which chained precompiled headers generalize the AST
file model:
Numbering of IDs
Many different kinds of entities --- identifiers, declarations, types, etc.
--- have ID numbers that start at 1 or some other predefined constant and
grow upward. Each precompiled header records the maximum ID number it has
assigned in each category. Then, when a new precompiled header is generated
that depends on (chains to) another precompiled header, it will start
counting at the next available ID number. This way, one can determine, given
an ID number, which AST file actually contains the entity.
Name lookup
When writing a chained precompiled header, Clang attempts to write only
information that has changed from the precompiled header on which it is
based. This changes the lookup algorithm for the various tables, such as the
:ref:`identifier table <pchinternals-ident-table>`: the search starts at the
most-recent precompiled header. If no entry is found, lookup then proceeds
to the identifier table in the precompiled header it depends on, and so one.
Once a lookup succeeds, that result is considered definitive, overriding any
results from earlier precompiled headers.
Update records
There are various ways in which a later precompiled header can modify the
entities described in an earlier precompiled header. For example, later
precompiled headers can add entries into the various name-lookup tables for
the translation unit or namespaces, or add new categories to an Objective-C
class. Each of these updates is captured in an "update record" that is
stored in the chained precompiled header file and will be loaded along with
the original entity.
.. _pchinternals-modules:
Modules
-------
Modules generalize the chained precompiled header model yet further, from a
linear chain of precompiled headers to an arbitrary directed acyclic graph
(DAG) of AST files. All of the same techniques used to make chained
precompiled headers work --- ID number, name lookup, update records --- are
shared with modules. However, the DAG nature of modules introduce a number of
additional complications to the model:
Numbering of IDs
The simple, linear numbering scheme used in chained precompiled headers falls
apart with the module DAG, because different modules may end up with
different numbering schemes for entities they imported from common shared
modules. To account for this, each module file provides information about
which modules it depends on and which ID numbers it assigned to the entities
in those modules, as well as which ID numbers it took for its own new
entities. The AST reader then maps these "local" ID numbers into a "global"
ID number space for the current translation unit, providing a 1-1 mapping
between entities (in whatever AST file they inhabit) and global ID numbers.
If that translation unit is then serialized into an AST file, this mapping
will be stored for use when the AST file is imported.
Declaration merging
It is possible for a given entity (from the language's perspective) to be
declared multiple times in different places. For example, two different
headers can have the declaration of ``printf`` or could forward-declare
``struct stat``. If each of those headers is included in a module, and some
third party imports both of those modules, there is a potentially serious
problem: name lookup for ``printf`` or ``struct stat`` will find both
declarations, but the AST nodes are unrelated. This would result in a
compilation error, due to an ambiguity in name lookup. Therefore, the AST
reader performs declaration merging according to the appropriate language
semantics, ensuring that the two disjoint declarations are merged into a
single redeclaration chain (with a common canonical declaration), so that it
is as if one of the headers had been included before the other.
Name Visibility
Modules allow certain names that occur during module creation to be "hidden",
so that they are not part of the public interface of the module and are not
visible to its clients. The AST reader maintains a "visible" bit on various
AST nodes (declarations, macros, etc.) to indicate whether that particular
AST node is currently visible; the various name lookup mechanisms in Clang
inspect the visible bit to determine whether that entity, which is still in
the AST (because other, visible AST nodes may depend on it), can actually be
found by name lookup. When a new (sub)module is imported, it may make
existing, non-visible, already-deserialized AST nodes visible; it is the
responsibility of the AST reader to find and update these AST nodes when it
is notified of the import.

Просмотреть файл

@ -1,126 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<!-- Material used from: HTML 4.01 specs: http://www.w3.org/TR/html401/ -->
<html>
<head>
<META http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
<title>ThreadSanitizer, a race detector</title>
<link type="text/css" rel="stylesheet" href="../menu.css">
<link type="text/css" rel="stylesheet" href="../content.css">
<style type="text/css">
td {
vertical-align: top;
}
</style>
</head>
<body>
<!--#include virtual="../menu.html.incl"-->
<div id="content">
<h1>ThreadSanitizer</h1>
<ul>
<li> <a href="#intro">Introduction</a>
<li> <a href="#howtobuild">How to Build</a>
<li> <a href="#platforms">Supported Platforms</a>
<li> <a href="#usage">Usage</a>
<li> <a href="#limitations">Limitations</a>
<li> <a href="#status">Current Status</a>
<li> <a href="#moreinfo">More Information</a>
</ul>
<h2 id="intro">Introduction</h2>
ThreadSanitizer is a tool that detects data races. <BR>
It consists of a compiler instrumentation module and a run-time library. <BR>
Typical slowdown introduced by ThreadSanitizer is <b>5x-15x</b> (TODO: these numbers are
approximate so far).
<h2 id="howtobuild">How to build</h2>
Follow the <a href="../get_started.html">clang build instructions</a>.
CMake build is supported.<BR>
<h2 id="platforms">Supported Platforms</h2>
ThreadSanitizer is supported on Linux x86_64 (tested on Ubuntu 10.04). <BR>
Support for MacOS 10.7 (64-bit only) is planned for late 2012. <BR>
Support for 32-bit platforms is problematic and not yet planned.
<h2 id="usage">Usage</h2>
Simply compile your program with <tt>-fsanitize=thread -fPIE</tt> and link it
with <tt>-fsanitize=thread -pie</tt>.<BR>
To get a reasonable performance add <tt>-O1</tt> or higher. <BR>
Use <tt>-g</tt> to get file names and line numbers in the warning messages. <BR>
Example:
<pre>
% cat projects/compiler-rt/lib/tsan/output_tests/tiny_race.c
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
Global = 43;
pthread_join(t, NULL);
return Global;
}
</pre>
<pre>
% clang -fsanitize=thread -g -O1 tiny_race.c -fPIE -pie
</pre>
If a bug is detected, the program will print an error message to stderr.
Currently, ThreadSanitizer symbolizes its output using an external
<tt>addr2line</tt>
process (this will be fixed in future).
<pre>
% TSAN_OPTIONS=strip_path_prefix=`pwd`/ # Don't print full paths.
% ./a.out 2> log
% cat log
WARNING: ThreadSanitizer: data race (pid=19219)
Write of size 4 at 0x7fcf47b21bc0 by thread 1:
#0 Thread1 tiny_race.c:4 (exe+0x00000000a360)
Previous write of size 4 at 0x7fcf47b21bc0 by main thread:
#0 main tiny_race.c:10 (exe+0x00000000a3b4)
Thread 1 (running) created at:
#0 pthread_create ??:0 (exe+0x00000000c790)
#1 main tiny_race.c:9 (exe+0x00000000a3a4)
</pre>
<h2 id="limitations">Limitations</h2>
<ul>
<li> ThreadSanitizer uses more real memory than a native run.
At the default settings the memory overhead is 9x plus 9Mb per each thread.
Settings with 5x and 3x overhead (but less accurate analysis) are also available.
<li> ThreadSanitizer maps (but does not reserve) a lot of virtual address space.
This means that tools like <tt>ulimit</tt> may not work as usually expected.
<li> Static linking is not supported.
<li> ThreadSanitizer requires <tt>-fPIE -pie</tt>
</ul>
<h2 id="status">Current Status</h2>
ThreadSanitizer is in alpha stage.
It is known to work on large C++ programs using pthreads, but we do not promise
anything (yet). <BR>
C++11 threading is not yet supported. <BR>
The test suite is integrated into CMake build and can be run with
<tt>make check-tsan</tt> command. <BR>
We are actively working on enhancing the tool -- stay tuned.
Any help, especially in the form of minimized standalone tests is more than welcome.
<h2 id="moreinfo">More Information</h2>
<a href="http://code.google.com/p/thread-sanitizer/">http://code.google.com/p/thread-sanitizer</a>.
</div>
</body>
</html>

95
docs/ThreadSanitizer.rst Normal file
Просмотреть файл

@ -0,0 +1,95 @@
ThreadSanitizer
===============
Introduction
------------
ThreadSanitizer is a tool that detects data races. It consists of a compiler
instrumentation module and a run-time library. Typical slowdown introduced by
ThreadSanitizer is **5x-15x** (TODO: these numbers are approximate so far).
How to build
------------
Follow the `Clang build instructions <../get_started.html>`_. CMake build is
supported.
Supported Platforms
-------------------
ThreadSanitizer is supported on Linux x86_64 (tested on Ubuntu 10.04). Support
for MacOS 10.7 (64-bit only) is planned for late 2012. Support for 32-bit
platforms is problematic and not yet planned.
Usage
-----
Simply compile your program with ``-fsanitize=thread -fPIE`` and link it with
``-fsanitize=thread -pie``. To get a reasonable performance add ``-O1`` or
higher. Use ``-g`` to get file names and line numbers in the warning messages.
Example:
.. code-block:: c++
% cat projects/compiler-rt/lib/tsan/output_tests/tiny_race.c
#include <pthread.h>
int Global;
void *Thread1(void *x) {
Global = 42;
return x;
}
int main() {
pthread_t t;
pthread_create(&t, NULL, Thread1, NULL);
Global = 43;
pthread_join(t, NULL);
return Global;
}
$ clang -fsanitize=thread -g -O1 tiny_race.c -fPIE -pie
If a bug is detected, the program will print an error message to stderr.
Currently, ThreadSanitizer symbolizes its output using an external
``addr2line`` process (this will be fixed in future).
.. code-block:: bash
% TSAN_OPTIONS=strip_path_prefix=`pwd`/ # Don't print full paths.
% ./a.out 2> log
% cat log
WARNING: ThreadSanitizer: data race (pid=19219)
Write of size 4 at 0x7fcf47b21bc0 by thread 1:
#0 Thread1 tiny_race.c:4 (exe+0x00000000a360)
Previous write of size 4 at 0x7fcf47b21bc0 by main thread:
#0 main tiny_race.c:10 (exe+0x00000000a3b4)
Thread 1 (running) created at:
#0 pthread_create ??:0 (exe+0x00000000c790)
#1 main tiny_race.c:9 (exe+0x00000000a3a4)
Limitations
-----------
* ThreadSanitizer uses more real memory than a native run. At the default
settings the memory overhead is 9x plus 9Mb per each thread. Settings with 5x
and 3x overhead (but less accurate analysis) are also available.
* ThreadSanitizer maps (but does not reserve) a lot of virtual address space.
This means that tools like ``ulimit`` may not work as usually expected.
* Static linking is not supported.
* ThreadSanitizer requires ``-fPIE -pie``.
Current Status
--------------
ThreadSanitizer is in alpha stage. It is known to work on large C++ programs
using pthreads, but we do not promise anything (yet). C++11 threading is not
yet supported. The test suite is integrated into CMake build and can be run
with ``make check-tsan`` command.
We are actively working on enhancing the tool --- stay tuned. Any help,
especially in the form of minimized standalone tests is more than welcome.
More Information
----------------
`http://code.google.com/p/thread-sanitizer <http://code.google.com/p/thread-sanitizer/>`_.

Просмотреть файл

@ -1,120 +0,0 @@
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
<html>
<head>
<title>Writing Clang Tools</title>
<link type="text/css" rel="stylesheet" href="../menu.css">
<link type="text/css" rel="stylesheet" href="../content.css">
</head>
<body>
<!--#include virtual="../menu.html.incl"-->
<div id="content">
<h1>Writing Clang Tools</h1>
<p>Clang provides infrastructure to write tools that need syntactic and semantic
information about a program. This document will give a short introduction of the
different ways to write clang tools, and their pros and cons.</p>
<!-- ======================================================================= -->
<h2 id="libclang"><a href="http://clang.llvm.org/doxygen/group__CINDEX.html">LibClang</a></h2>
<!-- ======================================================================= -->
<p>LibClang is a stable high level C interface to clang. When in doubt LibClang
is probably the interface you want to use. Consider the other interfaces only
when you have a good reason not to use LibClang.</p>
<p>Canonical examples of when to use LibClang:</p>
<ul>
<li>Xcode</li>
<li>Clang Python Bindings</li>
</ul>
<p>Use LibClang when you...</p>
<ul>
<li>want to interface with clang from other languages than C++</li>
<li>need a stable interface that takes care to be backwards compatible</li>
<li>want powerful high-level abstractions, like iterating through an AST
with a cursor, and don't want to learn all the nitty gritty details of Clang's
AST.</li>
</ul>
<p>Do not use LibClang when you...</p>
<ul>
<li>want full control over the Clang AST</li>
</ul>
<!-- ======================================================================= -->
<h2 id="clang-plugins"><a href="ClangPlugins.html">Clang Plugins</a></h2>
<!-- ======================================================================= -->
<p>Clang Plugins allow you to run additional actions on the AST as part of
a compilation. Plugins are dynamic libraries that are loaded at runtime by
the compiler, and they're easy to integrate into your build environment.</p>
<p>Canonical examples of when to use Clang Plugins:</p>
<ul>
<li>special lint-style warnings or errors for your project</li>
<li>creating additional build artifacts from a single compile step</li>
</ul>
<p>Use Clang Plugins when you...</p>
<ul>
<li>need your tool to rerun if any of the dependencies change</li>
<li>want your tool to make or break a build</li>
<li>need full control over the Clang AST</li>
</ul>
<p>Do not use Clang Plugins when you...</p>
<ul>
<li>want to run tools outside of your build environment</li>
<li>want full control on how Clang is set up, including mapping of in-memory
virtual files</li>
<li>need to run over a specific subset of files in your project which is not
necessarily related to any changes which would trigger rebuilds</li>
</ul>
<!-- ======================================================================= -->
<h2 id="libtooling"><a href="LibTooling.html">LibTooling</a></h2>
<!-- ======================================================================= -->
<p>LibTooling is a C++ interface aimed at writing standalone tools, as well as
integrating into services that run clang tools.</p>
<p>Canonical examples of when to use LibTooling:</p>
<ul>
<li>a simple syntax checker</li>
<li>refactoring tools</li>
</ul>
<p>Use LibTooling when you...</p>
<ul>
<li>want to run tools over a single file, or a specific subset of files,
independently of the build system</li>
<li>want full control over the Clang AST</li>
<li>want to share code with Clang Plugins</li>
</ul>
<p>Do not use LibTooling when you...</p>
<ul>
<li>want to run as part of the build triggered by dependency changes</li>
<li>want a stable interface so you don't need to change your code when the
AST API changes</li>
<li>want high level abstractions like cursors and code completion out of the
box</li>
<li>do not want to write your tools in C++</li>
</ul>
<!-- ======================================================================= -->
<h2 id="clang-tools"><a href="ClangTools.html">Clang Tools</a></h2>
<!-- ======================================================================= -->
<p>These are a collection of specific developer tools built on top of the
LibTooling infrastructure as part of the Clang project. They are targeted at
automating and improving core development activities of C/C++ developers.</p>
<p>Examples of tools we are building or planning as part of the Clang
project:</p>
<ul>
<li>Syntax checking (clang-check)</li>
<li>Automatic fixing of compile errors (clangc-fixit)</li>
<li>Automatic code formatting</li>
<li>Migration tools for new features in new language standards</li>
<li>Core refactoring tools</li>
</ul>
</div>
</body>
</html>

100
docs/Tooling.rst Normal file
Просмотреть файл

@ -0,0 +1,100 @@
===================
Writing Clang Tools
===================
Clang provides infrastructure to write tools that need syntactic and semantic
information about a program. This document will give a short introduction of
the different ways to write clang tools, and their pros and cons.
LibClang
--------
`LibClang <http://clang.llvm.org/doxygen/group__CINDEX.html>`_ is a stable high
level C interface to clang. When in doubt LibClang is probably the interface
you want to use. Consider the other interfaces only when you have a good
reason not to use LibClang.
Canonical examples of when to use LibClang:
* Xcode
* Clang Python Bindings
Use LibClang when you...:
* want to interface with clang from other languages than C++
* need a stable interface that takes care to be backwards compatible
* want powerful high-level abstractions, like iterating through an AST with a
cursor, and don't want to learn all the nitty gritty details of Clang's AST.
Do not use LibClang when you...:
* want full control over the Clang AST
Clang Plugins
-------------
`Clang Plugins <ClangPlugins.html>`_ allow you to run additional actions on the
AST as part of a compilation. Plugins are dynamic libraries that are loaded at
runtime by the compiler, and they're easy to integrate into your build
environment.
Canonical examples of when to use Clang Plugins:
* special lint-style warnings or errors for your project
* creating additional build artifacts from a single compile step
Use Clang Plugins when you...:
* need your tool to rerun if any of the dependencies change
* want your tool to make or break a build
* need full control over the Clang AST
Do not use Clang Plugins when you...:
* want to run tools outside of your build environment
* want full control on how Clang is set up, including mapping of in-memory
virtual files
* need to run over a specific subset of files in your project which is not
necessarily related to any changes which would trigger rebuilds
LibTooling
----------
`LibTooling <LibTooling.html>`_ is a C++ interface aimed at writing standalone
tools, as well as integrating into services that run clang tools. Canonical
examples of when to use LibTooling:
* a simple syntax checker
* refactoring tools
Use LibTooling when you...:
* want to run tools over a single file, or a specific subset of files,
independently of the build system
* want full control over the Clang AST
* want to share code with Clang Plgins
Do not use LibTooling when you...:
* want to run as part of the build triggered by dependency changes
* want a stable interface so you don't need to change your code when the AST API
changes
* want high level abstractions like cursors and code completion out of the box
* do not want to write your tools in C++
Clang Tools
-----------
`Clang tools <ClangTools.html>`_ are a collection of specific developer tools
built on top of the LibTooling infrastructure as part of the Clang project.
They are targeted at automating and improving core development activities of
C/C++ developers.
Examples of tools we are building or planning as part of the Clang project:
* Syntax checking (:program:`clang-check`)
* Automatic fixing of compile errors (:program:`clang-fixit`)
* Automatic code formatting
* Migration tools for new features in new language standards
* Core refactoring tools

Просмотреть файл

@ -12,6 +12,12 @@ progress. This page will get filled out with docs soon...
.. toctree::
:maxdepth: 2
LanguageExtensions
LibASTMatchers
LibTooling
PCHInternals
ThreadSanitizer
Tooling
Indices and tables