зеркало из https://github.com/microsoft/clang-1.git
Update the PCH internals documentation to cover chained precompiled
headers and modules in more detail. I'd still like to expand on some of the modules-related issues further, but this is a decent start. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@163989 91177308-0d34-0410-b5e6-96231b3b80d8
This commit is contained in:
Родитель
19ec962bab
Коммит
72e4f25c06
|
@ -2,7 +2,7 @@
|
|||
"http://www.w3.org/TR/html4/strict.dtd">
|
||||
<html>
|
||||
<head>
|
||||
<title>Precompiled Headers (PCH)</title>
|
||||
<title>Precompiled Header and Modules Internals</title>
|
||||
<link type="text/css" rel="stylesheet" href="../menu.css">
|
||||
<link type="text/css" rel="stylesheet" href="../content.css">
|
||||
<style type="text/css">
|
||||
|
@ -18,10 +18,10 @@
|
|||
|
||||
<div id="content">
|
||||
|
||||
<h1>Precompiled Headers</h1>
|
||||
<h1>Precompiled Header and Modules Internals</h1>
|
||||
|
||||
<p>This document describes the design and implementation of Clang's
|
||||
precompiled headers (PCH). If you are interested in the end-user
|
||||
precompiled headers (PCH) and modules. If you are interested in the end-user
|
||||
view, please see the <a
|
||||
href="UsersManual.html#precompiledheaders">User's Manual</a>.</p>
|
||||
|
||||
|
@ -30,7 +30,7 @@
|
|||
<li><a href="#usage">Using Precompiled Headers with
|
||||
<tt>clang</tt></a></li>
|
||||
<li><a href="#philosophy">Design Philosophy</a></li>
|
||||
<li><a href="#contents">Precompiled Header Contents</a>
|
||||
<li><a href="#contents">Serialized AST File Contents</a>
|
||||
<ul>
|
||||
<li><a href="#metadata">Metadata Block</a></li>
|
||||
<li><a href="#sourcemgr">Source Manager Block</a></li>
|
||||
|
@ -42,8 +42,9 @@
|
|||
<li><a href="#method-pool">Method Pool Block</a></li>
|
||||
</ul>
|
||||
</li>
|
||||
<li><a href="#tendrils">Precompiled Header Integration
|
||||
Points</a></li>
|
||||
<li><a href="#tendrils">AST Reader Integration Points</a></li>
|
||||
<li><a href="#chained">Chained precompiled headers</a></li>
|
||||
<li><a href="#modules">Modules</a></li>
|
||||
</ul>
|
||||
|
||||
<h2 id="usage">Using Precompiled Headers with <tt>clang</tt></h2>
|
||||
|
@ -94,30 +95,39 @@ with the <b><tt>-include-pch</tt></b> option:</p>
|
|||
require the PCH file to be up-to-date.</li>
|
||||
</ul>
|
||||
|
||||
<p>Clang's precompiled headers are designed with a compact on-disk
|
||||
representation, which minimizes both PCH creation time and the time
|
||||
required to initially load the PCH file. The PCH file itself contains
|
||||
<p>Modules, as implemented in Clang, use the same mechanisms as
|
||||
precompiled headers to save a serialized AST file (one per module) and
|
||||
use those AST modules. From an implementation standpoint, modules are
|
||||
a generalization of precompiled headers, lifting a number of
|
||||
restrictions placed on precompiled headers. In particular, there can
|
||||
only be one precompiled header and it must be included at the
|
||||
beginning of the translation unit. The extensions to the AST file
|
||||
format required for modules are discussed in the section on <a href="#modules">modules</a>.</p>
|
||||
|
||||
<p>Clang's AST files are designed with a compact on-disk
|
||||
representation, which minimizes both creation time and the time
|
||||
required to initially load the AST file. The AST file itself contains
|
||||
a serialized representation of Clang's abstract syntax trees and
|
||||
supporting data structures, stored using the same compressed bitstream
|
||||
as <a href="http://llvm.org/docs/BitCodeFormat.html">LLVM's bitcode
|
||||
file format</a>.</p>
|
||||
|
||||
<p>Clang's precompiled headers are loaded "lazily" from disk. When a
|
||||
PCH file is initially loaded, Clang reads only a small amount of data
|
||||
from the PCH file to establish where certain important data structures
|
||||
<p>Clang's AST files are loaded "lazily" from disk. When an
|
||||
AST file is initially loaded, Clang reads only a small amount of data
|
||||
from the AST file to establish where certain important data structures
|
||||
are stored. The amount of data read in this initial load is
|
||||
independent of the size of the PCH file, such that a larger PCH file
|
||||
does not lead to longer PCH load times. The actual header data in the
|
||||
PCH file--macros, functions, variables, types, etc.--is loaded only
|
||||
independent of the size of the AST file, such that a larger AST file
|
||||
does not lead to longer AST load times. The actual header data in the
|
||||
AST file--macros, functions, variables, types, etc.--is loaded only
|
||||
when it is referenced from the user's code, at which point only that
|
||||
entity (and those entities it depends on) are deserialized from the
|
||||
PCH file. With this approach, the cost of using a precompiled header
|
||||
AST file. With this approach, the cost of using an AST file
|
||||
for a translation unit is proportional to the amount of code actually
|
||||
used from the header, rather than being proportional to the size of
|
||||
the header itself.</p>
|
||||
used from the AST file, rather than being proportional to the size of
|
||||
the AST file itself.</p>
|
||||
|
||||
<p>When given the <code>-print-stats</code> option, Clang produces
|
||||
statistics describing how much of the precompiled header was actually
|
||||
statistics describing how much of the AST file was actually
|
||||
loaded from disk. For a simple "Hello, World!" program that includes
|
||||
the Apple <code>Cocoa.h</code> header (which is built as a precompiled
|
||||
header), this option illustrates how little of the actual precompiled
|
||||
|
@ -143,7 +153,7 @@ header is required:</p>
|
|||
<p>For this small program, only a tiny fraction of the source
|
||||
locations, types, declarations, identifiers, and macros were actually
|
||||
deserialized from the precompiled header. These statistics can be
|
||||
useful to determine whether the precompiled header implementation can
|
||||
useful to determine whether the AST file implementation can
|
||||
be improved by making more of the implementation lazy.</p>
|
||||
|
||||
<p>Precompiled headers can be chained. When you create a PCH while
|
||||
|
@ -153,13 +163,15 @@ example, you could create a PCH out of all the headers that are very
|
|||
commonly used throughout your project, and then create a PCH for every
|
||||
single source file in the project that includes the code that is
|
||||
specific to that file, so that recompiling the file itself is very fast,
|
||||
without duplicating the data from the common headers for every file.</p>
|
||||
without duplicating the data from the common headers for every
|
||||
file. The mechanisms behind chained precompiled headers are discussed
|
||||
in a <a href="#chained">later section</a>.
|
||||
|
||||
<h2 id="contents">Precompiled Header Contents</h2>
|
||||
<h2 id="contents">AST File Contents</h2>
|
||||
|
||||
<img src="PCHLayout.png" style="float:right" alt="Precompiled header layout">
|
||||
|
||||
<p>Clang's precompiled headers are organized into several different
|
||||
<p>Clang's AST files are organized into several different
|
||||
blocks, each of which contains the serialized representation of a part
|
||||
of Clang's internal representation. Each of the blocks corresponds to
|
||||
either a block or a record within <a
|
||||
|
@ -167,19 +179,19 @@ either a block or a record within <a
|
|||
format</a>. The contents of each of these logical blocks are described
|
||||
below.</p>
|
||||
|
||||
<p>For a given precompiled header, the <a
|
||||
<p>For a given AST file, the <a
|
||||
href="http://llvm.org/cmds/llvm-bcanalyzer.html"><code>llvm-bcanalyzer</code></a>
|
||||
utility can be used to examine the actual structure of the bitstream
|
||||
for the precompiled header. This information can be used both to help
|
||||
understand the structure of the precompiled header and to isolate
|
||||
areas where precompiled headers can still be optimized, e.g., through
|
||||
for the AST file. This information can be used both to help
|
||||
understand the structure of the AST file and to isolate
|
||||
areas where AST files can still be optimized, e.g., through
|
||||
the introduction of abbreviations.</p>
|
||||
|
||||
<h3 id="metadata">Metadata Block</h3>
|
||||
|
||||
<p>The metadata block contains several records that provide
|
||||
information about how the precompiled header was built. This metadata
|
||||
is primarily used to validate the use of a precompiled header. For
|
||||
information about how the AST file was built. This metadata
|
||||
is primarily used to validate the use of an AST file. For
|
||||
example, a precompiled header built for a 32-bit x86 target cannot be used
|
||||
when compiling for a 64-bit x86 target. The metadata block contains
|
||||
information about:</p>
|
||||
|
@ -187,17 +199,17 @@ information about:</p>
|
|||
<dl>
|
||||
<dt>Language options</dt>
|
||||
<dd>Describes the particular language dialect used to compile the
|
||||
PCH file, including major options (e.g., Objective-C support) and more
|
||||
AST file, including major options (e.g., Objective-C support) and more
|
||||
minor options (e.g., support for "//" comments). The contents of this
|
||||
record correspond to the <code>LangOptions</code> class.</dd>
|
||||
|
||||
<dt>Target architecture</dt>
|
||||
<dd>The target triple that describes the architecture, platform, and
|
||||
ABI for which the PCH file was generated, e.g.,
|
||||
ABI for which the AST file was generated, e.g.,
|
||||
<code>i386-apple-darwin9</code>.</dd>
|
||||
|
||||
<dt>PCH version</dt>
|
||||
<dd>The major and minor version numbers of the precompiled header
|
||||
<dt>AST version</dt>
|
||||
<dd>The major and minor version numbers of the AST file
|
||||
format. Changes in the minor version number should not affect backward
|
||||
compatibility, while changes in the major version number imply that a
|
||||
newer compiler cannot read an older precompiled header (and
|
||||
|
@ -205,11 +217,11 @@ vice-versa).</dd>
|
|||
|
||||
<dt>Original file name</dt>
|
||||
<dd>The full path of the header that was used to generate the
|
||||
precompiled header.</dd>
|
||||
AST file.</dd>
|
||||
|
||||
<dt>Predefines buffer</dt>
|
||||
<dd>Although not explicitly stored as part of the metadata, the
|
||||
predefines buffer is used in the validation of the precompiled header.
|
||||
predefines buffer is used in the validation of the AST file.
|
||||
The predefines buffer itself contains code generated by the compiler
|
||||
to initialize the preprocessor state according to the current target,
|
||||
platform, and command-line options. For example, the predefines buffer
|
||||
|
@ -220,26 +232,14 @@ contents are verified along with the rest of the metadata.</dd>
|
|||
|
||||
</dl>
|
||||
|
||||
<p>A chained PCH file (that is, one that references another PCH) has
|
||||
a slightly different metadata block, which contains the following
|
||||
information:</p>
|
||||
<p>A chained PCH file (that is, one that references another PCH) and a
|
||||
module (which may import other modules) have additional metadata
|
||||
containing the list of all AST files that this AST file depends
|
||||
on. Each of those files will be loaded along with this AST file.</p>
|
||||
|
||||
<dl>
|
||||
<dt>Referenced file</dt>
|
||||
<dd>The name of the referenced PCH file. It is looked up like a file
|
||||
specified using -include-pch.</dd>
|
||||
|
||||
<dt>PCH version</dt>
|
||||
<dd>This is the same as in normal PCH files.</dd>
|
||||
|
||||
<dt>Original file name</dt>
|
||||
<dd>The full path of the header that was used to generate this
|
||||
precompiled header.</dd>
|
||||
|
||||
</dl>
|
||||
|
||||
<p>The language options, target architecture and predefines buffer data
|
||||
is taken from the end of the chain, since they have to match anyway.</p>
|
||||
<p>For chained precompiled headers, the language options, target
|
||||
architecture and predefines buffer data is taken from the end of the
|
||||
chain, since they have to match anyway.</p>
|
||||
|
||||
<h3 id="sourcemgr">Source Manager Block</h3>
|
||||
|
||||
|
@ -248,10 +248,10 @@ Clang's <a
|
|||
href="InternalsManual.html#SourceLocation">SourceManager</a> class,
|
||||
which handles the mapping from source locations (as represented in
|
||||
Clang's abstract syntax tree) into actual column/line positions within
|
||||
a source file or macro instantiation. The precompiled header's
|
||||
a source file or macro instantiation. The AST file's
|
||||
representation of the source manager also includes information about
|
||||
all of the headers that were (transitively) included when building the
|
||||
precompiled header.</p>
|
||||
AST file.</p>
|
||||
|
||||
<p>The bulk of the source manager block is dedicated to information
|
||||
about the various files, buffers, and macro instantiations into which
|
||||
|
@ -259,18 +259,18 @@ a source location can refer. Each of these is referenced by a numeric
|
|||
"file ID", which is a unique number (allocated starting at 1) stored
|
||||
in the source location. Clang serializes the information for each kind
|
||||
of file ID, along with an index that maps file IDs to the position
|
||||
within the PCH file where the information about that file ID is
|
||||
within the AST file where the information about that file ID is
|
||||
stored. The data associated with a file ID is loaded only when
|
||||
required by the front end, e.g., to emit a diagnostic that includes a
|
||||
macro instantiation history inside the header itself.</p>
|
||||
|
||||
<p>The source manager block also contains information about all of the
|
||||
headers that were included when building the precompiled header. This
|
||||
headers that were included when building the AST file. This
|
||||
includes information about the controlling macro for the header (e.g.,
|
||||
when the preprocessor identified that the contents of the header
|
||||
dependent on a macro like <code>LLVM_CLANG_SOURCEMANAGER_H</code>)
|
||||
along with a cached version of the results of the <code>stat()</code>
|
||||
system calls performed when building the precompiled header. The
|
||||
system calls performed when building the AST file. The
|
||||
latter is particularly useful in reducing system time when searching
|
||||
for include files.</p>
|
||||
|
||||
|
@ -279,8 +279,8 @@ for include files.</p>
|
|||
<p>The preprocessor block contains the serialized representation of
|
||||
the preprocessor. Specifically, it contains all of the macros that
|
||||
have been defined by the end of the header used to build the
|
||||
precompiled header, along with the token sequences that comprise each
|
||||
macro. The macro definitions are only read from the PCH file when the
|
||||
AST file, along with the token sequences that comprise each
|
||||
macro. The macro definitions are only read from the AST file when the
|
||||
name of the macro first occurs in the program. This lazy loading of
|
||||
macro definitions is triggered by lookups into the <a
|
||||
href="#idtable">identifier table</a>.</p>
|
||||
|
@ -290,8 +290,8 @@ macro definitions is triggered by lookups into the <a
|
|||
<p>The types block contains the serialized representation of all of
|
||||
the types referenced in the translation unit. Each Clang type node
|
||||
(<code>PointerType</code>, <code>FunctionProtoType</code>, etc.) has a
|
||||
corresponding record type in the PCH file. When types are deserialized
|
||||
from the precompiled header, the data within the record is used to
|
||||
corresponding record type in the AST file. When types are deserialized
|
||||
from the AST file, the data within the record is used to
|
||||
reconstruct the appropriate type node using the AST context.</p>
|
||||
|
||||
<p>Each type has a unique type ID, which is an integer that uniquely
|
||||
|
@ -300,10 +300,10 @@ less than <code>NUM_PREDEF_TYPE_IDS</code> represent predefined types
|
|||
(<code>void</code>, <code>float</code>, etc.), while other
|
||||
"user-defined" type IDs are assigned consecutively from
|
||||
<code>NUM_PREDEF_TYPE_IDS</code> upward as the types are encountered.
|
||||
The PCH file has an associated mapping from the user-defined types
|
||||
The AST file has an associated mapping from the user-defined types
|
||||
block to the location within the types block where the serialized
|
||||
representation of that type resides, enabling lazy deserialization of
|
||||
types. When a type is referenced from within the PCH file, that
|
||||
types. When a type is referenced from within the AST file, that
|
||||
reference is encoded using the type ID shifted left by 3 bits. The
|
||||
lower three bits are used to represent the <code>const</code>,
|
||||
<code>volatile</code>, and <code>restrict</code> qualifiers, as in
|
||||
|
@ -316,19 +316,20 @@ class.</p>
|
|||
<p>The declarations block contains the serialized representation of
|
||||
all of the declarations referenced in the translation unit. Each Clang
|
||||
declaration node (<code>VarDecl</code>, <code>FunctionDecl</code>,
|
||||
etc.) has a corresponding record type in the PCH file. When
|
||||
declarations are deserialized from the precompiled header, the data
|
||||
etc.) has a corresponding record type in the AST file. When
|
||||
declarations are deserialized from the AST file, the data
|
||||
within the record is used to build and populate a new instance of the
|
||||
corresponding <code>Decl</code> node. As with types, each declaration
|
||||
node has a numeric ID that is used to refer to that declaration within
|
||||
the PCH file. In addition, a lookup table provides a mapping from that
|
||||
the AST file. In addition, a lookup table provides a mapping from that
|
||||
numeric ID to the offset within the precompiled header where that
|
||||
declaration is described.</p>
|
||||
|
||||
<p>Declarations in Clang's abstract syntax trees are stored
|
||||
hierarchically. At the top of the hierarchy is the translation unit
|
||||
(<code>TranslationUnitDecl</code>), which contains all of the
|
||||
declarations in the translation unit. These declarations (such as
|
||||
declarations in the translation unit but is not actually written as a
|
||||
specific declaration node. Its child declarations (such as
|
||||
functions or struct types) may also contain other declarations inside
|
||||
them, and so on. Within Clang, each declaration is stored within a <a
|
||||
href="http://clang.llvm.org/docs/InternalsManual.html#DeclContext">declaration
|
||||
|
@ -339,7 +340,7 @@ in a structure) and iterate over the declarations stored within a
|
|||
context (e.g., iterate over all of the fields of a structure for
|
||||
structure layout).</p>
|
||||
|
||||
<p>In Clang's precompiled header format, deserializing a declaration
|
||||
<p>In Clang's AST file format, deserializing a declaration
|
||||
that is a <code>DeclContext</code> is a separate operation from
|
||||
deserializing all of the declarations stored within that declaration
|
||||
context. Therefore, Clang will deserialize the translation unit
|
||||
|
@ -354,14 +355,11 @@ the name-lookup and iteration behavior described above:</p>
|
|||
<code>x</code> within a given declaration context (for example,
|
||||
during semantic analysis of the expression <code>p->x</code>,
|
||||
where <code>p</code>'s type is defined in the precompiled header),
|
||||
Clang deserializes a hash table mapping from the names within that
|
||||
declaration context to the declaration IDs that represent each
|
||||
visible declaration with that name. The entire hash table is
|
||||
deserialized at this point (into the <code>llvm::DenseMap</code>
|
||||
stored within each <code>DeclContext</code> object), but the actual
|
||||
declarations are not yet deserialized. In a second step, those
|
||||
declarations with the name <code>x</code> will be deserialized and
|
||||
will be used as the result of name lookup.</li>
|
||||
Clang refers to an on-disk hash table that maps from the names
|
||||
within that declaration context to the declaration IDs that
|
||||
represent each visible declaration with that name. The actual
|
||||
declarations will then be deserialized to provide the results of
|
||||
name lookup.</li>
|
||||
|
||||
<li>When the front end performs iteration over all of the
|
||||
declarations within a declaration context, all of those declarations
|
||||
|
@ -376,7 +374,7 @@ the name-lookup and iteration behavior described above:</p>
|
|||
|
||||
<h3 id="stmt">Statements and Expressions</h3>
|
||||
|
||||
<p>Statements and expressions are stored in the precompiled header in
|
||||
<p>Statements and expressions are stored in the AST file in
|
||||
both the <a href="#types">types</a> and the <a
|
||||
href="#decls">declarations</a> blocks, because every statement or
|
||||
expression will be associated with either a type or declaration. The
|
||||
|
@ -389,10 +387,10 @@ function.</p>
|
|||
<p>As with types and declarations, each statement and expression kind
|
||||
in Clang's abstract syntax tree (<code>ForStmt</code>,
|
||||
<code>CallExpr</code>, etc.) has a corresponding record type in the
|
||||
precompiled header, which contains the serialized representation of
|
||||
AST file, which contains the serialized representation of
|
||||
that statement or expression. Each substatement or subexpression
|
||||
within an expression is stored as a separate record (which keeps most
|
||||
records to a fixed size). Within the precompiled header, the
|
||||
records to a fixed size). Within the AST file, the
|
||||
subexpressions of an expression are stored, in reverse order, prior to the expression
|
||||
that owns those expression, using a form of <a
|
||||
href="http://en.wikipedia.org/wiki/Reverse_Polish_notation">Reverse
|
||||
|
@ -420,7 +418,7 @@ they are part of a different expression.</p>
|
|||
<h3 id="idtable">Identifier Table Block</h3>
|
||||
|
||||
<p>The identifier table block contains an on-disk hash table that maps
|
||||
each identifier mentioned within the precompiled header to the
|
||||
each identifier mentioned within the AST file to the
|
||||
serialized representation of the identifier's information (e.g, the
|
||||
<code>IdentifierInfo</code> structure). The serialized representation
|
||||
contains:</p>
|
||||
|
@ -438,17 +436,20 @@ contains:</p>
|
|||
declarations.</li>
|
||||
</ul>
|
||||
|
||||
<p>When a precompiled header is loaded, the precompiled header
|
||||
<p>When an AST file is loaded, the AST file reader
|
||||
mechanism introduces itself into the identifier table as an external
|
||||
lookup source. Thus, when the user program refers to an identifier
|
||||
that has not yet been seen, Clang will perform a lookup into the
|
||||
identifier table. If an identifier is found, its contents (macro
|
||||
definitions, flags, top-level declarations, etc.) will be deserialized, at which point the corresponding <code>IdentifierInfo</code> structure will have the same contents it would have after parsing the headers in the precompiled header.</p>
|
||||
definitions, flags, top-level declarations, etc.) will be
|
||||
deserialized, at which point the corresponding
|
||||
<code>IdentifierInfo</code> structure will have the same contents it
|
||||
would have after parsing the headers in the AST file.</p>
|
||||
|
||||
<p>Within the PCH file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
|
||||
<p>Within the AST file, the identifiers used to name declarations are represented with an integral value. A separate table provides a mapping from this integral value (the identifier ID) to the location within the on-disk
|
||||
hash table where that identifier is stored. This mapping is used when
|
||||
deserializing the name of a declaration, the identifier of a token, or
|
||||
any other construct in the PCH file that refers to a name.</p>
|
||||
any other construct in the AST file that refers to a name.</p>
|
||||
|
||||
<h3 id="method-pool">Method Pool Block</h3>
|
||||
|
||||
|
@ -457,7 +458,7 @@ serves two purposes: it provides a mapping from the names of
|
|||
Objective-C selectors to the set of Objective-C instance and class
|
||||
methods that have that particular selector (which is required for
|
||||
semantic analysis in Objective-C) and also stores all of the selectors
|
||||
used by entities within the precompiled header. The design of the
|
||||
used by entities within the AST file. The design of the
|
||||
method pool is similar to that of the <a href="#idtable">identifier
|
||||
table</a>: the first time a particular selector is formed during the
|
||||
compilation of the program, Clang will search in the on-disk hash
|
||||
|
@ -468,25 +469,25 @@ structure (<code>Sema::InstanceMethodPool</code> and
|
|||
respectively).</p>
|
||||
|
||||
<p>As with identifiers, selectors are represented by numeric values
|
||||
within the PCH file. A separate index maps these numeric selector
|
||||
within the AST file. A separate index maps these numeric selector
|
||||
values to the offset of the selector within the on-disk hash table,
|
||||
and will be used when de-serializing an Objective-C method declaration
|
||||
(or other Objective-C construct) that refers to the selector.</p>
|
||||
|
||||
<h2 id="tendrils">Precompiled Header Integration Points</h2>
|
||||
<h2 id="tendrils">AST Reader Integration Points</h2>
|
||||
|
||||
<p>The "lazy" deserialization behavior of precompiled headers requires
|
||||
<p>The "lazy" deserialization behavior of AST files requires
|
||||
their integration into several completely different submodules of
|
||||
Clang. For example, lazily deserializing the declarations during name
|
||||
lookup requires that the name-lookup routines be able to query the
|
||||
precompiled header to find entities within the PCH file.</p>
|
||||
AST file to find entities stored there.</p>
|
||||
|
||||
<p>For each Clang data structure that requires direct interaction with
|
||||
the precompiled header logic, there is an abstract class that provides
|
||||
the interface between the two modules. The <code>PCHReader</code>
|
||||
class, which handles the loading of a precompiled header, inherits
|
||||
the AST reader logic, there is an abstract class that provides
|
||||
the interface between the two modules. The <code>ASTReader</code>
|
||||
class, which handles the loading of an AST file, inherits
|
||||
from all of these abstract classes to provide lazy deserialization of
|
||||
Clang's data structures. <code>PCHReader</code> implements the
|
||||
Clang's data structures. <code>ASTReader</code> implements the
|
||||
following abstract classes:</p>
|
||||
|
||||
<dl>
|
||||
|
@ -505,7 +506,7 @@ following abstract classes:</p>
|
|||
<dd>This abstract interface is associated with the
|
||||
<code>IdentifierTable</code> class, and is used whenever the
|
||||
program source refers to an identifier that has not yet been seen.
|
||||
In this case, the precompiled header implementation searches for
|
||||
In this case, the AST reader searches for
|
||||
this identifier within its <a href="#idtable">identifier table</a>
|
||||
to load any top-level declarations or macros associated with that
|
||||
identifier.</dd>
|
||||
|
@ -513,7 +514,7 @@ following abstract classes:</p>
|
|||
<dt><code>ExternalASTSource</code></dt>
|
||||
<dd>This abstract interface is associated with the
|
||||
<code>ASTContext</code> class, and is used whenever the abstract
|
||||
syntax tree nodes need to loaded from the precompiled header. It
|
||||
syntax tree nodes need to loaded from the AST file. It
|
||||
provides the ability to de-serialize declarations and types
|
||||
identified by their numeric values, read the bodies of functions
|
||||
when required, and read the declarations stored within a
|
||||
|
@ -526,6 +527,131 @@ following abstract classes:</p>
|
|||
pool</a>.</dd>
|
||||
</dl>
|
||||
|
||||
<h2 id="chained">Chained precompiled headers</h2>
|
||||
|
||||
<p>Chained precompiled headers were initially intended to improve the
|
||||
performance of IDE-centric operations such as syntax highlighting and
|
||||
code completion while a particular source file is being edited by the
|
||||
user. To minimize the amount of reparsing required after a change to
|
||||
the file, a form of precompiled header--called a precompiled
|
||||
<i>preamble</i>--is automatically generated by parsing all of the
|
||||
headers in the source file, up to and including the last
|
||||
#include. When only the source file changes (and none of the headers
|
||||
it depends on), reparsing of that source file can use the precompiled
|
||||
preamble and start parsing after the #includes, so parsing time is
|
||||
proportional to the size of the source file (rather than all of its
|
||||
includes). However, the compilation of that translation unit
|
||||
may already uses a precompiled header: in this case, Clang will create
|
||||
the precompiled preamble as a chained precompiled header that refers
|
||||
to the original precompiled header. This drastically reduces the time
|
||||
needed to serialize the precompiled preamble for use in reparsing.</p>
|
||||
|
||||
<p>Chained precompiled headers get their name because each precompiled header
|
||||
can depend on one other precompiled header, forming a chain of
|
||||
dependencies. A translation unit will then include the precompiled
|
||||
header that starts the chain (i.e., nothing depends on it). This
|
||||
linearity of dependencies is important for the semantic model of
|
||||
chained precompiled headers, because the most-recent precompiled
|
||||
header can provide information that overrides the information provided
|
||||
by the precompiled headers it depends on, just like a header file
|
||||
<code>B.h</code> that includes another header <code>A.h</code> can
|
||||
modify the state produced by parsing <code>A.h</code>, e.g., by
|
||||
<code>#undef</code>'ing a macro defined in <code>A.h</code>.</p>
|
||||
|
||||
<p>There are several ways in which chained precompiled headers
|
||||
generalize the AST file model:</p>
|
||||
|
||||
<dl>
|
||||
<dt>Numbering of IDs</dt>
|
||||
<dd>Many different kinds of entities--identifiers, declarations,
|
||||
types, etc.---have ID numbers that start at 1 or some other
|
||||
predefined constant and grow upward. Each precompiled header records
|
||||
the maximum ID number it has assigned in each category. Then, when a
|
||||
new precompiled header is generated that depends on (chains to)
|
||||
another precompiled header, it will start counting at the next
|
||||
available ID number. This way, one can determine, given an ID
|
||||
number, which AST file actually contains the entity.</dd>
|
||||
|
||||
<dt>Name lookup</dt>
|
||||
<dd>When writing a chained precompiled header, Clang attempts to
|
||||
write only information that has changed from the precompiled header
|
||||
on which it is based. This changes the lookup algorithm for the
|
||||
various tables, such as the <a href="#idtable">identifier table</a>:
|
||||
the search starts at the most-recent precompiled header. If no entry
|
||||
is found, lookup then proceeds to the identifier table in the
|
||||
precompiled header it depends on, and so one. Once a lookup
|
||||
succeeds, that result is considered definitive, overriding any
|
||||
results from earlier precompiled headers.</dd>
|
||||
|
||||
<dt>Update records</dt>
|
||||
<dd>There are various ways in which a later precompiled header can
|
||||
modify the entities described in an earlier precompiled header. For
|
||||
example, later precompiled headers can add entries into the various
|
||||
name-lookup tables for the translation unit or namespaces, or add
|
||||
new categories to an Objective-C class. Each of these updates is
|
||||
captured in an "update record" that is stored in the chained
|
||||
precompiled header file and will be loaded along with the original
|
||||
entity.</dd>
|
||||
</dl>
|
||||
|
||||
<h2 id="modules">Modules</h2>
|
||||
|
||||
<p>Modules generalize the chained precompiled header model yet
|
||||
further, from a linear chain of precompiled headers to an arbitrary
|
||||
directed acyclic graph (DAG) of AST files. All of the same techniques
|
||||
used to make chained precompiled headers work---ID number, name
|
||||
lookup, update records---are shared with modules. However, the DAG
|
||||
nature of modules introduce a number of additional complications to
|
||||
the model:
|
||||
|
||||
<dl>
|
||||
<dt>Numbering of IDs</dt>
|
||||
<dd>The simple, linear numbering scheme used in chained precompiled
|
||||
headers falls apart with the module DAG, because different modules
|
||||
may end up with different numbering schemes for entities they
|
||||
imported from common shared modules. To account for this, each
|
||||
module file provides information about which modules it depends on
|
||||
and which ID numbers it assigned to the entities in those modules,
|
||||
as well as which ID numbers it took for its own new entities. The
|
||||
AST reader then maps these "local" ID numbers into a "global" ID
|
||||
number space for the current translation unit, providing a 1-1
|
||||
mapping between entities (in whatever AST file they inhabit) and
|
||||
global ID numbers. If that translation unit is then serialized into
|
||||
an AST file, this mapping will be stored for use when the AST file
|
||||
is imported.</dd>
|
||||
|
||||
<dt>Declaration merging</dt>
|
||||
<dd>It is possible for a given entity (from the language's
|
||||
perspective) to be declared multiple times in different places. For
|
||||
example, two different headers can have the declaration of
|
||||
<tt>printf</tt> or could forward-declare <tt>struct stat</tt>. If
|
||||
each of those headers is included in a module, and some third party
|
||||
imports both of those modules, there is a potentially serious
|
||||
problem: name lookup for <tt>printf</tt> or <tt>struct stat</tt> will
|
||||
find both declarations, but the AST nodes are unrelated. This would
|
||||
result in a compilation error, due to an ambiguity in name
|
||||
lookup. Therefore, the AST reader performs declaration merging
|
||||
according to the appropriate langauge semantics, ensuring that the
|
||||
two disjoint declarations are merged into a single redeclaration
|
||||
chain (with a common canonical declaration), so that it is as if one
|
||||
of the headers had been included before the other.</dd>
|
||||
|
||||
<dt>Name Visibility</dt>
|
||||
<dd>Modules allow certain names that occur during module creation to
|
||||
be "hidden", so that they are not part of the public interface of
|
||||
the module and are not visible to its clients. The AST reader
|
||||
maintains a "visible" bit on various AST nodes (declarations, macros,
|
||||
etc.) to indicate whether that particular AST node is currently
|
||||
visible; the various name lookup mechanisms in Clang inspect the
|
||||
visible bit to determine whether that entity, which is still in the
|
||||
AST (because other, visible AST nodes may depend on it), can
|
||||
actually be found by name lookup. When a new (sub)module is
|
||||
imported, it may make existing, non-visible, already-deserialized
|
||||
AST nodes visible; it is the responsibility of the AST reader to
|
||||
find and update these AST nodes when it is notified of the import.</dd>
|
||||
|
||||
</dl>
|
||||
|
||||
</div>
|
||||
|
||||
</body>
|
||||
|
|
Загрузка…
Ссылка в новой задаче