From 797a2479dac447138a6276eb5e9fb6ad4f61fd92 Mon Sep 17 00:00:00 2001 From: Ted Kremenek Date: Wed, 8 Apr 2009 05:07:30 +0000 Subject: [PATCH] Initial draft of PTH internals. git-svn-id: https://llvm.org/svn/llvm-project/cfe/trunk@68594 91177308-0d34-0410-b5e6-96231b3b80d8 --- docs/PTHInternals.html | 220 +++++++++++++++++++++++++++++++++++++++++ 1 file changed, 220 insertions(+) create mode 100644 docs/PTHInternals.html diff --git a/docs/PTHInternals.html b/docs/PTHInternals.html new file mode 100644 index 0000000000..7714fb91b2 --- /dev/null +++ b/docs/PTHInternals.html @@ -0,0 +1,220 @@ + + + Pretokenized Headers (PTH) + + + + + + + + +
+

Pretokenized Headers

+ +

Precompiled +headers is a general approach employed by many compilers to reduce +compilation time. The underlying motivation of the approach is that within a +codebase frequently the same (and often large) header files are included by +multiple source files. Consequently, compile times can often be greatly improved +by caching some of the (redundant) work done by a compiler to process headers. +Precompiled header files, which represent one of possibly many ways to implement +this optimization, are literally files that represent an on-disk cache that +contains the vital information necessary to reduce some (or all) of the work +needed to process a corresponding header file. While details of precompiled +headers vary between compilers, precompiled headers have been shown to be a +highly effective at speeding up program compilation on systems with very large +system headers (e.g., Mac OS X).

+ +

Clang supports an implementation of precompiled headers known as +pre-tokenized headers (PTH). Clang's pre-tokenized headers support most +of same interfaces as GCC's pre-compiled headers (as well as others) but are +completely different in their implementation. This pages first describes the +interface for using PTH and then briefly elaborates on their design and +implementation.

+ + +

Using Pretokenized Headers (High-level Interface)

+ +

The high-level interface to generate a PTH file is the same as GCC's:

+ +
+  $ gcc -x c-header test.h -o test.h.gch
+  $ clang -x c-header test.h -o test.h.pth
+
+ +

A PTH file can then be used as a prefix header when a -include +option is passed to clang:

+ +
+  $ clang -include test.h test.c -o test
+
+ +

The clang driver will first check if a PTH file for test.h +is available; if so, the contents of test.h (and the files it includes) +will be processed from the PTH file. Otherwise, clang falls back to +directly processing the content of test.h. This mirrors the behavior of +GCC.

+ +

NOTE: clang does not automatically used PTH files +for headers that are directly included within a source file. For example:

+ +
+  $ clang -x c-header test.h -o test.h.pth
+  $ cat test.c
+  #include "test.h"
+  $ clang test.c -o test
+
+ +

In this example, clang will not automatically use the PTH file for +test.h since test.h was included directly in the source file +and not specified on the command line using -include.

+ +

Using Pretokenized Headers (Low-level Interface)

+ +

The low-level Clang driver, clang-cc, supports three command line +options for generating and using PTH files.

+ +

To generate PTH files using clang-cc, use the option -emit-pth: + +

+  $ clang-cc test.h -emit-pth -o test.h.pth
+
+ +

This option is transparently used by clang when generating PTH +files. Similarly, PTH files can be used as prefix headers using the -include-pth option:

+ +
+  $ clang-cc -include-pth test.h.pth test.c -o test.s
+
+ +

Alternatively, Clang's PTH files can be used as a raw "token-cache" +(or "content" cache) of the source included by the original header +file. This means that the contents of the PTH file are searched as substitutes +for any source files that are used by clang-cc to process a +source file. This is done by specifying the -token-cache option:

+ +
+  $ cat test.h
+  #include
+  $ clang-cc -emit-pth test.h -o test.h.pth
+  $ cat test.c
+  #include "test.h"
+  $ clang-cc test.c -o test -token-cache test.h.pth
+
+ +

In this example the contents of stdio.h (and the files it includes) +will be retrieved from test.h.pth, as the PTH file is being used in +this case as a raw cache of the contents of test.h. This is a low-level +interface used to both implement the high-level PTH interface as well as to +provide alternative means to use PTH-style caching.

+ +

PTH Design and Implementation

+ +

Unlike GCC's precompiled headers, which cache the full ASTs and preprocessor +state of a header file, Clang's pretokenized header files mainly cache the raw +lexer tokens that are needed to segment the stream of characters in a +source file into keywords, identifiers, and operators. Consequently, PTH serves +to mainly directly speed up the lexing and preprocessing of a source file, while +parsing and type-checking must be completely redone every time a PTH file is +used.

+ +

Basic Design Tradeoffs

+ +

In the long term there are plans to provide an alternate PCH implementation +for Clang that also caches the work for parsing and type checking the contents +of header files. The current implementation of PCH in Clang as pretokenized +header files was motivated by the following factors:

+ +

+ +

Further, compared to GCC's PCH implementation (which is the dominate +precompiled header file implementation that Clang can be directly compared +against) the PTH design in Clang yields several attractive features:

+ + + +

Despite these strengths, PTH's simple design suffers some algorithmic +handicaps compared to other PCH strategies such as those used by GCC. While PTH +can greatly speed up the processing time of a header file, the amount of work +required to process a header file is still roughly linear in the size of the +header file. In contrast, the amount of work done by GCC to process a +precompiled header is (theoretically) constant (the ASTs for the header are +literally memory mapped into the compiler). This means that only the pieces of +the header file that are referenced by the source file including the header are +the only ones the compiler needs to process during actual compilation. While +GCC's particular implementation of PCH mitigates some of these algorithmic +strengths via the use of copy-on-write pages, the approach itself can +fundamentally dominate at an algorithmic level, especially when one considers +header files of arbitrary size.

+ +

Consequently, as alluded earlier, there are plans to potentially implement an +alternative PCH implementation for Clang based on the lazy deserialization of +ASTs. This approach would theoretically have the same constant-time algorithmic +advantages just mentioned but would also retain some of the strengths of PTH +such as reduced memory pressure (ideal for multi-core builds).

+ +

Internal PTH Optimizations

+ +

While the main optimization employed by PTH is to reduce lexing time of +header files by caching pre-lexed tokens, PTH also employs several other +optimizations to speed up the processing of header files:

+ + + +
+ +