зеркало из https://github.com/microsoft/clang-1.git
287 строки
12 KiB
Plaintext
287 строки
12 KiB
Plaintext
//===---------------------------------------------------------------------===//
|
|
// Random Notes
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
C90/C99/C++ Comparisons:
|
|
http://david.tribble.com/text/cdiffs.htm
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
Extensions:
|
|
|
|
* "#define_target X Y"
|
|
This preprocessor directive works exactly the same was as #define, but it
|
|
notes that 'X' is a target-specific preprocessor directive. When used, a
|
|
diagnostic is emitted indicating that the translation unit is non-portable.
|
|
|
|
If a target-define is #undef'd before use, no diagnostic is emitted. If 'X'
|
|
were previously a normal #define macro, the macro is tainted. If 'X' is
|
|
subsequently #defined as a non-target-specific define, the taint bit is
|
|
cleared.
|
|
|
|
* "#define_other_target X"
|
|
The preprocessor directive takes a single identifier argument. It notes
|
|
that this identifier is a target-specific #define for some target other than
|
|
the current one. Use of this identifier will result in a diagnostic.
|
|
|
|
If 'X' is later #undef'd or #define'd, the taint bit is cleared. If 'X' is
|
|
already defined, X is marked as a target-specific define.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
When we go to reimplement <tgmath.h>, we should do it more intelligently than
|
|
the GCC-supplied header. EDG has an interesting __generic builtin that provides
|
|
overloading for C:
|
|
http://www.edg.com/docs/edg_cpp.pdf
|
|
|
|
For example, they have:
|
|
#define sin(x) __generic(x,,, sin, sinf, sinl, csin, csinf,csinl)(x)
|
|
|
|
It's unclear to me why you couldn't just have a builtin like:
|
|
__builtin_overload(1, arg1, impl1, impl2, impl3)
|
|
__builtin_overload(2, arg1, arg2, impl1, impl2, impl3)
|
|
__builtin_overload(3, arg1, arg2, arg3, impl1, impl2, impl3)
|
|
|
|
Where the compiler would just pick the right "impl" based on the arguments
|
|
provided. One nasty detail is that some arithmetic promotions most be done for
|
|
use by the tgmath.h stuff, but it would be nice to be able to handle vectors
|
|
etc as well without huge globs of macros. With the above scheme, you could
|
|
use:
|
|
|
|
#define sin(x) __builtin_overload(1, x, sin, sinf, sinl, csin, csinf,csinl)(x)
|
|
|
|
and not need to keep track of which argument to "__generic" corresponds to which
|
|
type, etc.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
To time GCC preprocessing speed without output, use:
|
|
"time gcc -MM file"
|
|
This is similar to -Eonly.
|
|
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
C++ Template Instantiation benchmark:
|
|
http://users.rcn.com/abrahams/instantiation_speed/index.html
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
TODO: File Manager Speedup:
|
|
|
|
We currently do a lot of stat'ing for files that don't exist, particularly
|
|
when lots of -I paths exist (e.g. see the <iostream> example, check for
|
|
failures in stat in FileManager::getFile). It would be far better to make
|
|
the following changes:
|
|
1. FileEntry contains a sys::Path instead of a std::string for Name.
|
|
2. sys::Path contains timestamp and size, lazily computed. Eliminate from
|
|
FileEntry.
|
|
3. File UIDs are created on request, not when files are opened.
|
|
These changes make it possible to efficiently have FileEntry objects for
|
|
files that exist on the file system, but have not been used yet.
|
|
|
|
Once this is done:
|
|
1. DirectoryEntry gets a boolean value "has read entries". When false, not
|
|
all entries in the directory are in the file mgr, when true, they are.
|
|
2. Instead of stat'ing the file in FileManager::getFile, check to see if
|
|
the dir has been read. If so, fail immediately, if not, read the dir,
|
|
then retry.
|
|
3. Reading the dir uses the getdirentries syscall, creating an FileEntry
|
|
for all files found.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
TODO: Fast #Import:
|
|
|
|
* Get frameworks that don't use #import to do so, e.g.
|
|
DirectoryService, AudioToolbox, CoreFoundation, etc. Why not using #import?
|
|
Because they work in C mode? C has #import.
|
|
* Have the lexer return a token for #import instead of handling it itself.
|
|
- Create a new preprocessor object with no external state (no -D/U options
|
|
from the command line, etc). Alternatively, keep track of exactly which
|
|
external state is used by a #import: declare it somehow.
|
|
* When having reading a #import file, keep track of whether we have (and/or
|
|
which) seen any "configuration" macros. Various cases:
|
|
- Uses of target args (__POWERPC__, __i386): Header has to be parsed
|
|
multiple times, per-target. What about #ifndef checks? How do we know?
|
|
- "Configuration" preprocessor macros not defined: POWERPC, etc. What about
|
|
things like __STDC__ etc? What is and what isn't allowed.
|
|
* Special handling for "umbrella" headers, which just contain #import stmts:
|
|
- Cocoa.h/AppKit.h - Contain pointers to digests instead of entire digests
|
|
themselves? Foundation.h isn't pure umbrella!
|
|
* Frameworks digests:
|
|
- Can put "digest" of a framework-worth of headers into the framework
|
|
itself. To open AppKit, just mmap
|
|
/System/Library/Frameworks/AppKit.framework/"digest", which provides a
|
|
symbol table in a well defined format. Lazily unstream stuff that is
|
|
needed. Contains declarations, macros, and debug information.
|
|
- System frameworks ship with digests. How do we handle configuration
|
|
information? How do we handle stuff like:
|
|
#if MAC_OS_X_VERSION_MAX_ALLOWED >= MAC_OS_X_VERSION_10_2
|
|
which guards a bunch of decls? Should there be a couple of default
|
|
configs, then have the UI fall back to building/caching its own?
|
|
- GUI automatically builds digests when UI is idle, both of system
|
|
frameworks if they aren't not available in the right config, and of app
|
|
frameworks.
|
|
- GUI builds dependence graph of frameworks/digests based on #imports. If a
|
|
digest is out date, dependent digests are automatically invalidated.
|
|
|
|
* New constraints on #import for objc-v3:
|
|
- #imported file must not define non-inline function bodies.
|
|
- Alternatively, they can, and these bodies get compiled/linked *once*
|
|
per app into a dylib. What about building user dylibs?
|
|
- Restrictions on ObjC grammar: can't #import the body of a for stmt or fn.
|
|
- Compiler must detect and reject these cases.
|
|
- #defines defined within a #import have two behaviors:
|
|
- By default, they escape the header. These macros *cannot* be #undef'd
|
|
by other code: this is enforced by the front-end.
|
|
- Optionally, user can specify what macros escape (whitelist) or can use
|
|
#undef.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
TODO: New language feature: Configuration queries:
|
|
- Instead of #ifdef __POWERPC__, use "if (strcmp(`cpu`, __POWERPC__))", or
|
|
some other, better, syntax.
|
|
- Use it to increase the number of "architecture-clean" #import'd files,
|
|
allowing a single index to be used for all fat slices.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
// Specifying targets: -triple and -arch
|
|
===---------------------------------------------------------------------===//
|
|
|
|
The clang supports "-triple" and "-arch" options. At most one -triple option may
|
|
be specified, while multiple -arch options can be specified. Both are optional.
|
|
|
|
The "selection of target" behavior is defined as follows:
|
|
|
|
(1) If the user does not specify -triple:
|
|
|
|
(a) If no -arch options are specified, the target triple used is the host
|
|
triple (in llvm/Config/config.h).
|
|
|
|
(b) If one or more -arch's are specified (and no -triple), then there is
|
|
one triple for each -arch, where the specified arch is substituted
|
|
for the arch in the host triple. Example:
|
|
|
|
host triple = i686-apple-darwin9
|
|
command: clang -arch ppc -arch ppc64 ...
|
|
triples used: ppc-apple-darwin9 ppc64-apple-darwin9
|
|
|
|
(2) The user does specify a -triple (only one allowed):
|
|
|
|
(a) If no -arch options are specified, the triple specified by -triple
|
|
is used. E.g clang -triple i686-apple-darwin9
|
|
|
|
(b) If one or more -arch options are specified, then the triple specified
|
|
by -triple is used as the primary target, and the arch's specified
|
|
by -arch are used to create secondary targets. For example:
|
|
|
|
clang -triple i686-apple-darwin9 -arch ppc -arch ppc64
|
|
|
|
has the following targets:
|
|
|
|
i686-apple-darwin9 (primary target)
|
|
ppc-apple-darwin9 (secondary target)
|
|
ppc64-apple-darwin9 (secondary target)
|
|
|
|
The secondary targets are used in the 'portability' model (see below).
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|
|
The 'portability' model in clang is sufficient to catch translation units (or
|
|
their parts) that are not portable, but it doesn't help if the system headers
|
|
are non-portable and not fixed. An alternative model that would be easy to use
|
|
is a 'tainting' scheme. Consider:
|
|
|
|
int32_t
|
|
OSHostByteOrder(void) {
|
|
#if defined(__LITTLE_ENDIAN__)
|
|
return OSLittleEndian;
|
|
#elif defined(__BIG_ENDIAN__)
|
|
return OSBigEndian;
|
|
#else
|
|
return OSUnknownByteOrder;
|
|
#endif
|
|
}
|
|
|
|
It would be trivial to mark 'OSHostByteOrder' as being non-portable (tainted)
|
|
instead of marking the entire translation unit. Then, if OSHostByteOrder is
|
|
never called/used by the current translation unit, the t-u wouldn't be marked
|
|
non-portable. However, there is no good way to handle stuff like:
|
|
|
|
extern int X, Y;
|
|
|
|
#ifndef __POWERPC__
|
|
#define X Y
|
|
#endif
|
|
|
|
int bar() { return X; }
|
|
|
|
When compiling for powerpc, the #define is skipped, so it doesn't know that bar
|
|
uses a #define that is set on some other target. In practice, limited cases
|
|
could be handled by scanning the skipped region of a #if, but the fully general
|
|
case cannot be implemented efficiently. In this case, for example, the #define
|
|
in the protected region could be turned into either a #define_target or
|
|
#define_other_target as appropriate. The harder case is code like this (from
|
|
OSByteOrder.h):
|
|
|
|
#if (defined(__ppc__) || defined(__ppc64__))
|
|
#include <libkern/ppc/OSByteOrder.h>
|
|
#elif (defined(__i386__) || defined(__x86_64__))
|
|
#include <libkern/i386/OSByteOrder.h>
|
|
#else
|
|
#include <libkern/machine/OSByteOrder.h>
|
|
#endif
|
|
|
|
The realistic way to fix this is by having an initial #ifdef __llvm__ that
|
|
defines its contents in terms of the llvm bswap intrinsics. Other things should
|
|
be handled on a case-by-case basis.
|
|
|
|
|
|
We probably have to do something smarter like this in the future. The C++ header
|
|
<limits> contains a lot of code like this:
|
|
|
|
static const int digits10 = __LDBL_DIG__;
|
|
static const int min_exponent = __LDBL_MIN_EXP__;
|
|
static const int min_exponent10 = __LDBL_MIN_10_EXP__;
|
|
static const float_denorm_style has_denorm
|
|
= bool(__LDBL_DENORM_MIN__) ? denorm_present : denorm_absent;
|
|
|
|
... since this isn't being used in an #ifdef, it should be easy enough to taint
|
|
the decl for these ivars.
|
|
|
|
|
|
/usr/include/sys/cdefs.h contains stuff like this:
|
|
|
|
#if defined(__ppc__)
|
|
# if defined(__LDBL_MANT_DIG__) && defined(__DBL_MANT_DIG__) && \
|
|
__LDBL_MANT_DIG__ > __DBL_MANT_DIG__
|
|
# if __ENVIRONMENT_MAC_OS_X_VERSION_MIN_REQUIRED__-0 < 1040
|
|
# define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBLStub")
|
|
# else
|
|
# define __DARWIN_LDBL_COMPAT(x) __asm("_" __STRING(x) "$LDBL128")
|
|
# endif
|
|
# define __DARWIN_LDBL_COMPAT2(x) __asm("_" __STRING(x) "$LDBL128")
|
|
# define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0
|
|
# else
|
|
# define __DARWIN_LDBL_COMPAT(x) /* nothing */
|
|
# define __DARWIN_LDBL_COMPAT2(x) /* nothing */
|
|
# define __DARWIN_LONG_DOUBLE_IS_DOUBLE 1
|
|
# endif
|
|
#elif defined(__i386__) || defined(__ppc64__) || defined(__x86_64__)
|
|
# define __DARWIN_LDBL_COMPAT(x) /* nothing */
|
|
# define __DARWIN_LDBL_COMPAT2(x) /* nothing */
|
|
# define __DARWIN_LONG_DOUBLE_IS_DOUBLE 0
|
|
#else
|
|
# error Unknown architecture
|
|
#endif
|
|
|
|
An ideal way to solve this issue is to mark __DARWIN_LDBL_COMPAT /
|
|
__DARWIN_LDBL_COMPAT2 / __DARWIN_LONG_DOUBLE_IS_DOUBLE as being non-portable
|
|
because they depend on non-portable macros. In practice though, this may end
|
|
up being a serious problem: every use of printf will mark the translation unit
|
|
non-portable if targetting ppc32 and something else.
|
|
|
|
//===---------------------------------------------------------------------===//
|
|
|