Merge pull request #354 from erozenfeld/ReaderDoc

Document describing llilc reader.
This commit is contained in:
Eugene Rozenfeld 2015-03-25 14:47:51 -07:00
Родитель 64641a5412 fd8d3f97cd
Коммит 51adef89d8
1 изменённых файлов: 292 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,292 @@
LLILC Reader
============
Introduction
------------
LLILC reader is part of LLILC JIT and is responsible for translating
MSIL instructions into LLVM IR. The semantics of MSIL instructions is
described in [ECMA-335](http://www.ecma-international.org/publications/files/ECMA-ST/ECMA-335.pdf).
The reader operates in two passes: the first pass builds LLVM basic
blocks and the second pass translates the instructions. The resulting
LLVM IR is passed back to the JIT driver for further LLVM processing.
The reader is provided a pointer to an instance of ICorJitInfo
interface. Its a rich interface implemented by CoreCLR Execution
Engine. The reader calls methods of that interface to resolve MSIL
tokens, get information about types, fields, code locations and much
more.
Main Classes
------------
The two main classes comprising the reader are ReaderBase and GenIR.
GenIR derives from ReaderBase. ReaderBase encapsulates MSIL processing
and is IR-agnostic. It operates on opaque classes such as IRNode,
FlowGraphNode, FlowGraphEdgeList, EHRegion. GenIR implements
LLVM-specific functionality required by ReaderBase. This separation
allows us to decouple msil processing from IR creation and makes the
code more maintainable and easier to evolve. A legacy jit was
implemented using the same ReaderBase and a different implementation of
GenIR.
Reader Driver
-------------
The main driver for the reader is ReaderBase::msilToIR. The driver has
the steps below:
- Execute [Pre-pass](#pre-pass) to allow the client to do initialization
- Get special debugger sequence points if necessary (currently not yet
implemented). This is not required for correct code execution.
- Create EH region tree and set EH info (more details in [Exception
Handling in the LLILC JIT](https://github.com/dotnet/llilc/blob/master/Documentation/llilc-jit-eh.md))
- Store the generics context on the stack and report its location to
the EE if necessary
- Build flow graph in [First pass](#first-pass)
- Execute [Second pass](#second-pass) that translates MSIL instructions
- Remove unused flow graph nodes
- Execute [Post-pass](#post-pass) to allow the client a chance to finish up
Pre-pass
--------
An instance of GenIR translates a single function. GenIR::readerPrePass
is responsible for initial setup. The steps it performs:
- Initialize target pointer size
- Create an llvm::Function object
- Create an entry block
- Generate alloca instructions for locals and parameters. We do that
to simplify IR creation. LLVM requires all register values to be in
SSA form but does not require memory objects to be in SSA form. We
rely on LLVMs mem2reg to promote these alloca instructions to
registers and construct SSA. This technique is described in chapter
7.3 of [this tutorial](http://llvm.org/docs/tutorial/LangImpl7.html).
- Check whether the function has features that the reader hasnt
implemented yet
First Pass
----------
First pass is responsible for building LLVM basic blocks. The only instructions
that are inserted are terminating branches and switches. Blocks that end with
returns are empty after the first pass. That effectively creates the flow graph
of the function. fgBuildBasicBlocksFromBytes is the main driver. The two main
steps are described below.
### Initial Basic Block Build-up
The first step is reading byte codes and creating basic blocks based on
branches, switches, and returns. The targets of branches are temporary
blocks. The MSIL offsets corresponding to these temporary branch target
blocks are saved in NodeOffsetListArray.
### Branch Target Adjustment
Once the initial set of basic blocks is built, the algorithm finds the
real blocks corresponding to the branch target MSIL offsets and replaces
temporary target blocks with the real ones. This may involve splitting a
basic block if a target of a branch has offset thats in the middle of
the block.
Second Pass
-----------
In the second pass the reader walks the flow graph in depth-first
preorder (starting with the head block) and translates MSIL instructions
for each block. Walking the flow graph in this order allows the reader
to identify unused blocks that are then removed. No new control flow is
introduced in this pass except for exception checks (null checks, bounds
checks, etc.) and conditional helper calls. Extra care should be taken with
any changes to control flow graph in the second pass to make sure they don't
interfere with dominance computation we do for class initialization.
ReaderBase::readBytesForFlowGraphNode\_Helper is the method that
iterates over basic block bytes.
### Type Translation
GenIR::getType is the main entry point for translating CorInfoTypes to
LLVM Types.
- The translation of primitive types is straightforward. All of
CorInfoType primitive types have direct equivalents in LLVM type
system.
- nativeint and nativeuint are represented as IntegerType with the
target pointer size.
- We use addressspace to distinguish managed pointers (interior or object)
from unmanaged ones. Addressspace 0 is used for unmanaged pointers and
addressspace 1 is used for managed pointers (interior or object).
GenIR::getUnmanagedPointerType and GenIR::getManagedPointerType should be
used for creating pointers.
- We intend to use LLVM types to recover information about GC pointers
in value types for GC info.
- We represent value classes as LLVM Structs and reference classes as
managed pointers to LLVM Structs.
- We maintain two maps for ensuring that each class corresponds to a
single LLVM Type: ClassTypeMap and ArrayTypeMap. ClassTypeMap is
indexed by CORINFO\_CLASS\_HANDLE and is used for non-array types.
ArrayTypeMap is indexed by `<element type, element handle, array
rank>` tuple. The reason for that is that two different
CORINFO\_CLASS\_HANDLEs can identify the same array: the actual
array handle and the handle for its MethodTable.
- We construct LLVM Structs with accurate information about fields
including vtable slot for objects and struct padding. This allows us
to use struct GEP instructions for accessing fields.
### MSIL Instruction Translation
GenIR is responsible for translating MSIL instructions to LLVM
instructions. It uses IRBuilder apis.
The instructions currently implemented:
- Constant loading (ldc variants)
- Indirect loading (ldind variants)
- Indirect storing (stind variants)
- Method argument loading (ldarg variants)
- Method argument address loading (ldarga variants)
- Method argument storing (starg variants)
- Local variable loading (ldloc variants)
- Local variable reference loading (ldloca variants)
- Local variable storing (stloc variants)
- Arithmetical instructions (add, sub, mul, div, rem, neg and their
variants)
- Overflow arithmetical instructions (add.ovf, sub.ovf, mul.ovf and
their variants)
- Bitwise instructions (and, or, xor, not)
- Shift instructions (shl, shr, shr.un)
- Conversion instructions (conv variants)
- Logical condition check instructions (ceq, cgt, clt and their
variants)
- Unconditional branching instructions (br variants)
- Conditional branching instructions (brfalse, brtrue and their
variants)
- Comparative branching instructions (beq, bne, bge, bgt, ble, blt and
their variants)
- The switch instruction
- The ret instruction
- Addressing fields (ldfld, ldsfld, ldflda, ldsflda, stfld, stsfld)
- Manipulating class and value type instances (ldnull, ldstr, newobj,
castclass, isinst, ldtoken, sizeof)
- Vector instructions (newarr, ldlen)
- Calls (call, calli, ldftn, ldvirtftn, calls that
require virtual stub dispatch)
- Method argument list (arglist)
- Stack manipulation (nop, dup, pop)
<a name="Not implemented"></a>The instructions not yet implemented:
- Overflow conversion instructions (conv.ovf variants)
- Local block allocation (localloc)
- Block operations (cpblk, initblk).
- Manipulating class and value type instances (ldobj, stobj, box,
unbox, mkrefany, refanytype, refanyval)
- Calls (jmp, tail calls, constrained virtual calls)
- Debugging breakpoint (break)
### Stack Maintenance
Each MSIL instruction may modify operand stack associated with its basic
block. If the block is not empty on exit from the basic block, additional
processing is needed to make sure the block successors can use the
values on the predecessors stack. GenIR::maintainOperandStack is
responsible for setting up successors operand stacks in this case. For
successors of the current block such that the current block is their
only predecessor, the algorithm simply copies the predecessors stack.
For successors of the current block such that the current block is not
their only predecessor, the algorithm creates PHI instructions in the
successor blocks and pushes them on the operand stacks.
Post-Pass
---------
The post-pass inserts the necessary code for keeping generic context
alive and cleans up memory used by the reader.
Future Work
-----------
- [Implement remaining MSIL instructions.](#user-content-Not%20implemented)
- [Implement value type support (as parameters, arguments to a callee,
stores, or return values).](https://github.com/dotnet/llilc/issues/279)
- [ReaderBase has some code to enable inlining in the reader. We need
to decide whether we do inlining in the reader or in a subsequent
pass and update the reader accordingly.](https://github.com/dotnet/llilc/issues/239)
- [Generate debug locations for later debug emission.](https://github.com/
dotnet/llilc/issues/318)
- Possibly enable a limited set of reader-time optimizations (like
[avoiding redundant class initialization](https://github.com/dotnet/llilc/issues/38),
[using pc-relative call forms](https://github.com/dotnet/llilc/issues/296),
[replacing readonly static field loads with constants](https://github.com/dotnet/llilc/issues/294),
[deferring lowering of certain constructs](https://github.com/dotnet/llilc/issues/292),
[hot/cold options for loading strings](https://github.com/dotnet/llilc/issues/286)).
- [Produce more precise aliasing annotations.](https://github.com/dotnet/llilc/issues/291)
- [Recognize vectorizable types and emit proper vector IR.](https://github.com/dotnet/llilc/issues/323)
- [Support Just My Code.](https://github.com/dotnet/llilc/issues/272)
- [Support synchronized methods.](https://github.com/dotnet/llilc/issues/271)
- [Handle methods with security checks.](https://github.com/dotnet/llilc/issues/301)
- [Support union types.](https://github.com/dotnet/llilc/issues/275)
- [Support volatile operations.](https://github.com/dotnet/llilc/issues/278)
- [Support intrinsics.](https://github.com/dotnet/llilc/issues/281)