update overview and readme

2017-01-16 22:52:14 -08:00 · 2017-01-16 22:52:14 -08:00 · 6f0dd9b851
--- a/Overview.md
+++ b/Overview.md
@ -1,67 +1,33 @@
 # Overview

-At a high level, the parser accepts source code as an input, and
-produces a syntax tree as an output.
+The syntax tree produced by the parser ensures two key attributes:
+1. **All source information is held in full fidelity.** This means that the tree contains every piece of 
+information found in the source text, every grammatical construct, every lexical token, and everything
+else in between including whitespace and comments. The syntax trees also represent errors in source code
+when the program is incomplete or malformed, by representing skipped or missing tokens in the syntax tree.
+2. **A syntax tree obtained from the parser is completely round-trippable back to the text it was parsed from.**
+From any syntax node, it is possible to get the text representation of the subtree rooted at that node.
+This means that syntax trees can be used as a way to construct and edit source text.

-If you're familiar with Roslyn and TypeScript, many of the concepts presented here will be familiar
-(albeit adapted, to account for the unique runtime characteristics of PHP.)
-
-## Syntax Tree
-A syntax tree is literally a tree data structure, where non-terminal structural 
-elements parent other elements. Each syntax tree is made up of Nodes (represented by circles), 
-Tokens (represented by squares), and trivia (not represented, below, but attached to each Token).
-
-![image](https://cloud.githubusercontent.com/assets/762848/19092929/e10e60aa-8a3d-11e6-8b90-51eabe5d1d8e.png)
-
-Syntax trees have two key attributes.
-
-1. The first attribute is that Syntax trees hold all the source information in full fidelity. 
-This means that the syntax tree contains every piece of information 
-found in the source text, every grammatical construct, every lexical 
-token, and everything else in between including whitespace, comments, 
-and preprocessor directives. For example, each literal mentioned in 
-the source is represented exactly as it was typed. The syntax trees 
-also represent errors in source code when the program is incomplete 
-or malformed, by representing skipped or missing tokens in the syntax tree.
-
-2. This enables the second attribute of syntax trees. A syntax tree obtained 
-from the parser is completely round-trippable back to the text it was parsed 
-from. From any syntax node, it is possible to get the text representation of 
-the sub-tree rooted at that node. This means that syntax trees can be used 
-as a way to construct and edit source text. By creating a tree you have by 
-implication created the equivalent text, and by editing a syntax tree, 
-making a new tree out of changes to an existing tree, you have effectively 
-edited the text.
-
-The syntax tree is composed of Nodes (represented by circles), 
-Tokens (represented by squares), and Trivia (not represented directly, but attached to 
-individual Tokens)
+## Key Concepts
+The **Syntax Tree** produced is literally a tree data structure, where non-terminal structural elements parent other
+elements. Each syntax tree is made up of **Nodes** (non-terminal elements) and
+ **Tokens** (terminal elements).

+Additionally associated with each Node and Token is **Positional Information**, **Errors**, and **Comment + Whitespace Trivia**.

+All trees guarantee a set of **Invariants** - properties of the tree that always hold true, no matter what the
+input. This set of invariants provides a consistent foundation 
+that makes it easier to ensure the tree is "structurally sound", and confidently reason about the tree 
+as we continue to build up our understanding. For instance, one such invariant is that the original text 
+(including whitespace and comments) should always be reproducible from a Node. See [Invariants](Invariants.md)
+for a complete list. 

+## Tree Elements
 ### Nodes
 Syntax nodes are one of the primary elements of syntax trees. These nodes represent 
 syntactic constructs such as declarations, statements, clauses, and expressions. 
-Each category of syntax nodes is represented by a separate class derived from SyntaxNode. 
-The set of node classes is not extensible.
-
-All syntax nodes are non-terminal nodes in the syntax tree, which means they always have 
-other nodes and tokens as children. As a child of another node, each node has a parent node
- that can be accessed through the Parent property. Because nodes and trees are immutable, 
- the parent of a node never changes. The root of the tree has a null parent.
-
-Each node has a ChildNodes method, which returns a list of child nodes in sequential order 
-based on its position in the source text. This list does not contain tokens. Each node also
-has a collection of Descendant methods - such as DescendantNodes, DescendantTokens, or 
-DescendantTrivia - that represent a list of all the nodes, tokens, or trivia that exist in 
-the sub-tree rooted by that node.
-
-In addition, each syntax node subclass exposes all the same children through 
-properties. For example, a BinaryExpressionSyntax node class has three additional properties 
-specific to binary operators: Left, OperatorToken, and Right.
-
-Some syntax nodes have optional children. For example, an IfStatementSyntax has an optional 
-ElseClauseSyntax. If the child is not present, the property returns null.
+Each category of syntax nodes is represented by a separate class derived from `Node`.

 ### Tokens
 Syntax tokens are the terminals of the language grammar, representing the smallest syntactic 
@ -72,24 +38,23 @@ For efficiency purposes, unlike syntax nodes, there is only one structure for al
 kinds of tokens with a mix of properties that have meaning depending on the kind 
 of token that is being represented.

-### Trivia
-Syntax trivia represent the parts of the source text that are largely insignificant for 
-normal understanding of the code, such as whitespace, comments, and preprocessor directives.
-Because trivia are not part of the normal language syntax and can appear anywhere between 
+### Whitespace and Comment Trivia
+Because whitespace and comment trivia are not part of the normal language syntax and can appear anywhere between 
 any two tokens, they are not included in the syntax tree as a child of a node. Yet, because 
 they are important when implementing a feature like refactoring and to maintain full 
 fidelity with the source text, they do exist as part of the syntax tree.

-You can access trivia by inspecting a token's LeadingTrivia. 
-When source text is parsed, sequences of trivia are associated with tokens. 
+You can access trivia by inspecting a token's LeadingWhitespaceAndComments. When source text is parsed,
+sequences of trivia are associated with tokens. 

-### Kinds
-Each node, token, or trivia has a RawKind property (represented by a numeric literal), 
-that identifies the exact syntax element represented.
+### Positional Information
+Each node, token, or trivia knows its position within the source text and the number of 
+characters it consists of. A text position is represented as a 32-bit integer, which is 
+a zero-based byte index into the string. The width corresponds to a count of characters,
+represented as integers. Zero-length refers to a location between two characters.

-The RawKind property allows for easy disambiguation of syntax node types that share the 
-same node class. For tokens and trivia, this property is the only way to distinguish 
-one type of element from another.
+For efficiency purposes, the position refers to the absolute position within the text, 
+and a helper function is available if you require Line/Column information.

 ### Errors
 Even when the source text contains syntax errors, a full syntax tree that is round-trippable
@ -101,23 +66,12 @@ insert a missing token into the syntax tree in the location that the token was e
 A missing token represents the actual token that was expected, but it has an empty span.

 Second, the parser may skip tokens until it finds one where it can continue parsing. 
-In this case, the skipped tokens that were skipped are attached as a trivia node with 
-the kind SkippedTokens.
+In this case, the skipped tokens that were skipped are attached as a skipped token in the tree.

 Note that the parser produces trees in a tolerant fashion, and will not produce errors for
 all incorrect constructs (e.g. including a non-constant expression as the default value of
 a method parameter). Instead, it attaches these errors on a post-parse walk of the tree.

-### Positional Information
-Each node, token, or trivia knows its position within the source text and the number of 
-characters it consists of. A text position is represented as a 32-bit integer, which is 
-a zero-based Unicode character index. A TextSpan object is the beginning position and a 
-count of characters, both represented as integers. If TextSpan has a zero length, it refers
-to a location between two characters.
-
-The position refers to the absolute position within the text, but a helper function is available
-if you require Line/Column information. 
-
 ## Next Steps
-Check out the [Documentation](GettingStarted.md) section for more information on how consume
+Check out the [Readme](Readme.md) for more information on how consume
 the parser, or the [How It Works](HowItWorks.md) section if you want to dive deeper into the implementation.
--- a/README.md
+++ b/README.md
@ -54,7 +54,9 @@ foreach ($astNode->getDescendantNodes() as $descendant) {
 }
 ```

-> Note: The API is still a work in progress, and will evolve according to user feedback.
+> Note: [the API](ApiDocumentation.md) is not yet finalized, so please file issues let us know what functionality you want exposed, 
+and we'll see what we can do! Also please file any bugs with unexpected behavior in the parse tree. We're still
+in our early stages, and any feedback you have is much appreciated :smiley:.

 ## Design Goals
 * Error tolerant design - in IDE scenarios, code is, by definition, incomplete. In the case that invalid code is entered, the
@ -111,8 +113,6 @@ own machine to see for yourself.
 ## Learn more
 **:dart: [Design Goals](#design-goals)** - learn about the design goals of the project (features, performance metrics, and more).

-**:sunrise_over_mountains: [Syntax Overview](Overview.md)** - learn about the composition and key properties of the syntax tree.
-
 **:seedling: [Documentation](GettingStarted.md#getting-started)** - learn how to reference the parser from your project, and how to perform
 operations on the AST to answer questions about your code.