cadl/docs в 44faaadf014a05b190ee1289f7de07781e49bedb - cadl

Nick Guerrera a6d9bc552c Implement basic parser error recovery (#453 ) With this change, the parser no longer throws when it encounters an error, but continues on to report subsequent errors to the user as well. For now, however, evalADLScript still throws if there are any parse errors. More work is needed in the parser to represent which nodes have the errors and so forth before we can meaningfully analyze a syntax tree for source that had errors. Our default response to a token that doesn't match our expectation is to insert a matching token before the offending token. This is effectively what is happening wherever we have parseExpected() without checking the return value. Also when we expect an identifier and do not find one, we insert an identifier with a unique yet unspeakable name. In fact, anywhere we hit an invalid expression, we insert one of these identifiers. Statements are easier to correct than expressions, and our approach there is different. Every statement in the language begins with a reserved word, an at-sign, or a semicolon. If the leading token for statement is none of these, we report an invalid statement starting at that token and ending immediately before the next token that is one of these. There are also case-by-case refinements to this insertion strategy. For example, we replace errant semicolons with commas in comma-only delimited lists rather than inserting semicolons in front of the comma. In other cases, we take the approach of parsing a grammar that is a superset of the language specification, augmented with known common errors. For example, we parse decorators in many more places than actually allowed (and signal an error still, of course). We do the same for import statements inside namespaces. Over time, I expect that we'll need to do more of these more deliberate and one-off corrections, but this change still performs relatively well and it provides the foundation for such improvements. A major challenge with the approach of correction by inserting tokens is that it can hang the parser such that it keeps inserting tokens without making forward progress. To mitigate the risk of such bugs, all list constructs that are susceptible to this are driven by the same parseList() routine. This routine has an escape hatch in the loop where it bails and assumes we've hit a bad representation for the end of the list if any loop iteration fails to make progress. A trivial example of a construct that would hit this without this check is `model M { ]`. The parser proceeds as follows in that case: 1. Parse model keyword: OK. 2. Parse model name (M): OK. 3. Parse open brace: OK. 4. Expect property name, see close bracket: ERROR, insert synthetic identifier for property name. 5. Expect colon, see close bracket: ERROR, insert colon. 6. Expect property type, see close bracket: ERROR, insert synthetic identifier for property type. 7. Expect semicolon, see close bracket: ERROR, insert semicolon. 8. Observe that the position has not advanced after a full loop iteration of parsing properties: ERROR, replace with close brace, exit loop. (Note that everywhere I'm saying "insert" or "replace" here, there's no literal array of tokens that we're mutating, we are just taking the code paths we would take if those edits were made to the source. There is rather an implicit "behave as though" replacing/inserting, but that is really an implementation detail and not part of the logical algorithm.) Without the vital step 8, we could convince ourselves that we've parsed a real property and try to move on to the next one, and do this over and over again, creating infinitely many synthetically named properties of synthetically named types! This change also adjusts many of our error messages, borrowing from the TypeScript compiler's terse tone for mundane errors. It also fixes various issues with imprecise or sub-optimal squiggly locations for various errors. There are also some ADL team developer productivity improvements in this change... The per-test output in Mocha Test Explorer in VS Code now shows the input source code, the resulting syntax tree, and all parse diagnostics formatted nicely as the CLI compiler would. The syntax tree JSON also has boilerplate default-empty things elided and the start and end positions augmented with line and column number. Negative parse test cases must now provide regex(es) to match against the reported diagnostics. Stack traces in Mocha Test Explorer will now be reported up to 50 frames rather than 10, making it easier to diagnose a stack overflow in the parser's recursive descent. A new compilerAssert function is added for asserting something that should never happen in the compiler. It takes a condition, message, and optional source node. If the condition is not met, it throws an AssertionError with the message and if a source node is provided, the message will be augmented with "occurred while compiling (file) near line (X) and column (Y)". This is used in only a couple of places right now. I did not yet scrub the existing throws that could benefit from this. If a DiagnosticError or AggregateError occurs in a test, the formatted diagnostics or inner stack traces are included in the Mocha Test Explorer per-test output. This is done because tests don't have the CLI catch handler that has to take special steps for these special errors. Note that for all Mocha Test Explorer output improvements above, if you prefer to run tests on the command line, you can also set environment variable ADL_VERBOSE_TEST_OUTPUT=true and get all of the output spewed to the console. An issue with the typing of `messages` allowed typos when used is fixed, and the fix makes `Message.X` a `Message` and if you hover over X in the IDE, you will see the message code, severity and text. Finally, there's also a minor correction in the tutorial to account for parenless decorators having been removed from the language.	2021-04-20 09:04:45 -07:00
..
tutorial.md	Implement basic parser error recovery (#453 )	2021-04-20 09:04:45 -07:00

Nick Guerrera a6d9bc552c Implement basic parser error recovery (#453 )

With this change, the parser no longer throws when it encounters an
error, but continues on to report subsequent errors to the user as
well.

For now, however, evalADLScript still throws if there are any parse
errors. More work is needed in the parser to represent which nodes
have the errors and so forth before we can meaningfully analyze a
syntax tree for source that had errors.

Our default response to a token that doesn't match our expectation is
to insert a matching token before the offending token. This is
effectively what is happening wherever we have parseExpected() without
checking the return value. Also when we expect an identifier and do
not find one, we insert an identifier with a unique yet unspeakable
name. In fact, anywhere we hit an invalid expression, we insert one of
these identifiers.

Statements are easier to correct than expressions, and our approach
there is different. Every statement in the language begins with a
reserved word, an at-sign, or a semicolon. If the leading token for
statement is none of these, we report an invalid statement starting at
that token and ending immediately before the next token that is one of
these.

There are also case-by-case refinements to this insertion
strategy. For example, we replace errant semicolons with commas in
comma-only delimited lists rather than inserting semicolons in front
of the comma.

In other cases, we take the approach of parsing a grammar that is a
superset of the language specification, augmented with known common
errors. For example, we parse decorators in many more places than
actually allowed (and signal an error still, of course). We do the
same for import statements inside namespaces.

Over time, I expect that we'll need to do more of these more
deliberate and one-off corrections, but this change still performs
relatively well and it provides the foundation for such improvements.

A major challenge with the approach of correction by inserting tokens
is that it can hang the parser such that it keeps inserting tokens
without making forward progress. To mitigate the risk of such bugs,
all list constructs that are susceptible to this are driven by the
same parseList() routine. This routine has an escape hatch in the loop
where it bails and assumes we've hit a bad representation for the end
of the list if any loop iteration fails to make progress.

A trivial example of a construct that would hit this without this
check is `model M { ]`. The parser proceeds as follows in that case:

1. Parse model keyword: OK.
2. Parse model name (M): OK.
3. Parse open brace: OK.
4. Expect property name, see close bracket: ERROR, insert synthetic
   identifier for property name.
5. Expect colon, see close bracket: ERROR, insert colon.
6. Expect property type, see close bracket: ERROR, insert synthetic
   identifier for property type.
7. Expect semicolon, see close bracket: ERROR, insert semicolon.
8. Observe that the position has not advanced after a full loop
   iteration of parsing properties: ERROR, replace with close brace,
   exit loop.

(Note that everywhere I'm saying "insert" or "replace" here, there's
no literal array of tokens that we're mutating, we are just taking the
code paths we would take if those edits were made to the source. There
is rather an implicit "behave as though" replacing/inserting, but that
is really an implementation detail and not part of the logical
algorithm.)

Without the vital step 8, we could convince ourselves that we've
parsed a real property and try to move on to the next one, and do this
over and over again, creating infinitely many synthetically named
properties of synthetically named types!

This change also adjusts many of our error messages, borrowing from
the TypeScript compiler's terse tone for mundane errors.

It also fixes various issues with imprecise or sub-optimal squiggly
locations for various errors.

There are also some ADL team developer productivity improvements in
this change...

The per-test output in Mocha Test Explorer in VS Code now shows the
input source code, the resulting syntax tree, and all parse
diagnostics formatted nicely as the CLI compiler would. The syntax
tree JSON also has boilerplate default-empty things elided and the
start and end positions augmented with line and column number.

Negative parse test cases must now provide regex(es) to match against
the reported diagnostics.

Stack traces in Mocha Test Explorer will now be reported up to 50
frames rather than 10, making it easier to diagnose a stack overflow
in the parser's recursive descent.

A new compilerAssert function is added for asserting something that
should never happen in the compiler. It takes a condition, message,
and optional source node. If the condition is not met, it throws an
AssertionError with the message and if a source node is provided, the
message will be augmented with "occurred while compiling (file) near
line (X) and column (Y)". This is used in only a couple of places
right now. I did not yet scrub the existing throws that could benefit
from this.

If a DiagnosticError or AggregateError occurs in a test, the formatted
diagnostics or inner stack traces are included in the Mocha Test
Explorer per-test output. This is done because tests don't have the
CLI catch handler that has to take special steps for these special
errors.

Note that for all Mocha Test Explorer output improvements above, if
you prefer to run tests on the command line, you can also set
environment variable ADL_VERBOSE_TEST_OUTPUT=true and get all of the
output spewed to the console.

An issue with the typing of `messages` allowed typos when used is
fixed, and the fix makes `Message.X` a `Message` and if you hover over
X in the IDE, you will see the message code, severity and text.

Finally, there's also a minor correction in the tutorial to account
for parenless decorators having been removed from the language.

2021-04-20 09:04:45 -07:00

tutorial.md

Implement basic parser error recovery (#453 )

2021-04-20 09:04:45 -07:00