fsharp/docs/changing-the-ast.md

6.1 KiB

title category categoryindex index
Changing the AST Compiler Internals 200 800

Changing the AST

Making changes to the AST is a common task when working on new F# compiler features or when working on developer tooling.
This document describes the process of making changes to the AST.

The easiest way to modify the AST is to start with the type definitions in SyntaxTree.fsi and SyntaxTree.fs and then let the compiler guide you to the places where you need to make changes. Let's look at an example: We want to extend the AST to include the range of the / symbol in a SynRationalConst.Rational.

There are two solutions to choose from:

  • Add a new field to the Rational union case
  • Add a dedicated trivia type to the union case which contains the new range and maybe move the existing ranges to the trivia type as well

The pros of introducing a dedicated trivia type are:

  • Having the additional information in a separate structure allows it to grow more easily over time. Adding new information to an existing trivia type won't disrupt most FCS consumers.
  • It is clear that it is information that is not relevant for the compilation.

The cons are:

  • It can be a bit excessive to introduce for a single field.
  • The existing AST node might already contain fields that are historically more suited for trivia, but predate the SyntaxTrivia module.

In this example, we'll go with the first solution and add a new field named divRange to the Rational union case as it felt a bit excessive to introduce a new trivia type for a single field.
But these are the type of decisions that need to be made when changing the AST.

type SynRationalConst =

    // ...

    | Rational of
        numerator: int32 *
        numeratorRange: range *
        divRange: range *   // our new field
        denominator: int32 *
        denominatorRange: range *
        range: range
    
    // ...

After modifying SyntaxTree.fsi and SyntaxTree.fs, the compiler will report errors in pars.fsy. If not, the fsy file wasn't processed by the compilation. In this case, a rebuild of FSharp.Compiler.Service.fsproj should help.
pars.fsy is the parser specification of F#, a list of rules that describe how to parse F# code. Don't be scared by the size of the file or the unfamiliar content. It's easier than it looks. The F# compiler uses a parser generator called fsyacc to generate the parser from the specification in pars.fsy. Let's look at the most relevant syntax parts of a .fsy file:

rationalConstant:
  | INT32 INFIX_STAR_DIV_MOD_OP INT32
    { if $2 <> "/" then reportParseErrorAt (rhs parseState 2) (FSComp.SR.parsUnexpectedOperatorForUnitOfMeasure())
      if fst $3 = 0 then reportParseErrorAt (rhs parseState 3) (FSComp.SR.parsIllegalDenominatorForMeasureExponent())
      if (snd $1) || (snd $3) then errorR(Error(FSComp.SR.lexOutsideThirtyTwoBitSigned(), lhs parseState))
      SynRationalConst.Rational(fst $1, rhs parseState 1, fst $3, rhs parseState 3, lhs parseState) }
  | // ...

The first line is the name of the rule, rationalConstant in this case. It's a so called non-terminal symbol in contrast to a terminal symbol like INT32 or INFIX_STAR_DIV_MOD_OP. The individual cases of the rule are separated by |, they are called productions.

By now, you should be able to see the similarities between an fsyacc rule and the pattern matching you know from F#.
The code between the curly braces is the code that gets executed when the rule is matched and is real F# code. After compilation, it ends up in .\artifacts\obj\FSharp.Compiler.Service\Debug\netstandard2.0\pars.fs, generated by fsyacc.

The first three lines do error checking and report errors if the input is invalid. Then the code calls the Rational constructor of SynRationalConst and passes some values to it. Here we need to make changes to adjust the parser to our modified type definition.
The values or symbols that matched the rule are available as $1, $2, $3 etc. in the code. As you can see, $1 is a tuple, consisting of the parsed number and a boolean indicating whether the number is a valid 32 bit signed integer or not. The code is executed in the context of the parser, so you can use the parseState variable, an instance of IParseState, to access the current state of the parser. There are helper functions defined in ParseHelpers.fs that make it easier to work with it.
rhs parseState 1 returns the range of the first symbol that matched the rule, here INT32. So, it returns the range of 23 in 23/42.
rhs is short for right hand side.
Another helper is rhs2. Using it like rhs2 parseState 2 3 for example, returns the range covering the symbols from the second to the third symbol that matched the rule. Given 23/42, it would return the range of /42.
lhs parseState returns the range of the whole rule, 23/42 in our example. When parser recovery is of concern for a rule, it's preferred to use rhs2 over lhs.

Circling back to our original example of adding a new field to SynRationalConst, we need to add a new parameter to the call of the Rational constructor. We want to pass the range of the / symbol, so we need to add rhs parseState 2 as the third parameter to the constructor call:

SynRationalConst.Rational(fst $1, rhs parseState 1, rhs parseState 2, fst $3, rhs parseState 3, lhs parseState)

That's it. Adjusting the other constructor calls of Rational in pars.fsy should be enough to have a working parser again which returns the modified AST.
While fixing the remaining compiler errors outside of pars.fsy, it's a good idea to use named access to the fields of the SynRationalConst.Rational union case instead of positional access. This way, the compilation won't fail if additional fields are added to the union case in the future.
After a successful compilation, you can run the parser tests in SyntaxTreeTests.fs to verify that everything works as expected.
It's likely that you'll need to update the baseline files as described in SyntaxTreeTests.fs.