fsharp/docs/optimizations.md

4.8 KiB

title category categoryindex index
Optimizations Compiler Internals 200 400

Code Optimizations

Code optimizations are in Compiler/Optimize/*.

Some of the optimizations performed in Optimizer.fs are:

  • Propagation of known values (constants, x = y, lambdas, tuples/records/union-cases of known values)
  • Inlining of known lambda values
  • Eliminating unused bindings
  • Eliminating sequential code when there is no side-effect
  • Eliminating switches when we determine definite success or failure of pattern matching
  • Eliminating getting fields from an immutable record/tuple/union-case of known value
  • Expansion of tuple bindings "let v = (x1,...x3)" to avoid allocations if it's not used as a first class value
  • Splitting large functions into multiple methods, especially at match cases, to avoid massive methods that take a long time to JIT
  • Removing tailcalls when it is determined that no code in the transitive closure does a tailcall nor recurses

In DetupleArgs.fs, tuples at call sites are eliminated if possible. Concretely, functions that accept a tuple at all call sites are replaced by functions that accept each of the tuple's arguments individually. This may require inlining to activate.

Considering the following example:

let max3 t =
    let (x, y, z) = t
    max x (max y z)

max3 (1, 2, 3)

The max3 function gets rewritten to simply accept three arguments, and depending on how it gets called it will either get inlined at the call site or called with 3 arguments instead of a new tuple. In either case, the tuple allocation is eliminated.

However, sometimes this optimization is not applied unless a function is marked inline. Consider a more complicated case:

let rec runWithTuple t offset times =
    let offsetValues x y z offset =
        (x + offset, y + offset, z + offset)
    if times <= 0 then
        t
    else
        let (x, y, z) = t
        let r = offsetValues x y z offset
        runWithTuple r offset (times - 1)

The inner function offsetValues will allocate a new tuple when called. However, if offsetValues is marked as inline then it will no longer allocate a tuple.

Currently, these optimizations are not applied to struct tuples or explicit ValueTuples passed to a function. In most cases, this doesn't matter because the handling of ValueTuple is well-optimized and may be erased at runtime. However, in the previous runWithTuple function, the overhead of allocating a ValueTuple each call ends up being higher than the previous example with inline applied to offsetValues. This may be addressed in the future.

In InnerLambdasToTopLevelFuncs.fs, inner functions and lambdas are analyzed and, if possible, rewritten into separate methods that do not require an FSharpFunc allocation.

Consider the following implementation of sumBy on an F# list:

let sumBy f xs =
    let rec loop xs acc =
        match xs with
        | [] -> acc
        | x :: t -> loop t (f x + acc)
    loop xs 0

The inner loop function is emitted as a separate static method named loop@2 and incurs no overhead involved with allocating an FSharpFunc at runtime.

In LowerCalls.fs:

  • Performs eta-expansion on under-applied values applied to lambda expressions and does a beta-reduction to bind any known arguments

In LowerSequences.fs:

  • Analyzes a sequence expression and translates it into a state machine so that operating on sequences doesn't incur significant closure overhead

Potential future optimizations: Better Inlining

Consider the following example:

let inline f k = (fun x -> k (x + 1))
let inline g k = (fun x -> k (x + 2))

let res = (f << g) id 1 // 4

Intermediate values that inherit from FSharpFunc are allocated at the call set of res to support function composition, even if the functions are marked as inline. Currently, if this overhead needs removal, you need to rewrite the code to be more like this:

let f x = x + 1
let g x = x + 2

let res = id 1 |> g |> f // 4

The downside of course being that the id function can't propagate to composed functions, meaning the code is now different despite yielding the same result.

More generally, any time a first-order function is passed as an argument to a second-order function, the first-order function is not inlined even if everything is marked as inline. This results in a performance penalty.