razor-tooling/docs/CollectionBestPractices.md

25 KiB
Исходник Постоянная ссылка Ответственный История

Collection Best Practices in Razor

Imperative Collections

  • .NET provides many collection types with different characteristics for different purposes.
  • The collections from the System.Collections namespace should be avoided. Never use these unless in some legacy scenario.
  • The collections in System.Collections.Generic are considered the “work horse” collection types for .NET and are suitable for most purposes. They have years of hardening that make them highly efficient choices for most work.
  • Popular imperative collection types include the ones we all use on a regular basis List<T>, HashSet<T>, Dictionary<TKey, TValue>, Stack<T>.
  • System.Collections.Concurrent contains collections that are designed for use when thread-safety is needed. In general, these should only be used in particular situations.

[!WARNING] Beware of collection growth

The imperative collections generally have more internal storage than needed to allow more items to be added. (This is what is meant by "capacity” vs. “count”). When enough items are added, the internal storage will need to grow. This requires creating larger storage, releasing the previous storage for garbage collection, and copying the existing contents into it, which consumes CPU time. For a larger collection, this can potential happen many times, so its important to set the capacity up front to avoid unnecessary internal storage growth.

[!WARNING] Avoid exposing collection interfaces

Avoid exposing collections directly via interfaces, such as IReadOnlyList<T> and IReadOnlyDictionary<TKey, TValue>. The primary reason for this is that these interfaces can result in allocations when they are foreachd. In general, collections provide a struct enumerator that can be used to foreach that collection without allocating an IEnumertor<T> on the heap. However, when going through a collection interface, there isnt a struct enumerator, so an allocation is likely required to foreach. In fact, many collections, such as List<T>, are implemented to just return their struct enumerator when accessed via collection interfaces, resulting in an allocation when the struct enumerator is boxed.

  • If exposing a collection is necessary, consider whether it might be better to expose a more optimal read-only collection. Instead of IReadOnlyList<T>, consider ImmutableArray<T>.
  • There arent many other options when an API calls for exposing an IReadOnlyDictionary<TKey, TValue>. In these cases, consider whether it might be better to just avoid exposing the collection altogether and provide APIs that access it. Or, in some cases, it might be necessary to create entirely new collection types. (This is why Razor TagHelperDescriptors expose a MetadataCollection.)

[!WARNING] Be mindful of ToArray()

Calling ToArray() on a collection will create a new array and copy content from the collection into it. So, when the exact capacity is known up front, it is an anti-pattern to create a List<T> without that capacity, fill it with items and then call ToArray() at the end. This results in extra allocations that could be avoided by creating an array and filling it.

Immutable Collections

  • The .NET immutable collections are provided by the System.Collections.Immutable NuGet package, which provides implementations for .NET, .NET Framework, and .NET Standard 2.0.
  • The collections in the System.Collections.Immutable namespace have a very specific purpose. They are intended to be persistent data structures; that is, a data structure that always preserves the previous version of itself when it is modified. Such data structures are effectively immutable, but in hindsight, maybe it would have been better for this namespace to have been called, System.Collections.Persistent?
    • The term “persistent data structure” was introduced by the 1986 paper, “Making Data Structures Persistent” (PDF).
    • A highly influential book in the area of persistent data structures is “Purely Functional Data Structures” (1999) by Chris Okasaki (Amazon). Okasakis original dissertation is available from CMUs website (PDF).
  • Because of their persistency, nearly all of the immutable collections have very different implementations than their imperative counterparts. For example, List<T> is implemented using an array, while ImmutableList<T> is implemented using a binary tree.
  • Mutating methods on an immutable collection perform “non-destructive mutation”. Instead, of mutating the underlying object, a mutating method like Add produces a new instance of the immutable collection. This is similar to how the String.Replace(...) API is used.
  • Significant effort has been made to ensure that immutable collections are as efficient as they can be. However, the cost of persistence means that immutable collections are generally assumed to be slower than imperative counterparts.

[!CAUTION]

Because the immutable collections are often implemented using binary trees to achieve persistence, the asymptotic complexity of standard operations can be very surprising. For example, ImmutableDictionary<TKey, TValue> access is O(log n) rather than the usual O(1) that would be expected when accessing a hash table data structure, such as Dictionary<TKey, TValue>. A similar difference in performance characteristics exists across the various collection types. The following table shows the complexity of accessing a few popular collections types using their indexer.

Immutable collection type Complexity Imperative collection type Complexity
ImmutableDictionary<TKey, TValue> O(log n) Dictionary<TKey, TValue> O(1)
ImmutableHashSet<T> O(log n) HashSet<T> O(1)
ImmutableList<T> O(log n) List<T> O(1)
ImmutableSortedDictionary<TKey, TValue> O(log n) SortedDictionary<TKey, TValue> O(log n)

[!CAUTION] ToImmutableX() extension methods are not “freeze” methods!

The System.Immutable.Collections package provides several extension methods that produce an immutable collection from an existing collection or sequence. These methods arent optimized to reuse the internal storage of other collections in any way. Because of this, the following code is an anti-pattern. In this example, each element is added to a HashSet<int> and then the elements of that set are added to a new ImmutableHashSet<int>.

var array = new[] { "One", "Two", "Two", "One", "Three" };
var set = new HashSet<int>(array).ToImmutableHashSet();

Using Builders

  • When creating an immutable collection with a lot of mutation, use a builder. Builders are optimized to populate the internal storage of an immutable collection.
  • The following code achieves the expected result but inefficiently creates several intermediate ImmutableList<int> instances.
ImmutableList<int> CreateList()
{
    var list = ImmutableList<int>.Empty;
    for (var i = 0; i < 10; i++)
    {
      list = list.Add(i);
    }

    return list;
}
  • The version below populates an ImmutableList<int>.Builder and creates just a single ImmutableList<int> instance at the end.
ImmutableList<int> CreateList()
{
    var builder = ImmutableList.CreateBuilder<int>();

    for (var i = 0; i < 10; i++)
    {
        builder.Add(i);
    }

    return builder.ToImmutable();
}

ImmutableArray<T>

  • ImmutableArray<T> is very different than the other immutable collections. It is the only struct collection type, and is not optimized for persistence. (In hindsight, perhaps a more appropriate name would have been FrozenArray<T>?)
  • ImmutableArray<T> is a relatively simple struct that provides read-only access to an internal array.

[!WARNING] Be aware of copies!

In order to maintain its immutability semantics, ImmutableArray<T> always creates a copy of the array it is wrapping internally. If it didnt, external changes to the array would be reflected in the ImmutableArray<T>.

Because a new array copy is created for every ImmutableArray<T> it is important to be mindful of chaining methods that produce immutable arrays to avoid unnecessary intermediate array copies.

In addition, as of System.Immutable.Collections 8.0.0, there is a new ImmutableCollectionsMarshal class that can provide access to the internal array of an ImmutableArray<T> or to create an new ImmutableArray<T> that wraps an existing array without copying. These can be used in high performance scenarios, but should be employed carefully to avoid introducing subtle bugs.

  • Because ImmutableArray<T> is a struct that wraps a single field of a reference type, it is essentially free to copy at runtime. However, this also leaves a bit of a usability wart because, as a struct, an ImmutableArray<T> reference can never be null, but it can has its default, zeroed-out value where the internal array reference is null. For this reason, an IsDefault property is provided to check if an ImmutableArray<T> is actually wrapping an array.
  • ImmutableArray<T> can be used as a persistent data structure via non-destructive mutation, but mutating methods are generally implemented to copy the elements of the internal array. For example, Add will create a copy of the internal array storage with an additional element and return it as an ImmutableArray<T>.

[!NOTE] A Little History

ImmutableArray<T> was not part of System.Collections.Immutable when originally conceived. It was developed out of necessity by Roslyn to expose array data while avoiding the inherent problems of exposing an array. (At the time, .NET arrays didnt even implement IReadOnlyList<T>, which didnt ship until .NET Framework 4.5.) System.Collections.Immutable itself was inspired by the many persistent data structures used internally by Roslyn and was intended to be used within Visual Studio for asynchronous code. However, the NuGet package became so popular that it was ultimately pulled into the .NET runtime.

Using ImmutableArray<T>.Builder

  • The Builder type for ImmutableArray<T> provides a couple of features not provided by other immutable collection builders.
  • ToImmutable(): Like other builders, creates a new ImmutableArray<T> that wraps a copy of the filled portion of internal array buffer used by the builder.
  • MoveToImmutable(): Creates a new ImmutableArray<T> that wraps the internal array buffer used by the builder. Note that this requires that the builders capacity is the same as its count. In other words, the builders internal array buffer must be completely filled, or this will throw an InvalidOperationException. If the operation is successful, the internal buffer is set to an empty array.
  • DrainToImmutable(): This is sort of like a combination of ToImmutable() and MoveToImmutable(). This operation “drains” the builder by checking if the capacity equals the count. If true, it returns a new ImmutableArray<T> that wraps the internal array buffer. If false, it returns a new ImmutableArray<T> that wraps a copy of the filled portion of the internal array buffer. In either case, the internal buffer is set to an empty array.

[!CAUTION] Immutable collections as static data

Because of their performance characteristics, most of the immutable collections are not suitable for static collections. In fact, ImmutableArray<T> is really the only immutable collection that should be used for static data, since accessing it is essentially the same as accessing an array.

When creating a static lookup table it can be tempting to reach for an ImmutableHashSet<T> or an ImmutableDictionary<TKey, TValue>, but that temptation should be resisted! Lookup will always be slower than using he imperative counterpart because of the internal tree structures employed for immutable collections.

There are several tricks that can be used to encapsulate imperative collections as static data. For example, a nested static class could hide a HashSet<T> or Dictionary<TKey, TValue> behind static methods that access the collections. However, a better solution available today is to use a frozen collection.

Frozen Collections

  • The System.Collections.Frozen namespace became available starting with version 8.0.0 of the System.Collections.Immutable NuGet package.
  • Currently, there are two frozen collection types: FrozenSet<T> and FrozenDictionary<TKey, TValue>.
  • The frozen collections are not persistent; in fact, they cant be mutated at all! Instead, frozen collections are optimized for faster lookup operations — faster than their imperative counterparts.
  • Frozen collections provide faster lookup by performing up-front analysis and selecting an optimal implementation for the content. This means that they are much more expensive to create.
  • Because of their higher creation cost and improved lookup performance, frozen collections are best suited for static data.

Array Pools

  • When a temporary array is needed to perform work and the lifetime of the array is bounded, consider acquiring a pooled array. ArrayPool<T> can be used to acquire an array of some minimum length that can be returned to the pool when the work is done.

[!WARNING] Be mindful of the array size!

The size of an array acquired from an ArrayPool<T> is guaranteed to be at least as large as the minimum length that was requested. However, it is likely that a larger array will have been returned. So, care should be taken to avoid using the acquired arrays length, unless thats whats needed.

  • Razor provides a handful of helper extension methods that acquire pooled arrays and return them within the scope of a using statement:
var pool = ArrayPool<char>.Shared;

using (pool.GetPooledArray(minimumLength: 42, out var array)
{
    // When using array but be careful that array.Length >= minimumLength.
}

using (pool.GetPooledArraySpan(minimumLength: 42, out var span)
{
    // span is array.AsSpan(0, minimumLength) to help avoid subtle bugs.
}

Object Pools

using var _ = ListPool<int>.GetPooledObject(out var list);

// Use list here. It'll be returned to the pool at the end of the using
// statement's scope.
  • Pooled collections provide a couple of benefits.
    1. Pooled collections decrease pressure on the garbage collector by reusing collection instances.
    2. Pooled collections avoid growing a collections internal storage. For example, when the List<int> acquired from ListPool<int> in the code sample above is returned to the pool, it will be cleared. However, the capacity of its internal storage will only be trimmed if it is larger than 512. So, lists acquired from the pool are likely to already have a larger capacity than needed for most work.

[!WARNING] Don't allow pooled objects to escape their scope!

Consider the following code:

List<int> M()
{
    using var _ = ListPool<int>.GetPooledObject(out var list);

    // use list...

    return list;
}

The compiler won't complain if a pooled List<int> escapes its scope. In the code above, the List<int> will be returned to the pool at the end of the using statement's scope but is returned from the method. This results several problems:

  1. The list will be cleared when returned to the pool. So, the caller will find it to be empty.
  2. If the caller adds items to the list, other code acquiring a pooled list might receive the mutated list!
  3. Likewise, if the caller holds onto the list, other code acquiring a pooled list might receive the same list and mutate it!

In essence, a pooled object that escapes its scope can corrupt the pool in came from.

Its Magic! PooledArrayBuilder<T>

  • Razors PooledArrayBuilder<T> is heavily inspired by Roslyns TemporaryArray<T>.
  • The important feature of this type (and the reason weve started using it all over Razor) is that it stores the first 4 elements of the array being built inline as fields. After 4 elements have been added, it will acquire a pooled ImmutableArray<T>.Builder. This makes it extremely cheap to use for small arrays and reduces pressure on the object pools.
  • Because PooledArrayBuilder<T> is a struct, it must be passed by-reference. Otherwise, any elements added by a method its passed to wont be reflected back at the call-site.
  • To avoid writing buggy code that accidentally copies a PooledArrayBuilder<T>, it is marked with a [NonCopyable] attribute. A Roslyn analyzer tracks types decorated with that attribute and ensures that instances are never copied.
  • Because PooledArrayBuilder<T> may acquire a pooled ImmutableArray<T>.Builder, it is disposable and should generally be created within a using statement. However, that makes it a bit more awkward to pass by reference, so a special AsRef() extension method is provided.
  • In the following code example, an ImmutableArray<int>.Builder will never be acquired from the pool because the PooledArrayBuilder<int> only ever contains three elements.
ImmutableArray<string> BuildStrings()
{
    using var builder = new PooledArrayBuilder<string>();
    AddElements(ref builder.AsRef());

    return builder.DrainToImmutable();
}

void AddElements(ref PooledArrayBuilder<string> builder)
{
    builder.Add("One");
    builder.Add("Two");
    builder.Add("Three");
}

Using LINQ

  • LINQ (that is, LINQ to Objects) is a bit of a tricky subject. It has been used extensively throughout Razor for a long time. Its certainly not off limits but should be used with an understanding of the hidden costs:
    • Every lambda expression represents at least one allocation — the delegate that holds it.
    • A lambda that accesses variables or instance data from an outer scope will result in a closure being allocated each time the delegate is invoked.
    • Many LINQ methods allocate an iterator instance.
    • Because Razor tooling runs in Visual Studio, it runs on .NET Framework and doesnt benefit from many LINQ optimizations made in modern .NET.
    • Because LINQ methods target IEnumerable<T> instances, they can trigger additional allocations depending on how GetEnumerator() is implemented. For example, a simple call like Queue<T>.Any() might seem innocuous—it doesnt even have a lambda! However, the implementation of Enumerable.Any() on .NET Framework doesnt have any fast paths and simply calls GetEnumerator(). So, Any() boxes Queue<T>s struct enumerator, resulting an allocation every time its called. In a tight loop, that could be disastrous!
    • LINQ can obfuscate algorithmic complexity. It can be hard to see that introducing a LINQ expression has made an algorithm O(n^2).

Best Practices

  • Consider whether LINQ could have a negative performance impact for a particular scenario. Is this a hot path? Is it happening in a loop?
  • Always try to use static lambdas to ensure closures arent created and delegates are cached and reused.
  • What collection type is being targeted? Do we have specialized LINQ methods that could be used? Razor provides a few for ImmutableArray<T> and IReadOnlyList<T>.

Using Collection Expressions

  • C# 12 introduced collection expressions as a language-level abstraction to generate collection-based code. It is a goal of collection expressions to produce efficient code.
  • Collection expressions are generally very good. They are especially helpful for combining collections or even query expressions.
int[] Combine(List<int> list, HashSet<int> set)
{
    return [..list, ..set];
}

int[] Squares(List<int> list, HashSet<int> set)
{
    return [
        ..from x in list select x * x,
        ..from x in set select x * x
    ];
}

[!WARNING] Considerations when using collection expressions

  • Sometimes, a collection expression might create a new temporary collection instance, such as a List<T>. However, it will not acquire a temporary collection from Razors object pools (SharpLab).
  • There are pathological collection expressions to be avoided. For example, never use a collection expression to replace a call to ImmutableArray<T>.Builder.ToImmutable() (SharpLab). When using a collection expression in a new scenario or with an uncommon type, it's a good idea to try it out on https://sharplab.io first.
  • Empty collection expression generally produces very efficient code and can be used without concern (SharpLab).
  • It is expected that collection expressions will improve over time. At the time of writing, there are several open issues tracking collection expression enhancements.

Meta Tips

  • Always be aware of the memory layout, features, and performance characteristics of the data structure you are using.
  • If you have an implementation question for a .NET collection type, check out the source code using the .NET Source Browser for modern .NET, or the .NET Framework Reference Source. And of course, the .NET runtime repo is available at dotnet/runtime.
  • Several reflection-based tools exist for exploring .NET assemblies, such as ILSpy or dotPeek (from JetBrains).
  • Use https://sharplab.io to see how code will be compiled. This can be especially useful for collection expressions, which are usually very efficient do have pathological cases to avoid.