Add Collection Best Practices in Razor document

2024-09-09 15:39:38 -07:00 · 2024-09-09 15:39:38 -07:00 · 8ffc0fc7d3
--- a/docs/CollectionBestPractices.md
+++ b/docs/CollectionBestPractices.md
@ -0,0 +1,360 @@
+# Collection Best Practices in Razor
+
+- [Imperative Collections](#imperative-collections)
+- [Immutable Collections](#immutable-collections)
+  - [Using Builders](#using-builders)
+  - [`ImmutableArray<T>`](#immutablearrayt)
+    - [Using `ImmutableArray<T>.Builder`](#using-immutablearraytbuilder)
+  - [Frozen Collections](#frozen-collections)
+- [Array Pools](#array-pools)
+- [Object Pools](#object-pools)
+- [✨It’s Magic! `PooledArrayBuilder<T>`](#its-magic-pooledarraybuildert)
+- [Using LINQ](#using-linq)
+  - [Best Practices](#best-practices)
+- [Meta Tips](#meta-tips)
+
+# Imperative Collections
+- .NET provides many collection types with different characteristics for different purposes.
+- Collections from the System.Collections namespace should be avoided. Never use these unless in some legacy scenario.
+- The collections in System.Collections.Generic are considered the “work horse” collection types for .NET and are
+  suitable for most purposes. They have years of hardening that make them highly efficient choices for most work.
+- Popular imperative collection types include the ones we all use on a regular basis `List<T>`, `HashSet<T>`,
+  `Dictionary<TKey, TValue>`, `Stack<T>`.
+- System.Collections.Concurrent contains collections that are designed for use when thread-safety is needed.
+  In general, these should only be used in particular situations.
+
+> [!WARNING]
+> **Beware of collection growth**
+>
+> The imperative collections generally have more internal storage than needed to allow more items to be added. (This is
+> what is meant by "capacity” vs. “count”). When enough items are added, the internal storage will need to grow. This
+> requires creating larger storage, releasing the previous storage for garbage collection, and copying the existing
+> contents into it, which consumes CPU time. For a larger collection, this can potential happen many times, so it’s
+> important to set the capacity up front to avoid unnecessary internal storage growth.
+
+> [!WARNING]
+> **Avoid exposing collection interfaces**
+>
+> Avoid exposing collections directly via interfaces, such as `IReadOnlyList<T>` and
+> `IReadOnlyDictionary<TKey, TValue>`. The primary reason for this is that these interfaces can result in allocations
+> when they are foreach’d. In general, collections provide a struct enumerator that can be used to foreach that
+> collection without allocating an `IEnumertor<T>` on the heap. However, when going through a collection interface,
+> there isn’t a struct enumerator, so an allocation is likely required to foreach. In fact, many collections, such as
+> `List<T>`, are implemented to just return their struct enumerator when accessed via collection interfaces, resulting
+> in an allocation when the struct enumerator is boxed.
+> - If exposing a collection is necessary, consider whether it might be better to expose a more optimal read-only
+>   collection. Instead of `IReadOnlyList<T>`, consider [`ImmutableArray<T>`](#immutablearrayt).
+> - There aren’t many other options when an API calls for exposing an `IReadOnlyDictionary<TKey, TValue>`. In these
+>   cases, consider whether it might be better to just avoid exposing the collection altogether and provide APIs that
+>   access it. Or, in some cases, it might be necessary to create entirely new collection types. (This is why Razor
+>   `TagHelperDescriptors` expose a `MetadataCollection`.)
+
+> [!WARNING]
+> **Be mindful of ToArray()**
+>
+> Calling `ToArray()` on a collection will create a new array and copy content from the collection into it. So, when
+> the exact capacity is known up front, it is an anti-pattern to create a `List<T>` withwout that capacity, fill it
+> with items and then call `ToArray()` at the end. This results in extra allocations that could be avoided by creating
+> an array and filling it.
+
+# Immutable Collections
+- The .NET immutable collections are provided by the System.Collections.Immutable NuGet package, which provides
+  implementations for .NET, .NET Framework, and .NET Standard 2.0.
+- The collections in the System.Collections.Immutable namespace have a very specific purpose.
+- They are intended to be *persistent* data structures; that is, a data structure that always preserves the previous
+  version of itself when it is modified. Such data structures are effectively immutable, but it might have been better
+  for the namespace to have been called, System.Collections.Persistent.
+  - The term “persistent data structure” was introduced by the 1986 paper,
+    “Making Data Structures Persistent” ([PDF](https://www.cs.cmu.edu/~sleator/papers/making-data-structures-persistent.pdf)).
+  - A highly influential book in the area of persistent data structures is “Purely Functional Data Structures” (1999)
+    by Chris Okasaki ([Amazon](https://www.amazon.com/Purely-Functional-Data-Structures-Okasaki/dp/0521663504)).
+    Okasaki’s original dissertation is available from CMU’s website ([PDF](https://www.cs.cmu.edu/~rwh/students/okasaki.pdf)).
+- Because of their persistency, nearly all of the immutable collections have very different implementations than their
+  imperative counterparts. For example, `List<T>` is implemented using an array, while `ImmutableList<T>` is implemented
+  using a binary tree.
+- Mutating methods on an immutable collection perform “non-destructive mutation”. Instead, of mutating the underlying
+  object, a mutating method like `Add` produces a new instance of the immutable collection. This is similar to how the
+  `String.Replace(...)` API is used.
+- The difference in implementation affects the asymptotic complexity of many standard operations. For example, indexing
+  into a `List<T>` is O(1) but indexing into an `ImmutableList<T>` is O(log n).
+- Significant effort has been made to ensure that immutable collections are as efficient as they can be, while
+  maintaining their persistence characteristics. However, they are generally assumed to be slower than imperative
+  collections.
+
+> [!CAUTION]
+> **ToImmutableX() extension methods are not “freeze” methods!**
+>
+> The System.Immutable.Collections package provides several extension methods that produce an immutable collection from
+> an existing collection or sequence. These methods aren’t optimized to reuse the internal storage of other collections
+> in any way. Because of this, the following code is an anti-pattern. In this example, each element is added to a
+> `HashSet<int>` and then the elements of that set are added to a new `ImmutableHashSet<int>`.
+>
+> ```C#
+> var array = new[] { "One", "Two", "Two", "One", "Three" };
+> var set = new HashSet<int>(array).ToImmutableHashSet();
+> ```
+
+## Using Builders
+- When creating an immutable collection with a lot of mutation, use a builder. Builders are optimized to populate the
+  internal storage of an immutable collection.
+- The following code achieves the expected result but inefficiently creates several intermediate `ImmutableList<int>`
+  instances.
+
+```C#
+ImmutableList<int> CreateList()
+{
+    var list = ImmutableList<int>.Empty;
+    for (var i = 0; i < 10; i++)
+    {
+      list = list.Add(i);
+    }
+
+    return list;
+}
+```
+
+- The version below populates an `ImmutableList<int>.Builder` and creates just a single `ImmutableList<int>` instance
+  at the end.
+
+```C#
+ImmutableList<int> CreateList()
+{
+    var builder = ImmutableList.CreateBuilder<int>();
+
+    for (var i = 0; i < 10; i++)
+    {
+        builder.Add(i);
+    }
+
+    return builder.ToImmutable();
+}
+```
+
+## `ImmutableArray<T>`
+- `ImmutableArray<T>` is very different than the other immutable collections. It is the only struct collection type,
+  and is not optimized for persistence. (In hindsight, perhaps a more appropriate name would have been
+  `FrozenArray<T>`?)
+- `ImmutableArray<T>` is a relatively simple struct that provides read-only access to an internal array.
+
+> [!WARNING]
+> **Be aware of copies!**
+>
+> In order to maintain its immutability semantics, `ImmutableArray<T>` *always* creates a copy of the array it is
+> wrapping internally. If it didn’t, external changes to the array would be reflected in the `ImmutableArray<T>`.
+>
+> Because a new array copy is created for every `ImmutableArray<T>` it is important to be mindful of chaining methods
+> that produce immutable arrays to avoid unnecessary intermediate array copies.
+>
+> In addition, as of System.Immutable.Collections 8.0.0, there is a new `ImmutableCollectionsMarshal` class that can
+> provide access to the internal array of an `ImmutableArray<T>` or to create an new `ImmutableArray<T>` that wraps an
+> existing array without copying. These can be used in high performance scenarios, but should be employed carefully to
+> avoid introducing subtle bugs.
+
+- Because `ImmutableArray<T>` is a struct that wraps a single field of a reference type, it is essentially free to copy
+  at runtime. However, this also leaves a bit of a usability wart because, as a struct, an `ImmutableArray<T>` reference
+  can never be null, but it can has its default, zeroed-out value where the internal array reference is null. For this
+  reason, an `IsDefault` property is provided to check if an `ImmutableArray<T>` is actually wrapping an array.
+- `ImmutableArray<T>` *can* be used as a persistent data structure via non-destructive mutation, but mutating methods
+  are generally implemented to copy the elements of the internal array. For example, `Add` will create a copy of the
+  internal array storage with an additional element and return it as an `ImmutableArray<T>`.
+
+> [!NOTE]
+> **A Little History**
+>
+> `ImmutableArray<T>` was not part of System.Collections.Immutable when originally conceived. It was developed out of
+> necessity by Roslyn to expose array data while avoiding the inherent problems of exposing an array. (At the time,
+> .NET arrays didn’t even implement `IReadOnlyList<T>`, which didn’t ship until .NET Framework 4.5.)
+> System.Collections.Immutable itself was inspired by the many persistent data structures used internally by Roslyn and
+> was intended to be used within Visual Studio for asynchronous code. However, the NuGet package became so popular that
+> it was ultimately pulled into the .NET runtime.
+
+### Using `ImmutableArray<T>.Builder`
+- The Builder type for `ImmutableArray<T>` provides a couple of features not provided by other immutable collection
+  builders.
+- `ToImmutable()`: Like other builders, creates a new `ImmutableArray<T>` that wraps a copy of the filled portion of
+  internal array buffer used by the builder.
+- `MoveToImmutable()`: Creates a new `ImmutableArray<T>` that wraps the internal array buffer used by the builder. Note
+  that this requires that the builder’s capacity is the same as its count. In other words, the builder’s internal array
+  buffer must be completely filled, or this will throw an `InvalidOperationException`. If the operation is successful,
+  the internal buffer is set to an empty array.
+- `DrainToImmutable()`: This is sort of like a combination of `ToImmutable()` and `MoveToImmutable()`. This operation
+  “drains” the builder by checking if the capacity equals the count. If true, it returns a new `ImmutableArray<T>` that
+  wraps the internal array buffer. If false, it returns a new `ImmutableArray<T>` that wraps a copy of the filled
+  portion of the internal array buffer. In either case, the internal buffer is set to an empty array.
+
+> [!CAUTION]
+> **Immutable collections as static data**
+>
+> Because of their performance characteristics, most of the immutable collections are *not* suitable for static
+> collections. In fact, `ImmutableArray<T>` is really the only immutable collection that should be used for static data,
+> since accessing it is essentially the same as accessing an array.
+>
+> When creating a static lookup table it can be tempting to reach for an `ImmutableHashSet<T>` or an
+> `ImmutableDictionary<TKey, TValue>`, but that temptation should be resisted! Lookup will always be slower than using
+> he imperative counterpart because of the internal tree structures employed for immutable collections.
+>
+> There are several tricks that can be used to encapsulate imperative collections as static data. For example, a nested
+> static class could hide a `HashSet<T>` or `Dictionary<TKey, TValue>` behind static methods that access the
+> collections. However, a better solution available today is to use a [frozen collection](#Frozen-Collections).
+
+## Frozen Collections
+- The System.Collections.Frozen namespace became available starting with version 8.0.0 of the
+  System.Collections.Immutable NuGet package.
+- Currently, there are two frozen collection types: `FrozenSet<T>` and `FrozenDictionary<TKey, TValue>`.
+- The frozen collections are not persistent; in fact, they can’t be mutated at all! Instead, frozen collections are
+  optimized for faster lookup operations — faster than their imperative counterparts.
+- Frozen collections provide faster lookup by performing up-front analysis and selecting an optimal implementation for
+  the content. This means that they are much more expensive to create.
+- Because of their higher creation cost and improved lookup performance, frozen collections are best suited for
+  static data.
+
+# Array Pools
+- When a temporary array is needed to perform work and the lifetime of the array is bounded, consider acquiring a
+  pooled array. `ArrayPool<T>` can be used to acquire an array of some minimum length that can be returned to the pool
+  when the work is done.
+
+> [!WARNING]
+> **Be mindful of the array size!**
+>
+> The size of an array acquired from an `ArrayPool<T>` is guaranteed to be at least as large as the minimum length that
+> was requested. However, it is likely that a larger array will have been returned. So, care should be taken to avoid
+> using the acquired array’s length, unless that’s what’s needed.
+
+- Razor provides a handful of helper extension methods that acquire pooled arrays and return them within the scope of a
+using statement:
+
+```C#
+var pool = ArrayPool<char>.Shared;
+
+using (pool.GetPooledArray(minimumLength: 42, out var array)
+{
+    // When using array but be careful that array.Length >= minimumLength.
+}
+
+using (pool.GetPooledArraySpan(minimumLength: 42, out var span)
+{
+    // span is array.AsSpan(0, minimumLength) to help avoid subtle bugs.
+}
+```
+
+# Object Pools
+- Razor provides object pooling facilities based on
+  [Microsoft.Extensions.ObjectPool](https://www.nuget.org/packages/Microsoft.Extensions.ObjectPool/) (which was
+  originally based on Roslyn’s `ObjectPool<T>`) along with several premade pools for many collection types in the
+  [Microsoft.AspNetCore.Razor.PooledObjects](https://github.com/dotnet/razor/tree/5c0677ad275e64300b897de0f6e8856ebe13f07b/src/Shared/Microsoft.AspNetCore.Razor.Utilities.Shared/PooledObjects)
+  namespace. These can be used to acquire temporary collections to use for work and return when finished.
+
+```C#
+using var _ = ListPool<int>.GetPooledObject(out var list);
+
+// Use list here. It'll be returned to the pool at the end of the using
+// statement's scope.
+```
+
+- Pooled collections provide a couple of benefits.
+  1. Pooled collections decrease pressure on the garbage collector by reusing collection instances.
+  2. Pooled collections avoid growing a collection’s internal storage. For example, when the `List<int>` acquired from
+     `ListPool<int>` in the code sample above is returned to the pool, it will be cleared. However, the capacity of its
+     internal storage will only be trimmed if it is larger than 512. So, lists acquired from the pool are likely to
+     already have a larger capacity than needed for most work.
+
+# ✨It’s Magic! `PooledArrayBuilder<T>`
+
+- Razor’s [`PooledArrayBuilder<T>`](https://github.com/dotnet/razor/blob/5c0677ad275e64300b897de0f6e8856ebe13f07b/src/Shared/Microsoft.AspNetCore.Razor.Utilities.Shared/PooledObjects/PooledArrayBuilder%601.cs)
+  is heavily inspired by Roslyn’s [`TemporaryArray<T>`](https://github.com/dotnet/roslyn/blob/d176f9b5a7220cd95a6d5811ba1c49ac392a2fdc/src/Compilers/Core/Portable/Collections/TemporaryArray%601.cs).
+- The important feature of this type (and the reason we’ve started using it all over Razor) is that it stores the first
+  4 elements of the array being built inline as fields. After 4 elements have been added, it will acquire a pooled
+  `ImmutableArray<T>.Builder`. This makes it extremely cheap to use for small arrays and reduces pressure on the object
+  pools.
+- Because `PooledArrayBuilder<T>` is a struct, it must be passed by-reference. Otherwise, any elements added by a method
+  it’s passed to won’t be reflected back at the call-site.
+- To avoid writing buggy code that accidentally copies a `PooledArrayBuilder<T>`, it is marked with a `[NonCopyable]`
+  attribute. A Roslyn analyzer tracks types decorated with that attribute and ensures that instances are never copied.
+- Because `PooledArrayBuilder<T>` _may_ acquire a pooled `ImmutableArray<T>.Builder`, it is disposable and should
+  generally be created within a using statement. However, that makes it a bit more awkward to pass by reference, so a
+  special `AsRef()` extension method is provided.
+- In the following code example, an `ImmutableArray<int>.Builder` will never be acquired from the pool because the
+  `PooledArrayBuilder<int>` only ever contains three elements.
+
+```C#
+ImmutableArray<string> BuildStrings()
+{
+    using var builder = new PooledArrayBuilder<string>();
+    AddElements(ref builder.AsRef());
+
+    return builder.DrainToImmutable();
+}
+
+void AddElements(ref PooledArrayBuilder<string> builder)
+{
+    builder.Add("One");
+    builder.Add("Two");
+    builder.Add("Three");
+}
+```
+
+# Using LINQ
+- LINQ (that is, LINQ to Objects) is a bit of a tricky subject. It has been used extensively throughout Razor for a long
+  time. It’s certainly not off limits but should be used with an understanding of the hidden costs:
+  - Every lambda expression represents at least one allocation — the delegate that holds it.
+  - A lambda that accesses variables or instance data from an outer scope will result in a closure being allocated each
+    time the delegate is invoked.
+  - Many LINQ methods allocate an iterator instance.
+  - Because Razor tooling runs in Visual Studio, it runs on .NET Framework and doesn’t benefit from many LINQ
+    optimizations made in modern .NET.
+  - Because LINQ methods target `IEnumerable<T>` instances, they can trigger additional allocations depending on how
+    `GetEnumerator()` is implemented. For example, a simple call like `Queue<T>.Any()` might seem innocuous—it doesn’t
+    even have a lambda! However, the implementation of
+    [`Enumerable.Any()`](https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs,1288) on .NET
+    Framework doesn’t have any fast paths and simply calls `GetEnumerator()`. So, `Any()` boxes `Queue<T>`’s struct
+    enumerator, resulting an allocation every time it’s called. In a tight loop, that could be disastrous!
+  - LINQ can obfuscate algorithmic complexity. It can be hard to see that introducing a LINQ expression has made an
+    algorithm O(n^2).
+
+## Best Practices
+- Consider whether LINQ could have a negative performance impact for a particular scenario. Is this a hot path? Is it
+  happening in a loop?
+- Always try to use static lambdas to ensure closures aren’t created and delegates are cached and reused.
+- What collection type is being targeted? Do we have specialized LINQ methods that could be used? Razor provides a few
+  for `ImmutableArray<T>` and `IReadOnlyList<T>`.
+
+# Using Collection Expressions
+- C# 12 introduced collection expressions as a language-level abstraction to generate collection-based code. It is a
+  goal of collection expressions to produce efficient code.
+- Collection expressions are generally very good. They are especially helpful for combining collections or even query
+  expressions.
+
+```C#
+int[] Combine(List<int> list, HashSet<int> set)
+{
+    return [..list, ..set];
+}
+
+int[] Squares(List<int> list, HashSet<int> set)
+{
+    return [
+        ..from x in list select x * x,
+        ..from x in set select x * x
+    ];
+}
+```
+
+> [!WARNING]
+>
+> Below are a few issues to consider when using a collection expression:
+> - Sometimes, a collection expression might create a new temporary collection instance, such as a `List<T>`. However,
+>   it will not acquire a temporary collection from Razor’s object pools ([SharpLab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0ATEBqAPgAQAYACfARhQG4BYAKHwGZSAmYgYWIG87jfSmAlgDsALgG0AusQCyACnIMAPMJEA+YgGcYARwCuMIWBgBKLjz4X8AdmJiAdHc079hmBJq0LAXzpegA=)).
+> - There are pathological collection expressions to be avoided. For example, never use a collection expression to
+>   replace a call to `ImmutableArray<T>.Builder.ToImmutable()` ([SharpLab](https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0ATEBqAPgAQAYACfARgDoBhCAG1pjABcBLCAOwGcKBJAWz4BXJgENgDANwBYAFD4AzKQBMxKsQDes4ttKL+Q0eJgBBKFBEBPADwt2TAHzEAsgApbTANoBdYiLOWASg0tHVCANz9iYEEWWgwYKGIAXmJ9YTEGU3MLalgRJhgAIRi4hJs7excA6RlQ0OjY+KgKYwwMACURdgBzGBc/bOqQuuJhuvwAdmIPCgookqavGtCAX1kVoA=)).
+
+# Meta Tips
+
+- Always be aware of the memory layout, features, and performance characteristics of the data structure you are using.
+- If you have an implementation question for a .NET collection type, check out the source code using the
+  [.NET Source Browser](https://source.dot.net/)for modern .NET, or the
+  [.NET Framework Reference Source](https://referencesource.microsoft.com/). And of course, the .NET runtime repo is
+  available at [dotnet/runtime](https://github.com/dotnet/runtime).
+- Several reflection-based tools exist for exploring .NET assemblies, such as
+  [ILSpy](https://github.com/icsharpcode/ILSpy) or dotPeek (from JetBrains).
+- Use https://sharplab.io to see how code will be compiled. This can be especially useful for collection expressions,
+  which are usually very efficient do have pathological cases to avoid.