Add collection literals working group notes

This commit is contained in:
jnm2 2023-05-26 23:49:58 -04:00
Родитель 9e68cac625
Коммит 979330b632
1 изменённых файлов: 117 добавлений и 0 удалений

Просмотреть файл

@ -0,0 +1,117 @@
# Collection literals working group meeting for May 26, 2023
## Agenda
* Codegen patterns and APIs for efficient construction
* Open questions on dictionary literals
* Cut line for C# 12
## Discussion
### Capacity of constructed collections
If an exact length is able to be calculated for the constructed collection, the capacity could be set ahead of time to avoid resizing the backing storage of the collection. But if the constructed collection escapes outside the method body, the next .Add would trigger a resize to double the backing storage size.
People will be able to write better code by hand than the compiler can generate if they have outside knowledge of the eventual state of the collection during its life past the construction of the collection literal. This would be a caveat to a statement about the collection literals feature generating optimal code. This is fine; the compiler can strike a good balance, and people can drop down to manual construction of the collection if they need more control over the initial capacity.
When an exact count is known for a collection literal but the compiler can't prove that it is not potentially resized later, or when only a minimum count is known for a collection literal, the compiler could for example set the initial capacity to the best-fitting power of two. This would skip some early resizes while also amortizing resizes that could happen if the collection is modified after the collection literal is created.
### Unknown-length construction
When a struct-typed collection is spread, codegen could call `AddRange(IEnumerable<T>)` on the collection being constructed. This would cause the collection being spread to be boxed. To avoid boxing, codegen could loop over the spread collection and call `Add(T)` repeatedly on the collection being constructed.
This would work fine when the constructed collection has a writeable Capacity property because codegen could set the capacity before calling `Add(T)` in a loop. However, if there's no writeable Capacity property, we're better off calling `AddRange` and boxing, because the `AddRange` implementation is the only way to intelligently avoid intermediate resizes. (In this situation, the thing being spread has a known length but the collection literal as a whole has an unknown length due to an earlier spread.)
#### Conclusion (unknown-length construction)
Stick with building using Add and AddRange. This is good enough for 95% of customers. Allow the compiler to special-case codegen for widely-used collections to improve performance.
The compiler should prefer calling `AddRange(ReadOnlySpan<T>)` if such a method exists and the thing being spread is a span or exposes a span.
### Known-length construction
The ideal API for constructing collections with contiguous backing storage is to pass the capacity and receive a span to the backing storage. To enable creation methods to be in the same class while having the same name, the collection could be passed back using an `out` parameter.
The type parameter to use when calling the creation method could be taken from the generic type argument of the collection's `IEnumerable<>` implementation. This strategy would not work for collections which implement `IEnumerable<>` more than once.
```cs
[CollectionLiteralBuilder(
typeof(CollectionsMarshal),
nameof(CollectionsMarshal.CreateUnsafe))]
public class MyCollection<T> : IEnumerable<T> { }
public static class CollectionsMarshal
{
public static void CreateUnsafe<T>(
int capacity,
out MyCollection<T> collection,
out Span<T> storage);
public static void CreateUnsafe<T>(
int capacity,
out List<T> collection,
out Span<T> storage);
}
```
The reverse API pattern (passing a span to the collection rather than receiving one) would enable the construction of collections that have noncontiguous backing storage and do not have mutating Add/AddRange methods, such as ImmutableDictionary.
This would allow the use of existing Create methods in the BCL:
```cs
[CollectionLiteralBuilder(
typeof(MyCollection),
nameof(MyCollection.Create))]
public class MyCollection<T> : IEnumerable<T> { }
public static class MyCollection
{
public static MyCollection<T> Create<T>(ReadOnlySpan<T> elements);
}
```
#### Conclusion (known-length construction)
Use an attribute to point to a creation method similar to the code samples above. If this attribute is present, prefer it over generating Add/AddRange calls.
This will need to go through API review. If we don't ship this general facility in C# 12, special-case access to the backing spans of List and ImmutableArray anyway for the sake of performance.
### Dictionary sigils
In a previous full LDM, some folks didn't like the idea that `[..dict1, ..dict2]` would create a dictionary (unless the target type is a dictionary type). A merged dictionary could change the count and enumerated order of the key-value pairs. Similar concerns apply to `[kvp1, kvp2, kvp3]`.
The working group doesn't expect these syntaxes to be commonly desired for producing dictionaries. We expect at least one `k: v` element to be present in literals where the user wants a dictionary and the target type is not already a dictionary type. Merging two dictionaries to create a new dictionary, and doing nothing further, is not a scenario we think comes up that often.
So, is there a place for a sigil to override this, where syntax similar to `[:, ..dict1, ..dict2]` or `[:, kvp1, kvp2]` causes a dictionary to be created instead of a list of key-value pairs?
An existing way to override this would be `[..dict1, ..dict2].ToDictionary()`. This is something which users may do anyway to override the natural type in other situations, such as `var x = [a, b].ToImmutableArray();`.
There's an open question about whether the natural type is influenced by the collection types of spread collections. Depending on the resolution, `[..dict1, ..dict2]` may create a dictionary while `[..listOfKvps1, ..listOfKvps2]` does not. A dictionary sigil would then be addressing a rarer set of scenarios.
#### Conclusion (dictionary sigils)
The working group doesn't find the sigil use cases compelling, and it can be added later.
### Cut line for C# 12
The working group feels good about shipping these features in C# 12:
* Target-typing to core collections and collections which support classic collection initializers
* An opt-in API construction pattern to enable 1⃣ target-typing to other collections and 2⃣ improving performance over classic collection initializers
* Spreads
* Dictionary literals
We'd like this in C# 12, but we'd rather let this slip to C# 13 than anything in the previous list:
* Natural type for non-empty literals
* We think performance concerns about `List<T>` have a great resolution after input from the runtime on the "optimize away `List<T>`" strategy.
* Design discussion is still needed. For example, does the collection type of a spread collection affect the natural type of the collection literal?
Not interested in pursuing for C# 12:
* Natural type for empty literals
* This requires forward-looking type inference.
## Next steps
We will present takeaways of recent working group meetings in a full language design meeting and get feedback on those and on some open questions.