14 KiB
C# Design Notes for Aug 27, 2014
Agenda
The meeting focused on rounding out the feature set around structs.
- Allowing parameterless constructors in structs <allow, but some unresolved details>
- Definite assignment for imported structs <revert to Dev12 behavior>
Parameterless constructors in structs
Unlike classes, struct types cannot declare a parameterless constructor in C# and VB today. The reason is that the syntax new S()
in C# has historically been reserved for producing a zero-initialized instance of the struct. VB.Net has always had an alternative syntax for that (Nothing
) and C# 2.0 also added one: default(T)
. So the new S()
syntax is no longer necessary for this purpose.
It is possible to define parameterless constructors for structs in IL, but neither C#, VB or F# allow you to. All three languages have mostly sane semantics when consuming one, though, mostly having new S()
call the constructor instead of zero-initializing the struct (except in some corner cases visited below).
Not being able to define parameterless constructors in structs has always been a bit of a pain, and now that we’re adding initializers to structs it becomes outright annoying.
Conclusion
We want to add the ability to declare explicit public parameterless constructors in structs, and we also want to think about reducing the number of occurrences of new S()
that produce a default value. In the following we explore details and additional proposals.
Accessibility
C#, VB and F# will all call an accessible parameterless constructor if they find one. If there is one, but it is not accessible, C# and VB will backfill default(T)
instead. (F# will complain.)
It is problematic to have successful but different behavior of new S()
depending on where you are in the code. To minimize this issue, we should make it so that explicit parameterless constructors have to be public. That way, if you want to replace the “default behavior” you do it everywhere.
Conclusion
Parameterless constructors will be required to be public.
Compatibility
Non-generic instantiation of structs with (public) parameterless constructors does the right thing in all three languages today. With generics it gets a little more subtle. All structs satisfy the new()
constraint. When new T()
is called on a type parameter T
, the compiler should generate a call to Activator.CreateInstance
– and in VB and F# it does. However, C# tries to be smart, discovers at runtime that T
is a struct (if it doesn’t already know from the struct
constraint), and emits default(T)
instead!
public T M<T>() where T: new() { return new T(); }
Clearly we should remove this “optimization” and always call Activator.CreateInstance
in C# as well. This is a bit of a breaking change, in two ways. Imagine the above method is in a library:
- The more obvious – but also more esoteric – break is if people today call the library with a struct type (written directly in IL) which has a parameterless constructor, yet they depend on the library not calling that parameterless constructor. That seems extremely unlikely, and we can safely ignore this aspect.
- The more subtle issue is if such a library is not recompiled as we start populating the world with structs with parameterless constructors. The library is going to be wrongly not calling those constructors until someone recompiles it. But if it’s a third party library and they’ve gone out of business, no-one ever will.
We believe even the second kind of break is relatively rare. The new()
constraint isn’t used much. But it would be nice to validate.
Conclusion
Change the codegen for generic new T()
in C# but try to validate that the pattern is rare in known code.
Default arguments
For no good reason C# allows new
expressions for value types to occur as default arguments to optional parameters:
void M(S s = new S()){ … }
This is one place where we cannot (and do not) call a parameterless constructor even when there is one. This syntax is plain bad. It suggests one meaning but delivers another.
We should do what we can (custom diagnostic?) to move developers over to use default(S)
with existing types. More importantly we should not allow this syntax at all when S
has a parameterless constructor. This would be a slight breaking change for the vanishingly rare IL-authored structs that do today, but so be it.
Conclusion
Forbid new S()
in default arguments when S
has a parameterless constructor, and consider a custom diagnostic when it doesn’t. People should use default(S)
instead.
Helpful diagnostic
In general, with this change we are trying to introduce more of a distinction between default values and constructed values for structs. Today it is very blurred by the use of new S()
for both meanings.
Arguably the use of new S()
to get the default value is fine as long as S
does not have any explicit constructors. It can be viewed a bit like making use of the default constructor in classes, which gets generated for you if you do not have any explicit constructors.
The confusion is when a struct type “intends” to be constructed, by advertising one or more constructors. Provided that none of those is parameterless, new S()
still creates an unconstructed default value. This may or may not be the intention of the calling code. Oftentimes it would represent a bug where they meant to construct it (and run initializers and so forth), but the lack of complaint from the compiler caused them to think everything was all right.
Occasionally a developer really does want to create an uninitialized value even of a struct that has constructors. In those cases, though, their intent would be much clearer if they used the default(S)
syntax instead.
It therefore seems that everyone would be well served by a custom diagnostic that would help “clear up the confusion” as it were, by
- Flagging occurrences of
new S()
whereS
has constructors but not a parameterless one - Offering a fix to change to
default(T)
, as well as fixes to call the constructors
This would help identify subtle bugs where they exist, and make the developer’s intent clearer when the behavior is intentional.
The issue of course is how disruptive such a diagnostic would be to existing code. Would it be so annoying that they would just turn it off? Also, is the above assumption correct, that the occurrence of any constructor means that the library author intended for a constructor to always run?
Conclusion
We are cautiously interested in such a diagnostic, but concerned that it would be too disruptive. We should evaluate its impact on current code.
Chaining to the default constructor when there’s a parameterless one
A struct constructor must ensure that the struct is definitely assigned. It can do so by chaining to another constructor or by assigning to all fields.
For structs with auto-properties there is an annoying fact that you cannot assign to the underlying field because its name is hidden, and you cannot assign to the setter, because you are not allowed to invoke a function member until the whole struct is definitely assigned. Catch 22!
People usually deal with this today by chaining to the default constructor – which will zero-initialize the entire struct. If there is a user-defined parameterless constructor, however, that will not work. (Especially not if that is the constructor you are trying to implement!)
There is a workaround. Instead of writing
S(int x): this() { this.X = x; }
You can make use of the fact that in a struct, this
is an l-value:
S(int x) { this = default(S); this.X = x; }
It’s not pretty, though. In fact it’s rather magical. We may want to consider adding an alternative syntax for zero-initializing from a constructor; e.g.:
S(int x): default() { this.X = x; }
However, it is also worth noting that auto-properties themselves are evolving. You can now directly initialize their underlying field with an initializer on the auto-property. And for getter-only auto-properties, assignment in the constructor will also directly assign the underlying field. So maybe problem just isn’t there any longer. You can just zero-initialize the auto-properties directly:
public int X { get; set; } = 0;
Now the definite assignment analysis will be happy when you get to running a constructor body.
Conclusion
Do nothing about this right now, but keep an eye on the issue.
Generated parameterless constructors
The current rule is that initializers are only allowed if there are constructors that can run them. This seems reasonable, but look at the following code:
struct S
{
string label = "<unknown>";
bool pending = true;
public S(){}
…
}
Do we really want to force people to write that trivial constructor? Had this been a class, they would just have relied on the compiler-generated default constructor.
It is probably desirable to at least do what classes do and generate a default constructor when there are no other constructors. Of course we wouldn’t generate one when there are no initializers either: that would be an unnecessary (and probably slightly breaking) change over what we do today, as the generated constructor would do exactly the same as the default new S()
behavior anyway.
A question though is if we should generate a parameterless constructor to run initializers even if there are also parameterful ones. After all, don’t we want to ensure that initializers get run in as many cases as possible?
This seems somewhat attractive, though it does mean that a struct with initializers doesn’t get to choose not to have a generated parameterless constructor that runs the initializers.
Also, in the case that there’s a primary constructor it becomes uncertain what it would mean for a parameterless constructor to run the initializers: after all they may refer to primary constructor parameters that aren’t available to the parameterless constructor:
struct Name(string first, string last)
{
string first = first;
string last = last;
}
How is a generated parameterless constructor supposed to run those initializers? To make this work, we would probably have to make the parameterless constructor chain to the primary constructor (all other constructors must chain to the primary constructor), passing default values as arguments.
Alternatively we could require that all structs with primary constructors also provide a parameterless constructor. But that kind of defeats the purpose of primary constructors in the first place: doing the same with less code.
In all we seem to have the following options:
- Don’t generate anything. If you have initializers, you must also provide at least one constructor. The only change from today’s design is that one of those constructors can be parameterless.
- Only generate a parameterless constructor if there are no other constructors. This most closely mirrors class behavior, but it may be confusing that adding an explicit constructor “silently” changes the meaning of
new S()
back to zero-initialization. (The above diagnostic would help with that, though). - Generate a parameterless constructor only when there is not a primary constructor and a. Still fall back to zero-initialization for new S() in this case b. Require a parameterless constructor to be explicitly specified This seems to introduce an arbitrary distinction between primary constructors and other constructors that prevents easy refactoring back and forth between them.
- Generate a parameterless constructor even when there is a primary constructor a. using default values and/or b. some syntax to provide the arguments as part of the primary constructor This seems overly magical, and again treats primary constructors more differently than was the intent with their design.
Conclusion
This is a hard one, and we didn’t reach agreement. We probably want to do at least option 2, since part of our goal is for structs to become more like classes. But we need to think more about the tradeoffs between that and the more drastic (but also more helpful?) approaches.
Definite assignments for imported structs
Unlike classes, private fields in structs do need to be observed in various ways on the consumer side – they cannot be considered entirely an implementation detail.
In particular, in order to know if a struct is definitely assigned we need to know if its fields have all been initialized. For inaccessible fields, there is no sure way to do that piecemeal, so if such inaccessible fields exist, the struct-consuming code must insist that the struct value as a whole has been constructed or otherwise initialized.
So the key is to know if the struct has inaccessible fields. The native compiler had a long-running bug that would cause it to check imported structs for inaccessible fields only where those fields were of value type! So if the struct had only a private field of a reference type, the compiler would fail to ensure that it was definitely assigned.
In Roslyn we started out implementing the specification, which was of course stricter and turned out to break some code (that was buggy and should probably have been fixed). Instead we then went to the opposite extreme and just stopped ensuring definite assignment of these structs altogether. This lead to its own set of problems, primarily in the form of a new set of bugs that went undetected because of the laxer rules.
Ideally we would go back to implementing the spec. This would break old code, but have the best experience for new code. If we had a “quirks” mode approach, we could allow e.g. the lang-version flag to be more lax on older versions. Part of migrating a code base to the new version of the language would involve fixing this kind of issue.
Conclusion
Unfortunately we do not have the notion of a quirks mode. Like a number of issues before, this one alone does not warrant introducing one – after all, it is a new kind of upgrade tax on customers. We should compile a list of things we would do if we had a quirks mode, and evaluate if the combined value would be enough to justify it.
Definite assignment for structs should be on that list. In the meantime however, the best we can do is to revert to the behavior of the native compiler, so that’s what we’ll do.