- Start Date: (fill me in with today's date, 2014-08-28) - RFC PR: [rust-lang/rfcs#218](https://github.com/rust-lang/rfcs/pull/218/files) - Rust Issue: [rust-lang/rust#218](https://github.com/rust-lang/rust/issues/24266) # Summary When a struct type `S` has no fields (a so-called "empty struct"), allow it to be defined via either `struct S;` or `struct S {}`. When defined via `struct S;`, allow instances of it to be constructed and pattern-matched via either `S` or `S {}`. When defined via `struct S {}`, require instances to be constructed and pattern-matched solely via `S {}`. # Motivation Today, when writing code, one must treat an empty struct as a special case, distinct from structs that include fields. That is, one must write code like this: ```rust struct S2 { x1: int, x2: int } struct S0; // kind of different from the above. let s2 = S2 { x1: 1, x2: 2 }; let s0 = S0; // kind of different from the above. match (s2, s0) { (S2 { x1: y1, x2: y2 }, S0) // you can see my pattern here => { println!("Hello from S2({}, {}) and S0", y1, y2); } } ``` While this yields code that is relatively free of extraneous curly-braces, this special case handling of empty structs presents problems for two cases of interest: automatic code generators (including, but not limited to, Rust macros) and conditionalized code (i.e. code with `cfg` attributes; see the [CFG problem] appendix). The heart of the code-generator argument is: Why force all to-be-written code-generators and macros with special-case handling of the empty struct case (in terms of whether or not to include the surrounding braces), especially since that special case is likely to be forgotten (yielding a latent bug in the code generator). The special case handling of empty structs is also a problem for programmers who actively add and remove fields from structs during development; such changes cause a struct to switch from being empty and non-empty, and the associated revisions of changing removing and adding curly braces is aggravating (both in effort revising the code, and also in extra noise introduced into commit histories). This RFC proposes an approach similar to the one we used circa February 2013, when both `S0` and `S0 { }` were accepted syntaxes for an empty struct. The parsing ambiguity that motivated removing support for `S0 { }` is no longer present (see the [Ancient History] appendix). Supporting empty braces in the syntax for empty structs is easy to do in the language now. # Detailed design There are two kinds of empty structs: Braced empty structs and flexible empty structs. Flexible empty structs are a slight generalization of the structs that we have today. Flexible empty structs are defined via the syntax `struct S;` (as today). Braced empty structs are defined via the syntax `struct S { }` ("new"). Both braced and flexible empty structs can be constructed via the expression syntax `S { }` ("new"). Flexible empty structs, as today, can also be constructed via the expression syntax `S`. Both braced and flexible empty structs can be pattern-matched via the pattern syntax `S { }` ("new"). Flexible empty structs, as today, can also be pattern-matched via the pattern syntax `S`. Braced empty struct definitions solely affect the type namespace, just like normal non-empty structs. Flexible empty structs affect both the type and value namespaces. As a matter of style, using braceless syntax is preferred for constructing and pattern-matching flexible empty structs. For example, pretty-printer tools are encouraged to emit braceless forms if they know that the corresponding struct is a flexible empty struct. (Note that pretty printers that handle incomplete fragments may not have such information available.) There is no ambiguity introduced by this change, because we have already introduced a restriction to the Rust grammar to force the use of parentheses to disambiguate struct literals in such contexts. (See [Rust RFC 25]). The expectation is that when migrating code from a flexible empty struct to a non-empty struct, it can start by first migrating to a braced empty struct (and then have a tool indicate all of the locations where braces need to be added); after that step has been completed, one can then take the next step of adding the actual field. # Drawbacks Some people like "There is only one way to do it." But, there is precendent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity; see the [Precedent for flexible syntax in Rust] appendix. Also, see the [Always Require Braces] alternative below. I have attempted to summarize the previous discussion from [RFC PR 147] in the [Recent History] appendix; some of the points there include drawbacks to this approach and to the [Always Require Braces] alternative. # Alternatives ## Always Require Braces Alternative 1: "Always Require Braces". Specifically, require empty curly braces on empty structs. People who like the current syntax of curly-brace free structs can encode them this way: `enum S0 { S0 }` This would address all of the same issues outlined above. (Also, the author (pnkfelix) would be happy to take this tack.) The main reason not to take this tack is that some people may like writing empty structs without braces, but do not want to switch to the unary enum version described in the previous paragraph. See "I wouldn't want to force noisier syntax ..." in the [Recent History] appendix. ## Status quo Alternative 2: Status quo. Macros and code-generators in general will need to handle empty structs as a special case. We may continue hitting bugs like [CFG parse bug]. Some users will be annoyed but most will probably cope. ## Synonymous in all contexts Alternative 3: An earlier version of this RFC proposed having `struct S;` be entirely synonymous with `struct S { }`, and the expression `S { }` be synonymous with `S`. This was deemed problematic, since it would mean that `S { }` would put an entry into both the type and value namespaces, while `S { x: int }` would only put an entry into the type namespace. Thus the current draft of the RFC proposes the "flexible" versus "braced" distinction for empty structs. ## Never synonymous Alternative 4: Treat `struct S;` as requiring `S` at the expression and pattern sites, and `struct S { }` as requiring `S { }` at the expression and pattern sites. This in some ways follows a principle of least surprise, but it also is really hard to justify having both syntaxes available for empty structs with no flexibility about how they are used. (Note again that one would have the option of choosing between `enum S { S }`, `struct S;`, or `struct S { }`, each with their own idiosyncrasies about whether you have to write `S` or `S { }`.) I would rather adopt "Always Require Braces" than "Never Synonymous" ## Empty Tuple Structs One might say "why are you including support for curly braces, but not parentheses?" Or in other words, "what about empty tuple structs?" The code-generation argument could be applied to tuple-structs as well, to claim that we should allow the syntax `S0()`. I am less inclined to add a special case for that; I think tuple-structs are less frequently used (especially with many fields); they are largely for ad-hoc data such as newtype wrappers, not for code generators. Note that we should not attempt to generalize this RFC as proposed to include tuple structs, i.e. so that given `struct S0 {}`, the expressions `T0`, `T0 {}`, and `T0()` would be synonymous. The reason is that given a tuple struct `struct T2(int, int)`, the identifier `T2` is *already* bound to a constructor function: ```rust fn main() { #[deriving(Show)] struct T2(int, int); fn foo(f: |int, int| -> S) { println!("Hello from {} and {}", f(2,3), f(4,5)); } foo(T2); } ``` So if we were to attempt to generalize the leniency of this RFC to tuple structs, we would be in the unfortunate situation given `struct T0();` of trying to treat `T0` simultaneously as an instance of the struct and as a constructor function. So, the handling of empty structs proposed by this RFC does not generalize to tuple structs. (Note that if we adopt alternative 1, [Always Require Braces], then the issue of how tuple structs are handled is totally orthogonal -- we could add support for `struct T0()` as a distinct type from `struct S0 {}`, if we so wished, or leave it aside.) # Unresolved questions None # Appendices ## The CFG problem A program like this works today: ```rust fn main() { #[deriving(Show)] struct Svaries { x: int, y: int, #[cfg(zed)] z: int, } let s = match () { #[cfg(zed)] _ => Svaries { x: 3, y: 4, z: 5 }, #[cfg(not(zed))] _ => Svaries { x: 3, y: 4 }, }; println!("Hello from {}", s) } ``` Observe what happens when one modifies the above just a bit: ```rust struct Svaries { #[cfg(eks)] x: int, #[cfg(why)] y: int, #[cfg(zed)] z: int, } ``` Now, certain `cfg` settings yield an empty struct, even though it is surrounded by braces. Today this leads to a [CFG parse bug] when one attempts to actually construct such a struct. If we want to support situations like this properly, we will probably need to further extend the `cfg` attribute so that it can be placed before individual fields in a struct constructor, like this: ```rust // You cannot do this today, // but maybe in the future (after a different RFC) let s = Svaries { #[cfg(eks)] x: 3, #[cfg(why)] y: 4, #[cfg(zed)] z: 5, }; ``` Supporting such a syntax consistently in the future should start today with allowing empty braces as legal code. (Strictly speaking, it is not *necessary* that we add support for empty braces at the parsing level to support this feature at the semantic level. But supporting empty-braces in the syntax still seems like the most consistent path to me.) ## Ancient History A parsing ambiguity was the original motivation for disallowing the syntax `S {}` in favor of `S` for constructing an instance of an empty struct. The ambiguity and various options for dealing with it were well documented on the [rust-dev thread]. Both syntaxes were simultaneously supported at the time. In particular, at the time that mailing list thread was created, the code match `match x {} ...` would be parsed as `match (x {}) ...`, not as `(match x {}) ...` (see [Rust PR 5137]); likewise, `if x {}` would be parsed as an if-expression whose test component is the struct literal `x {}`. Thus, at the time of [Rust PR 5137], if the input to a `match` or `if` was an identifier expression, one had to put parentheses around the identifier to force it to be interpreted as input to the `match`/`if`, and not as a struct constructor. Of the options for resolving this discussed on the mailing list thread, the one selected (removing `S {}` construction expressions) was chosen as the most expedient option. At that time, the option of "Place a parser restriction on those contexts where `{` terminates the expression and say that struct literals cannot appear there unless they are in parentheses." was explicitly not chosen, in favor of continuing to use the disambiguation rule in use at the time, namely that the presence of a label (e.g. `S { a_label: ... }`) was *the* way to distinguish a struct constructor from an identifier followed by a control block, and thus, "there must be one label." Naturally, if the construction syntax were to be disallowed, it made sense to also remove the `struct S {}` declaration syntax. Things have changed since the time of that mailing list thread; namely, we have now adopted the aforementioned parser restriction [Rust RFC 25]. (The text of RFC 25 does not explicitly address `match`, but we have effectively expanded it to include a curly-brace delimited block of match-arms in the definition of "block".) Today, one uses parentheses around struct literals in some contexts (such as `for e in (S {x: 3}) { ... }` or `match (S {x: 3}) { ... }` Note that there was never an ambiguity for uses of `struct S0 { }` in item position. The issue was solely about expression position prior to the adoption of [Rust RFC 25]. ## Precedent for flexible syntax in Rust There is precendent in Rust for violating "one way to do it" in favor of syntactic convenience or regularity. For example, one can often include an optional trailing comma, for example in: `let x : &[int] = [3, 2, 1, ];`. One can also include redundant curly braces or parentheses, for example in: ```rust println!("hi: {}", { if { x.len() > 2 } { ("whoa") } else { ("there") } }); ``` One can even mix the two together when delimiting match arms: ```rust let z: int = match x { [3, 2] => { 3 } [3, 2, 1] => 2, _ => { 1 }, }; ``` We do have lints for some style violations (though none catch the cases above), but lints are different from fundamental language restrictions. ## Recent history There was a previous [RFC PR][RFC PR 147] that was effectively the same in spirit to this one. It was closed because it was not sufficient well fleshed out for further consideration by the core team. However, to save people the effort of reviewing the comments on that PR (and hopefully stave off potential bikeshedding on this PR), I here summarize the various viewpoints put forward on the comment thread there, and note for each one, whether that viewpoint would be addressed by this RFC (accept both syntaxes), by [Always Require Braces], or by [Status Quo]. Note that this list of comments is *just* meant to summarize the list of views; it does not attempt to reflect the number of commenters who agreed or disagreed with a particular point. (But since the RFC process is not a democracy, the number of commenters should not matter anyway.) * "+1" ==> Favors: This RFC (or potentially [Always Require Braces]; I think the content of [RFC PR 147] shifted over time, so it is hard to interpret the "+1" comments now). * "I find `let s = S0;` jarring, think its an enum initially." ==> Favors: Always Require Braces * "Frequently start out with an empty struct and add fields as I need them." ==> Favors: This RFC or Always Require Braces * "Foo{} suggests is constructing something that it's not; all uses of the value `Foo` are indistinguishable from each other" ==> Favors: Status Quo * "I find it strange anyone would prefer `let x = Foo{};` over `let x = Foo;`" ==> Favors Status Quo; strongly opposes Always Require Braces. * "I agree that 'instantiation-should-follow-declation', that is, structs declared `;, (), {}` should only be instantiated [via] `;, (), { }` respectively" ==> Opposes leniency of this RFC in that it allows expression to use include or omit `{}` on an empty struct, regardless of declaration form, and vice-versa. * "The code generation argument is reasonable, but I wouldn't want to force noisier syntax on all 'normal' code just to make macros work better." ==> Favors: This RFC [Always Require Braces]: #always-require-braces [Status Quo]: #status-quo [Ancient History]: #ancient-history [Recent History]: #recent-history [CFG problem]: #the-cfg-problem [Empty Tuple Structs]: #empty-tuple-structs [Precedent for flexible syntax in Rust]: #precedent-for-flexible-syntax-in-rust [rust-dev thread]: https://mail.mozilla.org/pipermail/rust-dev/2013-February/003282.html [Rust Issue 5167]: https://github.com/rust-lang/rust/issues/5167 [Rust RFC 25]: https://github.com/rust-lang/rfcs/blob/master/complete/0025-struct-grammar.md [CFG parse bug]: https://github.com/rust-lang/rust/issues/16819 [Rust PR 5137]: https://github.com/rust-lang/rust/pull/5137 [RFC PR 147]: https://github.com/rust-lang/rfcs/pull/147