rfcs/text/0218-empty-struct-with-brac...

383 lines
16 KiB
Markdown

- Start Date: (fill me in with today's date, 2014-08-28)
- RFC PR: [rust-lang/rfcs#218](https://github.com/rust-lang/rfcs/pull/218/files)
- Rust Issue: [rust-lang/rust#218](https://github.com/rust-lang/rust/issues/24266)
# Summary
When a struct type `S` has no fields (a so-called "empty struct"),
allow it to be defined via either `struct S;` or `struct S {}`.
When defined via `struct S;`, allow instances of it to be constructed
and pattern-matched via either `S` or `S {}`.
When defined via `struct S {}`, require instances to be constructed
and pattern-matched solely via `S {}`.
# Motivation
Today, when writing code, one must treat an empty struct as a
special case, distinct from structs that include fields.
That is, one must write code like this:
```rust
struct S2 { x1: int, x2: int }
struct S0; // kind of different from the above.
let s2 = S2 { x1: 1, x2: 2 };
let s0 = S0; // kind of different from the above.
match (s2, s0) {
(S2 { x1: y1, x2: y2 },
S0) // you can see my pattern here
=> { println!("Hello from S2({}, {}) and S0", y1, y2); }
}
```
While this yields code that is relatively free of extraneous
curly-braces, this special case handling of empty structs presents
problems for two cases of interest: automatic code generators
(including, but not limited to, Rust macros) and conditionalized code
(i.e. code with `cfg` attributes; see the [CFG problem] appendix).
The heart of the code-generator argument is: Why force all
to-be-written code-generators and macros with special-case handling of
the empty struct case (in terms of whether or not to include the
surrounding braces), especially since that special case is likely to
be forgotten (yielding a latent bug in the code generator).
The special case handling of empty structs is also a problem for
programmers who actively add and remove fields from structs during
development; such changes cause a struct to switch from being empty
and non-empty, and the associated revisions of changing removing and
adding curly braces is aggravating (both in effort revising the code,
and also in extra noise introduced into commit histories).
This RFC proposes an approach similar to the one we used circa February
2013, when both `S0` and `S0 { }` were accepted syntaxes for an empty
struct. The parsing ambiguity that motivated removing support for
`S0 { }` is no longer present (see the [Ancient History] appendix).
Supporting empty braces in the syntax for empty structs is easy to do
in the language now.
# Detailed design
There are two kinds of empty structs: Braced empty structs and
flexible empty structs. Flexible empty structs are a slight
generalization of the structs that we have today.
Flexible empty structs are defined via the syntax `struct S;` (as today).
Braced empty structs are defined via the syntax `struct S { }` ("new").
Both braced and flexible empty structs can be constructed via the
expression syntax `S { }` ("new"). Flexible empty structs, as today,
can also be constructed via the expression syntax `S`.
Both braced and flexible empty structs can be pattern-matched via the
pattern syntax `S { }` ("new"). Flexible empty structs, as today,
can also be pattern-matched via the pattern syntax `S`.
Braced empty struct definitions solely affect the type namespace,
just like normal non-empty structs.
Flexible empty structs affect both the type and value namespaces.
As a matter of style, using braceless syntax is preferred for
constructing and pattern-matching flexible empty structs. For
example, pretty-printer tools are encouraged to emit braceless forms
if they know that the corresponding struct is a flexible empty struct.
(Note that pretty printers that handle incomplete fragments may not
have such information available.)
There is no ambiguity introduced by this change, because we have
already introduced a restriction to the Rust grammar to force the use
of parentheses to disambiguate struct literals in such contexts. (See
[Rust RFC 25]).
The expectation is that when migrating code from a flexible empty
struct to a non-empty struct, it can start by first migrating to a
braced empty struct (and then have a tool indicate all of the
locations where braces need to be added); after that step has been
completed, one can then take the next step of adding the actual field.
# Drawbacks
Some people like "There is only one way to do it." But, there is
precendent in Rust for violating "one way to do it" in favor of
syntactic convenience or regularity; see
the [Precedent for flexible syntax in Rust] appendix.
Also, see the [Always Require Braces] alternative below.
I have attempted to summarize the previous discussion from [RFC PR
147] in the [Recent History] appendix; some of the points there
include drawbacks to this approach and to the [Always Require Braces]
alternative.
# Alternatives
## Always Require Braces
Alternative 1: "Always Require Braces". Specifically, require empty
curly braces on empty structs. People who like the current syntax of
curly-brace free structs can encode them this way: `enum S0 { S0 }`
This would address all of the same issues outlined above. (Also, the
author (pnkfelix) would be happy to take this tack.)
The main reason not to take this tack is that some people may like
writing empty structs without braces, but do not want to switch to the
unary enum version described in the previous paragraph.
See "I wouldn't want to force noisier syntax ..."
in the [Recent History] appendix.
## Status quo
Alternative 2: Status quo. Macros and code-generators in general will
need to handle empty structs as a special case. We may continue
hitting bugs like [CFG parse bug]. Some users will be annoyed but
most will probably cope.
## Synonymous in all contexts
Alternative 3: An earlier version of this RFC proposed having `struct
S;` be entirely synonymous with `struct S { }`, and the expression
`S { }` be synonymous with `S`.
This was deemed problematic, since it would mean that `S { }` would
put an entry into both the type and value namespaces, while
`S { x: int }` would only put an entry into the type namespace.
Thus the current draft of the RFC proposes the "flexible" versus
"braced" distinction for empty structs.
## Never synonymous
Alternative 4: Treat `struct S;` as requiring `S` at the expression
and pattern sites, and `struct S { }` as requiring `S { }` at the
expression and pattern sites.
This in some ways follows a principle of least surprise, but it also
is really hard to justify having both syntaxes available for empty
structs with no flexibility about how they are used. (Note again that
one would have the option of choosing between
`enum S { S }`, `struct S;`, or `struct S { }`, each with their own
idiosyncrasies about whether you have to write `S` or `S { }`.)
I would rather adopt "Always Require Braces" than "Never Synonymous"
## Empty Tuple Structs
One might say "why are you including support for curly braces, but not
parentheses?" Or in other words, "what about empty tuple structs?"
The code-generation argument could be applied to tuple-structs as
well, to claim that we should allow the syntax `S0()`. I am less
inclined to add a special case for that; I think tuple-structs are
less frequently used (especially with many fields); they are largely
for ad-hoc data such as newtype wrappers, not for code generators.
Note that we should not attempt to generalize this RFC as proposed to
include tuple structs, i.e. so that given `struct S0 {}`, the
expressions `T0`, `T0 {}`, and `T0()` would be synonymous. The reason
is that given a tuple struct `struct T2(int, int)`, the identifier
`T2` is *already* bound to a constructor function:
```rust
fn main() {
#[deriving(Show)]
struct T2(int, int);
fn foo<S:std::fmt::Show>(f: |int, int| -> S) {
println!("Hello from {} and {}", f(2,3), f(4,5));
}
foo(T2);
}
```
So if we were to attempt to generalize the leniency of this RFC to
tuple structs, we would be in the unfortunate situation given `struct
T0();` of trying to treat `T0` simultaneously as an instance of the
struct and as a constructor function. So, the handling of empty
structs proposed by this RFC does not generalize to tuple structs.
(Note that if we adopt alternative 1, [Always Require Braces], then
the issue of how tuple structs are handled is totally orthogonal -- we
could add support for `struct T0()` as a distinct type from `struct S0
{}`, if we so wished, or leave it aside.)
# Unresolved questions
None
# Appendices
## The CFG problem
A program like this works today:
```rust
fn main() {
#[deriving(Show)]
struct Svaries {
x: int,
y: int,
#[cfg(zed)]
z: int,
}
let s = match () {
#[cfg(zed)] _ => Svaries { x: 3, y: 4, z: 5 },
#[cfg(not(zed))] _ => Svaries { x: 3, y: 4 },
};
println!("Hello from {}", s)
}
```
Observe what happens when one modifies the above just a bit:
```rust
struct Svaries {
#[cfg(eks)]
x: int,
#[cfg(why)]
y: int,
#[cfg(zed)]
z: int,
}
```
Now, certain `cfg` settings yield an empty struct, even though it
is surrounded by braces. Today this leads to a [CFG parse bug]
when one attempts to actually construct such a struct.
If we want to support situations like this properly, we will probably
need to further extend the `cfg` attribute so that it can be placed
before individual fields in a struct constructor, like this:
```rust
// You cannot do this today,
// but maybe in the future (after a different RFC)
let s = Svaries {
#[cfg(eks)] x: 3,
#[cfg(why)] y: 4,
#[cfg(zed)] z: 5,
};
```
Supporting such a syntax consistently in the future should start today
with allowing empty braces as legal code. (Strictly speaking, it is
not *necessary* that we add support for empty braces at the parsing
level to support this feature at the semantic level. But supporting
empty-braces in the syntax still seems like the most consistent path
to me.)
## Ancient History
A parsing ambiguity was the original motivation for disallowing the
syntax `S {}` in favor of `S` for constructing an instance of
an empty struct. The ambiguity and various options for dealing with it
were well documented on the [rust-dev thread].
Both syntaxes were simultaneously supported at the time.
In particular, at the time that mailing list thread was created, the
code match `match x {} ...` would be parsed as `match (x {}) ...`, not
as `(match x {}) ...` (see [Rust PR 5137]); likewise, `if x {}` would
be parsed as an if-expression whose test component is the struct
literal `x {}`. Thus, at the time of [Rust PR 5137], if the input to
a `match` or `if` was an identifier expression, one had to put
parentheses around the identifier to force it to be interpreted as
input to the `match`/`if`, and not as a struct constructor.
Of the options for resolving this discussed on the mailing list
thread, the one selected (removing `S {}` construction expressions)
was chosen as the most expedient option.
At that time, the option of "Place a parser restriction on those
contexts where `{` terminates the expression and say that struct
literals cannot appear there unless they are in parentheses." was
explicitly not chosen, in favor of continuing to use the
disambiguation rule in use at the time, namely that the presence of a
label (e.g. `S { a_label: ... }`) was *the* way to distinguish a
struct constructor from an identifier followed by a control block, and
thus, "there must be one label."
Naturally, if the construction syntax were to be disallowed, it made
sense to also remove the `struct S {}` declaration syntax.
Things have changed since the time of that mailing list thread;
namely, we have now adopted the aforementioned parser restriction
[Rust RFC 25]. (The text of RFC 25 does not explicitly address
`match`, but we have effectively expanded it to include a curly-brace
delimited block of match-arms in the definition of "block".) Today,
one uses parentheses around struct literals in some contexts (such as
`for e in (S {x: 3}) { ... }` or `match (S {x: 3}) { ... }`
Note that there was never an ambiguity for uses of `struct S0 { }` in item
position. The issue was solely about expression position prior to the
adoption of [Rust RFC 25].
## Precedent for flexible syntax in Rust
There is precendent in Rust for violating "one way to do it" in favor
of syntactic convenience or regularity.
For example, one can often include an optional trailing comma, for
example in: `let x : &[int] = [3, 2, 1, ];`.
One can also include redundant curly braces or parentheses, for
example in:
```rust
println!("hi: {}", { if { x.len() > 2 } { ("whoa") } else { ("there") } });
```
One can even mix the two together when delimiting match arms:
```rust
let z: int = match x {
[3, 2] => { 3 }
[3, 2, 1] => 2,
_ => { 1 },
};
```
We do have lints for some style violations (though none catch the
cases above), but lints are different from fundamental language
restrictions.
## Recent history
There was a previous [RFC PR][RFC PR 147] that was effectively the
same in spirit to this one. It was closed because it was not
sufficient well fleshed out for further consideration by the core
team. However, to save people the effort of reviewing the comments on
that PR (and hopefully stave off potential bikeshedding on this PR), I
here summarize the various viewpoints put forward on the comment
thread there, and note for each one, whether that viewpoint would be
addressed by this RFC (accept both syntaxes), by [Always Require Braces],
or by [Status Quo].
Note that this list of comments is *just* meant to summarize the list
of views; it does not attempt to reflect the number of commenters who
agreed or disagreed with a particular point. (But since the RFC process
is not a democracy, the number of commenters should not matter anyway.)
* "+1" ==> Favors: This RFC (or potentially [Always Require Braces]; I think the content of [RFC PR 147] shifted over time, so it is hard to interpret the "+1" comments now).
* "I find `let s = S0;` jarring, think its an enum initially." ==> Favors: Always Require Braces
* "Frequently start out with an empty struct and add fields as I need them." ==> Favors: This RFC or Always Require Braces
* "Foo{} suggests is constructing something that it's not; all uses of the value `Foo` are indistinguishable from each other" ==> Favors: Status Quo
* "I find it strange anyone would prefer `let x = Foo{};` over `let x = Foo;`" ==> Favors Status Quo; strongly opposes Always Require Braces.
* "I agree that 'instantiation-should-follow-declation', that is, structs declared `;, (), {}` should only be instantiated [via] `;, (), { }` respectively" ==> Opposes leniency of this RFC in that it allows expression to use include or omit `{}` on an empty struct, regardless of declaration form, and vice-versa.
* "The code generation argument is reasonable, but I wouldn't want to force noisier syntax on all 'normal' code just to make macros work better." ==> Favors: This RFC
[Always Require Braces]: #always-require-braces
[Status Quo]: #status-quo
[Ancient History]: #ancient-history
[Recent History]: #recent-history
[CFG problem]: #the-cfg-problem
[Empty Tuple Structs]: #empty-tuple-structs
[Precedent for flexible syntax in Rust]: #precedent-for-flexible-syntax-in-rust
[rust-dev thread]: https://mail.mozilla.org/pipermail/rust-dev/2013-February/003282.html
[Rust Issue 5167]: https://github.com/rust-lang/rust/issues/5167
[Rust RFC 25]: https://github.com/rust-lang/rfcs/blob/master/complete/0025-struct-grammar.md
[CFG parse bug]: https://github.com/rust-lang/rust/issues/16819
[Rust PR 5137]: https://github.com/rust-lang/rust/pull/5137
[RFC PR 147]: https://github.com/rust-lang/rfcs/pull/147