union-initialization-and-drop

This commit is contained in:
Ralf Jung 2018-08-03 15:05:09 +02:00
parent 2d5218a3e2
commit 04c66db03b
2 changed files with 342 additions and 0 deletions

View File

@ -0,0 +1,340 @@
- Feature Name: union_initialization_and_drop
- Start Date: 2018-08-03
- RFC PR: (leave this empty)
- Rust Issue: (leave this empty)
# Summary
[summary]: #summary
Unions do not allow fields of types that require drop glue, but they may still
`impl Drop` themselves. We specify when one may "move" out of a union field and
when the union's `drop` is called. To avoid implicit calls of drop, we also
restrict the use of `DerefMut` when unions are involved.
# Motivation
[motivation]: #motivation
Currently, it is unstable to have a non-`Copy` field in the union. The main
reason for this is that having fields which need drop glue raises some hard
questions about whether to call that drop glue when assigning a union field.
Not much progress has been made on stabilizing the unstable union features.
This RFC proposes a route forwards that side-steps those hard questions: Do not
allow fields with drop glue.
# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation
## Union Definition
When defining a union, it is a hard error to use a field type that requires drop glue.
Examples:
```rust
// Accepted
union Example1<T> {
// `ManuallyDrop<T>` never has drop glue, even if `T` does.
f1: ManuallyDrop<T>,
// `Cell<i32>` is a fully known type, and does not have drop glue.
f2: Cell<i32>,
}
union Example2<T: Copy> {
// `Copy` types never have drop glue.
f1: T,
}
trait Trait3 { type Assoc: Copy; }
union Example3<T: Trait3> {
// `T::Assoc` is `Copy` and hence cannot have drop glue.
f1: T::Assoc,
}
// Rejected
union Example4<T> {
// `T` might have drop glue.
f1: T,
}
trait Trait5 { type Assoc; }
union Example5<T: Trait5> {
// `T::Assoc` might have drop glue.
f1: T::Assoc,
}
```
Ruling out possibly dropping types may seem restrictive, but thanks to
`ManuallyDrop` it in fact is not: If the compiler rejects a union definition,
you can always wrap field types in `ManuallyDrop` to obtain a working
definition. This means you have to manually take care of when to drop the data,
but that is already something to be concerned with when working on unions.
As a consequence, it is quite obvious that writing to a union field will never
implicitly call `drop`. Such a write is hence always a safe operation. This
removes a whole class of pit falls related to `drop` being called in tricky
unsafe code when you might not expect that to happen. (However, see below for
some pit falls that remain.)
## Union initialization and `Drop`
In two cases, the compiler cares about whether a (field of a) variable is
initialized: When deciding whether a move from the field/variable is allowed
(for cases where the type is not `Copy`), and when deciding whether or not the
variable has to be dropped when it goes out of scope.
A union just does very simple initialization tracking: There is a single boolean
state for the entire union and all of its fields. Nested inner fields are
tracked just like they are for structs; however, when the union becomes
(un)initialized, then all nested inner fields of all union fields are
(un)initialized at once. For example:
```rust
// This code creates bad references and transmute to `Vec` in incorrect ways.
// This is just to demonstrate what the compiler would accept in terms of
// tracking initialization.
struct S(i32, i32); // not `Copy`, no drop glue
union U { f1: ManuallyDrop<Vec<i32>>, f2: S, f3: i32 }
let mut u: U;
// Now `u` is not initialized: `&u`, `&u.f2` and `&u.f2.0` are all rejected.
// We can write into uninitialized inner fields:
u.f2.1 = 42;
let _ = &u.f2.1; // This field is initialized now.
// But this does not change the initialization state of the union itself,
// or any other (inner) field.
// We can initialize by assigning an entire field:
u.f1 = ManuallyDrop::new(Vec::new());
// Now *all fields* of `u` are initialized:
let _ = &u.f2;
// Equivalently, we can assign the entire union:
u = U { f2: S(42, 23) };
// Now `u` is still initialized.
// Copying does not change anything:
let _ = u.f3;
// Now `u` is still initialized.
// We can move out of an initialized union:
let v = u.f1;
// Now `u.f1` and `u.f2` are no longer initialized (they got "moved out of").
// `let _ = u.f2;` would hence get rejected, as would `&u.f1` and `foo(u)`.
u.f1 = v;
// Now `u` and all of its fields are initialized again ("moving back in").
```
If the union implements `Drop`, the same restrictions as for structs apply: It
is not possible to initialize a field before initializing the entire variable,
and it is not possible to move out of a field. For example:
```rust
// This code creates bad references and transmute to `Vec` in incorrect ways.
// This is just to demonstrate what the compiler would accept in terms of
// tracking initialization.
struct S(i32); // not `Copy`, no drop glue
union U { f1: ManuallyDrop<Vec<i32>>, f2: S, f3: u32 }
impl Drop for U {
fn drop(&mut self) {
println!("Goodbye!");
}
}
let mut u: U;
// `u.f1 = ...;` gets rejected: Cannot initialize a union with `Drop` by assigning a field.
u = U { f2: S(42) };
// Now `u` is initialized.
// `let v = u.f1;` gets rejected: Cannot move out of union that implements `Drop`.
let v_ref = &mut u.f1; // creating a reference is allowed
let _ = u.f3; // copying out is allowed
```
When a union implementing `Drop` goes out of scope, its destructor gets called if and only if the union is currently considered initialized:
(Continuing the example from above.)
```rust
{
let u = U { f2: S(42) };
// drop gets called
}
{
let u = U { f1: ManuallyDrop::new(Vec::new()) };
foo(u.f2);
// drop does NOT get called
}
```
## Potential pitfalls around `DerefMut`
There is still a potential pitfall left around assigning to union fields: If the
assignment implicitly happens through a `DerefMut`, it may still call drop glue.
For example:
```rust
#![feature(untagged_unions)]
use std::mem::ManuallyDrop;
union U<T> { x:(), f: ManuallyDrop<T> }
fn main() {
let mut u : U<(Vec<i32>,)> = U { x: () };
unsafe { u.f.0 = Vec::new() }; // uninitialized `Vec` being droped
}
```
This requires `unsafe` because it desugars to `ManuallyDrop::deref_mut(&mut u.f).0`,
and while writing to a union field is safe, taking a reference is not.
For this reason, `DerefMut` auto-deref is not applied when working on a union or
its fields. However, note that manually dereferencing is still possible, so
`*(u.f).0 = Vec::new()` is still a way to drop an uninitialized field! But this
can never happen when no `*` is involved, and hopefully dereferencing an element
of a union is a clear enough signal that the union better be initialized
properly for this to make sense.
# Reference-level explanation
[reference-level-explanation]: #reference-level-explanation
## Union definition
When defining a union, it is a hard error to use a field type that requires drop glue.
This is checked as follows:
* Proceed recursively down the given type, insofar as the type involved is known
at compile-time. For example, `u32`, `&mut T` and `ManuallyDrop<T>` are known
to not have drop glue no matter the choice of `T`.
* When hitting a type variable where no progress can be made, check that `T:
Copy` as a proxy for `T` not requiring drop glue.
Note: Currently, union fields with drop glue are allowed on nightly with an
unstable feature. This RFC proposes to remove support entirely; code using
nightly might have to be changed.
## Writing to union fields
Writing to union fields is currently unsafe when the field has drop glue. This
check is no longer needed, because union fields will never have drop glue.
Moreover, writing to a nested field (e.g., `u.f1.x = 0;`) is currently unsafe as
well, this should also become a safe operation *as long as the path consists
only of field projections, not deref's*. Note that this is sound only because
`ManuallyDrop`'s only field is private (so, in fact, this is *not* sound inside
the module that defines `ManuallyDrop`).
## Union initialization tracking
A "fragment" is a place of the form `local_var.field.field.field`, without any
implicit derefs. A fragment can be either *initialized* or *uninitialized*.
This state is approximated statically: The type system will only allow accesses
to initialized fragments. Drop elaboration needs to know the precise state of a
fragment, for which purpose it adds run-time drop flags as needed.
If a fragment has some uninitialized nested fragments then it is still
uninitialized and accesses to this fragment as a whole are prevented. This
applies even if it also has a nested initialized fragment (in which case we speak
of a *partially initialized* fragment). If a fragment has only initialized
nested fragments then it is initialized as a whole and can be accessed.
A fragment becomes initialized when it is assigned to, or created using an
initializer, or it is a union field and a sibling becomes initialized, or all
its nested fragments become initialized. A fragment becomes uninitialized when
it doesn't implement `Copy` and is moved out from, or it is a union field
(possibly `Copy`) and its sibling becomes uninitialized, or some of its nested
fragments becomes uninitialized.
In other words, union fields behave a lot like struct fields except that if one
field changes initialization state, the others follow suit. In particular, if
one union field becomes partially initialized (because one of its nested
fragments got uninitialized), all its siblings become *entirely* uninitialized.
If a fragment is of a type which has an `impl Drop`, then its nested fragments
cannot be separately (un)initialized. Only the entire fragment can be
initialized by assignment, and the entire fragment can be uninitialized by
moving out.
NOTE: To my knowledge, the following already mostly matches the current
implementation. The only exception is that "fragment becomes initialized when
all its nested fragments become initialized" rule is not currently implemented
for neither structs nor unions, so the compiler accepts less code than it
should.
(This is based on a
[previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).)
## Potential pitfalls around `DerefMut`
When adding auto-derefs on the left-hand side of an assignment, as we traverse
the path, once we hit a `union`, we stop adding further auto-derefs. So with
`s: Struct` and `u: Union`, when encountering `s.u.f.x`, auto-deref *does*
happen on `s`, but not on `s.u` or any of the later components.
Notice that this relies crucially on the only field of `ManuallyDrop` being
private! If we could project directly through that field, no `DerefMut` would
be needed to reproduce the problematic example from the "guide" section.
# Drawbacks
[drawbacks]: #drawbacks
This makes working with unions involving types that may have drop glue slightly
more verbose than today: One has to write `ManuallyDrop` more often than one may
want to.
The restriction placed on `DerefMut` is not fully backwards compatible: A type
could implement `Copy + DerefMut` and actually rely on the deref coercion inside
a union. That seems very unlikely, but should be tested with a crater run.
The initialization tracking rules are somewhat surprising, and one might want to
prefer the compiler to just not track anything when it comes to unions. After
all, the compiler fundamentally cannot know what part of the union is properly
initialized. Unfortunately, not having any initialization tracking is not an
option when non-`Copy` fields are involved: We have to decide when moving out of
a union field is allowed.
# Rationale and alternatives
[rationale-and-alternatives]: #rationale-and-alternatives
Ruling out fields with drop glue does not, in fact, reduce the expressiveness of
unions because one can use `ManuallyDrop<T>` to obtain a drop-glue-free version
of `T`. If anything, having the `ManuallyDrop` in the union definition should
help to drive home the point that no automatic dropping is happening, ever.
(Before this RFC, automatic dropping is happening when assigning to a union
field but not when the union goes out of scope. That seems to be the result of
necessity, not of a coherent design.)
An alternative approach to proceed with unions is
[this previously proposed RFC by @petrochenkov](https://github.com/petrochenkov/rfcs/blob/e5266bd105f592f7408b8592c5c3deaccba7f1ec/text/1444-union.md#initialization-state).
That RFC replaces RFC 1444 and goes into a lot more points than this much more
limited proposal. In particular, it allows fields with drop glue. However, it
can be pretty hard for the programmer to predict when drop glue will be
automatically invoked on assignment or not, because the initialization tracking
(which this RFC adapts from @petrochenkov's proposal) can sometimes be a little
surprising when looking at individual fields: Whether `u.f2 = ...;` drops
depends on whether `u.f1` has been previously initialized. @petrochenkov hence
proposes a lint to warn people that unions with drop-glue fields are not always
very well-behaved. This RFC, on the other hand, side-steps the entire question
by not allowing fields with drop glue. Initialization tracking thus has no
effect on the code executed during an assignment of a union field. For unions
that `impl Drop`, it still has an effect on what happens when the union goes out
of scope, but in that case initialization is so restricted that I cannot think
of any surprises. Together with the `DerefMut` restriction, that should make it
very unlikely to accidentally call `drop` when it was not intended.
We could simplify thus further by not to allowing `impl drop for Union`. It is
still possible to add a wrapper struct around the union which has drop glue, so
this does not restrict expressiveness. However, this seems unnecessarily
cumbersome.
# Prior art
[prior-art]: #prior-art
I do not know of any language combining initialization tracking and destructors
with unions: C++ does not have destructors for unions, and it does not track
whether fields of a data structures are initialized to (dis)allow references.
# Unresolved questions
[unresolved-questions]: #unresolved-questions
Should we even try to avoid the `DerefMut`-related pitfall? And if yes, should
we maybe try harder, e.g. lint against using `*` below a union type when
describing a place? That would make people write `let v = &mut u.f; *v =
Vec::new();`. It is not clear that this helps in terms of pointing out that an
automatic drop may be happening.

View File

@ -10,6 +10,8 @@ Provide native support for C-compatible unions, defined via a new "contextual
keyword" `union`, without breaking any existing code that uses `union` as an
identifier.
**Note:** This RFC has been partially superseded by `unions-and-drop`.
# Motivation
[motivation]: #motivation