Almost a year ago I developed the moveit
Rust library, which provides primitives for expressing something like C++’s T&&
and move constructors while retaining Rust’s so-called “destructive move property”: moving a value transfers ownership, rather than doing a funny copy.
In an earlier blogpost I described the theory behind this library and some of the motivation, which I feel fairly confident about, especially in how constructors (and their use of pinning) are defined.
However, there is a problem.
A Not-so-Quick Recap
The old post is somewhat outdated, since moveit
uses different names for a lot of things that are geared to fit in with the rest of Rust.
The core abstraction of moveit
is the constructor, which are types that implement the New
trait:
A New
type is not what is being constructed; rather, it represents a method of construction, resembling a specialized Fn
trait. The constructed type is given by the associated type Output
.
Types that can be constructed are constructed in place, unlike most Rust types. This is a property shared by constructors in C++, allowing values to record their own address at the moment of creation. Explaining why this is useful is a bit long-winded, but let’s assume this is a thing we want to be able to do. Crucially, we need the output of a constructor to be pinned, which is why the this
output parameter is pinned.
Calling a constructor requires creating the output location in advance so that we can make it available to it in time:
However, this is not quite right. Pin<P>
’s docs are quite clear that we must ensure that, once we create an Pin<&mut T>
, we must call T
’s destructor before its memory is re-used; since reuse is unavoidable for stack data, and storage
will not do it for us (it’s a MaybeUninit<T>
, after all), we must somehow run the destructor separately.
An “Easy” Solution
One trick we could use is to replace storage
with some kind of wrapper over a MaybeUninit<T>
that calls the destructor for us:
This works, but isn’t ideal, because now we can’t write down something like a C++ move constructor without running into the classic C++ problem: all objects must be destroyed unconditionally, so now you can have moved-from state. Giving up Rust’s moves-transfer-ownership (i.e. affine) property is bad, but it turns out to be avoidable!
There are also some scary details around panics here that I won’t get into.
&T
, &mut T
, … &move T
?
moveit
instead provides a MoveRef<'frame, T>
type that tries to capture the notion of what an “owning reference” could mean in Rust. An &move
or &own
type has been discussed many times, but implementing it in the full generality it would deserve as a language feature runs into some interesting problems due to how Box<T>
, the heap allocated equivalent, currently behaves.
We can think of MoveRef<'frame, T>
as wrapping the longest-lived &mut T
reference pointing to a particular location in memory. The longest-lived part is crucial, since it means that MoveRef
is entitled to run its pointee’s destructor:
No reference to the pointee can ever outlive the MoveRef
itself, by definition, so this is safe. The owner of a value is that which is entitled to destroy it, and therefore a MoveRef
literally owns its pointee. Of course, this means we can move out of it (which was the whole point of the original blogpost).
Because of this, we are further entitled to arbitrarily pin a MoveRef
with no consequences: pinning it would consume the unpinned MoveRef
(for obvious reasons, MoveRefs
cannot be reborrowed) so no unpinned reference may outlive the pinning operation.
This gives us a very natural solution to the problem above: result
should not be a Pin<&mut T>
, but rather a Pin<MoveRef<'_, T>>
:
This messy sequence of steps is nicely wrapped up in a macro provided by the library that ensures safe initialization and eventual destruction:
There is also some reasonably complex machinery that allows us to do something like an owning Deref
, which I’ll come back to in a bit.
However, there is a small wrinkle that I did not realize when I first designed MoveRef
: what happens if I mem::forget
a MoveRef
?
Undefined Behavior, Obviously
Quashing destruction isn’t new to Rust: we can mem::forget
just about anything, leaking all kinds of resources. And that’s ok! Destructors alone cannot be used in type design to advert unsafe
catastrophe, a well-understood limitation of the language that we have experience designing libraries around, such as Vec::drain()
.
MoveRef
’s design creates a contradiction:
MoveRef
is an owning smart pointer, and therefore can be safely pinned, much likeBox::into_pinned()
enables. Constructors, in particular, are designed to generate pinnedMoveRef
s!- Forgetting a
MoveRef
will cause the pointee destructor to be suppressed, but its storage will still be freed and eventually re-used, a violation of thePin
drop guarantee.
This would appear to mean that a design like MoveRef
is not viable at all, and that this sort of “stack box” strategy is always unsound.
What About
Box
?What about it? Even though we can trivially create a
Pin<Box<i32>>
viaBox::pin()
, this is a red herring. When wemem::forget
aBox
, we also forget about its storage too. Because its storage has been leaked unrecoverably, we are still, technically, within the bounds of thePin
contract. Only barely, but we’re inside the circle.
Interestingly, the Rust language has to deal with a similar problem; perhaps it suggests a way out?
Drop Flags and Dynamic Ownership Transfer
Carefully crafted Rust code emits some very interesting assembly. I’ve annotated the key portion of the output with a play-by-play below.
The upshot is that maybe_drop
conditions the destructor of x
on a flag, which is allocated next to it on the stack. Rust flips this flag when the value is moved into another function, and only runs the destructor when the flag is left alone. In this case, LLVM folded the flag into the bool
argument, so this isn’t actually a meaningful perf hit.
These “drop flags” are key to Rust’s ownership model. Since ownership may be transferred dynamically due to reasonably complex control flow, it needs to leave breadcrumbs for itself to figure out whether the value wound up getting moved away or not. This is unique to Rust: in C++, every object is always destroyed, so no such faffing about is necessary.
Similarly, moveit
can close this soundness hole by leaving itself breadcrumbs to determine if safe code is trying to undermine its guarantees.
In other words: in Rust, it is not sufficient to manage a pointer to manage a memory location; it is necessary to manage an explicit or implicit drop flag as well.
A Flagged MoveRef
We can extend MoveRef
to track an explicit drop flag:
Wrapping it in a Cell
is convenient and doesn’t cost us anything, since a MoveRef
can never be made Send
or Sync
anyways. Inside of its destructor, we can flip the flag, much like Rust flips a drop flag when transferring ownership to another function:
But, how should we use it? The easiest way is to change the definition of moveit!()
to construct a flag trap:
The trap is a deterrent against forgetting a MoveRef
: because the MoveRef
’s destructor flips the flag, the trap’s destructor will notice if this doesn’t happen, and take action accordingly.
Note: in
moveit
, this is actually implemented by having theSlot<T>
type carry a reference to the trap, created in theslot!()
macro. However, this is not a crucial detail for the design.
An Earth-Shattering Kaboom
The trap is another RAII type that basically looks like this:
The trap is simple: if the contained drop flag is not flipped, it crashes the program. Because moveit!()
allocates it on the stack where uses cannot mem::forget
it, its destructor is guaranteed to run before storage
’s destructor runs (although Rust does not guarantee destructors run, it does guarantee their order).
If a MoveRef
is forgotten, it won’t have a chance to flip the flag, which the trap will detect. Once the trap’s destructor notices this, it cannot return, either normally or by panic, since this would cause storage
to be freed. Crashing the program is the only1 acceptable response.
Some of MoveRef
’s functions need to be adapted to this new behavior: for example, MoveRef::into_inner()
still needs to flip the flag, since moving out of the MoveRef
is equivalent to running the destructor for the purposes of drop flags.
A Safer DerefMove
In order for MoveRef
to be a proper “new” reference type, and not just a funny smart pointer, we also need a Deref
equivalent:
This is the original design for DerefMove
, which had a two-phase operation: first deinit()
was used to create a destructor-suppressed version of the smart pointer that would only run the destructor for the storage (e.g., for Box
, only the call to free()
). Then, deref_move()
would extract the “inner pointee” out of it as a MoveRef
. This had the effect of splitting the smart pointer’s destructor, much like we did above on the stack.
This has a number of usability problems. Not only does it need to be called through a macro, but deinit()
isn’t actually safe: failing to call deref_move()
is just as bad as calling mem::forget
on the result. Further, it’s not clear where to plumb the drop flags through.
After many attempts to graft drop flags onto this design, I replaced it with a completely new interface:
Uninit
has been given the clearer name of Storage
: a type that owns just the storage of the moved-from pointer. The two functions were merged into a single, safe function that performs everything in one step, emitting the storage as an out-parameter.
The new DroppingSlot<T>
is like a Slot<T>
, but closer to a safe version of the EventuallyInit<T>
type from earlier: its contents are not necessarily initialized, but if they are, it destroys them, and it only does so when its drop flag is set.
Box
is the most illuminating example of this trait:
MoveRef
’s own implementation illustrates the need for the explicit lifetime bound:
Since this is fundamentally a lifetime narrowing, this can only compile if we insist that 'a: 'frame
, which is implied by Self: 'frame
. Earlier iterations of this design enforced it via a MoveRef<'frame, Self>
receiver, which turned out to be unnecessary.
Conclusions
As of writing, I’m still in the process of self-reviewing this change, but at this point I feel reasonably confident that it’s correct; this article is, in part, written to convince myself that I’ve done this correctly.
The new design will also enable me to finally complete my implementation of a constructor and pinning-friendly vector type; this issue came up in part because the vector type needs to manipulate drop flags in a complex way. For this reason, the actual implementation of drop flags actually uses a counter, not a single boolean.
I doubt this is the last issue I’ll need to chase down in moveit
, but for now, we’re ever-closer to true owning references in Rust.
-
Arguably, running the skipped destructor is also a valid remediation strategy. However, this is incompatible with what the user requested: they asked for the destructor to be supressed, not for it to be run at a later date. This would be somewhat surprising behavior, which we could warn about for the benefit of
unsafe
code, but ultimately the incorrect choice for non-stack storage, such as aMoveRef
referring to the heap. ↩