unsafe in Rust usually involves manual management of memory. Although, ideally, we’d like to exclusively use references for this, sometimes the constraints they apply are too strong. This post is a guide on those constraints and how to weaken them for correctness.
“Unmanaged” languages, like C++ and Rust, provide pointer types for manipulating memory. These types serve different purposes and provide different guarantees. These guarantees are useful for the optimizer but get in the way of correctness of low-level code. This is especially true in Rust, where these constraints are very tight.
NB: This post only surveys data pointers. Function pointers are their own beast, but generally are less fussy, since they all have static lifetime1.
First, let’s survey C++. We have three pointer types: the traditional C pointer
T*, C++ references
T&, and rvalue references
T&&. These generally have pretty weak guarantees.
Pointers provide virtually no guarantees at all: they can be null, point to uninitialized memory, or point to nothing at all! C++ Only requires that they be aligned2. They are little more than an address (until they are dereferenced, of course).
References, on the other hand, are intended to be the “primary” pointer type. A
T& cannot be null, is well-aligned, and is intended to only refer to live memory (although it’s not something C++ can really guarantee for lack of a borrow-checker). References are short-lived.
C++ uses non-nullness to its advantage. For example, Clang will absolutely delete code of the form
Because references cannot be null, and dereferencing the null pointer is always UB, the compiler may make this fairly strong assumption.
T&&, are not meaningfully different from normal references, beyond their role in overload resolution.
Choosing a C++ (primitive) pointer type is well-studied and not the primary purpose of this blog. Rather, we’re interested in how these map to Rust, which has significantly more complicated pointers.
Like C++, Rust has two broad pointer types:
*const T and
*mut T, the raw pointers, and
&mut T, the references.
Rust pointer have even fewer constraints than C++ pointers; they need not even be aligned3! The
mut specifier is basically irrelevant, but is useful as programmer book-keeping tool. Rust also does not enforce the dreaded strict-aliasing rule4 on its pointers.
On the other hand, Rust references are among the most constrained objects in any language that I know of. A shared reference
&'a T, lasting for the lifetime
- Non-null, and well-aligned (like in C++).
- Points to a valid, initialized
Tfor the duration of
Tis never ever mutated for the duration of the reference: the compiler may fold separate reads into one at will. Stronger still, no
&mut Tis reachable from any thread while the reference is reachable.
Stronger still are
&'a mut T references, sometimes called unique references, because in addition to being well-aligned and pointing to a valid
T at all times, no other reachable reference ever aliases it in any thread; this is equivalent to a C
T* restrict pointer.
Unlike C++, which has two almost-identical pointer types, Rust’s two pointer types provide either no guarantees or all of them. The following
unsafe operations are all UB:
Rust also provides the slice types
&[T]5 (of which you get mutable/immutable reference and pointer varieties) and dynamic trait object types
&dyn Tr (again, all four basic pointer types are available).
&[T] is a
usize6 length plus a pointer to that many
Ts. The pointer type of the slice specifies the guarantees on the pointed-to buffer.
*mut [T], for example, has no meaningful guarantees, but still contains the length7. Note that the length is part of the pointer value, not the pointee.
&dyn Tr is a trait object. For our purposes, it consists of a pointer to some data plus a pointer to a static vtable.
*mut dyn Tr is technically a valid type8. Overall, trait objects aren’t really relevant to this post; they are rarely used this way in
Suppose we’re building some kind of data structure; in Rust, data structures will need some sprinkling of
unsafe, since they will need to shovel around memory directly. Typically this is done using raw pointers, but it is preferable to use the least weakened pointer type to allow the compiler to perform whatever optimizations it can.
There are a number of orthogonal guarantees on
&mut T we might want to relax:
- Validity and initialized-ness of the pointee.
- Allocated-ness of the pointee (implied by initialized-ness).
- Global uniqueness of an
The last three of these properties are irrelevant for a zero-sized type. For example, we can generate infinite
&mut () with no consequences:
We materialize a non-null, well-aligned pointer and reborrow it into a static reference; because there is no data to point to, none of the usual worries about the pointee itself apply. However, the pointer itself must still be non-null and well-aligned;
0x1 is not a valid address for an
&[u32; 0], but
This also applies to empty slices; in fact, the compiler will happily promote the expression
&mut  to an arbitrary lifetime:
The most well-known manner of weakening is
Option<&T>. Rust guarantees that this is ABI-compatible with a C pointer
const T*, with
Option::<&T>::None being a null pointer on the C side. This “null pointer optimization” applies to any type recursively containing at least one
The same effect can be achieved for a pointer type using the
NonNull<T> standard library type:
Option<NonNull<T>> is identical to
*mut T. This is most beneficial for types which would otherwise contain a raw pointer:
No matter what, a
&T cannot point to uninitialized memory, since the compiler is free to assume it may read such references at any time with no consequences.
The following classic C pattern is verboten:
Rust doesn’t provide any particularly easy ways to allocate memory without initializing it, too, so this usually isn’t a problem. The
MaybeUninit<T> type can be used for safely allocating memory without initializing it, via
This type acts as a sort of “optimization barrier” that prevents the compiler from assuming the pointee is initialized.
&MaybeUninit<T> is a pointer to potentially uninitialized but definitely allocated memory. It has the same layout as
&T, and Rust provides functions like
assume_init_ref() for asserting that a
&MaybeUninit<T> is definitely initialized. This assertion is similar in consequence to dereferencing a raw pointer.
&mut MaybeUninit<T> should almost be viewed as pointer types in their own right, since they can be converted to/from
&mut T under certain circumstances.
T is almost a “subtype” of
MaybeUninit<T>, we are entitled10 to “forget” that the referent of a
&T is initialized converting it to a
&MaybeUninit<T>. This makes sense because
&T is covariant11 in
&T. However, this is not true of
&mut T, since it’s not covariant:
These types are useful for talking to C++ without giving up too many guarantees.
Option<&MaybeUninit<T>> is an almost perfect model of a
const T*, under the assumption that most pointers in C++ are valid most of the time.
MaybeUninit<T> also finds use in working with raw blocks of memory, such as in a
Vec-style growable slice:
&mut T can never alias any other pointer, but is also the mechanism by which we perform mutation. It can’t even alias with pointers that Rust can’t see; Rust assumes no one else can touch this memory. Thus,
&mut T is not an appropriate analogue for
Like with uninitialized memory, Rust provides a “barrier” wrapper type,
UnsafeCell<T> is the “interior mutability” primitive, which permits us to mutate through an
&UnsafeCell<T> so long as concurrent reads and writes do not occur. We may even convert it to a
&mut T when we’re sure we’re holding the only reference.
UnsafeCell<T> forms the basis of the
Mutex<T> types, each of which performs a sort of “dynamic borrow-checking”:
Cell<T>only permits direct loads and stores.
RefCell<T>maintains a counter of references into it, which it uses to dynamically determine if a mutable reference would be unique.
Mutex<T>, which is like
RefCell<T>but using concurrency primitives to maintain uniqueness.
Because of this, Rust must treat
&UnsafeCell<T> as always aliasing, but because we can mutate through it, it is a much closer analogue to a C++
T&. However, because
&T assumes the pointee is never mutated, it cannot coexist with a
&UnsafeCell<T> to the same memory, if mutation is performed through it. The following is explicitly UB:
Cell<T> type is useful for non-aliasing references to plain-old-data types, which tend to be
Copy. It allows us to perform mutation without having to utter
unsafe. For example, the correct type for a shared mutable buffer in Rust is
&[Cell<u8>], which can be freely
memcpy‘d, without worrying about aliasing12.
This is most useful for sharing memory with another language, like C++, which cannot respect Rust’s aliasing rules.
- Non-nullness can be disabled with
- Initialized-ness can be disabled with
- Uniqueness can be disabled with
There is no way to disable alignment and validity restrictions: references must always be aligned and have a valid lifetime attached. If these are unachievable, raw pointers are your only option.
We can combine these various “weakenings” to produce aligned, lifetime-bound references to data with different properties. For example:
&UnsafeCell<MaybeUninit<T>>is as close as we can get to a C++
Option<&UnsafeCell<T>>is a like a raw pointer, but to initialized memory.
Option<&mut MaybeUninit<T>>is like a raw pointer, but with alignment, aliasing, and lifetime requirements.
UnsafeCell<&[T]>permits us to mutate the pointer to the buffer and its length, but not the values it points to themselves.
UnsafeCell<&[UnsafeCell<T>]>lets us mutate both the buffer and its actual pointer/length.
Interestingly, there is no equivalent to a C++ raw pointer: there is no way to create a guaranteed-aligned pointer without a designated lifetime13.
Rust and C++ have many other pointer types, such as smart pointers. However, in both languages, both are built in terms of these basic pointer types. Hopefully this article is a useful reference for anyone writing
unsafe abstraction that wishes to avoid using raw pointers when possible. ◼
Except in Go, which synthesizes vtables on the fly. Story for another day. ↩
It is, apparently, a little-known fact that constructing unaligned pointers, but then never dereferencing them, is still UB in C++. C++ could, for example, store information in the lower bits of such a pointer. The in-memory representation of a pointer is actually unspecified! ↩
This is useful when paired with the Rust
<*const T>::read_unaligned()function, which can be compiled down to a normal load on architectures that do not have alignment restrictions, like x86_64 and aarch64. ↩
Another story for another time. ↩
usizeis Rust’s machine word type, compare
It is also not a type I have encountered enough to have much knowledge on. For example, I don’t actually know if the vtable half of a
*mut dyn Trmust always be valid or not; I suspect the answer is “no”, but I couldn’t find a citation for this. ↩
Note that you cannot continue to use a reference to freed, zero-sized memory. This subtle distinction is called out in https://doc.rust-lang.org/std/ptr/index.html#safety. ↩
transmutemust be used to perform this operation, but I see no reason way this would permit us to perform an illegal mutation without uttering
unsafea second time. In particular,
MaybeUninit::assume_init_read(), which could be used to perform illegal copies, is an
A covariant type
Cov<T>is once where, if
Tis a subtype of
Cov<T>is a subtype of
Cov<U>. This isn’t particularly noticeable in Rust, where the only subtyping relationships are
'b, but is nonetheless important for advanced type design. ↩
Cell<T>does not provide synchronization; you still need locks to share it between threads. ↩
I have previously proposed a sort of
'!“lifetime” that is intended to be the lifetime of dangling references (a bit of an oxymoron). This would allow us to express this concept, but I need to flesh out the concept more. ↩