2021-06-01 • 4898 words • 54 minutes
• #dark-arts • #linkers • #toolchains

Everything You Never Wanted To Know About Linker Script

Low level software usually has lots of .cc or .rs files. Even lower-level software, like your cryptography library, probably has .S containing assembly, my least favorite language for code review.

The lowest level software out there, firmware, kernels, and drivers, have one third file type to feed into the toolchain: an .ld file, a “linker script”. The linker script, provided to Clang as -Wl,-T,foo.ld¹, is like a template for the final executable. It tells the linker how to organize code from the input objects. This permits extremely precise control over the toolchain’s output.

Very few people know how to write linker script; it’s a bit of an obscure skill. Unfortunately, I’m one of them, so I get called to do it on occasion. Hopefully, this post is a good enough summary of the linker script language that you, too, can build your own binary!

Everything in this post can be found in excruciating detail in GNU ld’s documentation; lld accepts basically the same syntax. There’s no spec, just what your linker happens to accept. I will, however, do my best to provide a more friendly introduction.

No prior knowledge of how toolchains work is necessary! Where possible, I’ve tried to provide historical context on the names of everything. Toolchains are, unfortunately, bound by half a century of tradition. Better to at least know why they’re called that.

Wait, an .S file?

On Windows, assembly files use the sensible .asm extension. POSIX we use the .s extension, or .S when we’d like Clang to run the C preprocessor on them (virtually all hand-written assembly is of the second kind).

I don’t actually have a historical citation² for .s, other than that it came from the Unix tradition of obnoxiously terse names. If we are to believe that .o stands for “object”, and .a stands for “archive”, then .s must stand for “source”, up until the B compiler replaced them with .b files! See http://man.cat-v.org/unix-1st/1/b.

A final bit of trivia: .C files are obviously different from .c files… they’re C++ files! (Seriously, try it.)

Note: This post is specifically about POSIX. I know basically nothing about MSVC and link.exe other than that they exist. The most I’ve done is helped people debug trivial __declspec issues.

I will also only be covering things specific to linking an executable; linking other outputs, like shared libraries, is beyond this post.

Seriously, What’s a linker?

A linker is but a small part of a toolchain, the low-level programmer’s toolbox: everything you need to go from source code to execution.

The crown jewel of any toolchain is the compiler. The LLVM toolchain, for example, includes Clang, a C/C++³ compiler. The compiler takes source code, such as .cc, and lowers it down to a .s file, an assembly file which textually describes machine code for a specific architecture (you can also write them yourself).

Another toolchain program, the assembler, assembles each .s into a .o file, an object file⁴. An assembly file is merely a textual representation of an object file; assemblers are not particularly interesting programs.

A third program, the linker, links all of your object files into a final executable or binary, traditionally given the name a.out⁵.

This three (or two, if you do compile/assemble in one step) phase process is sometimes called the C compilation model. All modern software build infrastructure is built around this model⁶.

Even More Stages!

Clang, being based on LLVM, actually exposes one stage in between the .cc file and the .s file. You can ask it to skip doing codegen and emit a .ll file filled with LLVM IR, an intermediate between human-writable source code and assembly. The magic words to get this file are clang -S -emit-llvm. (The Rust equivalent is rustc --emit=llvm-ir.)

The LLVM toolchain provides llc, the LLVM compiler, which performs the .ll -> .s step (optionally assembling it, too). lli is an interpreter for the IR. Studying IR is mostly useful for understanding optimization behavior; topic for another day.

The compiler, assembler, and linker are the central components of a toolchain. Other languages, like Rust, usually provide their own toolchain, or just a compiler, reusing the existing C/C++ toolchain. The assembler and linker are language agnostic.

The toolchain also provides various debugging tools, including an interactive debugger, and tools for manipulating object files, such as nm, objdump, objcopy, and ar.

These days, most of this stuff is bundled into a single program, the compiler frontend, which knows how to compiler, assemble, and link, in one invocation. You can ask Clang to spit out .o files with clang -c, and .s files with clang -S.

Trs Nms

The UNIX crowd at Bell Labs was very excited about short, terse names. This tradition survives in Go’s somewhat questionable practice of single-letter variables.

Most toolchain program names are cute contractions. cc is “C compiler”; compilers for almost all other languages follow this convention, like rustc, javac, protoc, and scalac; Clang is just clang, but is perfectly ok being called as cc.

as is “assembler”; ld is “loader” (you’ll learn why sooner). ar is “archiver”, nm is “names”. Other names tend to be a bit more sensible.

Final Link

Some fifty years ago at Bell Labs, someone really wanted to write a program with more than one .s file. To solve this, a program that could “link” symbol references across object files was written: the first linker.

You can take several .o files and use ar (an archaic tar, basically) to create a library, which always have names like libfoo.a (the lib is mandatory). A static library is just a collection of objects, which can be provided on an as-needed basis to the linker.

The “final link” incorporates several .o files and .a files to produce an executable. It does roughly the following:

Parse all the objects and static libraries and put their symbols into a database. Symbols are named addresses of functions and global variables.
Search for all unresolved symbol references in the .o files and match it up with a symbol from the database, recursively doing this for any code in a .a referenced during this process. This forms a sort of dependency graph between sections. This step is called symbol resolution.
Throw out any code that isn’t referenced by the input files by tracing the dependency graph from the entry-point symbol (e.g., _start on Linux). This step is called garbage collection.
Execute the linker script to figure out how to stitch the final binary together. This includes discovering the offsets at which everything will go.
Resolve relocations, “holes” in the binary that require knowing the final runtime address of the section. Relocations are instructions placed in the object file for the linker to execute.
Write out the completed binary.

This process is extremely memory-intensive; it is possible for colossal binaries, especially ones with tons of debug information, to “fail to link” because the linker exhausts the system’s memory.

We only care about step 4; whole books can be written about the previous steps. Thankfully, Ian Lance Taylor, mad linker scientist and author of gold, has written several excellent words on this topic: https://lwn.net/Articles/276782/.

Object Files and Sections

Linkers, fundamentally, consume object files and produce object files; the output is executable, meaning that all relocations have been resolved and an entry-point address (where the OS/bootloader will jump to to start the binary).

It’s useful to be able to peek into object files. The objdump utility is best for this. objdump -x my_object.o will show all headers, telling you what exactly is in it.

At a high level, an object file describes how a program should be loaded into memory. The object is divided into sections, which are named blocks of data. Sections may have file-like permissions, such as allocatable, loadable, readonly, and executable. objdump -h can be used to show the list of sections. Some selected lines of output from objdump on my machine (I’m on a 64-bit machine, but I’ve trimmed leading zeros to make it all fit):

$ objdump -h "$(which clang)"
/usr/bin/clang:     file format elf64-x86-64

Sections:
Idx Name    Size      VMA       LMA       File off  Algn
 11 .init   00000017  00691ab8  00691ab8  00291ab8  2**2
            CONTENTS, ALLOC, LOAD, READONLY, CODE
 12 .plt    00006bb0  00691ad0  00691ad0  00291ad0  2**4
            CONTENTS, ALLOC, LOAD, READONLY, CODE
 13 .text   0165e861  00698680  00698680  00298680  2**4
            CONTENTS, ALLOC, LOAD, READONLY, CODE
 14 .fini   00000009  01cf6ee4  01cf6ee4  018f6ee4  2**2
            CONTENTS, ALLOC, LOAD, READONLY, CODE
 15 .rodata 0018ec68  01cf6ef0  01cf6ef0  018f6ef0  2**4
            CONTENTS, ALLOC, LOAD, READONLY, DATA
 24 .data   000024e8  021cd5d0  021cd5d0  01dcc5d0  2**4
            CONTENTS, ALLOC, LOAD, DATA
 26 .bss    00009d21  021cfac0  021cfac0  01dceab8  2**4
            ALLOCTerminal

Allocateable (ALLOC) sections must be allocated space by the operating system; if the section is loadable (LOAD), then the operating system must further fill that space with the contents of the section. This process is called loading and is performed by a loader program⁷. The loader is sometimes called the “dynamic linker”, and is often the same program as the “program linker”; this is why the linker is called ld.

Loading can also be done beforehand using the binary output format. This is useful for tiny microcontrollers that are too primitive to perform any loading. objcopy is useful for this and many other tasks that involve transforming object files.

Some common (POSIX) sections include:

.text, where your code lives⁸. It’s usually a loadable, readonly, executable section.
.data contains the initial values of global variables. It’s loadable.
.rodata contains constants. It’s loadable and readonly.
.bss is an empty allocatable section⁹. C specifies that uninitialized globals default to zero; this is a convenient way for avoiding storing a huge block of zeros in the executable!
Debug sections that are not loaded or allocated; these are usually removed for release builds.

After the linker decides which sections from the .o and .a inputs to keep (based on which symbols it decided it needed), it looks to the linker script how to arrange them in the output.

Let’s write our first linker script!

SECTIONS {
  /* Define an output section ".text". */
  .text : {
    /* Pull in all symbols in input sections named .text */
    *(.text)
    /* Do the same for sections starting with .text.,
       such as .text.foo */
    *(.text.*)
  }

  /* Do the same for ".bss", ".rodata", and ".data". */
  .bss : { *(.bss); *(.bss.*) }
  .data : { *(.data); *(.data.*) }
  .rodata : { *(.rodata); *(.rodata.*) }
}Linker Script

This tells the linker to create a .text section in the output, which contains all sections named .text from all inputs, plus all sections with names like .text.foo. The content of the section is laid out in order: the contents of all .text sections will come before any .text.* sections; I don’t think the linker makes any promises about the ordering between different objects¹⁰.

As I mentioned before, parsers for linker script are fussy¹¹: the space in .text : is significant.

Note that the two .text sections are different, and can have different names! The linker generally doesn’t care what a section is named; just its attributes. We could name it code if we wanted to; even the leading period is mere convention. Some object file formats don’t support arbitrary sections; all the sane ones (ELF, COFF, Mach-O) don’t care, but they don’t all spell it the same way; in Mach-O, you call it __text.

Before continuing, I recommend looking at the appendix so that you have a clear path towards being able to run and test your linker scripts!

Input Section Syntax

None of this syntax is used in practice but it’s useful to contextualize the syntax for pulling in a section. The full form of the syntax is
> archive:object(section1 section2 ...)
Plaintext
Naturally, all of this is optional, so you can write foo.o or libbar.a:(.text) or :baz.o(.text .data), where the last one means “not part of a library”. There’s even an EXCLUDE_FILE syntax for filtering by source object, and a INPUT_SECTION_FLAGS syntax for filtering by the presence of format-specific flags.

Do not use any of this. Just write *(.text) and don’t think about it too hard. The * is just a glob for all objects.

Each section has an alignment, which is just the maximum of the alignments of all input sections pulled into it. This is important for ensuring that code and globals are aligned the way the architecture expects them to be. The alignment of a section can be set explicitly with

SECTIONS {
  .super_aligned : ALIGN(16) {
    /* ... */
  }
}Linker Script

You can also instruct the linker to toss out sections using the special /DISCARD/ output section, which overrides any decisions made at garbage-collection time. I’ve only ever used this to discard debug information that GCC was really excited about keeping around.

On the other hand, you can use KEEP(*(.text.*)) to ensure no .text sections are discarded by garbage-collection. Unfortunately, this doesn’t let you pull in sections from static libraries that weren’t referenced in the input objects.

LMA and VMA

Every section has three addresses associated with it. The simplest is the file offset: how far from the start of the file to find the section.

The virtual memory address, or VMA, is where the program expects to find the section at runtime. This is the address that is used by pointers and the program counter.

The load memory address, or LMA, is where the loader (be it a runtime loader or objcpy) must place the code. This is almost always the same as the VMA. Later on, in Using Symbols and LMAs, I’ll explain a place where this is actually useful.

When declaring a new section, the VMA and LMA are both set to the value¹² of the location counter, which has the extremely descriptive name .¹³. This counter is automatically incremented as data is copied from the input

We can explicitly specify the VMA of a section by putting an expression before the colon, and the LMA by putting an expression in the AT(lma) specifier after the colon:

SECTIONS {
  .text 0x10008000: AT(0x40008000) {
    /* ... */
  }
}Linker Script

This will modify the location counter; you could also write it as

SECTIONS {
  . = 0x10008000;
  .text : AT(0x40008000) {
    /* ... */
  }
}Linker Script

Within SECTIONS, the location counter can be set at any point, even while in the middle of declaring a section (though the linker will probably complain if you do something rude like move it backwards).

The location counter is incremented automatically as sections are added, so it’s rarely necessary to fuss with it directly.

Memory Regions and Section Allocation

By default, the linker will simply allocate sections starting at address 0. The MEMORY statement can be used to define memory regions for more finely controlling how VMAs and LMAs are allocated without writing them down explicitly.

A classic example of a MEMORY block separates the address space into ROM and RAM:

MEMORY {
  rom (rx)   : ORIGIN = 0x8000,     LENGTH = 16K
  ram (rw!x) : ORIGIN = 0x10000000, LENGTH = 256M
}Linker Script

A region is a block of memory with a name and some attributes. The name is irrelevant beyond the scope of the linker script. The attributes in parens are used to specify what sections could conceivably go in that region. A section is compatible if it has any of the attributes before the !, and none which come after the !. (This filter mini-language isn’t very expressive.)

The attributes are the ones we mentioned earlier: rwxal are readonly, read/write, executable, allocated, and loadable¹⁴.

When allocating a section a VMA, the linker will try to pick the best memory region that matches the filter using a heuristic. I don’t really trust the heuristic, but you can instead write > region to put something into a specific region. Thus,

SECTION {
  .data {
    /* ... */
  } > ram AT> rom
}Linker Script

AT> is the “obvious” of AT() and >, and sets which region to allocate the LMA from.

The origin and length of a region can be obtained with the ORIGIN(region) and LENGTH(region) functions.

Other Stuff to Put In Sections

Output sections can hold more than just input sections. Arbitrary data can be placed into sections using the BYTE, SHORT, LONG and QUAD for placing literal 8, 16, 32, and 64-bit unsigned integers into the section:

SECTIONS {
  .screams_internally : { LONG(0xaaaaaaaa) }
}Linker Script

Numeric literals in linker script may, conveniently, be given the suffixes K or M to specify a kilobyte or megabyte quantity. E.g., 4K is sugar for 4096.

Fill

You can fill the unused portions of a section by using the FILL command, which sets the “fill pattern” from that point onward. For example, we can create four kilobytes of 0xaa using FILL and the location counter:

SECTIONS {
  .scream_page : {
    FILL(0xaa)
    . += 4K;
  }
}Linker Script

The “fill pattern” is used to fill any unspecified space, such as alignment padding or jumping around with the location counter. We can use multiple FILLs to vary the fill pattern, such as if we wanted half the page to be 0x0a and half 0xa0:

SECTIONS {
  .scream_page : {
    FILL(0x0a)
    . += 2K;
    FILL(0xa0)
    . += 2K;
  }
}Linker Script

When using one fill pattern for the whole section, you can just write = fill; at the end of the section. For example,

SECTIONS {
  .scream_page : {
    . += 4K;
  } = 0xaa;
}Linker Script

Linker Symbols

Although the linker needs to resolve all symbols using the input .o and .a files, you can also declare symbols directly in linker script; this is the absolute latest that symbols can be provided. For example:

SECTIONS {
  my_cool_symbol = 5;
}Linker Script

This will define a new symbol with value 5. If we then wrote extern char my_cool_symbol;, we can access the value placed by the linker. However, note that the value of a symbol is an address! If you did

extern char my_cool_symbol;

uintptr_t get() {
  return my_cool_symbol;
}C

the processor would be very confused about why you just dereferenced a pointer with address 5. The correct way to extract a linker symbol’s value is to write

extern char my_cool_symbol;

uintptr_t get() {
  return (uintptr_t)&my_cool_symbol;
}C

It seems a bit silly to take the address of the global and use that as some kind of magic value, but that’s just how it works. The exact same mechanism works in Rust, too:

fn get() -> usize {
  extern "C" {
    #[link_name = "my_cool_symbol"]
    static SYM: u8;
  }

  addr_of!(SYM) as usize
}Rust

The most common use of this mechanism is percolating information not known until link time. For example, a common idiom is

SECTIONS {
  .text : {
    __text_start = .;
    /* stuff */
    __text_end = .;
  }
}Linker Script

This allows initialization code to find the section’s address and length; in this case, the pointer values are actually meaningful!

Wunderbars

It’s common practice to lead linker symbols with two underscores, because C declares a surprisingly large class of symbols reserved for the implementation, so normal user code won’t call them. These include names like __text_start, which start with two underscores, and names starting with an underscore and an uppercase letter, like _Atomic.

However, libc and STL headers will totally use the double underscore symbols to make them resistant to tampering by users (which they are entitled to), so beware!

Symbol assignments can even go inside of a section, to capture the location counter’s value between input sections:

SECTIONS {
  .text : {
    *(.text)
    text_middle = .;
    *(.text.*)
  }
}Linker Script

Symbol names are not limited to C identifiers, and may contain dashes, periods, dollar signs, and other symbols. They may even be quoted, like "this symbol has spaces", which C will never be able to access as an extern.

There is a mini-language of expressions that symbols can be assigned to. This includes:

Numeric literals like 42, 0xaa, and 4K.
The location counter, ..
Other symbols.
The usual set of C operators, such as arithmetic and bit operations. Xor is curiously missing.
A handful of builtin functions, described below.

There are some fairly complicated rules around how symbols may be given relative addresses to the start of a section, which are only relevant when dealing with position-independent code: https://sourceware.org/binutils/docs/ld/Expression-Section.html

Functions belong to one of two board categories: getters for properties of sections, memory regions, and other linker structures; and arithmetic. Useful functions include:

ADDR, LOADADDR, SIZEOF, and ALIGNOF, which produce the VMA, LMA, size, and alignment of a previously defined section.
ORIGIN and LENGTH, which produce the start address and length of a memory region.
MAX, MIN are obvious; LOG2CEIL computes the base-2 log, rounded up.
ALIGN(expr, align) rounds expr to the next multiple of align. ALIGN(align) is roughly equivalent to ALIGN(., align) with some subtleties around PIC. . = ALIGN(align); will align the location counter to align.

Some other builtins can be found at https://sourceware.org/binutils/docs/ld/Builtin-Functions.html.

A symbol definition can be wrapped in the PROVIDEO() function to make it “weak”, analogous to the “weak symbol” feature found in Clang. This means that the linker will not use the definition if any input object defines it.

Using Symbols and LMAs

As mentioned before, it is extremely rare for the LMA and VMA to be different. The most common situation where this occurs is when you’re running on a system, like a microcontroller, where memory is partitioned into two pieces: ROM and RAM. The ROM has the executable burned into it, and RAM starts out full of random garbage.

Most of the contents of the linked executable are read-only, so their VMA can be in ROM. However, the .data and .bss sections need to lie in RAM, because they’re writable. For .bss this is easy, because it doesn’t have loadable content. For .data, though, we need to separate the VMA and LMA: the VMA must go in RAM, and the LMA in ROM.

This distinction is important for the code that initializes the RAM: while for .bss all it has to do is zero it, for .data, it has to copy from ROM to RAM! The LMA lets us distinguish the copy source and the copy destination.

This has the important property that it tells the loader (usually objcopy in this case) to use the ROM addresses for actually loading the section to, but to link the code as if it were at a RAM address (which is needed for things like PC-relative loads to work correctly).

Here’s how we’d do it in linker script:

MEMORY {
  rom : /* ... */
  ram : /* ... */
}

SECTIONS {
  /* .text and .rodata just go straight into the ROM. We don't need
     to mutate them ever. */
  .text : { *(.text) } > rom
  .rodata : { *(.rodata) } > rom

  /* .bss doesn't have any "loadable" content, so it goes straight
     into RAM. We could include `AT> rom`, but because the sections
     have no content, it doesn't matter. */
  .bss : { *(.bss) } > ram

  /* As described above, we need to get a RAM VMA but a ROM LMA;
     the > and AT> operators achieve this. */
  .data : { *(.data) } > ram AT> rom
}

/* The initialization code will need some symbols to know how to
   zero the .bss and copy the initial .data values. We can use the
   functions from the previous section for this! */

bss_start = ADDR(.bss);
bss_end = bss_start + SIZEOF(.bss);

data_start = ADDR(.data);
data_end = data_start + SIZEOF(.data);

rom_data_start = LOADADDR(.data);Linker Script

Although we would normally write the initialization code in assembly (since it’s undefined behavior to execute C before initializing the .bss and .data sections), I’ve written it in C for illustrative purposes:

#include <string.h>

extern char bss_start[];
extern char bss_end[];
extern char data_start[];
extern char data_end[];
extern char rom_data_start[];

void init_sections(void) {
  // Zero the .bss.
  memset(bss_start, 0, bss_end - bss_start);

  // Copy the .data values from ROM to RAM.
  memcpy(data_start, rom_data_start, data_end - data_start);
}C

Misc Linker Script Features

Linker script includes a bunch of other commands that don’t fit into a specific category:

ENTRY() sets the program entry-point, either as a symbol or a raw address. The -e flag can be used to override it. The ld docs assert that there are fallbacks if an entry-point can’t be found, but in my experience you can sometimes get errors here. ENTRY(_start) would use the _start symbol, for example¹⁵.
INCLUDE "path/to/file.ld" is #include but for linker script.
INPUT(foo.o) will add foo.o as a linker input, as if it was passed at the commandline. GROUP is similar, but with the semantics of --start-group.
OUTPUT() overrides the usual a.out default output name.
ASSERT() provides static assertions.
EXTERN(sym) causes the linker to behave as if an undefined reference to sym existed in an input object.

(Other commands are documented, but I’ve never needed them in practice.)

Real Linker Scripts

It may be useful to look at some real-life linker scripts.

If you wanna see what Clang, Rust, and the like all ultimately use, run ld --verbose. This will print the default linker script for your machine; this is a really intense script that uses basically every feature available in linker script (and, since it’s GNU, is very poorly formatted).

The Linux kernel also has linker scripts, which are differently intense, because they use the C preprocessor. For example, the one for amd64: https://github.com/torvalds/linux/blob/master/arch/x86/kernel/vmlinux.lds.S.

Tock OS, a secure operating system written in Rust, has some pretty solid linker scripts, with lots of comments: https://github.com/tock/tock/blob/master/boards/kernel_layout.ld. I recommend taking a look to see what a “real” but not too wild linker script looks like. There’s a fair bit of toolchain-specific stuff in there, too, that should give you an idea of what to expect.

Happy linking!

Appendix: A Linker Playground

tl;dr: If you don’t wanna try out any examples, skip this section.

I want you to be able to try out the examples above, but there’s no Godbolt for linker scripts (yet!). Unlike normal code, you can’t just run linker script through a compiler, you’re gonna need some objects to link, too! Let’s set up a very small C project for testing your linker scripts.

Note: I’m assuming you’re on Linux, with x86_64, and using Clang. If you’re on a Mac (even M1), you can probably make ld64 do the right thing, but this is outside of what I’m an expert on.

If you’re on Windows, use WSL. I have no idea how MSCV does linker scripts at all.

First, we want a very simple static library:

int lib_call(const char* str) {
  // Discard `str`, we just want to take any argument.
  (void)str;

  // This will go in `.bss`.
  static int count;
  return count++;
}godboltextern.c

Compile extern.c into a static library like so:

clang -c extern.c
ar rc libextern.a extern.oShell

We can check out that we got something reasonable by using nm. The nm program shows you all the symbols a library or object defines.

$ nm libextern.a
extern.o:
0000000000000000 T lib_call
0000000000000000 b lib_call.countTerminal

This shows us the address, section type, and name of each symbol; man nm tells us that T means .text and b means .bss. Capital letters mean that the symbol is exported, so the linker can use it to resolve a symbol reference or a relocation. In C/C++, symbols declared static or in an unnamed namespace are “hidden”, and can’t be referenced outside of the object. This is sometimes called internal vs external linkage.

Next, we need a C program that uses the library:

extern int lib_call(const char* str);

// We're gonna use a custom entrypoint. This code will never run anyways, we
// just care about the linker output.
void run(void) {
  // This will go in `.data`, because it's initialized to non-zero.
  static int data = 5;

  // The string-constant will go into `.rodata`.
  data = lib_call("Hello from .rodata!");
}godboltrun.c

Compile it with clang -c run.c. We can inspect the symbol table with nm as before:

$ nm run.o
                 U lib_call
0000000000000000 T run
0000000000000000 d run.dataTerminal

As you might guess, d is just .data. However, U is interesting: it’s an undefined symbol, meaning the linker will need to perform a symbol resolution! In fact, if we ask Clang to link this for us (it just shells out to a linker like ld):

$ clang run.o
/usr/bin/ld: /somewhere/crt1.o: in function `_start':
(.text+0x20): undefined reference to `main'
/usr/bin/ld: run.o: in function `run':
run.c:(.text+0xf): undefined reference to `lib_call'Terminal

The linker also complains that there’s no main() function, and that some object we didn’t provide called crt1.o wants it. This is the startup code for the C runtime; we can skip linking it with -nostartfiles. This will result in the linker picking an entry point for us.

We can resolve the missing symbol by linking against our library. -lfoo says to search for the library libfoo.a; -L. says to include the current directory for searching for libraries.

clang run.o -L. -lextern -nostartfilesShell

This gives us our binary, a.out, which we can now objdump:

$ objdump -d -Mintel a.out

a.out:     file format elf64-x86-64


Disassembly of section .text:

0000000000401000 <run>:
  401000:  55                      push   rbp
  401001:  48 89 e5                mov    rbp,rsp
  401004:  48 bf 00 20 40 00 00    movabs rdi,0x402000
  40100b:  00 00 00 
  40100e:  e8 0d 00 00 00          call   401020 <lib_call>
  401013:  89 04 25 00 40 40 00    mov    DWORD PTR ds:0x404000,eax
  40101a:  5d                      pop    rbp
  40101b:  c3                      ret    
  40101c:  0f 1f 40 00             nop    DWORD PTR [rax+0x0]

0000000000401020 <lib_call>:
  401020:  55                      push   rbp
  401021:  48 89 e5                mov    rbp,rsp
  401024:  48 89 7d f8             mov    QWORD PTR [rbp-0x8],rdi
  401028:  8b 04 25 04 40 40 00    mov    eax,DWORD PTR ds:0x404004
  40102f:  89 c1                   mov    ecx,eax
  401031:  83 c1 01                add    ecx,0x1
  401034:  89 0c 25 04 40 40 00    mov    DWORD PTR ds:0x404004,ecx
  40103b:  5d                      pop    rbp
  40103c:  c3                      ret    Terminal

Let’s write up the simplest possible linker script for all this:

ENTRY(run)
SECTIONS {
  .text : { *(.text); *(.text.*) }
  .bss : { *(.bss); *(.bss.*) }
  .data : { *(.data); *(.data.*) }
  .rodata : { *(.rodata); *(.rodata.*) }
}link.ld

Let’s link! We’ll also want to make sure that the system libc doesn’t get in the way, using -nostdlib¹⁶.

clang run.o -L. -lextern -nostartfiles -nostdlib -Wl,-T,link.ldShell

At this point, you can use objdump to inspect a.out at your leisure! You’ll notice there are a few other sections, like .eh_frame. Clang adds these by default, but you can throw them out using /DISCARD/.

It’s worth it to run the examples in the post through the linker using this “playground”. You can actually control the sections Clang puts symbols into using the __attribute__((section("blah"))) compiler extension. The Rust equivalent is #[link_section = "blah"].

Blame GCC for this. -Wl feeds arguments through to the linker, and -T is ld’s linker script input flag. Thankfully, rustc is far more sensible here: -Clink-args=-Wl,-T,foo.ld (when GCC/Clang is your linker frontend). ↩
Correction, 2022-09-11. I have really been bothered by not knowing if this is actually true, and have periodically asked around about it. I asked Russ Cox, who was actually at Bell Labs back in the day, and he asked Ken Thompson, who confirms: it’s genuinely .s for source, because it was the only source they had back then.

I am glad I got this from the horse’s mouth. :) ↩
And many other things, like Objective-C. ↩
Completely and utterly unrelated to the objects of object-oriented programming. Best I can tell, the etymology is lost to time. ↩
a.out is also an object file format, like ELF, but toolchains live and die by tradition, so that’s the name given to the linker’s output by default. ↩
Rust does not compile each .rs file into an object, and its “crates” are much larger than the average C++ translation unit. However, the Rust compiler will nonetheless produce many object files for a single crate, precisely for the benefit of this compilation model. ↩
Operating systems are loaded by a bootloader. Bootloaders are themselves loaded by other bootloaders, such as the BIOS. At the bottom of the turtles is the mask ROM, which is a tiny bootloader permanently burned into the device. ↩
No idea on the etymology. This isn’t ASCII text! ↩
Back in the 50s, this stood for “block started by symbol”. ↩
Yes, yes, you can write SORT_BY_NAME(*)(.text) but that’s not really something you ever wind up needing.

See https://sourceware.org/binutils/docs/ld/Input-Section-Wildcards.html for more information on this. ↩
You only get /* */ comment syntax because that’s the lowest common denominator. ↩
Well, . actually gets increased to the alignment of the section first. If you insist on an unaligned section, the syntax is, obviously,
```
SECTIONS {
  .unaligned .: {
    /* ... */
  }
}
```
Linker Script
(That was sarcasm. It must be stressed that this is not a friendly language.) ↩
This symbol is also available in assembly files. jmp . is an overly-cute idiom for an infinity busy loop. It is even more terse in ARM and RISC-V, where it’s written b . and j ., respectively.

Personally, I prefer the obtuse clarity of loop_forever: j loop_forever. ↩
These are the same characters used to declare a section in assembly. If I wanted to place my code in a section named .crt0 but wanted it to be placed into a readonly, executable memory block, use the the assembler directive .section .crt0, rxal ↩
Note that the entry point is almost never a function called main(). In the default configuration of most toolchains, an object called crt0.o is provided as part of the libc, which provides a _start() function that itself calls main(). CRT stands for “C runtime”; thus, crt0.o initializes the C runtime.

This file contains the moral equivalent of the following C code, which varies according to target:
```
extern int main(int argc, char** argv);
noreturn void _start() {
  init_libc();    // Initializes global libc state.
  run_ctors();    // Runs all library constructors.
  int ret = main(get_argc(), get_argv());
  run_dtors();    // Runs all library destructors.
  cleanup_libc(); // Deinitializes the libc.
  
  exit(ret); // Asks the OS to gracefully destroy the process.
}
```
C
This behavior can be disabled with -nostartfiles in Clang. The OSDev wiki has some on this topic: https://wiki.osdev.org/Creating_a_C_Library#Program_Initialization. ↩
If you include libc, you will get bizarre errors involving something called “gcc_s”. libgcc (and libgcc_s) is GCC’s compiler runtime library. Where libc exposes high-level operations on the C runtime and utilities for manipulating common objects, libgcc provides even lower-level support, including:
- Polyfills for arithmetic operations not available on the target. For example, dividing two 64-bit integers on most 32-bit targets will emit a reference to the a symbol like __udivmoddi4 (they all have utterly incomprehensible names like this one).
- Soft-float implementations, i.e., IEEE floats implemented in software for targets without an FPU.
- Bits of unwinding (e.g. exceptions and panics) support (the rest is in libunwind).
- Miscellaneous runtime support code, such as the code that calls C++ static initializers.
Clang’s version, libcompiler-rt, is ABI-compatible with libgcc and provides various support for profiling, sanitizers, and many, many other things the compiler needs available for compiling code. ↩

2021-05-24 • 2069 words • 22 minutes
• #c++ • #rust • #pointers

The Taxonomy of Pointers

Writing unsafe in Rust usually involves manual management of memory. Although, ideally, we’d like to exclusively use references for this, sometimes the constraints they apply are too strong. This post is a guide on those constraints and how to weaken them for correctness.

“Unmanaged” languages, like C++ and Rust, provide pointer types for manipulating memory. These types serve different purposes and provide different guarantees. These guarantees are useful for the optimizer but get in the way of correctness of low-level code. This is especially true in Rust, where these constraints are very tight.

NB: This post only surveys data pointers. Function pointers are their own beast, but generally are less fussy, since they all have static lifetime¹.

Basic C++ Pointers

First, let’s survey C++. We have three pointer types: the traditional C pointer T*, C++ references T&, and rvalue references T&&. These generally have pretty weak guarantees.

Pointers provide virtually no guarantees at all: they can be null, point to uninitialized memory, or point to nothing at all! C++ Only requires that they be aligned². They are little more than an address (until they are dereferenced, of course).

References, on the other hand, are intended to be the “primary” pointer type. A T& cannot be null, is well-aligned, and is intended to only refer to live memory (although it’s not something C++ can really guarantee for lack of a borrow-checker). References are short-lived.

C++ uses non-nullness to its advantage. For example, Clang will absolutely delete code of the form

auto& x = Foo();
if (&x == nullptr) {
  DoSomething();
}C++

Because references cannot be null, and dereferencing the null pointer is always UB, the compiler may make this fairly strong assumption.

Rvalue references, T&&, are not meaningfully different from normal references, beyond their role in overload resolution.

Choosing a C++ (primitive) pointer type is well-studied and not the primary purpose of this blog. Rather, we’re interested in how these map to Rust, which has significantly more complicated pointers.

Basic Rust Pointers

Like C++, Rust has two broad pointer types: *const T and *mut T, the raw pointers, and &T and &mut T, the references.

Rust pointer have even fewer constraints than C++ pointers; they need not even be aligned³! The const/mut specifier is basically irrelevant, but is useful as programmer book-keeping tool. Rust also does not enforce the dreaded strict-aliasing rule⁴ on its pointers.

On the other hand, Rust references are among the most constrained objects in any language that I know of. A shared reference &'a T, lasting for the lifetime 'a, satisfies:

Non-null, and well-aligned (like in C++).
Points to a valid, initialized T for the duration of 'a.
T is never ever mutated for the duration of the reference: the compiler may fold separate reads into one at will. Stronger still, no &mut T is reachable from any thread while the reference is reachable.

Stronger still are &'a mut T references, sometimes called unique references, because in addition to being well-aligned and pointing to a valid T at all times, no other reachable reference ever aliases it in any thread; this is equivalent to a C T* restrict pointer.

Unlike C++, which has two almost-identical pointer types, Rust’s two pointer types provide either no guarantees or all of them. The following unsafe operations are all UB:

let null = unsafe { &*ptr::null() };

// A reference to u8 need not be sufficiently aligned
// for a reference to u32.
let unaligned = unsafe { &*(&0u8 as *const u8 as *const u32) };

// More on this type later...
let uninit = unsafe { &*MaybeUninit::uninit().as_ptr() };

let x = 0;
unsafe {
  // Not UB in C++ with const_cast!
  let p = &x;
  (p as *const i32 as *mut i32).write(42);
}

// Two mutable references live at the same time pointing to
// the same memory. This would also be fine in C++!
let mut y = 0;
let p1 = unsafe { &*(&mut y as *mut i32) };
let p2 = unsafe { &*(&mut y as *mut i32) };Rust

Wide Pointers

Rust also provides the slice types &[T]⁵ (of which you get mutable/immutable reference and pointer varieties) and dynamic trait object types &dyn Tr (again, all four basic pointer types are available).

&[T] is a usize⁶ length plus a pointer to that many Ts. The pointer type of the slice specifies the guarantees on the pointed-to buffer. *mut [T], for example, has no meaningful guarantees, but still contains the length⁷. Note that the length is part of the pointer value, not the pointee.

&dyn Tr is a trait object. For our purposes, it consists of a pointer to some data plus a pointer to a static vtable. *mut dyn Tr is technically a valid type⁸. Overall, trait objects aren’t really relevant to this post; they are rarely used this way in unsafe settings.

Weakening the Guarantees

Suppose we’re building some kind of data structure; in Rust, data structures will need some sprinkling of unsafe, since they will need to shovel around memory directly. Typically this is done using raw pointers, but it is preferable to use the least weakened pointer type to allow the compiler to perform whatever optimizations it can.

There are a number of orthogonal guarantees on &T and &mut T we might want to relax:

Non-nullness.
Well-aligned-ness.
Validity and initialized-ness of the pointee.
Allocated-ness of the pointee (implied by initialized-ness).
Global uniqueness of an &mut T.

Pointer to ZST

The last three of these properties are irrelevant for a zero-sized type. For example, we can generate infinite &mut () with no consequences:

fn unique_unit() -> &'static mut () {
  unsafe { &mut *(0x1 as *mut ()) } 
}Rust

We materialize a non-null, well-aligned pointer and reborrow it into a static reference; because there is no data to point to, none of the usual worries about the pointee itself apply. However, the pointer itself must still be non-null and well-aligned; 0x1 is not a valid address for an &[u32; 0], but 0x4 is⁹.

This also applies to empty slices; in fact, the compiler will happily promote the expression &mut [] to an arbitrary lifetime:

fn unique_empty<T>() -> &'static mut [T] {
  &mut []
}Rust

Null References

The most well-known manner of weakening is Option<&T>. Rust guarantees that this is ABI-compatible with a C pointer const T*, with Option::<&T>::None being a null pointer on the C side. This “null pointer optimization” applies to any type recursively containing at least one T&.

extern "C" {
  fn DoSomething(ptr: Option<&mut u32>);
}

fn do_something() {
  DoSomething(None);  // C will see a `NULL` as the argument.
}Rust

The same effect can be achieved for a pointer type using the NonNull<T> standard library type: Option<NonNull<T>> is identical to *mut T. This is most beneficial for types which would otherwise contain a raw pointer:

struct Vec<T> {
  ptr: NonNull<T>,
  len: usize,
  cap: usize,
}

assert_eq!(size_of::<Vec<u8>>(), size_of::<Option<Vec<u8>>>())Rust

Uninitialized Pointee

No matter what, a &T cannot point to uninitialized memory, since the compiler is free to assume it may read such references at any time with no consequences.

The following classic C pattern is verboten:

Foo foo;
initialize_foo(&foo);C

Rust doesn’t provide any particularly easy ways to allocate memory without initializing it, too, so this usually isn’t a problem. The MaybeUninit<T> type can be used for safely allocating memory without initializing it, via MaybeUninit::uninit().

This type acts as a sort of “optimization barrier” that prevents the compiler from assuming the pointee is initialized. &MaybeUninit<T> is a pointer to potentially uninitialized but definitely allocated memory. It has the same layout as &T, and Rust provides functions like assume_init_ref() for asserting that a &MaybeUninit<T> is definitely initialized. This assertion is similar in consequence to dereferencing a raw pointer.

&MaybeUninit<T> and &mut MaybeUninit<T> should almost be viewed as pointer types in their own right, since they can be converted to/from &T and &mut T under certain circumstances.

Because T is almost a “subtype” of MaybeUninit<T>, we are entitled¹⁰ to “forget” that the referent of a &T is initialized converting it to a &MaybeUninit<T>. This makes sense because &T is covariant¹¹ in &T. However, this is not true of &mut T, since it’s not covariant:

let mut x = 0;
let uninit: &mut MaybeUninit<i32> = unsafe { transmute(&mut x) };
*uninit = MaybeUninit::uninit();  // Oops, `x` is now uninit!Rust

These types are useful for talking to C++ without giving up too many guarantees. Option<&MaybeUninit<T>> is an almost perfect model of a const T*, under the assumption that most pointers in C++ are valid most of the time.

MaybeUninit<T> also finds use in working with raw blocks of memory, such as in a Vec-style growable slice:

struct SliceVec<'a, T> {
  // Backing memory. The first `len` elements of it are
  // known to be initialized, but no more than that.
  data: &'a mut [MaybeUninit<T>],
  len: usize,
}

impl SliceVec<'a, T> {
  fn push(&mut self, x: T) {
    assert!(self.len < data.len());

    self.data[self.len] = MaybeUninit::new(x);
    self.len += 1;
  }
}Rust

Aliased Pointee

&mut T can never alias any other pointer, but is also the mechanism by which we perform mutation. It can’t even alias with pointers that Rust can’t see; Rust assumes no one else can touch this memory. Thus, &mut T is not an appropriate analogue for T&.

Like with uninitialized memory, Rust provides a “barrier” wrapper type, UnsafeCell<T>. UnsafeCell<T> is the “interior mutability” primitive, which permits us to mutate through an &UnsafeCell<T> so long as concurrent reads and writes do not occur. We may even convert it to a &mut T when we’re sure we’re holding the only reference.

UnsafeCell<T> forms the basis of the Cell<T>, RefCell<T>, and Mutex<T> types, each of which performs a sort of “dynamic borrow-checking”:

Cell<T> only permits direct loads and stores.
RefCell<T> maintains a counter of references into it, which it uses to dynamically determine if a mutable reference would be unique.
Mutex<T>, which is like RefCell<T> but using concurrency primitives to maintain uniqueness.

Because of this, Rust must treat &UnsafeCell<T> as always aliasing, but because we can mutate through it, it is a much closer analogue to a C++ T&. However, because &T assumes the pointee is never mutated, it cannot coexist with a &UnsafeCell<T> to the same memory, if mutation is performed through it. The following is explicitly UB:

let mut x = 0;
let p = &x;

// This is ok; creating the reference to UnsafeCell does not
// immediately trigger UB.
let q = unsafe { transmute::<_, &UnsafeCell<i32>>(&x) };

// But writing to it does!
q.get().write(42);Rust

The Cell<T> type is useful for non-aliasing references to plain-old-data types, which tend to be Copy. It allows us to perform mutation without having to utter unsafe. For example, the correct type for a shared mutable buffer in Rust is &[Cell<u8>], which can be freely memcpy‘d, without worrying about aliasing¹².

This is most useful for sharing memory with another language, like C++, which cannot respect Rust’s aliasing rules.

Combined Barriers

To recap:

Non-nullness can be disabled with Option<&T>.
Initialized-ness can be disabled with &MaybeUninit<T>.
Uniqueness can be disabled with &UnsafeCell<T>.

There is no way to disable alignment and validity restrictions: references must always be aligned and have a valid lifetime attached. If these are unachievable, raw pointers are your only option.

We can combine these various “weakenings” to produce aligned, lifetime-bound references to data with different properties. For example:

&UnsafeCell<MaybeUninit<T>> is as close as we can get to a C++ T&.
Option<&UnsafeCell<T>> is a like a raw pointer, but to initialized memory.
Option<&mut MaybeUninit<T>> is like a raw pointer, but with alignment, aliasing, and lifetime requirements.
UnsafeCell<&[T]> permits us to mutate the pointer to the buffer and its length, but not the values it points to themselves.
UnsafeCell<&[UnsafeCell<T>]> lets us mutate both the buffer and its actual pointer/length.

Interestingly, there is no equivalent to a C++ raw pointer: there is no way to create a guaranteed-aligned pointer without a designated lifetime¹³.

Other Pointers

Rust and C++ have many other pointer types, such as smart pointers. However, in both languages, both are built in terms of these basic pointer types. Hopefully this article is a useful reference for anyone writing unsafe abstraction that wishes to avoid using raw pointers when possible.

Except in Go, which synthesizes vtables on the fly. Story for another day. ↩
It is, apparently, a little-known fact that constructing unaligned pointers, but then never dereferencing them, is still UB in C++. C++ could, for example, store information in the lower bits of such a pointer. The in-memory representation of a pointer is actually unspecified! ↩
This is useful when paired with the Rust <*const T>::read_unaligned() function, which can be compiled down to a normal load on architectures that do not have alignment restrictions, like x86_64 and aarch64. ↩
Another story for another time. ↩
Comparable to the C++20 std::span<T> type. ↩
usize is Rust’s machine word type, compare std::uintptr_t. ↩
The length of a *mut [T] can be accessed via the unstable <*mut [T]>::len() method. ↩
It is also not a type I have encountered enough to have much knowledge on. For example, I don’t actually know if the vtable half of a *mut dyn Tr must always be valid or not; I suspect the answer is “no”, but I couldn’t find a citation for this. ↩
Note that you cannot continue to use a reference to freed, zero-sized memory. This subtle distinction is called out in https://doc.rust-lang.org/std/ptr/index.html#safety. ↩
Currently, a transmute must be used to perform this operation, but I see no reason way this would permit us to perform an illegal mutation without uttering unsafe a second time. In particular, MaybeUninit::assume_init_read(), which could be used to perform illegal copies, is an unsafe function. ↩
A covariant type Cov<T> is once where, if T is a subtype of U, then Cov<T> is a subtype of Cov. This isn’t particularly noticeable in Rust, where the only subtyping relationships are &'a T subtypes &'b T when 'a outlives 'b, but is nonetheless important for advanced type design. ↩
Cell<T> does not provide synchronization; you still need locks to share it between threads. ↩
I have previously proposed a sort of 'unsafe or '! “lifetime” that is intended to be the lifetime of dangling references (a bit of an oxymoron). This would allow us to express this concept, but I need to flesh out the concept more. ↩

2021-04-26 • 3922 words • 43 minutes
• #rust • #c++ • #ffi

Move Constructors in Rust:
Is it possible?

I’ve been told I need to write this idea down – I figure this one’s a good enough excuse to start one of them programming blogs.

TL;DR You can move-constructors the Rust! It requires a few macros but isn’t much more outlandish than the async pinning state of the art. A prototype of this idea is implemented in my moveit crate.

The Interop Problem

Rust is the best contender for a C++ replacement; this is not even a question at this point¹. It’s a high-level language that provides users with appropriate controls over memory, while also being memory safe. Rust accomplishes by codifying C++ norms and customs around ownership and lifetimes into its type system.

Rust has an ok² FFI story for C:

void into_rust();

void into_c() {
  into_rust();
}C

extern "C" {
  fn into_c();
}

#[no_mangle]
extern "C" fn into_rust() {
  unsafe { into_c() }
}Rust

Calling into either of these functions from the Rust or C side will recurse infinitely across the FFI boundary. The extern "C" {} item on the Rust side declares C symbols, much like a function prototype in C would; the extern "C" fn is a Rust function with the C calling convention, and the #[no_mangle] annotation ensures that recurse_into_rust is the name that the linker sees for this function. The link works out, we run our program, and the stack overflows. All is well.

But this is C. We want to rewrite all of the world’s C++ in Rust, but unfortunately that’s going to take about a decade, so in the meantime new Rust must be able to call existing C++, and vise-versa. C++ has a much crazier ABI, and while Rust gives us the minimum of passing control to C, libraries like cxx need to provide a bridge on top of this for Rust and C++ to talk to each other.

Unfortunately, the C++ and Rust object models are, a priori, incompatible. In Rust, every object may be “moved” via memcpy, whereas in C++ this only holds for types satisfying std::is_trivially_moveable³. Some types require calling a move constructor, or may not be moveable at all!

Even more alarming, C++ types are permited to take the address of the location where they are being constructed: the this pointer is always accessible, allowing easy creation of self-referential types:

class Cyclic {
 public:
  Cyclic() {}
 
  // Ensure that copy and move construction respect the self-pointer
  // invariant:
  Cyclic(const Cyclic&) {
    new (this) Cyclic;
  }
  // snip: Analogous for other rule-of-five constructors.

 private:
  Cyclic* ptr_ = this;
};C++

The solution cxx and other FFI strategies take is to box up complex C++ objects across the FFI boundary; a std::unique_ptr<Cyclic> (perhaps reinterpreted as a Box on the Rust side) can be passed around without needing to call move constructors. The heap allocation is a performance regression that scares off potential Rust users, so it’s not a viable solution.

We can do better.

Notation and Terminology

“Move” is a very, very overloaded concept across Rust and C++, and many people have different names for what this means. So that we don’t get confused, we’ll establish some terminology to use throughout the rest of the article.

A destructive move is a Rust-style move, which has the following properties:

It does not create a new object; from the programmer’s perspective, the object has simply changed address.
The move is implemented by a call to memcpy; no user code is run.
The moved-from value becomes inaccessible and its destructor does not run.

A destructive move, in effect, is completely invisible to the user⁴, and the Rust compiler can emit as many or as few of them as it likes. We will refer to this as a “destructive move”, a “Rust move”, or a “blind, memcpy move”.

A copying move is a C++-style move, which has the following properties:

It creates a new, distinct object at a new memory location.
The move is implemented by calling a user-provided function that initializes the new object.
The moved-from value is still accessible but in an “unspecified but valid state”. Its destructor is run once the current scope ends.

A copying move is just a weird copy operation that mutates the copied-from object. C++ compilers may elide calls to the move constructor in certain situations, but calling it usually requires the programmer to explicitly ask for it. From a Rust perspective, this is as if Clone::clone() took &mut self as an argument. We will refer to this as a “copying move”, a “nondestructive move”, a “C++ move”, or, metonymically, as a “move constructor”.

Pinned Pointers

As part of introducing support for stackless coroutines⁵ (aka async/await), Rust had to provide some kind of supported for immobile types through pinned pointers.

The Pin type is a wraper around a pointer type, such as Pin<&mut i32> or Pin<Box<ComplexObject>>. Pin provides the following guarantee to unsafe code:

Given p: Pin for P: Deref, and P::Target: !Unpin, the pointee object *p will always be found at that address, and no other object will use that address until *p’s destructor is called.

In a way, Pin is a witness to a sort of critical section: once constructed, that memory is pinned until the destructor runs. The Pin documentation goes into deep detail about when and why this matters, and how unsafe code can take advantage of this guarantee to provide a safe interface.

The key benefit is that unsafe code can create self-references behind the pinned pointer, without worrying about them breaking when a destructive move occurs. C++ deals with this kind of type by allowing move/copy constructors to observe the new object’s address and fix up any self references as necessary.

Our progress so far: C++ types can be immoveable from Rust’s perspective. They need to be pinned in some memory location: either on the heap as a Pin<Box<T>>, or on the stack (somehow; keep reading). Our program is now to reconcile C++ move constructors with this standard library object that explicitly prevents moves. Easy, right;

Constructors

C++ constructors are a peculiar thing. Unlike Rust’s Foo::new()-style factories, or even constructors in dynamic languages like Java, C++ constructors are unique in that they construct a value in a specific location. This concept is best illustraced by the placement-new operation:

void MakeString(std::string* out) {
  new (out) std::string("mwahahaha");
}C++

Placement-new is one of those exotic C++ operations you only ever run into deep inside fancy library code. Unlike new, which triggers a trip into the allocator, placement-new simply calls the constructor of your type with this set to the argument in parentheses. This is the “most raw” way you can call a constructor: given a memory location and arguments, construct a new value.

In Rust, a method call foo.bar() is really syntax sugar for Foo::bar(foo). This is not the case in C++; a member function has an altogether different type, but some simple template metaprogramming can flatten it back into a regular old free function:

class Foo {
  int Bar(int x);
};

inline int FreeBar(Foo& foo, int x) {
  return foo.Bar();
}

Foo foo;
FreeBar(foo, 5);C++

Placement-new lets us do the analogous thing for a constructor:

class Foo {
  Foo(int x);
}

inline void FreeFoo(Foo& foo, int x) {
  new (&foo) Foo(x);
}

Foo* foo = AllocateSomehow();
FreeFoo(*foo, 5);C++

We can lift this “flattening” of a specific constructor into Rust, using the existing vocabulary for pushing fixed-address memory around:

unsafe trait Ctor {
  type Output;
  unsafe fn ctor(self, dest: Pin<&mut MaybeUninit<Self::Output>>);
}Rust

A Ctor is a constructing closure. A Ctor contains the necessary information for constructing a value of type Output which will live at the location *dest. The Ctor::ctor() function performs in-place construction, making *dest become initialized.

A Ctor is not the constructor itself; rather, it is more like a Future or an Iterator which contain the necessary captured values to perform the operation. A Rust type that is constructed using a Ctor would have functions like this:

impl MyType {
  fn new() -> impl Ctor<Output = Self>;
}Rust

The unsafe markers serve distinct purposes:

It is an unsafe trait, because *dest must be initialized when ctor() returns.
It has an unsafe fn, because, in order to respect the Pin drop guarantees, *dest must either be freshly allocated or have had its destructor run just prior.

Since we are constructing into Pinned memory, the Ctor implementation can use the address of *dest as part of the construction procedure and assume that that pointer will not suddenly dangle because of a move. This recovers our C++ behavior of “this-stability”.

Unfortunately, Ctor is covered in unsafe, and doesn’t even allocate storage for us. Luckily, it’s not too hard to build our own safe std::make_unique:

fn make_box<C: Ctor>(c: C) -> Pin<Box<Ctor::Output>> {
  unsafe {
    type T = Ctor::Output;
    // First, obtain uninitialized memory on the heap.
    let uninit = std::alloc::alloc(Layout::new<T>());
    
    // Then, pin this memory as a MaybeUninit. This memory
    // isn't going anywhere, and MaybeUninit's entire purpose
    // in life is being magicked into existence like this,
    // so this is safe.
    let pinned = Pin::new_unchecked(
      &mut *uninit.cast::<MaybeUninit<T>>()
    );
    // Now, perform placement-`new`, potentially FFI'ing into
    // C++.
    c.ctor(pinned);

    // Because Ctor guarantees it, `uninit` now points to a
    // valid `T`. We can safely stick this in a `Box`. However,
    // the `Box` must be pinned, since we pinned `uninit`
    // earlier.
    Pin::new_unchecked(Box::from_raw(uninit.cast::<T>()))
  }
}Rust

Thus, std::make_unique<MyType>() in C++ becomes make_box(MyType::new()) in Rust. Ctor::ctor gives us a bridging point to call the C++ constructor from Rust, in a context where its expectations are respected. For example, we might write the following binding code:

class Foo {
  Foo(int x);
}

// Give the constructor an explicit C ABI, using
// placement-`new` to perform the "rawest" construction
// possible.
extern "C" FooCtor(Foo* thiz, int x) {
  new (thiz) Foo(x);
}foo.cc

struct Foo { ... }
impl Foo {
  fn new(x: i32) -> impl Ctor<Output = Self> {
    unsafe {
      // Declare the placement-new bridge.
      extern "C" {
        fn FooCtor(this: *mut Foo, x: i32);
      }

      // Make a new `Ctor` wrapping a "real" closure.
      ctor::from_placement_fn(move |dest| {
        // Call back into C++.
        FooCtor(dest.as_mut_ptr(), x)
      })
    }
  }
}foo_bindings.rs

use foo_bindings::Foo;

// Lo, behold! A C++ type on the Rust heap!
let foo = make_box(Foo::new(5));foo_user.rs

But… we’re still on the heap, so we seem to have made no progress. We could have just called std::make_unique on the C++ side and shunted it over to Rust. In particular, this is what cxx resorts to for complex types.

Interlude I: Pinning on the Stack

Creating pinned pointers directly requires a sprinkling of unsafe. Box::pin() allows us to safely create a Pin<Box<T>>, since we know it will never move, much like the make_box() example above. However, it’s not possible to create a Pin<&mut T> to not-necessarilly-Unpin data as easilly:

let mut data = 42;
let ptr = &mut data;
let pinned = unsafe {
  // Reborrow `ptr` to create a pointer with a shorter lifetime.
  Pin::new_unchecked(&mut *ptr)
};

// Once `pinned` goes out of scope, we can move out of `*ptr`!
let moved = *ptr;Rust

The unsafe block is necessary because of exactly this situation: &mut T does not own its pointee, and a given mutable reference might not be the “oldest” mutable reference there is. The following is a safe usage of this constructor:

let mut data = 42;
// Intentionally shadow `data` so that no longer-lived reference than
// the pinned one can be created.
let data = unsafe {
  Pin::new_unchecked(&mut *data)
};Rust

This is such a common pattern in futures code that many futures libraries provide a macro for performing this kind of pinning on behalf of the user, such as tokio::pin!().

With this in hand, we can actually call a constructor on a stack-pinned value:

let val = MaybeUninit::uninit();
pin!(val);
unsafe { Foo::new(args).ctor(val); }
let val = unsafe {
  val.map_unchecked_mut(|x| &mut *x.as_mut_ptr())
};Rust

Unfortunately, we still need to utter a little bit more unsafe, but because of Ctor’s guarantees, this is all perfectly safe; the compiler just can’t guarantee it on its own. The natural thing to do is to wrap it up in a macro much like pin!, which we’ll call emplace!:

emplace!(let val = Foo::new(args));Rust

This is truly analogous to C++ stack initialization, such as Foo val(args);, although the type of val is Pin<&mut Foo>, whereas in C++ it would merely bee Foo&. This isn’t much of an obstacle, and just means that Foo’s API on the Rust side needs to use Pin<&mut Self> for its methods.

The Return Value Optimization

Now we go to build our Foo-returning function and are immediately hit with a roadblock:

fn make_foo() -> Pin<&'wat mut Foo> {
  emplace!(let val = Foo::new(args));
  foo
}Rust

What is the lifetime 'wat? This is just returning a pointer to the current stack frame, which is no good. In C++ (ignoring fussy defails about move semantics), NRVO would kick in and val would be constructed “in the return slot”:

Foo MakeFoo() {
  Foo val(args);
  return val;
}C++

Return value optimization (and the related named return value optimization) allow C++ to elide copies when constructing return values. Instead of constructing val on MakeFoo’s stack and then copying it into the ABI’s return location (be that a register like rax or somewhere in the caller’s stack frame), the value is constructed directly in that location, skipping the copy. Rust itself performs some limited RVO, though its style of move semantics makes this a bit less visible.

Rust does not give us a good way of accessing the return slot directly, for good reason: it need not have an address! Rust returns all types that look roughly like a single integer in a register (on modern ABIs), and registers don’t have addresses. C++ ABIs typically solve this by making types which are “sufficiently complicated” (usually when they are not trivially moveable) get passed on the stack unconditionally⁶.

Since we can’t get at the return slot, we’ll make our own! We just need to pass the pinned MaybeUninit<T> memory that we would pass into Ctor::ctor as a “fake return slot”:

fn make_foo(return_slot: Pin<&mut MaybeUninit<Foo>>) -> Pin<&mut Foo> {
  unsafe {
    Foo::new(args).ctor(return_slot);
    val.map_unchecked_mut(|x| &mut *x.as_mut_ptr());
  }
}Rust

This is such a common operation that it makes sense to replace Pin<&mut MaybeUninit<T>> with a specific type, Slot<'a, T>:

struct Slot<'a, T>(Pin<&'a mut MaybeUninit<T>>);
impl<'a, T> Slot<'a, T> {
  fn emplace<C: Ctor>(c: C) -> Pin<&'a mut T> {
    unsafe {
      c.ctor(self.0);
      val.map_unchecked_mut(|x| &mut *x.as_mut_ptr());
    }
  }
}

fn make_foo(return_slot: Slot<Foo>) -> Pin<&mut Foo> {
  return_slot.emplace(Foo::new(args))
}Rust

We can provide another macro, slot!(), which reserves pinned space on the stack much like emplace!() does, but without the construction step. Calling make_foo only requires minimal ceremony and no user-level unsafe.

slot!(foo);
let foo = make_foo(foo);Rust

The slot!() macro is almost identical to tokio::pin!(), except that it doesn’t initialize the stack space with an existing value.

Towards Move Constructors: Copy Constructors

Move constructors involve rvalue references, which Rust has no meaningful equivalent for, so we’ll attack the easier version: copy constructors.

A copy constructor is C++’s Clone equivalent, but, like all constructors, is allowed to inspect the address of *this. Its sole argument is a const T&, which has a direct Rust analogue: a &T. Let’s write up a trait that captures this operation:

unsafe trait CopyCtor {
  unsafe fn copy_ctor(src: &Self, dest: Pin<&mut MaybeUninit<Self>>);
}Rust

Unlike Ctor, we would implement CopyCtor on the type with the copy constructor, bridging it to C++ as before. We can then define a helper that builds a Ctor for us:

fn copy<T: CopyCtor>(val: &T) -> impl Ctor<Output = T> {
  unsafe {
    ctor::from_placement_fn(move |dest| {
      T::copy_ctor(val, dest)
    })
  }
}

emplace!(let y = copy(x));     // Calls the copy constructor.
let boxed = make_box(copy(y)); // Copy onto the heap.Rust

We can could (modulo orphan rules) even implement CopyCtor for Rust types that implement Clone by cloneing into the destination.

It should be straightforward to make a version for move construction… but, what’s a T&& in Rust?

Interlude II: Unique Ownership

Box<T> is interesting, because unlike &T, it is possible to move out of a Box<T>, since the compiler treats it somewhat magically. There has long been a desire to introduce a DerefMove trait captures this behavior, but the difficulty is the signature: if deref returns &T, and deref_mut returns &mut T, should deref_move return T? Or something… more exotic? You might not want to dump the value onto the stack; you want *x = *y to not trigger an expensive intermediate copy, when *y: [u8; BIG].

Usually, the “something more exotic” is a &move T or &own T reference that “owns” the pointee, similar to how a T&& in C++ is taken to mean that the caller wishes to perform ownership transfer.

Exotic language features aside, we’d like to be able to implement something like DerefMove for move constructors, since this is the natural analogue of T&&. To move out of storage, we need a smart pointer to provide us with three things:

It must actually be a smart pointer (duh).
It must be possible to destroy the storage without running the destructor of the pointee (in Rust, unlike in C++, destructors do not run on moved-from objects).
It must be the unique owner of the pointee. Formally, if, when p goes out of scope, no thread can access *p, then p is the unique owner.

Box<T> trivially satisfies all three of these: it’s a smart pointer, we can destroy the storage using std::alloc::dealloc, and it satisfies the unique ownership property.

&mut T fails both tests: we don’t know how to destory the storage (this is one of the difficulties with a theoretical &move T) and it is not the unique owner: some &mut T might outlive it.

Interestingly, Arc<T> only fails the unique ownership test, and it can pass it dynamically, if we observe the strong and weak counts to both be 1. This is also true for Rc<T>.

Most importantly, however, is that if Pin, then it is sufficient that P satisfy these conditions. After all, a Pin<Box> uniquely owns its contents, even if they can’t be moved.

It’s useful to introduce some traits that record these requirements:

unsafe trait OuterDrop {
  unsafe fn outer_drop(this: *mut Self);
}

unsafe trait DerefMove: DerefMut + OuterDrop {}Rust

OuterDrop is simply the “outer storage destruction” operation. Naturally, it is only safe to perform this operation when the pointee’s own destructor has been separately dropped (there are some subtleties around leaking memory here, but in general it’s not a good idea to destroy storage without destroying the pointee, too).

DerefMove⁷ is the third requirement, which the compiler cannot check (there’s a lot of these, huh?). Any type which implements DerefMove can be moved out of by carefully dismantling the pointer:

fn move_out_of<P>(mut p: P) -> P::Target
where
  P: DerefMove,
  P::Target: Sized + Unpin,
{
  unsafe {
    // Copy the pointee out of `p` (all Rust moves are
    // trivial copies). We need `Unpin` for this to be safe.
    let val = (&mut *p as *mut P::Target).read();
    
    // Destroy `p`'s storage without running the pointee's
    // destructor.
    let ptr = &mut p as *mut P;
    // Make sure to suppress the actual "complete" destructor of
    // `p`.
    std::mem::forget(p);
    // Actually destroy the storage.
    P::outer_drop(ptr);
    
    // Return the moved pointee, which will be trivially NRVO'ed.
    val
  }
}Rust

Much like pinning, we need to lift this capability to the stack somehow. &mut T won’t cut it here.

Owning the Stack

We can already speak of uninitialized but uniquely-owned stack memory with Slot, but Slot::emplace() returns a (pinned) &mut T, which cannot be DerefMove. This operation actually loses the uniqueness information of Slot, so instead we make emplace() return a StackBox.

A StackBox<'a, T> is like a Box<T> that’s bound to a stack frame, using a Slot<'a, T> as underlying storage. Although it’s just a &mut T on the inside, it augments it with the uniqueness invariant above. In particular, StackBox::drop() is entitled to call the destructor of its pointee in-place.

To the surprise of no one who has read this far, StackBox: DerefMove. The implementation for StackBox::outer_drop() is a no-op, since the calling convention takes care of destroying stack frames.

It makes sense that, since Slot::emplace() returns a Pin<StackBox<T>>, so should emplace!().

(There’s a crate called stackbox that provides similar StackBox/Slot types, although it is implemented slightly differently and does not provide the pinning guarantees we need.)

Move Constructors

This is it. The moment we’ve all be waiting for. Behold, the definition of a move constructor in Rust:

unsafe trait MoveCtor {
  unsafe fn move_ctor(
    src: &mut Self,
    dest: Pin<&mut MaybeUninit<Self>>
  );
}Rust

Wait, that’s it?

There’s no such thing as &move Self, so, much like drop(), we have to use a plain ol’ &mut instead. Like Drop, and like CopyCtor, this function is not called directly by users; instead, we provide an adaptor that takes in a MoveCtor and spits out a Ctor.

fn mov<P>(mut ptr: P) -> impl Ctor<Output = P::Target>
where
  P: DerefMove,
  P::Target: MoveCtor,
{
  unsafe {
    from_placement_fn(move |dest| {
      MoveCtor::move_ctor(&mut *ptr, dest);

      // Destroy `p`'s storage without running the pointee's
      // destructor.
      let inner = &mut ptr as *mut P;
      mem::forget(ptr);
      P::outer_drop(inner);
    })
  }
}Rust

Notice that we no longer require that P::Target: Unpin, since the ptr::read() call from move_out_of() is now gone. Instead, we need to make a specific requirement of MoveCtor that I will explain shortly. However, we can now freely call the move constructor just like any other Ctor:

emplace!(let y = mov(x));  // Calls the move constructor.
let boxed = make_box(mov(y));  // Move onto the heap.Rust

The Langauge-Lawyering Part

(If you don’t care for language-lawyering, you can skip this part.)

Ok. We need to justify the loss of the P::Target: Unpin bound on mov(), which seems almost like a contradiction: Pin guarantees its pointee won’t be moved, but isn’t the whole point of MoveCtor to perform moves?

At the begining of this article, I called out the difference between destructive Rust move and copying C++ moves. The reason that the above isn’t a contradiction is that the occurences of “move” in that sentence refer to these different senses of “move”.

The specific thing that Pin is protecting unsafe code from is whatever state is behind the pointer being blindly memcpy moved to another location, leaving any self-references in the new location dangling. However, by invoking a C++-style move constructor, the data never “moves” in the Rust sense; it is merely copied in a way that carefully preserves any address-dependent state.

We need to ensure two things:

Implementors of MoveCtor for their own type must ensure that their type does not rely on any pinning guarantees that the move constructor cannot appropriately “fix up”.
No generic code can hold onto a reference to moved-from state, because that way they could witness whatever messed-up post-destruction state the move constructor leaves it in.

The first of these is passed onto the implementor as an unsafe impl requirement. Designing an !Unpin type by hand is difficult, and auto-generated C++ bindings using this model would hopefully inherit move-correctness from the C++ code itself.

The second is more subtle. In the C++ model, the moved-from value is mutated to mark it as “moved from”, which usually just inhibits the destructor. C++ believes all destructors are run for all objects. For example, std::unique_ptr sets the moved-from value to nullptr, so that the destructor can be run at the end of scope and do nothing. Compare with the Rust model, where the compiler inhibits the destructor automatically through the use of drop flags.

In order to support move-constructing both Rust and C++ typed through a uniform interface, move_ctor is a fused destructor/copy operation. In the Rust case, no “destructor” is run, but in the C++ case we are required to run a destructor. Although this changes the semantic ordering of destruction compared to the equivalent C++ program, in practice, no one depends on moved-from objects actually being destroyed (that I know of).

After move_ctor is called, src must be treated as if it had just been destroyed. This means that the storage for src must be disposed of immediately, without running any destructors for the pointed-to value. Thus, no one must be able to witness the messed-up pinned state, which is why mov() requires P: DerefMove.

Thus, no code currently observing Pin invariants in unsafe code will notice anything untoward going on. No destructive moves happen, and no moved-from state is able to hang around.

I’m pretty confident this argument is correct, but I’d appreciate some confirmation. In particular, someone involved in the UCG WG or the Async WG will have to point out if there are any holes.

The Upshot

In the end, we don’t just have a move constructors story, but a story for all kinds of construction, C++-style. Not only that, but we have almost natural syntax:

emplace! {
  let x = Foo::new();
  let y = ctor::mov(x);
  let z = ctor::copy(y);
}

// The make_box() example above can be added to `Box` through
// an extension trait.
let foo = Box::emplace(Foo::new());Rust

As far as I can tell, having some kind of “magic” around stack emplacement is unavoidable; this is a place where the language is unlikely to give us enough flexibility any time soon, though this concept of constructors is the first step towards such a thing.

We can call into C++ from Rust without any heap allocations at all (though maybe wasting an instruction or two shunting pointers across registers for our not-RVO):

/// Generated Rust type for bridging to C++, like you might get from `cxx`.
struct Foo { ... }
impl Foo {
  pub fn new(x: i32) -> impl Ctor { ... }
  pub fn set_x(self: Pin<&mut Self>, x: i32) { ... }
}

fn make_foo(out: Slot<Foo>) -> Pin<StackBox<Foo>> {
  let mut foo = out.emplace(Foo::new(42));
  foo.as_mut().set_x(5);
  foo
}Rust

For when dealing with slots explicitly is too much work, types can just be ctor::moved into a Box with Box::emplace.

I’ve implemented everything discussed in this post in a crate, moveit. Contributions and corrections are welcome.

A thanks to Manish Goregaokar, Alyssa Haroldson, and Adrian Taylor for feedback on early versions of this design.

Future Work

This is only the beginning: much work needs to be done in type design to have a good story for bridging move-only types from C++ to Rust, preferably automatically. Ctors are merely the theoretical foundation for building a more ergonomic FFI; usage patterns will likely determine where to go from here.

Open questions such as “how to containers” remain. Much like C++03’s std::auto_ptr, we have no hope of putting a StackBox<T> into a Vec<T>, and we’ll need to design a Vec variant that knows to call move constructors when resizing and copy constructors when cloning. There’s also no support for custom move/copy assignment beyond the trivial new (this) auto(that) pattern, and it’s unclear whether that’s useful. Do we want to port a constructor-friendly HashMap (Rust’s swisstable implementation)? Do we want to come up with macros that make dealing with Slot out-params less cumbersome?

Personally, I’m excited. This feels like a real breakthrough in one of the biggest questions for true Rust/C++ interop, and I’d like to see what people wind up building on top of it.

This isn’t exactly a universal opinion glances at Swift but it is if you write kernel code like me. ↩
You can’t just have rustc consume a .h and spit out bindings, like e.g. Go can, but it’s better than the disaster that is JNI. ↩
Some WG21 folks have tried to introduce a weaker type-trait, std::is_trivially_relocatable, which is a weakening of trivally moveable that permits a Rust-style destructive move. The libc++ implementation of most STL types, like std::unique_ptr, admit this trait. ↩
A lot of unsafe Rust code assumes this is the only kind of move. For example, mem::swap() is implemented using memcpy. This is unlike the situation in C++, where types will often provide custom std::swap() implementations that preserve type invariants. ↩
Because Future objects collapse their stack state into themselves when yielding, they may have pointers into themselves (as a stack typically does). Thus, Futures need to be guaranteed to never move once they begin executing, since Rust has no move constructors and no way to fix up the self-pointers. ↩
Among other things, this means that std::unique_ptrs are passed on the stack, not in a register, which is very wasteful! Rust’s Box does not have this issue. ↩
Rust has attempted to add something like DerefMove many times. What’s described in this post is nowhere near as powerful as a “real” DerefMove would be, since such a thing would also allow moving into a memory location. ↩

< Prev • Next >