A Gentle Introduction to LLVM IR

The other day, I saw this tweet. In it, Andrew Gallant argues that reaching for LLVM IR, instead of assembly, is a useful tool for someone working on performance. Unfortunately, learning material on LLVM is usually aimed at compiler engineers, not generalist working programmers.

Now, I’m a compiler engineer, so my answer is of course you should know your optimizer’s IR. But I do think there’s a legitimate reason to be able to read it, in the same way that being able to read assembly to understand what your processor is doing is a powerful tool. I wrote an introduction to assembly over a year ago (still have to finish the followups… 💀), which I recommend reading first.

Learning LLVM IR is similar, but it helps you understand what your compiler is doing to create highly optimized code. LLVM IR is very popular, and as such well-documented and reasonably well-specified, to the point that we can just treat it as a slightly weird programming language.

In this article, I want to dig into what LLVM IR is and how to read it.

What’s LLVM IR?

“LLVM” is an umbrella name for a number of software components that can be used to build compilers. If you write performance-critical code, you’ve probably heard of it.

Its flagship product is Clang, a high-end C/C++/Objective-C compiler. Clang follows the orthodox compiler architecture: a frontend that parses source code into an AST and lowers it into an intermediate representation, an “IR”; an optimizer (or “middle-end”) that transforms IR into better IR, and a backend that converts IR into machine code for a particular platform.

                               optimizer (opt)
                                    ___
                                   |   v
          .c file  -->  AST  -->  LLVM IR  -->  assembly
                    ^         ^             ^
                 parser    lowering    backend (llc)

         \____________________/  \_____________________/
             Clang Frontend                LLVM

LLVM often also refers to just the optimizer and backend parts of Clang; this is can be thought of as a compiler for the “LLVM language” or “LLVM assembly”. Clang, and other language frontends like Rust, essentially compile to LLVM IR, which LLVM then compiles to machine code.

LLVM IR is well documented and… somewhat stable, which makes it a very good compilation target, since language implementers can re-use the thousands of engineer hours poured into LLVM already. The source of truth for “what is LLVM IR?” is the LangRef.

LLVM IR is also binary format (sometimes called “bitcode”), although we will be working exclusively with its text format (which uses the .ll extension).

LLVM-targeting compilers will have debugging flags to make them emit IR instead of their final output. For Clang, this is e.g. clang++ -S -emit-llvm foo.cc, while for Rust this is rustc --emit=llvm-ir foo.rs. Godbolt will also respect these options and correctly display LLVM IR output.

Back to Basic Blocks

LLVM IR can be quite intimidating to read, since it contains much more ancillary information than an assembly dump. Consider this function:

pub fn square(x: i32) -> i32 {
  x * x
}

If you click on the “Godbolt” widget, it will take you to a Godbolt that lowers it to LLVM IR. Most of that code is just metadata, but it’s really intimidating!

Starting from compiler output will have a steep difficulty curve, because we have to face the full complexity of LLVM IR. For Rust, this will likely mean encountering exception-handling, which is how panics are implemented, and function attributes that forward Rust’s guarantees (e.g. non-null pointers) to LLVM.

Instead, we’ll start by introducing the basic syntax of LLVM IR, and then we’ll tackle reading compiler output.

A Trivial Function

The meat of LLVM IR is function definitions, introduced with a define. There is also declare, which has exactly the same purpose as a function without a body in C: it brings an external symbol into scope.

For example, the following function takes no arguments and returns immediately:

define void @do_nothing() {
  ret void
}
LLVM IR

The return type of the function (void) immediately follows the define keyword; the name of the function starts with an @, which introduces us to the concept of sigils: every user-defined symbol starts with a sigil, indicating what kind of symbol it is. @ is used for global and functions: things you can take the address of (when used as a value, they are always ptr-typed).

The body of a function resembles assembly: a list of labels and instructions. Unlike ordinary assembly, however, there are significant restrictions on the structure of these instructions.

In this case, there is only one instruction: a void-typed return. Unlike most assembly languages, LLVM IR is strongly typed, and requires explicit type annotations almost everywhere.

Here is another trivial function.

define void @do_not_call() {
  unreachable
}
LLVM IR

This function will trigger undefined behavior upon being called: the unreachable instruction represents a codepath that the compiler can assume is never executed; this is unlike e.g. the unimplemented ud2 instruction in x86, which is guaranteed to issue a fault.

This is an important distinction between LLVM IR and an assembly language: some operations are explicitly left undefined to leave room for potential optimizations. For example, LLVM can reason that, because @do_not_call immediately triggers undefined behavior, all calls to @do_not_call are also unreachable (and propagate unreachability from there).

Purely Scalar Code

Let’s start with basic functions that only operate on integers. Consider the following function, that squares a 32-bit integer:

define i32 @square(i32 %x) {
  %1 = mul i32 %x, %x
  ret i32 %1
}
LLVM IR

Now our function takes arguments and has multiple instructions.

The argument is specified as i32 %x. Names a % sigil are sort of like local variables, but with some restrictions that make them more optimization-friendly; as we’ll see later, they’re not really “variable” at all. LLVM sometimes calls them registers; in a sense, LLVM IR is assembly for an abstract machine with an infinite number of registers. I’ll be calling %-prefixed names “registers” throughout this article.

i32 is a primitive integer types. All integer types in LLVM are of the form iN, for any N (even non-multiples of eight). There are no signed or unsigned types; instead, instructions that care about signedness will specify which semantic they use.

The first instruction is a mul i32, which multiples the two i32 operands together, and returns a value; we assign this to the new register %11. The next instruction returns this value.

The other arithmetic operations have the names you expect: add, sub, and, or, xor, shl (shift left). There are two division and remainder instructions, signed (sdiv, srem) and unsigned (udiv, urem). There two shift right instructions, again signed (ashr) and unsigned (lshr).

Exercise for the reader: why are /, %, and >> the only operations with signed and unsigned versions?

We can also convert from one integer type to another using trunc, zext, and sext, which truncate, zero-extend, and sign-extend, respectively (sext and zext are another signed/unsigned pair). For example, if we wanted the square function to never overflow, we could write

define i64 @square(i32 %x) {
  %1 = sext i32 %x to i64
  %2 = mul i64 %1, %1
  ret i64 %2
}
LLVM IR

Here, we cast %x to i64 by sign-extension (since we’ve decided we’re squaring signed integers) and then square the result. trunc and zext both have the same syntax as sext.

“I’ll Be Back”

Of course, interesting functions have control flow. Suppose we want a safe division function: division by zero is UB, so we need to handle it explicitly. Perhaps something like this:

fn safe_div(n: u64, d: u64) -> u64 {
  if d == 0 { return u64::MAX; }
  n / d
}
Rust

We could try doing this using select, LLVM’s “ternary” operation.

define i64 @safe_div(i64 %n, i64 %d) {
  %1 = icmp eq i64 %d, 0
  %2 = udiv i64 %n, %d
  %3 = select i1 %1, i64 -1, i64 %2
  ret i64 %3
}
LLVM IR

However, this has a problem: division by zero is UB2, and select is not short-circuiting: its semantics are closer to that of cmov in x86.

To compile this correctly, need to use the br instruction, which represents a general branch operation3. In C terms, a br i1 %cond, label %a, label %b is equivalent to if (cond) goto a; else goto b;.

This is how we might write that:

define i64 @safe_div(i64 %n, i64 %d) {
  %1 = icmp eq i64 %d, 0
  br i1 %1, label %iszero, label %nonzero

iszero:
  ret i64 -1

nonzero:
  %2 = udiv i64 %n, %d
  ret i64 %2
}
LLVM IR

Now our function has labels, which are used by the br instruction as jump targets.

In the first block, we do the d == 0 check, implemented by an icmp eq instruction. This returns an i1 (the type LLVM uses for booleans). We then pass the result into a br instruction, which jumps to the first label if it’s zero, otherwise to the second if it isn’t.

The second block is the early-return; it returns the “sentinel” value; the third block is self-explanatory.

Each of these blocks is a “basic block”: a sequence of non-control flow operations, plus an instruction that moves control flow away from the block. These blocks form the control flow graph (CFG) of the function.

There are a few other “block terminator” instructions. The one-argument form of br takes a single label, and is a simple unconditional goto. There’s also switch, which is similar to a C switch:

switch i32 %value, label %default [
  i32 0, label %if_zero
  i32 1, label %if_one,
  ; etc
]
LLVM IR

The type of the switch must be an integer type. Although you could represent this operation with a chain of brs, a separate switch instruction makes it easier for LLVM to generate jump tables.

unreachable, which we saw before, is a special terminator that does not trigger control flow per se, but which can terminate a block because reaching it is undefined behavior; it is equivalent to e.g. std::unreachable() in C++.

LLVM Deleted My Code!

The unreachable instruction provides a good example of why LLVM uses a basic block CFG: a naive dead code elimination (DCE) optimization pass can be implemented as follows:

  1. Fill a set with every block that ends in unreachable.
  2. For every block, if its terminator references a block in the unreachable set, delete that label from the terminator. For example, if we have br i1 %c, label %a, label %b, and the unreachable set contains %a, we can replace this with a br label %b.
  3. If every outgoing edge from a block is deleted in (2), replace the terminator with unreachable.
  4. Delete all blocks in the unreachable set.
  5. Repeat from (1) as many times as desired.

Intuitively, unreachables bubble upwards in the CFG, dissolving parts of the CFG among them. Other passes can generate unreachables to represent UB: interplay between this and DCE results in the “the compiler will delete your code” outcome from UB.

The actual DCE pass is much more complicated, since function calls make it harder to decide if a block is “pure” and thus transparently deletable.

But, what if we want to implement something more complicated, like a / b + 1? This expression needs the intermediate result, so we can’t use two return statements as before.

Working around this is not so straightforward: if we try to assign the same register in different blocks, the IR verifier will complain. This brings us to the concept of static single assignment.

Phony! Phony!

LLVM IR is a static single assignment form (SSA) IR. LLVM was actually started at the turn of the century to create a modern SSA optimizer as an academic project. These days, SSA is extremely fashionable for optimizing imperative code.

SSA form means that every register is assigned by at most one instruction per function. Different executions of the same block in the same function may produce different values for particular registers, but we cannot mutate already-assigned registers.

In other words:

  1. Every register is guaranteed to be initialized by a single expression.
  2. Every register depends only on the values of registers assigned before its definition.

This has many useful properties for writing optimizations: for example, within a basic block, every use of a particular register %x always refers to the same value, which makes optimizations like global value numbering and constant-folding much simpler to write, since the state of a register throughout a block doesn’t need to be tracked separately.

In SSA, we reinterpret mutation as many versions of a single variable. Thus, we might lower x += y as

%x.1 = add i32 %x.0, %y.0
LLVM IR

Here, we’ve used a var.n convention to indicate which version of a variable a specific register represents (LLVM does not enforce any naming conventions).

However, when loops enter the mix, it’s not clear how to manage versions. The number of registers in a function is static, but the number of loop iterations is dynamic.

Concretely, how do we implement this function?

fn pow(x: u32, y: u32) -> u32 {
  let mut r = 1;
  for i in 0..y {
    r *= x;
  }
  r
}
Rust

We could try something like this:

define i32 @pow(i32 %x, i32 %y) {
  br label %loop

loop:
  %r = add i32 %r, %x  ; ERROR: Recursive definition.
  %i = add i32 %i, 1   ; ERROR: Recursive definition.
  %done = icmp eq i32 %i, %y
  br i1 %done, label %exit, label %loop

exit:
  ret i32 %r
}
LLVM IR

But there’s a problem! What are the original definitions of %r and %i? The IR verifier will complain that these registers depend directly on themselves, which violates SSA form. What’s the “right” way to implement this function?

One option is to ask LLVM! We’ll implement the function poorly, and let the optimizer clean it up for us.

First, let’s write the function using memory operations, like loads and stores, to implement mutation. We can use the alloca instruction to create statically-sized stack slots; these instructions return a ptr[^clang-codegen].

Clang Makes a Mess, LLVM Cleans It Up

Incidentally, this is how Clang and Rust both generate LLVM IR: stack variables are turned into allocas and manipulated through loads and stores; temporaries are mostly turned into %regss, but the compiler will sometimes emit extra allocas to avoid thinking too hard about needing to create phi instructions.

This is pretty convenient, because it avoids needing to think very hard about SSA form outside of LLVM, and LLVM can trivially eliminate unnecessary allocas. The code I wrote for the codegen of @pow is very similar to what Rust would send to LLVM (although because we used an iterator, there’s a lot of extra junk Rust emits that LLVM has to work to eliminate).

define i32 @pow(i32 %x, i32 %y) {
  ; Create slots for r and the index, and initialize them.
  ; This is equivalent to something like
  ;   int i = 0, r = 1;
  ; in C.
  %r = alloca i32
  %i = alloca i32
  store i32 1, ptr %r
  store i32 0, ptr %i
  br label %loop_start

loop_start:
  ; Load the index and check if it equals y.
  %i.check = load i32, ptr %i
  %done = icmp eq i32 %i.check, %y
  br i1 %done, label %exit, label %loop

loop:
  ; r *= x
  %r.old = load i32, ptr %r
  %r.new = mul i32 %r.old, %x
  store i32 %r.new, ptr %r

  ; i += 1
  %i.old = load i32, ptr %i
  %i.new = add i32 %i.old, 1
  store i32 %i.new, ptr %i

  br label %loop_start

exit:
  %r.ret = load i32, ptr %r
  ret i32 %r.ret
}
LLVM IR

Next, we can pass this into the LLVM optimizer. The command opt, which is part of the LLVM distribution, runs specific optimizer passes on the IR. In our case, we want opt -p mem2reg, which runs a single “memory to register” pass. We can also just run opt --O2 or similar to get similar4 optimizations to the ones clang -O2 runs.

This is the result.

; After running through `opt -p mem2reg`
define i32 @pow(i32 %x, i32 %y) {
start:
  br label %loop_start

loop_start:
  %i.0 = phi i32 [0, %start], [%i.new, %loop]
  %r.0 = phi i32 [1, %start], [%r.new, %loop]
  %done = icmp eq i32 %i.0, %y
  br i1 %done, label %exit, label %loop

loop:
  %r.new = mul i32 %r.0, %x
  %i.new = add i32 %i.0, 1
  br label %loop_start

exit:
  ret i32 %r.0
}
LLVM IR

The allocas are gone, but now we’re faced with a new instruction: phi. “φ node” is jargon from the original SSA paper; the greek letter φ means “phoney”. These instructions select a value from a list based on which basic block we jumped to the block from.

For example, phi i32 [0, %start], [%i.new, %loop] says “this value should be 0 if we came from the start block; otherwise %i.new if it came from %loop”.

Unlike all other instructions, phi can refer to values that are not defined in all blocks that dominate the current block. This lets us have a dynamic number of versions of a variable! Here’s what that looks like in a dynamic execution context.

A block %a is said to dominate a block %b if each of its predecessors is either %a or a block dominated by %a. In other words, every path from the first block to %b passes through %a. In general instructions can only refer to values defined in previous instructions in the current block or values from blocks that dominate it.

  1. %start directly jumps into %loop_start. The first block cannot be a jump target, since it cannot have phi nodes because its predecessors include function’s callsite.

  2. In %loop_start, since we’ve entered from %start, %i.0 and %r.0 are selected to be the first versions of the (platonic) i and r variables, i.e., their initial values; we jump to %loop.

  3. Then, %loop is dominated by %loop_start so we can use %i.0 and %r.0 there directly; these are the *= and += operations. Then we jump back to %loop_start.

  4. Back in %loop_start, the phis now select %i.new and %r.new, so now %i.0 and %r.0 are the second versions of i and r. By induction, the nth execution of %loop_start has the nth versions of i and r.

  5. When we finally get sent to %exit, we can use %r.0 (since %loop_start dominates %r.0), which will be the %yth version of r; this is our return value.

This is a good place to stop and think about what we’ve done so far. SSA, domination, and phis can be hard to wrap your head around, and are not absolutely necessary for reading most IR. However, it is absolutely worth trying to understand, because it captures essential facts about how compilers like to reason about code5.

With phi and br, we can build arbitrarily complicated control flow within a function6.

Types and Aggregates

Now that we have basic scalar functions, let’s review LLVM’s type system.

We’ve seen i32 and its friends; these are arbitrary-bit-with integers. i1 is special because it is used as the boolean type. LLVM optimizations have been known to generate integer types with non-power-of-two sizes.

LLVM also has float and double, and some exotic float types like bfloat; these use their own arithmetic instructions with different options. I’ll pass on them in this explainer; see fadd and friends in the LangRef for more.

We’ve also seen void, which is only used as a return value, and ptr, which is an untyped7 pointer.

We’ve also seen the label pseudo-type, which represents a block label. It does not appear directly at runtime and has limited uses; the token and metadata types are similar.

Arrays are spelled [n x T]; the number must be an integer and the type must have a definite size. E.g., [1024 x i8]. Zero-sized arrays are supported.

Structs are spelled {T1, T2, ...}. E.g., {i64, ptr} is a Rust slice. Struct fields do not have names and are indexed, instead. The form <{...}> is a packed struct, which removes inter-field padding. E.g. #[repr(packed)] compiles down to this.

Vectors are like arrays but spelled <n x T>. These are used to represent types used in SIMD operations. For example, adding two <4 x i32> would lower to an AVX2 vector add on x86. I will not touch on SIMD stuff beyond this, although at higher optimization levels LLVM will merge scalar operations into vector operations, so you may come across them.

Type aliases can be created at file scope with the syntax

%Slice = type {i64, ptr}
LLVM IR

This means that %T can be either a type or a register/label inside of a function, depending on syntactic position.

Operations on Aggregates

The insertvalue and extractvalue can be used with struct or array types to statically access a field. For example,

%MyStruct = type {i32, {[5 x i8], i64}}

; In Rust-like syntax, this is `let v = s.1.0[4];`
%v = extractvalue %MyStruct %s, 1, 0, 4
LLVM IR

insertvalue is the reverse: it produces a copy of the aggregate with a specific field changed. It does not mutate in-place, because SSA forbids that.

; In Rust-like syntax, this is
;   let s2 = { let mut s = s; s2.1.1 = 42; s };
%s2 = insertvalue %MyStruct %s, i64 42, 1, 1
LLVM IR

There are similar operations called insertelement and extractelement work on vectors, but have slightly different syntax and semantics.

Finally, there’s getelementptr, the “pointer arithmetic instruction”, often abbreviated to GEP. A GEP can be used to calculate an offset pointer into a struct. For example,

define ptr @get_inner_in_array(ptr %p, i64 %idx) {
  %q = getelementptr %MyStruct, ptr %p, i64 %idx, i32 1, i32 1
  ret ptr %q
}
LLVM IR

This function takes in a pointer, ostensibly pointing to an array of %MyStructs, and an index. This returns a pointer to the i64 field of the %idxth element of %p.

A few important differences between GEP and extractvalue:

  1. It takes an untyped pointer instead of a value of a particular struct/array type.
  2. There is an extra parameter that specifies an index; from the perspective of GEP, every pointer is a pointer to an array of unspecified bound. When operating on a pointer that does not (at runtime) point to an array, an index operand of 0 is still required. (Alternatively, you can view a pointer to T as being a pointer to a one-element array.)
  3. The index parameters need explicit types.

LLVM provides a helpful8 FAQ on the GEP instruction: https://llvm.org/docs/GetElementPtr.html.

Other Operations

Some other operations are very relevant for reading IR, but don’t fit into any specific category. As always, the LangRef provides a full description of what all of these instructions do.

Function Calls

call, which calls any ptr as a function. For example:

; Arguments are passed in parentheses.
%r = call i32 @my_func(i32 %x)
LLVM IR

Note that this could have been a %reg instead of a @global, which indicates a function pointer call.

Sometimes you will see invoke, which is used to implement “call a function inside of a C++ try {} block”. This is rare in Rust, but can occur in some C++ code.

Function calls are often noisy areas of IR, because they will be very heavily annotated.

Synchronization

The load and store instructions we’ve already seen can be annotated as atomic, which is used to implement e.g. AtomicU32::load in Rust; this requires that an atomic ordering be specified, too. E.g.,

%v = load atomic i32, ptr %p acquire
LLVM IR

The fence operation is a general memory fence operation corresponding to e.g. Rust’s std::sync::atomic::fence function.

cmpxchg provides the CAS (compare-and-swap) primitive. It returns a {T, i1} containing the old value and whether the CAS succeeded. cmpxchg weak implements the spuriously-failing “weak CAS” primitive.

Finally, atomicrmw performs a read-modify-write (e.g., *p = op(*p, val)) atomically. This is used to implement things like AtomicU32::fetch_add and friends.

All of these operations, except for fence, can also be marked as volatile. In LLVM IR, much like in Rust but unlike in C/C++, individual loads and stores are volatile (i.e., have compiler-invisible side-effects). volatile can be combined with atomic operations (e.g. load atomic volatile), although most languages don’t provide access to these (except older C++ versions).

Reinterpret Shenanigans

bitcast is what mem::transmute and reinterpret_cast in Rust and C++, respectively, ultimately compile into. It can convert any non-aggregate type (integers, vectors) to any other type of the same bit width. For example, it can be used to get at the bits of a floating-point value:

%bits = bitcast double %fp to i64
LLVM IR

It also used to be what was used to cast pointer types (e.g. i32* to i8*). Pointers are now all untyped (ptr) so this use is no longer present.

However, bitcast cannot cast between pointer and integer data. For this we must use the inttoptr and ptrtoint9 instructions. These have the same syntax, but interact with the sketchy semantics of pointer-to-integer conversion and pointer provenance. This part of LLVM’s semantics is a bit of an ongoing trashfire; see Ralf Jung’s post for an introduction to this problem.

Intrinsics

There is also a vast collection of LLVM intrinsics, which are specified in the LangRef. For example, if we need a particular built-in memcpy, we can bring it into scope with a declare:

; ptr %dst, ptr %src, i64 %len, i1 %volatile
declare void @llvm.memcpy.p0.p0.i64(ptr, ptr, i64, i1)
LLVM IR

All of the LLVM intrinsics are functions that start with llvm.; diving into all of them is far beyond what we can do here.

I’m also leaving out discussion of floating point, SIMD, and exception handling, each of which would require their own articles!

Undefined Behavior

LLVM exists to generate optimized code, and optimizations require that we declare certain machine states “impossible”, so that we can detect when we can simplify what the programmer has said. This is “undefined behavior”.

For example, we’ve already encountered unreachable, which LLVM assumes cannot be executed. Division by zero and accessing memory out of bounds is also undefined.

Most LLVM UB factors through the concept of “poisoned values”. A poison value can be thought of as “taking on every value at once”, whichever is convenient for the current optimization pass with no respect to any other passes. This also means that if optimizations don’t detect a use of poison, it is ok from LLVM’s perspective to give you a garbage value. This is most visible at -O0, which performs minimal optimization.

Using a poison value as a pointer in a load, store, or call must be UB, because LLVM can choose it to be a null pointer. It also can’t be the denominator of a udiv or similar, because LLVM can choose it to be zero, which is UB. Passing poison into a br or a switch is also defined to be UB.

LLVM can perform dataflow analysis to try to determine what operations a poisonous value that was used in a UB way came from, and thus assume those operations cannot produce poison. Because all operations (other than select and phi) with a poison input produce poison, backwards reasoning allows LLVM to propagate UB forward. This is where so-called “time traveling UB” comes from.

Many operations generate poison. For example, in C, signed overflow is UB, so addition lowers to an add nsw (nsw stands for no signed wrap). Instead of wrapping on overflow, the instruction produces poison. There is also an unsigned version of the annotation, nuw.

Many other operations have “less defined” versions, which are either generated by optimizations, or inserted directly by the compiler that invokes LLVM when the language rules allow it (see C above). More examples include:

  • udiv and friends have an exact annotation, which requires that the division have a zero remainder, else poison.
  • getelementptr has an inbounds annotation, which produces poison if the access is actually out of bounds. This changes it from a pure arithmetic operation to one more closely matching C’s pointer arithmetic restrictions. GEP without inbounds corresponds to Rust’s <*mut T>::wrapping_offset() function.
  • Floating point operations marked with nnan and ninf will produce poison instead of a NaN or an infinite value, respectively (or when a NaN or infinity is an argument).

Creating poison is not UB; only using it is. This is weaker than the way UB works in most languages; in C, overflow is instantly UB, but in LLVM overflow that is never “witnessed” is simply ignored. This is a simpler operational semantics for reasoning about the validity of optimizations: UB must often be viewed as a side-effect, because the compiler will generate code that puts the program into a broken state. For example, division by zero will cause a fault in many architectures. This means UB-causing operations cannot always be reordered soundly. Replacing “causes UB” with “produces poison” ensures the vast majority of operations are pure and freely reorderable.

Reading Some Codegen

Let’s go back to our original Rust example!

pub fn square(x: i32) -> i32 {
  x * x
}

This is the output, with metadata redacted and some things moved around for readability.

source_filename = "example.b6eb2c7a6b40b4d2-cgu.0"
target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; example::square
define i32 @_ZN7example6square17hb32bcde4463f37c3E(i32 %x) unnamed_addr #0 {
start:
  %0 = call { i32, i1 } @llvm.smul.with.overflow.i32(i32 %x, i32 %x)
  %_2.0 = extractvalue { i32, i1 } %0, 0
  %_2.1 = extractvalue { i32, i1 } %0, 1
  %1 = call i1 @llvm.expect.i1(i1 %_2.1, i1 false)
  br i1 %1, label %panic, label %bb1

bb1:
  ret i32 %_2.0

panic:
  ; core::panicking::panic
  call void @_ZN4core9panicking5panic17ha338a74a5d65bf6fE(
    ptr align 1 @str.0,
    i64 33,
    ptr align 8 @alloc_1368addac7d22933d93af2809439e507
  )
  unreachable
}

declare { i32, i1 } @llvm.smul.with.overflow.i32(i32, i32) #1
declare i1 @llvm.expect.i1(i1, i1) #2

; core::panicking::panic
declare void @_ZN4core9panicking5panic17ha338a74a5d65bf6fE(ptr align 1, i64, ptr align 8) unnamed_addr #3

@alloc_9be5c135c0f7c91e35e471f025924b11 = private unnamed_addr constant
  <{ [15 x i8] }>
  <{ [15 x i8] c"/app/example.rs" }>, align 1

@alloc_1368addac7d22933d93af2809439e507 = private unnamed_addr constant
  <{ ptr, [16 x i8] }> <{
    ptr @alloc_9be5c135c0f7c91e35e471f025924b11,
    [16 x i8] c"\0F\00\00\00\00\00\00\00\02\00\00\00\03\00\00\00"
  }>, align 8

@str.0 = internal constant [33 x i8] c"attempt to multiply with overflow"

attributes #0 = { nonlazybind uwtable }
attributes #1 = { nocallback nofree nosync nounwind speculatable willreturn memory(none) }
attributes #2 = { nocallback nofree nosync nounwind willreturn memory(none) }
attributes #3 = { cold noinline noreturn nonlazybind uwtable }
LLVM IR

The main function is @_ZN7example6square17hb32bcde4463f37c3E, which is the mangled name of example::square. Because this code was compiled in debug mode, overflow panics, so we need to generate code for that. The first operation is a call to the LLVM intrinsic for “multiply and tell us if it overflowed”. This returns the equivalent of a (i32, bool); we extract both value out of it with extractvalue. We then pass the bool through @llvm.expect, which is used to tell the optimizer to treat the panicking branch as “cold”. The success branch goes to a return, which returns the product; otherwise, we go to a function that calls core::panicking::panic() to panic the current thread. This function never returns, so we can terminate the block with an unreachable.

The rest of the file consists of:

  • declares for the llvm intrinsics we used.
  • A declare for core::panicking::panic. Any external function we call needs to be declared. This also gives us a place to hang attributes for the function off of.
  • Global constants for a core::panic::Location and a panic message.
  • Attributes for the functions above.

This is a good place to mention attributes: LLVM has all kinds of attributes that can be placed on functions (and function calls) to record optimization-relevant information. For example, @llvm.expect.i1 is annotated as willreturn, which means this function will eventually return; this means that, for example, any UB that comes after the function is guaranteed to occur after finite time, so LLVM can conclude that the code is unreachable despite the call to @llvm.expect.i1. The full set of attributes is vast, but the LangRef documents all of them!

Conclusion

LLVM IR is huge, bigger than any individual ISA, because it is intended to capture every interesting operation. It also has a rich annotation language, so passes can record information for future passes to make use of. Its operational semantics attempt to leave enough space for optimizations to occur, while ensuring that multiple sound optimizations in sequence are not unsound (this last part is a work in progress).

Being able to read assembly reveals what will happen, exactly, when code is executed, but reading IR, before and after optimization, shows how the compiler is thinking about your code. Using opt to run individual optimization passes can also help further this understanding (in fact, “bisecting on passes” is a powerful debugging technique in compiler development).

I got into compilers by reading LLVM IR. Hopefully this article inspires you to learn more, too!

  1. Registers within a function may have a numeric name. They must be defined in order: you must define %0 (either as a register or a label), then %1, then %2, etc. These are often used to represent “temporary results”.

    If a function does not specify names for its parameters, they will be given the names %0, %1, etc implicitly, which affect what the first explicit numeric register name you can use is. Similarly, if the function does not start with a label, it will be implicitly be given the next numeric name.

    This can result in significant confusion, because if we have define void @foo(i32, i32) { ... }, the arguments will be %0 and %1, but if we tried to write %2 = add i32 %0, %1, we would get an extremely confusing parser error, because %2 is already taken as the name of the first block. 

  2. For some reason, the optimizer can’t figure out that the select is redundant? Alive2 (an SMT-solver correctness checker for optimizations) seems to agree this is a valid optimization.

    So I’ve filed a bug. :D 

  3. If you read my assembly article, you’ll recall that there are many branch instructions. On RV, we have beq, bne, bgt, and bge. Later on in the compilation process, after the optimizer runs, LLVM will perform instruction selection (isel) to choose the best machine instruction(s) to implement a particular LLVM instruction (or sequence), which is highly context-dependent: for example, we want to fuse an icmp eq followed by a br on the result into a beq.

    Isel is far outside my wheelhouse, and doing it efficiently and profitably is an active area of academic research. 

  4. Not exactly the same: language frontends like Clang and Rust will perform their own optimizations. For example, I have an open bug for LLVM being unable to convert && into & in some cases; this was never noticed, because Clang performs this optimization while lowering from C/C++ to LLVM, but Rust does not do the equivalent optimization. 

  5. A more intuitive model is used in more modern IRs, like MLIR. In MLIR, you cannot use variables defined in other blocks; instead, each block takes a set of arguments, just like a function call. This is equivalent to phi instructions, except that now instead of selecting which value we want in the target, each predecessor specifies what it wants to send to the target.

    If we instead treat each block as having “arguments”, we can rewrite it in the following fantasy syntax where register names are scoped to their block.

    ;; Not actual LLVM IR! ;;
    
    define i32 @pow(i32 %x, i32 %y) {
      br %loop_start(i32 0, i32 1)
    
    loop_start(i32 %i, i32 %r)
      %done = icmp eq i32 %i.0, %y
      br i1 %done, %exit(i32 %r), %loop(i32 %i, i32 %r)
    
    loop(i32 %i, i32 %r)
      %r.new = mul i32 %r, %x
      %i.new = add i32 %i, 1
      br %loop_start(i32 %i, i32 %r)
    
    exit(i32 %r)
      ret i32 %r
    }
    LLVM IR

  6. What does the CFG look like? LLVM contains “optimization” passes that print the CFG as a file as a .dot file, which can be rendered with the dot command. For @safe_div, we get something like the following.

    This is useful for understanding complex functions. Consider this Rust hex-parsing function.

    // hex.rs
    #[no_mangle]
    fn parse_hex(data: &str) -> Option<u64> {
      let mut val = 0u64;
      for c in data.bytes() {
        let digit = match c {
          b'0'..=b'9' => c,
          b'a'..=b'f' => c - b'a' + 10,
          b'A'..=b'F' => c - b'A' + 10,
          _ => return None,
        };
    
        val = val.checked_mul(16)?;
        val = val.checked_add(digit as u64)?;
      }
      Some(val)
    }
    Rust

    Then, we can generate our CFG with some shell commands.

    $ rustc -O --crate-type lib --emit llvm-ir hex.rs
    $ opt -p dot-cfg -o /dev/null hex.ll
    Writing '.parse_hex.dot'...
    $ dot -Tsvg .parse_hex.dot -o parse_hex.svg
    Shell

    The result is this mess.

    Without optimizations, we get a bigger mess (most optimization passes are various CFG cleanups).

    Exercise: try to trace through what each basic block is doing. You will want to open the SVGs in a separate tab to do that. I recommend following the optimized version, since it is much less noisy.

    Comparing optimized vs. unoptimized is a good way to see how much the compiler does to simplify the stuff the language frontend gives it. At -O0? All allocas. At -O2? No allocas! 

  7. Once upon a time we had typed pointers, like i32*. These turned out to generate more problems than they solved, requiring frequent casts in IR in exchange for mediocre type safety. See https://llvm.org/docs/OpaquePointers.html for a more complete history. 

  8. Sarcasm. 

  9. I quietly judge LLVM for having instructions named inttoptr when int2ptr just reads so much nicer. 

Single Abstract Method Traits

Rust and C++ both have very similar operational semantics for their “anonymous function” expressions (they call them “closures” and “lambdas” respectively; I will use these interchangably). Here’s what those expressions look like.

auto square = [](int x) { return x * x; }
C++
let square = |x: i32| x * x;
Rust

The type of square in both versions is an anonymous type that holds the captures for that closure. In C++, this type provides an operator() member that can be used to call it, wheras in Rust, it implements FnOnce (and possibly FnMut and Fn, depending on the captures), which represent a “callable” object.

For the purposes of this article, I am going to regard “function item values” as being identical to closures that explicitly specify their inputs and outputs for all intents and purposes. This is not completely accurate, because when I write let x = drop;, the resulting object is generic, but whenever I say “a closure” in Rust, I am also including these closure-like types too.

There is one thing C++ closures can express which Rust closures can’t: you can’t create a “generic” closure in Rust. In particular, in C++ we can write this code.

template <typename Fn>
size_t CallMany(Fn fn) {
  return fn(std::vector{5}) + fn(std::string("foo"));
}

CallMany([](auto& val) { return val.size(); });
C++

The auto keyword in a closure in C++ does not work like in Rust. In Rust, if try to write “equivalent” code, let x = |val| val.len();, on its own, we get this error:

error[E0282]: type annotations needed
 --> <source>:4:12
  |
4 |   let x = |val| val.len();
  |            ^^^  --- type must be known at this point
  |
help: consider giving this closure parameter an explicit type
  |
4 |   let x = |val: /* Type */| val.len();
  |               ++++++++++++
Rust

This is because in Rust, a closure argument without a type annotation means “please deduce what this should be”, so it participates in Rust’s type inference, wheras in C++ an auto argument means “make this a template parameter of operator()”.

How would we implement CallMany in Rust, anyways? We could try but we quickly hit a problem:

trait Len {
  fn len(&self) -> usize;
}

fn call_many(f: impl Fn(???) -> usize) {
  f(vec![5]) + f("foo")
}
Rust

What should we put in the ???? It can’t be a type parameter of call_many, since that has a concrete value in the body of the function. We want to say that Fn can accept any argument that implements len. There isn’t even syntax to describe this, but you could imagine adding a version of for<...> that works on types, and write something like this.

trait Len {
  fn len(&self) -> usize;
}

fn call_many(f: impl for<T: Len> Fn(&T) -> usize) -> usize {
  f(vec![5]) + f("foo")
}
Rust

The imaginary syntax for<T: Len> Fn(&T) -> usize means “implements Fn for all all types T that implement Len”. This is a pretty intense thing to ask rustc to prove. It is not unachievable, but it would be hard to implement.

For the purposes of this article, I am going to consider for<T> a plausible, if unlikely, language feature. I will neither assume it will ever happen, nor that we should give up on ever having it. This “middle of uncertainty” is important to ensure that we do not make adding this feature impossible in the discussion that follows.

A Workaround

Let’s examine the Fn trait, greatly simplified.

pub trait Fn<Args> {
  type Output;
  fn call(&self, args: Args) -> Self::Output;
}
Rust

Fn::call is analogous to operator() in C++. When we say that we want a “generic closure”, we mean that we want to instead have a trait that looks a bit more like this:

pub trait Fn {
  type Output<Args>;
  fn call<Args>(&self, args: Args) -> Self::Output<Args>;
}
Rust

Notice how Args has moved from being a trait parameter to being a function parameter, and Output now depends on it. This is a slightly different formulation from what we described above, because we are no longer demanding an infinitude of trait implementations, but now the implementation of one trait with a generic method.

For our specific example, we want something like this.

trait Len {
  fn len(&self) -> usize;
}

trait Callback {
  fn run(&self, val: impl Len) -> usize;
}

fn call_many(f: impl Callback) -> usize {
  f.run(vec![5]) + f.run("foo")
}
Rust

This compiles and expresses what we want precisely: we want to call f on arbitrary impl Len types.

But how do we call call_many? That starts to get pretty ugly.

struct CbImpl;
impl Callback for CbImpl {
  fn run(&self, val: impl Len) -> usize {
    val.len()
  }
}

call_many(CbImpl);
Rust

This has the potential to get really, really ugly. I used this pattern for a non-allocating visitor I wrote recently, and it wasn’t pretty. I had to write a macro to cut down on the boilerplate.

macro_rules! resume_by {
  ($parser:expr, $cb:expr) => {{
    struct Cb<'a, 's> {
      parser: &'a Parser<'s>,
      start: Option<u32>,
    }

    impl<'s> Resume<'s> for Cb<'_, 's> {
      fn resume(
        &mut self,
        visitor: &mut impl Visitor<'s>,
      ) -> Result<(), Error> {
        self.parser.do_with_rewind(
          &mut self.start,
          || ($cb)(self.parser, &mut *visitor),
        )
      }
    }

    Cb { parser: $parser, start: None }
  }};
}
Rust

This macro is, unsurprisingly, quite janky. It also can’t really do captures, because the $cb argument that contains the actual code is buried inside of a nested impl.

You might think “well Miguel, why don’t you hoist $cb into the Cb struct?” The problem is now that I need to write impl<'s, F: FnMut(&Parser<'s>, ???)> so that I can actually call the callback in the body of Resume::resume, but that brings us back to our trait bound problem from the start!

This is a general problem with this type of solution: there is no macro you can write that will capture an arbitrary closure to implement a trait by calling that closure, if the method being implemented is generic, because if you could, I wouldn’t have to bother with the macro.

Let’s Talk About Java

Java gets a bad rap but the core language does have some interesting features in it. A very handy one is an anonymous class.

Let’s suppose I want to pass a callback into something. In Java 6, which I grew up on, you did it like this:

public interface Callback {
  int run(int arg);
}

public int runMyThing(Callback cb) {
  return cb.run(42);
}

runMyThing(new Callback() {
  public int run(int arg) { return arg * arg; }
});
Java

The new Interface() {...} syntax mints a new class on the spot that implements Interface. You provide a standard class body between the braces, after the name of the type. You can also do this with a class type too.

Now, this is a bit tedious: I need to re-type the signature of the one method. This is fine if I need to implement a bunch of methods, but it’s a little annoying in the one-method case.

In Java 8 we got lambdas (syntax: x -> expr). Java made the interesting choice of not adding a Function type to be “the type of lambdas”. For a long time I thought this was a weird cop-out but I have since come to regard it as a masterclass in language design.

Instead, Java’s lambdas are a sort of syntax sugar over this anonymous class syntax.1 Instead, you need to assign a lambda to an interface type with a single abstract method, and it will use the body of the lambda to implement that one method.

Interfaces compatible with lambdas are called single abstract method (SAM) interfaces.

So, without needing to touch the existing library, I can turn the new syntax into this:

runMyThing(x -> x * x);
Java

chef’s kiss

Mind, Java does provide a mess of “standard function interfaces” in the java.util.functional package, and quite a bit of the standard library uses them, but they don’t need to express the totality of functions you might want to capture as objects.

These “SAM closures” give closures a powerful “BYO interface” aspect. Lambdas in Java are not “function objects”, they are extremely lightweight anonymous classes the pertinent interface.

I think this can let us cut the gordian knot of generic closures in Rust.

SAM in Rust

In what remains I will propose how we can extend the traits that closures implement to be any SAM trait, in addition to the traits they implement ipso facto.

What’s a SAM trait in Rust? It’s any trait T with precisely ONE method that does not have a default implementation, which must satisfy the following constraints:

  1. It must have a self parameter with type Self, &Self, or &mut Self.
  2. It does not mention Self in any part of its argument types, its return type, or its where clauses, except for the aforementioned self parameter.
  3. Has no associated consts and no GATs.
  4. All of its supertraits are Copy, Send, or Sync.

These restrictions are chosen so that we have a shot at actually implementing the entire trait.

In addition to the Fn traits, ordinary closures automatically implement Clone, Copy, Send, and Sync as appropriate.

None of these traits are SAM, so we can safely allow them to be automatically derived for SAM closures to, under the same rules as for ordinary closures.

To request a SAM closure, I will use the tentative syntax of impl Trait |args| expr. This syntax is unambiguously an expression rather than an impl item, because a | cannot appear in a path-in-type, and impl $path must be followed by {, for or where. The precise syntax is unimportant.

Applied to the call_many example above, we get this.

fn call_many(f: impl Callback) -> usize {
  f.run(vec![5]) + f.run("foo")
}

call_many(impl Callback |x| x.len());
Rust

The compiler rewrites this into something like this.

fn call_many(f: impl Callback) -> usize {
  f.run(vec![5]) + f.run("foo")
}

struct CallbackImpl;
impl Callback for CallbackImpl {
  fn run(&self, x: impl Len) -> usize {
    x.len()
  }
}

call_many(CallbackImpl);
Rust

This rewrite can happen relatively early, before we need to infer a type for x. We also need to verify that this trait’s captures are compatible with an &self receiver The same rules for when a trait implements Fn, FnMut, and FnOnce would decide which of the three receiver types the closure is compatible with.

Note that SAM closures WOULD NOT implement any Fn traits.

More Complicated Examples

We are required to name the trait we want but its type parameters can be left up in the air. For example:

pub trait Tr<T> {
  type Out;
  fn run(x: T, y: impl Display) -> Option<Self::Out>;
}

// We can infer `T = i32` and `Tr::Out = String`.
let tr = impl Tr<_> |x: i32, y| Some(format!("{}, {}"));
Rust

In general, unspecified parameters and associated types result in inference variables, which are resolved in the same way as the parameters of the Fn closures are.

In fact, we can emulate ordinary closures using SAM closures.

let k = 100;
let x = Some(42);
let y = x.map(impl FnOnce(_) -> _ move |x| x * k);
Rust

Note that because Fn and FnMut have non-trival supertraits we can’t make them out of SAM closures.

One application is to completely obsolete std::iter::from_fn.

fn fibonacci() -> impl Iterator<Item = u64> + Copy {
  let state = [1, 1];
  impl Iterator move || {
    let old = state[0];

    state[0] = state[1];
    state[1] += ret;

    ret
  }
}
Rust

Or, if you need a quick helper implementation of Debug

impl fmt::Debug for Thing {
  fn fmt(&self, f: fmt::Formatter) -> fmt::Result {
    f.debug_list()
    .entry(&impl Debug |f| {
      write!(f, "something something {}", self.next_thingy())
    })
    .finish();
  }
}
Rust

There are probably additional restrictions we will want to place on the SAM trait, but it’s not immediately clear what the breadth of those are. For example, we probably shouldn’t try to make this work:

trait UniversalFactory {
  fn make<T>() -> T;
}

let f = impl UniversalFactory || {
  // How do I name T so that I can pass it to size_of?
};
Rust

There are definitely clever tricks you can play to make this work, but the benefit seems slim.

Future Work

There’s two avenues for how we could extend this concept. The first is straightforward and desireable; the second is probably unimplementable.

Anonymous Trait Impls

Backing up from the Java equivalent of lambdas, it seems not unreasonable to have a full-fledged expression version of impl that can make captures.

Syntactically, I will use impl Trait for { ... }. This is currently unambiguous, although I think that making it so that { cannot start a type is probably a non-starter.

Let’s pick something mildly complicated… like Iterator with an overriden method. Then we might write something like this.

let my_list = &foo[...];
let mut my_iterator = impl Iterator for {
  type Item = i32;

  fn next(&mut self) -> Option<i32> {
    let head = *my_list.get(0)?;
    my_list = &my_list[1..];
    Some(head)
  }

  fn count(self) -> usize {
    my_list.len();
  }
};
Rust

The contents of the braces after for is an item list, except that variables from the outside are available, having the semantics of captures; they are, in effect, accesses of self without the self. prefix.

Hammering out precisely how this would interact with the self types of the functions in the body seems… complicated. Pretty doable, just fussy. There are also awkward questions about what Self is here and to what degree you’re allowed to interact with it.

Trait Inference

Suppose that we could instead “just” write impl |x| x * x and have the compiler figure out what trait we want (to say nothing of making this the default behavior and dropping the leading impl keyword).

This means that I could just write

fn call_many(f: impl Callback) -> usize {
  f.run(vec![5]) + f.run("foo")
}

call_many(impl |x| x.len());
Rust

We get into trouble fast.

trait T1 {
  fn foo(&self);
}

trait T2 {
  fn foo(&self);
}

impl<T: T1> T2 for T {
  fn foo(&self) {
    println!("not actually gonna call T1::foo() lmao");
  }
}

let x = || println!("hello");
T2::foo(&x);  // What should this print?
Rust

If the type of x implements T2 directly, we print "hello", but if we decide it implements T1 instead, it doesn’t, because we get the blanket impl. If it decides it should implement both… we get a coherence violation.

Currently, rustc does not have to produce impls “on demand”; the trait solver has a finite set of impls to look at. What we are asking the trait solver to do is to, for certain types, attempt to reify impls based on usage. I.e., I have my opaque closure type T and I the compiler decided it needed to prove a T: Foo bound so now it gets to perform type checking to validate whether it has an impl.

This seems unimplementable with how the solver currently works. It is not insurmountable! But it would be very hard.

It is possible that there are relaxations of this that are not insane to implement, e.g. the impl || expression is used to initialize an argument to a function that happens to be generic, so we can steal the bounds off of that type variable and hope it’s SAM. But realistically, this direction is more trouble than its worth.

Conclusion

Generic lambdas are extremely powerful in C++, and allow for very slick API designs; I often miss them in Rust. Although it feels like there is an insurmountable obstruction, I hope that the SAM interface approach offers a simpler, and possibly more pragmatic, approach to making them work in Rust.

  1. Except for the part where they are extremely not. Where new T() {} mints a brand new class and accompanying .class file, Java lambdas use this complicated machinery from Java 7 to generate method handles on the fly, via the invokedynamic JVM instruction. This, I’m told, makes them much easier to optimize. 

Better Trait Resolution in Rust

Traits are the core of polymorphism in Rust. Let’s review:

trait Stringer {
  fn string(&self) -> String;
}

impl Stringer for i32 {
  fn string(&self) -> String {
    format!("{self}")
  }
}

// Prints `42`.
println!("{}", 42.string());

Notice that we call the trait method Stringer::string directly on the value in question. This means that traits (at least, those currently in scope) inject their methods into the namespace of anything that implements them.

Now, this isn’t immediately a problem, because Rust’s namespace lookup rules are such that methods inherent to a type are searched for first:

trait Stringer {
  fn string(&self) -> String;
}

struct Woofer;
impl Stringer for Woofer {
  fn string(&self) -> String {
    format!("woof")
  }
}

impl Woofer {
  fn string(&self) -> String {
    format!("bark")
  }
}

// Prints `bark`.
println!("{}", Woofer.string());

This means that traits cannot easily break downstream code by adding new methods, but there are a few possible hazards:

  • If the owner of a type adds a method with the same name as a trait method, it will override direct (i.e., foo.string()) calls to that trait method, even if the type owner is unaware of the trait method.

  • If traits A and B are in scope, and String implements both, and we call str.foo() (which resolves to A::foo()), and later B adds a new method B::foo(), the callsite for String will break. A and B’s owners do not need to be aware of each other for this to happen.

Of course, Rust has a disambiguation mechanism. Given any trait implementation Foo: Bar, we can reference its items by writing <Foo as Bar>::baz. However, this syntax is very unweildy (it doesn’t work with method chaining), so it doesn’t get used. As a result, small evolution hazards can build up in a large codebase.

Those who know me know that I often talk about a syntax that I call foo.Trait::method(), or “qualified method call syntax”. In this post, I want to discuss this syntax in more detail, and some related ideas, and how they factor into type and trait design.

Paths-as-Methods

This idea isn’t new; others have proposed it, and it forms the core of Carbon’s version of trait method calls (you can read more about Carbon’s name lookup story here).

Let’s recreate the original example in Carbon (bear in mind that I am not an expert on this language, and the semantics are still up in the air).

interface Stringer {
  fn String[self: Self]() -> String;
}

external impl i32 as Stringer {
  fn String[self: Self]() -> String { ... }
}

fn Run() {
  Carbon.Print(42.(Stringer.String)())
}
Carbon

Notice 42.(Stringer.String)(): Carbon requires that we qualify the method call, because 42 has the concrete type i32. If this were in a generic context and we had a type variable bounded by Stringer, we could just write x.String(); no ambiguity.

In Carbon, all qualification uses ., so they have to add parens. Because Rust uses :: for qualifying paths, we don’t have this syntactic abiguity, so we can augment the syntax to allow more path expressions after the ..

The current grammar is

MethodCallExpression :
   Expression . PathExprSegment (CallParams?)

PathExprSegment :
   PathIdentSegment (:: GenericArgs)?
Plaintext

That is, exactly one identifier and an optional turbofish. I would like to see this extended to allow any QualifiedPathInExpression after the . and before the parens. This would allow, for example:

  • expr.Trait::method()
  • expr.Self::method()
  • expr.<Foo as Trait>::method()
  • expr.::path::to::Trait::<Args>::method::<MoreArgs>()

These would all be desugared as the equivalent UFCS, taking into account that method call syntax can trigger autoref.

  • Trait::method(expr)
  • Self::method(expr)
  • <Foo as Trait>::method(expr)
  • ::path::to::Trait::<Args>::method::<MoreArgs>(expr)

The method would still need to be valid to have been called via . syntax; I’m not proposing we add the following, even though it is unambiguous.

fn square(x: i32) -> i32 {
  x * x
}

// Would be equivalent to `square(42)`.
42.self::square()
Rust

Trait method callers can now use qualified method call syntax where they might want to use UFCS without issues around wordiness.

Impl Modality

Of course, this isn’t the only idea from Carbon’s interfaces worth stealing; Carbon also has a notion of “external” and “internal” impls; I will call these “impl modes”.

An external impl is like the one we showed above, whose methods can only be found by qualified lookup: foo.(Bar.Baz)(). An internal impl is one which is “part” of a type.

interface Stringer {
  fn String[self: Self]() -> String;
}

class Woofer {
  impl as Stringer {
    fn String[self: Self]() -> String { ... }
  }
}

fn Run() {
  let w: Woofer = ...;
  w.String()  // Unqualified!
}
Carbon

This also implies that we don’t need to import Stringer to call w.String().

There are definitely traits in Rust which fit into these modes.

Clone and Iterator almost always want to be internal. An iterator exists to implement Iterator, and cloning is a fundamental operation. Because both of these traits are in the prelude, it’s not a problem, but it is a problem for traits provided by a non-std crate, like rand::Rng. The lack of a way to do this leads to the proliferation of prelude modules and namespace pollution. (I think that preludes are bad library design.)

On the other hand, something like Debug wants to be external very badly. It almost never makes sense to call foo.fmt(), since that gets called for you by println! and friends; not to mention that all of the std::fmt traits have a method fmt(), making such a call likely to need disambiguation with UFCS. Borrow is similar; it exists to be a bound for things like Cow more than to provide the .borrow() method.

There’s also a third mode, which I will call “extension impls”. These want to inject methods into a type, either to extend it, like itertools, or as part of some framework, like tap. This use of traits is somewhat controversial, but I can sympathize with wanting to have this.

If we have paths-as-methods, we can use this classification to move towards something more like the Carbon model of method lookup, without impacting existing uses.

My strawman is to add a #[mode] attribute to place on trait impls, which allows a caller to select the behavior:

  • #[mode(extension)] is today’s behavior. The impl’s trait must be in scope so that unqualified calls like foo.method() resolve to it.
  • #[mode(internal)] makes it so that foo.method() can resolve to a method from this impl without its trait being in scope1. It can only be applied to impls that are such that you could write a corresponding inherent impl, so things like #[mode(internal)] impl<T> Trait for T { .. } are forbidden.
  • #[mode(external)] makes it so that foo.method() never resolves to a method from this impl. It must be called as Trait::method(foo) or foo.Trait::method().

Every trait would be #[mode(extension)] if not annotated, and it would be easy to migrate to external-by-default across an edition. Similarly, we could change whether a std impl is external vs extension based on the edition of the caller, and provide a cargo fix rewrite to convert from foo.method() to foo.Trait::method().

It may also make sense for traits to be able to specify the default modality of their impls, but I haven’t thought very carefully about this.

Note that moving in the external -> extension -> internal direction is not a breaking change, but moving the other way is.

What about Delegation?

A related feature that I will touch on lightly is delegation; that is, being able to write something like the following:

// crate a
struct Foo { ... }
impl Foo {
  fn boing() -> i32 { ... }
}

// crate b
trait Bar {
  fn boing() -> i32;
}

impl Bar for Foo {
  use Foo::boing;
}
Rust

The use in the impl indicates that we want to re-use Foo::boing to implement <Foo as Bar>::boing. This saves us having to write out a function signature, and results in less work for the compiler because that’s one less function we risk asking LLVM to codegen for us (at scale, this is a Big Deal).

You could imagine using delegation instead of #[mode]:

struct Woofer;
impl Stringer for Woofer {
  fn string(&self) -> String {
    format!("woof")
  }
}

impl Woofer {
  use Stringer::*;
}
Rust

The reason I haven’t gone down this road is because delegation is a very large feature, and doesn’t give us a clean way to express #[mode(external)], which is a big part of what I want. A delegation-compatible way to express this proposal is to not add #[mode(internal)], and add use Trait::method; and use Trait::*; (and no other variations) inside of inherent impl blocks.

Conclusion

I don’t have the personal bandwidth to write RFCs for any of this stuff, but it’s something I talk about a lot as a potential evolution hazard for Rust. I hope that putting these ideas to paper can help make name resolution in Rust more robust.

  1. This needs to carry a bunch of other restrictions, because it’s equivalent to adding inherent methods to the implee. For example, none of the methods can have the same name as a method in any other inherent or internal impl block, and internal impl methods should take lookup priority over extension impl methods during name lookup.