Let’s talk about C++ templates.
C++ is famous for relegating important functionality often built into the language to its standard library1. C++11 added a number of very useful class templates intended to make generic programming easier. By far the most complicated is std::tuple<>
, which is literally just a tuple.
It turns out that implementing std::tuple<>
is complicated. Very, very complicated.
Naively, we think that we can just splat a variadic pack into a struct:
If you click through to Godbolt, you’ll see it doesn’t: this feature doesn’t exist in C++2 (normally, you’d do std::tuple<Types...>
, but we need to write down std::tuple
somehow). The usual approach is to use some kind of recursive template, which can tend to generate a lot of code.
However, C++ does actually have tuples built into the language, as a C++11 feature… lambdas! As an extra challenge, we’re going to try to minimize the number of templates that the compiler needs to instantiate; std::tuple
is famously bad about this and can lead to very poor build performance.
For our tuple library type, we need to solve the following problems:
- How do we implement
std::tuple()
andstd::tuple(args...)
? - How do we implement
std::apply
? - How do we implement
std::tuple_element
? - How do we implement
std::get
?
The Power of [](){}
Alright, let’s back up. In C++11, we got lambdas, which are expressions that expand to anonymous functions. In C++, lambdas are closures, meaning that they capture (“close over”) their environment.
This is a lambda in action:
The [x]
syntax is the captures. To represent a lambda, C++ creates an anonymous, one-time-use class. It has the captures as members (whether they be references or values) and provides the necessary operator()
. In other words, this is approximately the desugaring:
Note the const
s in _Lambda
. By default, captured values are stored inline but marked const
, and the operator()
member is also const
. We can remove that specifier in both location with the mutable
keyword:
Lambdas can capture anything from their scope. In addition to values, they will capture any types visible from that location. This means that, if constructed in a function template, the generated class will effectively capture that template’s arguments. Thus:
This will create a new anonymous class capturing an arbitrary number of arguments, depending on the parameters passed to CaptureMany()
. This will form the core of our tuple type.
Now, let’s stick it into a class.
Lambda-Typed Data Members
We don’t want to leak the lambda into the template parameters of our tuple class, so we need it to be strictly in terms of the class’s template parameters. This is straightforward with decltype
.
Regardless of what our C++ compiler calls the type, we are able to use it as a field. However, a problem arises when we try to write down the main “in-place” constructor, which consists of the usual forwarding-reference and std::forward
boilerplate3:
The initialization for lambda_ doesn’t work, because the return type of TupleLambda
is wrong! The compiler is required to synthesize a new type for every specialization of TupleLambda
, and so TupleLambda<Types...>()
and TupleLambda<Args...>
return different types!
A new
Kind of Initialization
This requires a major workaround. We’d still like to use our lambda, but we need to give it a type that allows us to construct it before calling the constructors of Types...
. We can’t use Types...
, so we’ll do a switcheroo.
The following is boilerplate for a type that can hold a T
in it but which can be constructed before we construct the T
.
There’s a lot going on here. Let’s break it down.
alignof(T)
ensures that even though the only member is achar
array, this
-
The constructor does nothing; the
T
within is only constructed whenInit()
is called withT
’s constructor arguments. -
Init()
forwards its arguments just like our non-functional constructor forTuple
. This time, the arguments get sent intoT
’s constructor via placement-new
. Placement-new
is special syntax that allows us to call a constructor directly on existing memory. It’s spelled like this:new (dest) T(args);
. -
operator*
/operator->
turnStorageFor
into a smart pointer overT
, which will be useful later. The signatures of these functions aren’t important; it’s library boilerplate.
We can use this type like this:
How does this help us?
Constructors Inside-Out
StorageFor<T>
will be the types that our lambda captures, making it possible to give it a consistent type without knowing which arguments we’ll use to initialize the contents.
But now we’re in another bind: how do we call the constructors? Even with placement-new, we can’t reach into the lambda’s data, and the layout of a lambda is compiler-specific. However, that’s from the outside. What if we accessed the lambda from the inside?
We modify the lambda to itself be generic and take a pack of forwarding references as arguments, which we can then pass into Init()
:
That’s a serious mouthful. Let’s break it down.
-
[args...] (auto&&... init_args) {
declares a generic lambda. This means that there’s an imaginarytemplate <typename... Args>
on theoperator()
of the generated class. Because the argument type isArgs&&
, andArgs
is a template parameter ofoperator()
,init_args
is a pack of forwarding references. This is a C++14 feature. -
Init(std::forward<decltype(init_args)>(init_args))
is a forwarded constructor argument. Nothing new here. -
The outer
(<expr>, ...)
that the placement-new
is wrapped in is a pack fold, which uses an operator to fold a pack of values into one. For example,(foo + ...)
computes the sum of all elements in a pack. In our case, we’re folding with the comma operator,
. All this does is discard the elements of the pack (which are allvoid
, regardless). This is a C++17 feature4
Taken together, this causes the constructor of each type in Types...
to be run on the respective StorageFor<T>
captures by the lambda when TupleLambda()
was originally called. The double-nesting of a function-within-a-function can be a bit confusing: TupleLambda()
is not what calls T
’s constructor!
Actually, this won’t compile because Init()
is not const
, but the lambda’s operator()
is. This is easily fixed by adding the mutable
keyword:
We also need to mark the lambda_
parameter as mutable
so that const
functions can all it. We’ll just need to be careful we don’t actually mutate through it. This is necessary because we cannot (at least until C++23) write to the captures of a lambda and still be able to call it in const
contexts:
Now, our constructor looks like this:
More Constructors!
We have std::tuple(args)
but we still need std::tuple
. But, we’ve already used up our one chance to touch the captures of the lambda… we can’t write down a lambda that has both a variadic operator()
(many generic arguments) and a niladic operator()
(no arguments).
But we can make it take a lambda itself! In this case, all that our “storage lambda” does now is call a callback with a pack of references. Calling lambda_()
effectively “unpacks” it:
The decltype(auto)
bit simply ensures that if callback
returns a reference, then so does lambda_
. By default, lambdas return auto
, which will never deduce a reference (you’d need to write auto&
, which conversely cannot deduce a value). Instead of using “auto
deduction”, we can use the special decltype(auto)
type to request “decltype
deduction”, which can deduce both references and non-references. This comes in handy later.
Now we can refactor the two constructors to call lambda_
with different lambda arguments. Our original constructor will pass in the original body of lambda_
, which calls Init() with args
. The new constructor will simply call Init()
with no args.
We need to implement the destructor too, since StorageFor<T>
will not destroy the T
we’re squirreling away inside, but this is still really easy:
Copy and move are similar, but require interleaving two calls of lambda_
:
Copy/move assignment are basically identical; I’ll leave those as an exercise!
This gives us our complete set of constructors. We’ll throw in deduction guides5 to avoid needing to implement make_tuple
:
This works up until we try to write Tuple tup2 = tup;
Overload resolution will incorrectly route to the variadic constructor rather than the copy constructor, so a little bit of SFINAE is needed to grease the compiler’s wheels.
Keeping in the spirit of avoiding extra instantiation logic, we’ll use placement-new
inside of a decltype
as an ersatz std::enable_if
:
This verifies that we can actually construct a Types
from a Args
(for each member of the pack). Because this is occurring in an unevaluated context, we can safely placement-new
on nullptr
. All new
expressions produce a pointer value, and a comma-fold produces the last value in the fold, so the overall decltype()
is T*
, where T
is the last element of the pack.
This decltype()
is the type of a non-type template parameter, which we can default to nullptr
, so the user never notices it.
Ok. We have all of our constructors. The code so far is at this footnote: 6.
Onwards to std::apply
.
Unpacking, Again
std::apply(f, tup)
is a relatively straight-forward function: call f
by splatting tup
’s elements int f
as a pack. Because of how we’ve implemented lambda_
, this is actually super simple:
(We’re possibly returning a reference, so note the decltype(auto)
s.)
lambda_
is basically a funny std::apply
already, just with the wrong arguments. The *places
fixes this up. With some repetition, we can write down const
- and &&
-qualified overloads. We can even introduce a free function just like the one in the standard library:
The other unpacking operation, std::get
, is trickier. This is usually where things get really hairy, because we need to get the i
th type out of the lambda. There are many approaches for doing this, most of which involve recursive templates. I’ll present two approaches that don’t use recursive templates directly, but which can still be a bit slow, built-time-wise.
This is the function we need to implement:
Cheating with std::make_index_sequence
std::make_index_sequence
is a funny type-level function that produces a pack of integers from 0
to i
, given just i
. This is usually fast, since most compilers will have intrinsics for doing it without needing to instantiate i
templates. For example, in Clang, this is __make_integer_seq
, which is used by libc++.
Thus, we can turn the problem of implementing get
with a single i
to implementing get
with a pack:
We can then use this pack to cook up just the right lambda to grab just the capture we want out of lambda_
. Specifically, we want a lambda that picks out its i
th argument. Basically we want to write something with arguments like (auto..., auto, auto...)
, but somehow use the less_than_i
pack to control the size of the first argument pack.
We can whip up a class template for this:
Sink<n>
is a type that is implicitly convertible from anything, and has a dummy parameter we can key an expansion off-of. Hence GetImpl()
looks like this:
We can then provide the type of the i
th element as a member type alias, using decltype
:
(The template
keyword isn’t doing anything interesting; it’s just for syntactic disambiguation.)
We can, as usual, repeat implementations for const
/&&
qualifiers.
Cheating Harder with __type_pack_element
If we’re ok being Clang-specific, Clang just gives us a magic type function that selects out of a pack. This means we can implement TupleType
in terms of it:
Then, we can use void*
to swindle the type system, since we don’t need to go to any effort to learn the i
th type now:
(We’re returning a reference, so again note the decltype(auto)
.)
With that we have all of the functions we set out to implement. For kicks, we can add the relevant std
specializations to enable structured bindings on our type (along with our get
member function):
Now we can see everything in action:
The full code can be found at this footnote: 7.
The Damage
So, the end result is most of an implementation of std::tuple<>
. Let’s see how well it builds. We’re going to compile the following code for n
from 0 to 150 and measure how long it takes.
And here’s the results on Clang 11 (what I had on-hand) on my Zen 2 machine:
We seem to beat libstdc++ by a factor of around 2, but libc++ appears to have us beat. This is because libc++ makes even more aggressive use of Clang’s intrinsics than we did, allowing them to do significantly better. Interestingly, using the builtin makes us perform worse. I’m actually not sure why this is.
But ultimately, this wasn’t really about beating libc++: it’s about having fun with C++ templates.
-
Arguably, because WG21, the body that standardizes C++, is bad at language evolution, but that’s not why we’re here. ↩
-
The Circle compiler totally laughs in our faces, though, because it has this exact syntax. https://github.com/seanbaxter/circle/tree/master/tuple#circle-tuple ↩
-
Basically every in-place constructor in C++ looks like this. It takes a variadic pack as a template parameter, and then takes
&&
if that as its arguments.Args&&
here is a forwarding reference, which means it isT&
orT&&
depending on the callsite. This overrides the usual template deduction rules, and is important for making sure that e.g.std::move
propagates correctly.We cannot write
Types&&
instead, because that would not be a forwarding reference.T&&
refers to a forwarding reference argument only on a function template whereT
is a parameter of that function and not an enclosing entity. ↩ -
If C++17 is too much to ask, polyfilling isn’t too hard. Instead of
(<expr>, ...);
, we can write(void)(int[]){(<expr>, 0)...};
, even if<expr>
is a void expression.(<expr>, 0)
is still a comma operator call, which discards the result of<expr>
as before. The pack expands into an array of integers (aint[]
), which we then discard with(void)
. This still has the behavior of evaluating<expr>
once for each element of the pack. ↩ -
A deduction guide is a special piece of syntax introduced in C++17 intended to aid deducing the types of constructor calls. When we write
std::tuple(a, b, c)
, the template arguments ofstd::tuple
are deduced. However, the constructor call may not give sufficient information to properly deduce them, because we may be calling a constructor template.The syntax looks like this:
This tells the compiler that when it encounters a call to a constructor of
MyTypes
that deduces the given types as its arguments, it should deduce the type after the->
for the template arguments ofMyType
, which can be arbitrary template argument expressions. ↩ -
-