Protobuf Tip #6: The Subtle Dangers of Enum Aliases

I’ve been very fortunate to dodge a nickname throughout my entire career. I’ve never had one. – Jimmie Johnson

TL;DR: Enum values can have aliases. This feature is poorly designed and shouldn’t be used. The ENUM_NO_ALLOW_ALIAS Buf lint rule prevents you from using them by default.

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

Confusion and Breakage

Protobuf permits multiple enum values to have the same number. Such enum values are said to be aliases of each other. Protobuf used to allow this by default, but now you have to set a special option, allow_alias, for the compiler to not reject it.

This can be used to effectively rename values without breaking existing code:

package myapi.v1;

enum MyEnum {
  option allow_alias = true;
  MY_ENUM_UNSPECIFIED = 0;
  MY_ENUM_BAD = 1 [deprecated = true];
  MY_ENUM_MORE_SPECIFIC = 1;
}
Protobuf

This works perfectly fine, and is fully wire-compatible! And unlike renaming a field (see TotW #1), it won’t result in source code breakages.

But if you use either reflection or JSON, or a runtime like Java that doesn’t cleanly allow enums with multiple names, you’ll be in for a nasty surprise.

For example, if you request an enum value from an enum using reflection, such as with protoreflect.EnumValueDescriptors.ByNumber(), the value you’ll get is the one that appears in the file lexically. In fact, both myapipb.MyEnum_MY_ENUM_BAD.String() and myapipb.MyEnum_MY_ENUM_MORE_SPECIFIC.String() return the same value, leading to potential confusion, as the old “bad” value will be used in printed output like logs.

You might think, “oh, I’ll switch the order of the aliases”. But that would be an actual wire format break. Not for the binary format, but for JSON. That’s because JSON preferentially stringifies enum values by using their declared name (if the value is in range). So, reordering the values means that what once serialized as {"my_field": "MY_ENUM_BAD"} now serializes as {"my_field": "MY_ENUM_MORE_SPECIFIC"} .

If an old binary that hasn’t had the new enum value added sees this JSON document, it won’t parse correctly, and you’ll be in for a bad time.

You can argue that this is a language bug, and it kind of is. Protobuf should include an equivalent of json_name for enum values, or mandate that JSON should serialize enum values with multiple names as a number, rather than an arbitrarily chosen enum name. The feature is intended to allow renaming of enum values, but unfortunately Protobuf hobbled it enough that it’s pretty dangerous.

What To Do

Instead, if you really need to rename an enum value for usability or compliance reasons (ideally, not just aesthetics) you’re better off making a new enum type in a new version of your API. As long as the enum value numbers are the same, it’ll be binary-compatible, but it will somewhat reduce the risk of the above JSON confusion.

Buf provides a lint rule against this feature, ENUM_NO_ALLOW_ALIAS , and Protobuf requires that you specify a magic option to enable this behavior, so in practice you don’t need to worry about this. But remember, the consequences of enum aliases go much further than JSON—they affect anything that uses reflection. So even if you don’t use JSON, you can still get burned.

Protobuf Tip #5: Avoid import public/weak

My dad had a guitar but it was acoustic, so I smashed a mirror and glued broken glass to it to make it look more metal. It looked ridiculous! –Max Cavalera

TL;DR: Avoid import public and import weak. The Buf lint rules IMPORT_NO_PUBLIC and IMPORT_NO_WEAK enforce this for you by default.

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

Protobuf imports allow you to specify two special modes: import public and import weak. The Buf CLI lints against these by default, but you might be tempted to try using them anyway, especially because some GCP APIs use import public. What are these modes, and why do they exist?

Import Visibility

Protobuf imports are by file path, a fact that is very strongly baked into the language and its reflection model.

import "my/other/api.proto";
Protobuf

Importing a file dumps all of its symbols into the current file. For the purposes of name resolution, it’s as if all if the declarations in that file have been pasted into the current file. However, this isn’t transitive. If:

  • a.proto imports b.proto
  • and b.proto imports c.proto
  • and c.proto defines foo.Bar
  • then, a.proto must import c.proto to refer to foo.Bar, even though b.proto imports it.

This is similar to how importing a package as . works in Go. When you write import . "strings", it dumps all of the declarations from the strings package into the current file, but not those of any files that "strings" imports.

Now, what’s nice about Go is that packages can be broken up into files in a way that is transparent to users; users of a package import the package, not the files of that package. Unfortunately, Protobuf is not like that, so the file structure of a package leaks to its callers.

import public was intended as a mechanism for allowing API writers to break up files that were getting out of control. You can define a new file new.proto for some of the definitions in big.proto, move them to the new file, and then add import public "new.proto"; to big.proto. Existing imports of big.proto won’t be broken, hooray!

Except this feature was designed for C++. In C++, each .proto file maps to a .proto.h header, which you #include in your application code. In C++, #include behaves like import public, so marking an import as public only changes name resolution in Protobuf—the C++ backend doesn’t have to do anything to maintain source compatibility when an import is changed to public.

But other backends, like Go, do not work this way: import in Go doesn’t pull in symbols transitively, so Go would need to explicitly add aliases for all of the symbols that come in through a public import. That is, if you had:

// foo.proto
package myapi.v1;
message Foo { ... }

// bar.proto
package myotherapi.v1;
import public "foo.proto";
Protobuf

Then the Go backend has to generate a type Foo = foopb.Foo in bar.pb.go to emulate this behavior (in fact, I was surprised to learn Go Protobuf implements this at all). Go happens to implement public imports correctly, but not all backends are as careful, because this feature is obscure.

The spanner.proto example of an import public isn’t even used for breaking up an existing file; instead, it’s used to not make a huge file bigger and avoid making callers have to add an additional import. This is a bad use of a bad feature!

Using import public to effectively “hide” imports makes it harder to understand what a .proto file is pulling in. If Protobuf imports were at the package/symbol level, like Go or Java, this feature would not need to exist. Unfortunately, Protobuf is closely tailored for C++, and this is one of the consequences.

Instead of using import public to break up a file, simply plan to break up the file in the next version of the API.

The IMPORT_NO_PUBLIC Buf lint rule enforces that no one uses this feature by default. It’s tempting, but the footguns aren’t worth it.

Weak Imports

Public imports have a good, if flawed, reason to exist. Their implementation details are the main thing that kneecaps them.

Weak imports, however, simply should not exist. They were added to the language to make it easier for some of Google’s enormous binaries to avoid running out of linker memory, by making it so that message types could be dropped if they weren’t accessed. This means that weak imports are “optional”—if the corresponding descriptors are missing at runtime, the C++ runtime can handle it gracefully.

This leads to all kinds of implementation complexity and subtle behavior differences across runtimes. Most runtimes implement (or implemented, in the case of those that removed support) import weak in a buggy or inconsistent way. It’s unlikely the feature will ever be truly removed, even though Google has tried.

Don’t use import weak. It should be treated as completely non-functional. The IMPORT_NO_WEAK Buf lint rule takes care of this for you.

Protobuf Tip #4: Accepting Mistakes We Can't Fix

Bad humor is an evasion of reality; good humor is an acceptance of it. –Malcolm Muggeridge

TL;DR: Protobuf’s distributed nature introduces evolution risks that make it hard to fix some types of mistakes. Sometimes the best thing to do is to just let it be.

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

A Different Mindset

Often, you’ll design and implement a feature for the software you work on, and despite your best efforts to test it, something terrible happens in production. We have a playbook for this, though: fix the bug in your program and ship or deploy the new, fixed version to your users. It might mean working late for big emergencies, but turnaround for most organizations is a day to a week.

Most bugs aren’t emergencies, though. Sometimes a function has a confusing name, or an integer type is just a bit too small for real-world data, or an API conflates “zero” and “null”. You fix the API, refactor all usages in your API in one commit, merge, and the fix rolls out gradually.

Unless, of course, it’s a bug in a communication API, like a serialization format: your Protobuf types, or your JSON schema, or the not-too-pretty code that parses fields out of dict built from a YAML file. Here, you can’t just atomically fix the world. Fixing bugs in your APIs (from here on, “APIs” means “Protobuf definitions”) requires a different mindset than fixing bugs in ordinary code.

What Are the Risks?

Protobuf’s wire format is designed so that you can safely add new fields to a type, or values to an enum, without needing to perform an atomic upgrade. But other changes, like renaming fields or changing their type, are very dangerous.

This is because Protobuf types exist on a temporal axis: different versions of the same type exist simultaneously among programs in the field that are actively talking to each other. This means that writers from the future (that is, new serialization code) must be careful to not confuse the many readers from the past (old versions of the deserialization code). Conversely, future readers must tolerate anything past writers produce.

In a modern distributed deployment, the number of versions that exist at once can be quite large. This is true even in self-hosted clusters, but becomes much more fraught whenever user-upgradable software is involved. This can include mobile applications that talk to your servers, or appliance software managed by a third-party administrator, or even just browser-service communication.

The most important principle: you can’t easily control when old versions of a type or service are no longer relevant. As soon as a type escapes out of the scope of even a single team, upgrading types becomes a departmental effort.

Learning to Love the Bomb

There are many places where Protobuf could have made schema evolution easier, but didn’t. For example, changing int32 foo = 1; to sfixed32 foo = 1; is a breakage, even though at the wire format level, it is possible for a parser to distinguish and accept both forms of foo correctly. There too many other examples to list, but it’s important to understand that the language is not always working in our favor.

For example, if we notice a int32 value is too small, and should have been 64-bit, you can’t upgrade it without readers from the past potentially truncating it. But we really have to upgrade it! What are our options?

  1. Issue a new version of the message and all of its dependencies. This is the main reason why sticking a version number in the package name, as enforced by Buf’s PACKAGE_VERSION_SUFFIX lint rule, is so important.
  2. Do the upgrade anyway and hope nothing breaks. This can work for certain kinds of upgrades, if the underlying format is compatible, but it can have disastrous consequences if you don’t know what you’re doing, especially if it’s a type that’s not completely internal to a team’s project. Buf breaking change detection helps you avoid changes with potential for breakage.

Of course, there is a third option, which is to accept that some things aren’t worth fixing. When the cost of a fix is so high, fixes just aren’t worth it, especially when the language is working against us.

This means that even in Buf’s own APIs, we sometimes do things in a way that isn’t quite ideal, or is inconsistent with our own best practices. Sometimes, the ecosystem changes in a way that changes best practice, but we can’t upgrade to it without breaking our users. In the same way, you shouldn’t rush to use new, better language features if they would cause protocol breaks: sometimes, the right thing is to do nothing, because not breaking your users is more important.