Protobuf Tip #5: Avoid import public/weak

My dad had a guitar but it was acoustic, so I smashed a mirror and glued broken glass to it to make it look more metal. It looked ridiculous! –Max Cavalera

TL;DR: Avoid import public and import weak. The Buf lint rules IMPORT_NO_PUBLIC and IMPORT_NO_WEAK enforce this for you by default.

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

Protobuf imports allow you to specify two special modes: import public and import weak. The Buf CLI lints against these by default, but you might be tempted to try using them anyway, especially because some GCP APIs use import public. What are these modes, and why do they exist?

Import Visibility

Protobuf imports are by file path, a fact that is very strongly baked into the language and its reflection model.

import "my/other/api.proto";
Protobuf

Importing a file dumps all of its symbols into the current file. For the purposes of name resolution, it’s as if all if the declarations in that file have been pasted into the current file. However, this isn’t transitive. If:

  • a.proto imports b.proto
  • and b.proto imports c.proto
  • and c.proto defines foo.Bar
  • then, a.proto must import c.proto to refer to foo.Bar, even though b.proto imports it.

This is similar to how importing a package as . works in Go. When you write import . "strings", it dumps all of the declarations from the strings package into the current file, but not those of any files that "strings" imports.

Now, what’s nice about Go is that packages can be broken up into files in a way that is transparent to users; users of a package import the package, not the files of that package. Unfortunately, Protobuf is not like that, so the file structure of a package leaks to its callers.

import public was intended as a mechanism for allowing API writers to break up files that were getting out of control. You can define a new file new.proto for some of the definitions in big.proto, move them to the new file, and then add import public "new.proto"; to big.proto. Existing imports of big.proto won’t be broken, hooray!

Except this feature was designed for C++. In C++, each .proto file maps to a .proto.h header, which you #include in your application code. In C++, #include behaves like import public, so marking an import as public only changes name resolution in Protobuf—the C++ backend doesn’t have to do anything to maintain source compatibility when an import is changed to public.

But other backends, like Go, do not work this way: import in Go doesn’t pull in symbols transitively, so Go would need to explicitly add aliases for all of the symbols that come in through a public import. That is, if you had:

// foo.proto
package myapi.v1;
message Foo { ... }

// bar.proto
package myotherapi.v1;
import public "foo.proto";
Protobuf

Then the Go backend has to generate a type Foo = foopb.Foo in bar.pb.go to emulate this behavior (in fact, I was surprised to learn Go Protobuf implements this at all). Go happens to implement public imports correctly, but not all backends are as careful, because this feature is obscure.

The spanner.proto example of an import public isn’t even used for breaking up an existing file; instead, it’s used to not make a huge file bigger and avoid making callers have to add an additional import. This is a bad use of a bad feature!

Using import public to effectively “hide” imports makes it harder to understand what a .proto file is pulling in. If Protobuf imports were at the package/symbol level, like Go or Java, this feature would not need to exist. Unfortunately, Protobuf is closely tailored for C++, and this is one of the consequences.

Instead of using import public to break up a file, simply plan to break up the file in the next version of the API.

The IMPORT_NO_PUBLIC Buf lint rule enforces that no one uses this feature by default. It’s tempting, but the footguns aren’t worth it.

Weak Imports

Public imports have a good, if flawed, reason to exist. Their implementation details are the main thing that kneecaps them.

Weak imports, however, simply should not exist. They were added to the language to make it easier for some of Google’s enormous binaries to avoid running out of linker memory, by making it so that message types could be dropped if they weren’t accessed. This means that weak imports are “optional”—if the corresponding descriptors are missing at runtime, the C++ runtime can handle it gracefully.

This leads to all kinds of implementation complexity and subtle behavior differences across runtimes. Most runtimes implement (or implemented, in the case of those that removed support) import weak in a buggy or inconsistent way. It’s unlikely the feature will ever be truly removed, even though Google has tried.

Don’t use import weak. It should be treated as completely non-functional. The IMPORT_NO_WEAK Buf lint rule takes care of this for you.

Protobuf Tip #4: Accepting Mistakes We Can't Fix

Bad humor is an evasion of reality; good humor is an acceptance of it. –Malcolm Muggeridge

TL;DR: Protobuf’s distributed nature introduces evolution risks that make it hard to fix some types of mistakes. Sometimes the best thing to do is to just let it be.

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

A Different Mindset

Often, you’ll design and implement a feature for the software you work on, and despite your best efforts to test it, something terrible happens in production. We have a playbook for this, though: fix the bug in your program and ship or deploy the new, fixed version to your users. It might mean working late for big emergencies, but turnaround for most organizations is a day to a week.

Most bugs aren’t emergencies, though. Sometimes a function has a confusing name, or an integer type is just a bit too small for real-world data, or an API conflates “zero” and “null”. You fix the API, refactor all usages in your API in one commit, merge, and the fix rolls out gradually.

Unless, of course, it’s a bug in a communication API, like a serialization format: your Protobuf types, or your JSON schema, or the not-too-pretty code that parses fields out of dict built from a YAML file. Here, you can’t just atomically fix the world. Fixing bugs in your APIs (from here on, “APIs” means “Protobuf definitions”) requires a different mindset than fixing bugs in ordinary code.

What Are the Risks?

Protobuf’s wire format is designed so that you can safely add new fields to a type, or values to an enum, without needing to perform an atomic upgrade. But other changes, like renaming fields or changing their type, are very dangerous.

This is because Protobuf types exist on a temporal axis: different versions of the same type exist simultaneously among programs in the field that are actively talking to each other. This means that writers from the future (that is, new serialization code) must be careful to not confuse the many readers from the past (old versions of the deserialization code). Conversely, future readers must tolerate anything past writers produce.

In a modern distributed deployment, the number of versions that exist at once can be quite large. This is true even in self-hosted clusters, but becomes much more fraught whenever user-upgradable software is involved. This can include mobile applications that talk to your servers, or appliance software managed by a third-party administrator, or even just browser-service communication.

The most important principle: you can’t easily control when old versions of a type or service are no longer relevant. As soon as a type escapes out of the scope of even a single team, upgrading types becomes a departmental effort.

Learning to Love the Bomb

There are many places where Protobuf could have made schema evolution easier, but didn’t. For example, changing int32 foo = 1; to sfixed32 foo = 1; is a breakage, even though at the wire format level, it is possible for a parser to distinguish and accept both forms of foo correctly. There too many other examples to list, but it’s important to understand that the language is not always working in our favor.

For example, if we notice a int32 value is too small, and should have been 64-bit, you can’t upgrade it without readers from the past potentially truncating it. But we really have to upgrade it! What are our options?

  1. Issue a new version of the message and all of its dependencies. This is the main reason why sticking a version number in the package name, as enforced by Buf’s PACKAGE_VERSION_SUFFIX lint rule, is so important.
  2. Do the upgrade anyway and hope nothing breaks. This can work for certain kinds of upgrades, if the underlying format is compatible, but it can have disastrous consequences if you don’t know what you’re doing, especially if it’s a type that’s not completely internal to a team’s project. Buf breaking change detection helps you avoid changes with potential for breakage.

Of course, there is a third option, which is to accept that some things aren’t worth fixing. When the cost of a fix is so high, fixes just aren’t worth it, especially when the language is working against us.

This means that even in Buf’s own APIs, we sometimes do things in a way that isn’t quite ideal, or is inconsistent with our own best practices. Sometimes, the ecosystem changes in a way that changes best practice, but we can’t upgrade to it without breaking our users. In the same way, you shouldn’t rush to use new, better language features if they would cause protocol breaks: sometimes, the right thing is to do nothing, because not breaking your users is more important.

Protobuf Tip #3: Enum Names Need Prefixes

Smart people learn from their mistakes. But the real sharp ones learn from the mistakes of others. –Brandon Mull

TL;DR: enums inherit some unfortunate behaviors from C++. Use the Buf lint rules ENUM_VALUE_PREFIX and ENUM_ZERO_VALUE_SUFFIX  to avoid this problem (they’re part of the DEFAULT category).

I’m editing a series of best practice pieces on Protobuf, a language that I work on which has lots of evil corner-cases.These are shorter than what I typically post here, but I think it fits with what you, dear reader, come to this blog for. These tips are also posted on the buf.build blog.

C++-Style Enums

Protobuf’s enums define data types that represent a small set of valid values. For example, google.rpc.Code lists status codes used by various RPC frameworks, such as GRPC. Under the hood, every enum is just an int32  on the wire, although codegen backends will generate custom types and constants for the enum to make it easier to use.

Unfortunately, enums were originally designed to match C++ enums exactly, and they inadvertently replicate many of those behaviors.

If you look at the source for google.rpc.Code, and compare it to, say, google.protobuf.FieldDescriptorProto.Type, you will notice a subtle difference:

package google.rpc;
enum Code {
  OK = 0;
  CANCELLED = 1;
  UNKNOWN = 2;
  // ...
}

package google.protobuf;
message FieldDescriptorProto {
  enum Type {
    // 0 is reserved for errors.
    TYPE_DOUBLE = 1;
    TYPE_FLOAT = 2;
    TYPE_INT64 = 3;
    // ...
  }
}
Protobuf

FieldDescriptorProto.Type has values starting with TYPE_, but Code ‘s values don’t have a CODE_ prefix.  This is because the fully-qualified names (FQN) of an enum value don’t include the name of the enum. That is, TYPE_DOUBLE actually refers to google.protobuf.FieldDescriptorProto.TYPE_DOUBLE. Thus, OK is not google.rpc.Code.OK, but google.rpc.OK.

This is because it matches the behavior of unscoped C++ enums. C++ is the “reference” implementation, so the language often bends for the sake of the C++ backend.

When generating code, protoc’s C++ backend emits the above as follows:

namespace google::rpc {
enum Code {
  OK = 0,
  CANCELLED = 1,
  UNSPECIFIED = 2,
  // ...
};
}

namespace google::protobuf {
class FieldDescriptorProto final {
 public:
  enum Type {
   TYPE_DOUBLE = 1;
   TYPE_FLOAT = 2;
   // ...
  };
};
}
C++

And in C++, enums don’t scope their enumerators: you write google::rpc::OK, NOT google::rpc::Code::OK.

If you know C++, you might be thinking, “why didn’t they use enum class?!”? Enums were added in proto2, which was developed around 2007-2008, but Google didn’t start using C++11, which introduced enum class , until much, much later.

Now, if you’re a Go or Java programmer, you’re probably wondering why you even care about C++. Both Go and Java do scope enum values to the enum type (although Go does it in a somewhat grody way: rpcpb.Code_OK).

Unfortunately, this affects name collision detection in Protobuf. You can’t write the following code:

package myapi.v1;

enum Stoplight {
  UNSPECIFIED = 0;
  RED = 1;
  YELLOW = 2;
  GREEN = 3;
}

enum Speed {
  UNSPECIFIED = 0;
  SLOW = 1;
  FAST = 2;
}
Protobuf

Because the enum name is not part of the FQN for an enum value, both UNSPECIFIEDs here have the FQN myapi.v1.UNSPECIFIED, so Protobuf complains about duplicate symbols.

Thus, the convention we see in FieldDescriptorProto.Type:

package myapi.v1;

enum Stoplight {
  STOPLIGHT_UNSPECIFIED = 0;
  STOPLIGHT_RED = 1;
  STOPLIGHT_YELLOW = 2;
  STOPLIGHT_GREEN = 3;
}

enum Speed {
  SPEED_UNSPECIFIED = 0;
  SPEED_SLOW = 1;
  SPEED_FAST = 2;
}
Protobuf

Buf provides a lint rule to enforce this convention: ENUM_VALUE_PREFIX. Even though you might think that an enum name will be unique, because top-level enums bleed their names into the containing package, the problem spreads across packages!

Zero Values

proto3 relies heavily on the concept of “zero values” – all non-message fields that are neither repeated nor optional are implicitly zero if they are not present. Thus, proto3 requires that enums specify a value equal to zero.

By convention, this value shouldn’t be a specific value of the enum, but rather a value representing that no value is specified. ENUM_ZERO_VALUE_SUFFIX enforces this, with a default of _UNSPECIFIED. Of course, there are situations where this might not make sense for you, and a suffix like _ZERO or _UNKNOWN might make more sense.

It may be tempting to have a specific “good default” value for the zero value. Beware though, because that choice is forever. Picking a generic “unknown” as the default reduces the chance you’ll burn yourself.

Why Don’t All of Google’s Protobuf Files Do This?

Name prefixes and zero values also teach us an important lesson: because Protobuf names are forever, it’s really hard to fix style mistakes, especially as we collectively get better at using Protobuf.

google.rpc.Code is intended to be source-compatible with very old existing C++ code, so it throws caution to the wind. FieldDescriptorProto.Type doesn’t have a zero value because in proto2 , which doesn’t have zero value footguns in its wire format, you don’t need to worry about that. The lesson isn’t just to use Buf’s linter to try to avoid some of the known pitfalls, but also to remember that even APIs designed by the authors of the language make unfixable mistakes, so unlike other programming languages, imitating “existing practice” isn’t always the best strategy.