Mastering Protocol Buffers: Schema-first API Design for Multi-language Projects

Programming tutorial - IT technology blog
Programming tutorial - IT technology blog

Last year, our team was maintaining three microservices — one in Go, one in Python, and a TypeScript frontend — all talking to each other via REST JSON APIs. Every time we changed a field name, someone’s service would silently break at 2 AM. That’s exactly when I started taking Protocol Buffers seriously.

Protobuf is Google’s binary serialization format — but more importantly, it’s a schema-first contract between services. You define the data structure once in a .proto file, then generate type-safe client/server code for any language. No more “what did you name that field again?” moments.

Quick Start: Your First .proto File in 5 Minutes

Get the toolchain installed first. On most systems this takes under a minute.

Install protoc

# macOS
brew install protobuf

# Ubuntu/Debian
sudo apt install -y protobuf-compiler

# Verify
protoc --version  # libprotoc 25.x

Then install language-specific plugins. I’ll walk through Go and Python — the two I use most day-to-day.

# Go plugins
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latest

# Python
pip install grpcio-tools

Write Your First Schema

Create user.proto:

syntax = "proto3";

package user.v1;

option go_package = "github.com/yourorg/protos/user/v1;userv1";

message User {
  string id = 1;
  string email = 2;
  string display_name = 3;
  int64 created_at = 4;
}

message GetUserRequest {
  string id = 1;
}

message GetUserResponse {
  User user = 1;
}

service UserService {
  rpc GetUser(GetUserRequest) returns (GetUserResponse);
  rpc CreateUser(User) returns (User);
}

Generate Code for Both Languages

# Generate Go code
protoc \
  --go_out=./gen/go \
  --go_opt=paths=source_relative \
  --go-grpc_out=./gen/go \
  --go-grpc_opt=paths=source_relative \
  user.proto

# Generate Python code
python -m grpc_tools.protoc \
  -I. \
  --python_out=./gen/python \
  --grpc_python_out=./gen/python \
  user.proto

Done. You now have type-safe data classes and gRPC stubs in both languages from one source of truth. That single .proto file is your API contract.

Deep Dive: How Protobuf Actually Works

Field Numbers Are Sacred

This is the most important concept when working with Protobuf. Each field has a unique integer (1, 2, 3…) — that number is what gets serialized on the wire, not the field name. Once assigned, never change or reuse a number.

message Order {
  string order_id = 1;
  repeated string item_ids = 2;
  // BAD: Never reassign field number 2 after shipping
  // string customer_id = 2;  // conflicts with item_ids!
  string customer_id = 3;     // Always use a new number
}

Removing Fields Safely

Deleting a field without marking it as reserved is a silent trap. Old clients will still send field 2, and your new code might accidentally reuse that number for something else entirely.

message Order {
  string order_id = 1;
  reserved 2;           // field number reserved forever
  reserved "item_ids";  // field name reserved forever
  string customer_id = 3;
}

In my real-world experience, this is one of the essential skills to master — the discipline around field numbers and reserved is what separates teams that ship smooth upgrades from teams that debug production breakage at midnight.

Quick Type Reference

  • string — UTF-8 text
  • int32 / int64 — signed integers (use int64 for Unix timestamps)
  • bool — true/false
  • bytes — raw binary data
  • float / double — floating point
  • repeated FieldType — equivalent to a list/array
  • map<KeyType, ValueType> — key-value pairs
  • optional FieldType — distinguishes unset from zero value

Advanced Usage: Versioning Across a Real Project

Version in the Package Path from Day One

The single best structural decision I’ve made: embed the version directly in the package path.

protos/
├── user/
│   ├── v1/
│   │   └── user.proto      # package user.v1
│   └── v2/
│       └── user.proto      # package user.v2 — breaking changes land here
├── order/
│   └── v1/
│       └── order.proto
└── buf.yaml

When you need a breaking change (removing a field, changing a type), create v2 and keep v1 alive until all consumers have migrated. Much cleaner than REST-style /api/v2/users because the type system enforces the boundary.

Replace protoc with buf

Raw protoc commands get messy fast, especially in CI. buf modernizes the entire Protobuf workflow — linting, breaking change detection, and code generation from a single config file.

# Install buf
brew install bufbuild/buf/buf

# Initialize in your protos directory
buf mod init

Create buf.gen.yaml at your project root:

version: v1
plugins:
  - plugin: go
    out: gen/go
    opt: paths=source_relative
  - plugin: go-grpc
    out: gen/go
    opt: paths=source_relative
  - plugin: python
    out: gen/python
  - plugin: grpc-python
    out: gen/python

Now code generation is a single command:

buf generate

Catch Breaking Changes in CI

This is where buf earns its place in your pipeline. Add this check to your CI workflow:

# Detect breaking changes against the main branch
buf breaking --against '.git#branch=main'

# Or against a published Buf Schema Registry module
buf breaking --against buf.build/yourorg/protos

If a developer changes a field number or removes a field without using reserved, the build fails. Breaking change detection becomes automated rather than relying on manual review.

Optional Fields for Partial Updates

proto3 dropped the required keyword, but you sometimes need to tell the difference between “not provided” and “set to empty”. Use optional:

message UpdateUserRequest {
  string id = 1;
  optional string display_name = 2;  // nil = skip, "" = clear the field
  optional string email = 3;
}

Practical Tips From the Trenches

Commit Your Generated Code

Teams debate this endlessly. My take: commit it. Code reviews become clearer (reviewers see exactly what changed), non-proto developers don’t need the build toolchain, and deploys stay reproducible.

Add a CI check to verify the generated code stays in sync:

buf generate
git diff --exit-code gen/   # Fails if generated code is stale

Quick Sanity Check in Python

from gen.python import user_pb2

u = user_pb2.User()
u.id = "abc-123"
u.email = "[email protected]"
u.display_name = "Alice"

data = u.SerializeToString()
print(f"Serialized: {len(data)} bytes")  # Protobuf is compact

u2 = user_pb2.User()
u2.ParseFromString(data)
print(u2.email)  # [email protected]

When Not to Use Protobuf

Protobuf shines for internal service-to-service communication and high-throughput data pipelines. For public REST APIs consumed by external developers, JSON is still more approachable — the binary format makes debugging harder without extra tooling. The sweet spot is internal gRPC between your own services, where type-safety and performance gains matter most.

Debugging Binary Payloads

The binary wire format is not human-readable, which trips people up the first time. Keep JSON-format logging on the side for debugging, and use grpc_cli or the grpcurl tool to inspect live traffic:

# Install grpcurl
brew install grpcurl

# Call a gRPC endpoint like curl
grpcurl -plaintext \
  -d '{"id": "abc-123"}' \
  localhost:50051 \
  user.v1.UserService/GetUser

Start with one service, get comfortable with the schema-first workflow, then expand. The investment pays off the first time your team ships a breaking schema change that gets caught in CI before it ever touches production — and that will happen sooner than you expect.

Share: