This document describes how the PBJ compiler Gradle plugin transforms .proto schema files into Java source code.
The PBJ compiler is a Gradle plugin (com.hedera.pbj.pbj-compiler) that parses Protocol Buffer 3 schema files using an ANTLR4 grammar and generates Java source code. For each protobuf message, it produces up to five Java files: a model record, a schema class, a protobuf codec, a JSON codec, and a unit test. For enums and services, it generates a single file each.
The pipeline has three phases:
- Global analysis — scan all proto files (sources + classpath) to build lookup tables for packages, types, and imports
- Parse — lex and parse each source proto file into an ANTLR parse tree
- Generate — walk each top-level definition and emit Java source code via generators
The plugin (PbjCompilerPlugin implements Plugin<Project>) performs these setup steps during the configuration phase:
- Registers a
PbjExtensionexposing thepbj { }DSL block with two options:javaPackageSuffix— optional suffix appended to derived package names (e.g.,".pbj")generateTestClasses— boolean (defaulttrue) controlling test generation
- Registers a
PbjProtobufExtractTransformartifact transform that extracts.protofiles from JAR dependencies on the compile classpath - For each source set, creates a virtual
PbjSourceDirectorySetpointing tosrc/<sourceSet>/protoand registers aPbjCompilerTask
The main source set generates model/codec/schema into build/generated/source/pbj-proto/main/java and tests into build/generated/source/pbj-proto/test/java. The generated source directories are wired as inputs to the Java compile task, so generation happens automatically before compilation.
The task (extends SourceTask) defines:
@InputFiles— proto source files + extracted classpath protos@OutputDirectory— main and test output directories@TaskAction perform()— clears output dirs, then delegates toPbjCompiler.compileFilesIn()
The grammar file Protobuf3.g4 (package com.hedera.hashgraph.protoparser.grammar) defines the full proto3 syntax. Notable additions beyond the standard spec:
DOC_COMMENTtokens preserve/** ... */documentation comments through to generated JavadocOPTION_LINE_COMMENTtokens capture PBJ-specific option comments:// <<<pbj.java_package = "...">>>
Phase 1 — LookupHelper construction:
Before any code generation, PbjCompiler builds a LookupHelper by parsing every proto file (both source files and classpath dependencies). This pre-scan builds several lookup maps:
| Map | Key | Value |
|---|---|---|
pbjPackageMap |
Fully qualified proto name | Java package for PBJ model classes |
pbjCompleteClassMap |
Fully qualified proto name | Complete Java class name (including outer class for nested types) |
protocPackageMap |
Fully qualified proto name | Java package for protoc-generated classes |
enumNames |
— | Set of all fully qualified enum names |
comparableFieldsByMsg |
Message name | List of comparable field names |
The LookupHelper resolves Java packages using a priority chain (PBJ comment option → per-definition options → standard java_package + suffix → proto package + suffix). For the full resolution rules, see protobuf-and-schemas.md.
Phase 2 — Per-file generation:
For each source proto file, a ContextualLookupHelper wraps the global LookupHelper with the current file context. The file is lexed and parsed:
FileInputStream → Protobuf3Lexer → CommonTokenStream → Protobuf3Parser → ProtoContext
Then each topLevelDef is dispatched:
messageDef→ createFileSetWriter(5JavaFileWriterinstances), run allGeneratorimplementations, write filesenumDef→EnumGenerator.generateEnum()with a singleJavaFileWriterserviceDef→ServiceGenerator.generateService()with a singleJavaFileWriter
Rather than building a full AST, the compiler uses lightweight field records extracted directly from ANTLR parse tree contexts. The Field interface defines the contract; three record implementations cover all protobuf field kinds:
Represents a regular field or a sub-field within a oneof. Constructed directly from Protobuf3Parser.FieldContext. Stores:
type— aFieldTypeenum value (see below)fieldNumber,name,repeated,deprecatedmessageType/completeClassName— for message and enum referencesparent— theOneOfFieldthis belongs to, if any- Package references for model, codec, and test imports
Key methods: parseCode() (Java code to parse this field from protobuf input), javaFieldType(), schemaFieldsDef(), parserFieldsSetMethodCase().
Represents a protobuf oneof block. Contains a list of child Field objects (the variants). Generates an inner enum type (e.g., DataOneOfType with values like ACCOUNT_ID, UNSET) for runtime type discrimination.
Represents a map<K, V> field. Internally decomposed into synthetic keyField and valueField SingleField instances. On the wire, maps are repeated length-delimited entries sorted by key for deterministic encoding.
Maps every protobuf type to its Java representation and wire format:
| FieldType | Java type | Boxed type | Wire type |
|---|---|---|---|
| INT32, UINT32, SINT32 | int |
Integer |
VARINT (0) |
| INT64, UINT64, SINT64 | long |
Long |
VARINT (0) |
| FLOAT, FIXED32, SFIXED32 | float/int |
Float/Integer |
FIXED32 (5) |
| DOUBLE, FIXED64, SFIXED64 | double/long |
Double/Long |
FIXED64 (1) |
| BOOL | boolean |
Boolean |
VARINT (0) |
| STRING | String |
String |
LENGTH_DELIMITED (2) |
| BYTES | Bytes |
Bytes |
LENGTH_DELIMITED (2) |
| MESSAGE | Object |
Object |
LENGTH_DELIMITED (2) |
| ENUM | int |
Integer |
VARINT (0) |
| MAP | Map |
Map |
LENGTH_DELIMITED (2) |
| ONE_OF | OneOf |
OneOf |
— |
For repeated fields, FieldType.javaType(true) returns the boxed List<> variant (e.g., List<Integer>).
All message generators implement the Generator interface and are registered in Generator.GENERATORS — a map from generator class to the JavaFileWriter accessor on FileSetWriter:
Map.of(
ModelGenerator.class, FileSetWriter::modelWriter,
SchemaGenerator.class, FileSetWriter::schemaWriter,
CodecGenerator.class, FileSetWriter::codecWriter,
JsonCodecGenerator.class, FileSetWriter::jsonCodecWriter,
TestGenerator.class, FileSetWriter::testWriter
);Each generator is instantiated via reflection and called with the MessageDefContext, a JavaFileWriter, and the ContextualLookupHelper. Generators build Java code as strings and append to the writer.
Output: <MessageName>.java in the base package
Generates a Java record for each protobuf message containing:
- Record fields for each proto field, plus two precomputed fields:
$hashCodeand$protobufEncodedSize - Multiple constructor overloads (with/without
unknownFields, with enum types or rawObjectstorage) - Getter methods —
foo()returns the value (null if absent),fooOrElse(default)returns a default for absent fields hashCode()/equals()— fields with default values are excluded so adding new default-valued fields doesn't break existing hash mapstoString(),compareTo()(when fields are markedpbj.comparable)- Builder inner class with fluent API (
newBuilder(),toBuilder()) - OneOf inner enums and typed accessor methods
- Static
PROTOBUFandJSONcodec constants
Output: <MessageName>Schema.java in the .schema sub-package
Generates static FieldDefinition constants for each field (field number, type, repeated/optional flags) and a getField(int fieldNumber) method for O(1) lookup.
Output: <MessageName>ProtoCodec.java in the .codec sub-package
Implements the Codec<T> interface for protobuf binary serialization. The generator delegates to specialized sub-generators:
| Sub-generator | Method generated | Purpose |
|---|---|---|
CodecParseMethodGenerator |
parse(ReadableSequentialData, ...) |
Deserialize from protobuf binary |
CodecWriteMethodGenerator |
write(T, WritableSequentialData) |
Serialize to protobuf binary |
CodecWriteByteArrayMethodGenerator |
write(T) → byte[] |
Serialize to byte array |
CodecMeasureDataMethodGenerator |
measure(T) |
Compute serialized size |
CodecMeasureRecordMethodGenerator |
measureRecord(T) |
Record-based size measurement |
CodecFastEqualsMethodGenerator |
fastEquals(T, T) |
Optimized equality check |
CodecDefaultInstanceMethodGenerator |
getDefaultInstance() |
Singleton default instance |
LazyGetProtobufSizeMethodGenerator |
getProtobufSize() |
Lazy size computation for model |
The parse method uses a switch over protobuf tags ((fieldNumber << 3) | wireType) to dispatch to field-specific parsing logic. Maps are sorted by key on write for deterministic encoding.
Output: <MessageName>JsonCodec.java in the .codec sub-package
Implements Codec<T> for JSON serialization/deserialization. Structured similarly to CodecGenerator with:
JsonCodecParseMethodGenerator— JSON deserializationJsonCodecWriteMethodGenerator— JSON serialization
Output: <MessageName>Test.java in the .tests sub-package (test source set)
Generates JUnit 5 parameterized tests covering:
- Round-trip serialization (model → bytes → model) for both protobuf and JSON codecs
- Equality and hash code verification
- Unknown fields handling
- Compatibility with Google protoc-generated classes
Output: <EnumName>.java in the base package
Generates a Java enum with:
- A constant for each proto enum value
fromProtobufOrdinal(int)— maps wire value to enum constanttoProtobufOrdinal()— maps enum constant to wire value@Deprecatedannotations where specified in the proto schema
Output: <ServiceName>ServiceInterface.java in the base package
Generates a Java interface extending ServiceInterface with:
SERVICE_NAMEandFULL_NAMEconstants- A
Methodinner enum listing all RPC methods - Default method implementations for each RPC (throwing
UnsupportedOperationException) - An
open()routing method that dispatches by method enum to the correct handler - Support for all four gRPC call types: unary, client-streaming, server-streaming, and bidirectional
- An inner
Clientclass implementing the interface viaGrpcClient
A single .java file accumulator. Generators call addImport() to register imports and append() to build the class body as a string. When writeFile() is called, it assembles the final file:
// SPDX-License-Identifier: Apache-2.0
package <package>;
import <sorted imports>;
<accumulated class body>
A record holding five JavaFileWriter instances (model, schema, codec, jsonCodec, test) for a single message. Created by FileSetWriter.create() which resolves output paths and packages for each file type. After all generators run, writeAllFiles() writes them all to disk.
For a message with base package com.example.proto:
com/example/proto/
├── MessageName.java (model)
├── schema/
│ └── MessageNameSchema.java (schema)
├── codec/
│ ├── MessageNameProtoCodec.java (protobuf codec)
│ └── MessageNameJsonCodec.java (JSON codec)
└── tests/ (test source set)
└── MessageNameTest.java (unit tests)
File naming is controlled by constants in FileAndPackageNamesConfig:
| File type | Class suffix | Sub-package |
|---|---|---|
| Model | (none) | (base) |
| Schema | Schema |
schema |
| Protobuf Codec | ProtoCodec |
codec |
| JSON Codec | JsonCodec |
codec |
| Test | Test |
tests |
The generators use direct string construction (StringBuilder via JavaFileWriter.append()) rather than templates or AST manipulation. Java source code is built by concatenating string literals, formatted blocks (often using text blocks with .indent()), and field-specific code fragments produced by Field method calls like parseCode(), schemaFieldsDef(), and parserFieldsSetMethodCase().
This approach is simple and keeps all generation logic visible in the generator classes, but means the generators must manually manage indentation, imports, and syntax correctness.
Nested messages (messages defined inside other messages) are detected via Generator.isInner(), which walks up the ANTLR parse tree looking for a parent MessageDefContext. Inner messages are generated as static inner classes within the outer message's model file. The JavaFileWriter abstraction allows inner type generators to append their output to the same writer as the outer type.
The PbjProtobufExtractTransform Gradle artifact transform extracts .proto files from JAR dependencies. This allows proto files in one module to import proto definitions from another module's published JAR. The extracted protos are passed to PbjCompiler as classpath files — they are parsed for type resolution in the LookupHelper but no code is generated for them (code generation only runs for source files).