Scope: generation
Executive Summary
Add first-class Avro schema generation to Microsmith, including an intuitive Kotlin DSL, deterministic .avsc emission, Avro-specific validation, and fixture/documentation coverage that makes Avro a supported schema-generation target alongside the existing schema formats.
Problem Statement
Microsmith already supports schema-driven generation workflows, but it does not currently support Avro. That blocks users working with Kafka, schema-registry, analytics, data-platform, and event-contract ecosystems from using Microsmith as their primary schema authoring surface. Without native Avro support, they have to maintain Avro schemas manually or introduce a separate transformation step, which weakens Microsmith's value as a multi-format schema-generation tool.
Objectives
- Add first-class Avro schema generation as a supported Microsmith target.
- Provide an ergonomic, authoring-first Kotlin DSL for Avro under the existing Microsmith scripting model.
- Keep the DSL intuitive and Kotlin-idiomatic rather than mirroring raw Avro JSON structure.
- Emit valid, deterministic
.avsc schema files suitable for downstream tooling and code review.
- Validate Avro-specific constraints before writing invalid output.
- Document the Avro authoring and generation contract end-to-end.
Non-Objectives
- Avro RPC/protocol generation (
.avpr) in the first release.
- Avro IDL (
.avdl) parsing or emission in the first release.
- Language-specific source-code generation from Avro schemas in this issue.
- Schema-registry publication, remote compatibility checks, or serializer integrations in this issue.
- Designing the DSL as a thin Kotlin wrapper around Avro JSON.
Functional Requirements
- Add a new Avro schema target under the Microsmith schema DSL.
- Support the Avro named types required for a practical first release:
- Support the field/container/reference constructs required for a practical first release:
- primitive types
- named-type references
- arrays
- maps
- unions
- nullable fields via union modeling
- Support practical Avro metadata and compatibility surfaces:
- namespace
- doc
- aliases
- default values
- field order where applicable
- logical types where supported by the chosen internal model
- Emit
.avsc files as the canonical output for the first implementation.
- Default generated Avro output should land in a repository-level
avro/ directory unless explicitly configured otherwise.
- File naming must be deterministic and based on Avro named-type identity.
- Explicit user-configured output paths must continue to override defaults.
- Generation must preserve stable ordering for fields, enum symbols, union branches where policy permits, and file emission order.
- The generator must reject or clearly diagnose unsupported or invalid Avro constructs instead of silently emitting broken schemas.
- The generator must support cross-type references and namespace-qualified resolution within a generation run.
- The generator must support multiple namespaces within a single Avro authoring surface.
- The generator must support multi-file output for repositories that model multiple Avro named types.
- The generator must define and document how shared named types are emitted and referenced across files.
DSL Requirements
- Add a dedicated Avro surface under
schemas { avro { ... } }.
- The DSL must be authoring-first and ergonomic, not a thin wrapper around Avro JSON.
- The DSL should follow the same ergonomic standard as the existing Microsmith DSLs, preferring intuitive helpers over low-level constructor-style APIs.
- The first release should optimize for readability and explicit intent over one-to-one fidelity with Avro JSON layout.
Namespace Authoring
- Namespace blocks should use an idiomatic Kotlin DSL form as the primary syntax, for example:
"com.example.common" { ... }
"com.example.events" { ... }
- A named
namespace("...") { ... } helper may exist as a secondary/helper API if implementation convenience requires it, but the primary documented DSL should prefer the string-invoke form.
Named Types And Aliases
- Named declarations should support variadic aliases directly in the declaration signature where practical, for example:
record("Address", "PostalAddress") { ... }
enum("CountryCode", "IsoCountryCode", "LegacyCountryCode") { ... }
fixed("Decimal128", "MoneyBytes") { ... }
- Block-level
aliases(...) helpers may still exist where they improve composability, especially for field aliases, but common named-type aliases should not require a separate nested call.
- Named types should be declared top-level within
avro { ... } or namespace blocks in the first release rather than allowing arbitrarily nested named-type declarations inside records.
Fields, Types, And Containers
- The DSL should support ergonomic, type-first field helpers where practical, for example:
string("name")
int("quantity") { default(1) }
string("line2") { nullable(); default(null) }
ref("country", "CountryCode")
array("tags", string)
map("attributes", string)
- For Avro arrays and maps, the preferred first-release form is the concise field-first shape:
array("tags", string)
map("attributes", string)
- The issue should not require extra nested container-type blocks for normal array/map declarations unless a later design review finds a strong need for that complexity.
- The DSL should prefer modifier-style nullability such as
nullable() inside a field block rather than exploding the API into nullableString(...), nullableInt(...), and similar variants.
nullable() must be documented as syntactic sugar for a deterministic Avro union shape, not as a separate schema concept.
- Where a dedicated field helper does not make sense, the DSL may still accept an explicit schema expression, but that should not be the primary ergonomic path.
References, Unions, And Defaults
- The DSL must support both namespace-local references and fully-qualified cross-namespace references.
- The DSL must support references to previously declared named types.
- The issue must explicitly define whether forward references within the same Avro block are supported; if they are not, they must fail with clear diagnostics rather than behaving ambiguously.
- The DSL must define a deterministic reference-resolution contract. The preferred first-release rule is:
- unqualified references resolve within the current namespace
- fully-qualified references resolve globally
- ambiguous or unresolved references fail validation
- The DSL must make nullable fields and unions explicit enough to avoid ambiguous or invalid Avro generation.
- The DSL must define and document the emitted branch ordering policy for unions, especially nullable unions.
- The DSL must define and document how defaults are validated against the final emitted Avro schema shape, especially for union-backed fields.
- The DSL must reject nested unions and duplicate effective union branches unless there is a strong, documented reason to support them.
- The DSL must allow defaults to be authored in a typed, readable way where practical, rather than forcing end-users to hand-construct JSON literals.
- The DSL should provide an ergonomic built-in
empty marker for empty array/map defaults rather than requiring Kotlin implementation details such as emptyList<Any>() or emptyMap<String, String>() in normal authoring.
Enum Ergonomics
- Enum bodies should support concise, idiomatic value declaration styles, with both of the following considered valid first-class APIs:
Primitive And Logical Types
- The DSL should prefer symbol-style built-in schema values for primitive and fixed logical shapes, for example:
string
boolean
int
long
bytes
uuid
date
timeMillis and/or timeMicros
timestampMillis and/or timestampMicros
- Parameterized shapes may still use builder-style helpers where a singleton symbol would be insufficient, for example
decimal(...).
- The DSL should still provide a generic logical-type escape hatch for uncommon or future logical types, but that escape hatch should not be the primary ergonomic path.
- The DSL should prefer symbol-style built-in values over
*Type() factory calls for fixed primitive/logical shapes.
- Reified generic sugar may be provided where it improves readability, for example
logical<Uuid>("eventId"), but those generic markers should resolve to Microsmith-defined Avro schema markers rather than host-language platform types.
doc() should remain supported because Avro doc is emitted schema metadata, not just author-side commentary in the .microsmith.kts file.
Documentation Examples
- The DSL contract must be documented with at least one non-trivial example.
- README and fixtures must include at least one non-trivial Avro example covering the core first-release contract, including multiple namespaces and cross-namespace references.
Example DSL
microsmith {
schemas {
avro {
"com.example.common" {
enum("CountryCode", "IsoCountryCode") {
doc("ISO 3166-1 alpha-2 country code")
+"GB"
+"US"
+"DE"
value("FR")
}
fixed("Decimal128", "MoneyBytes") {
doc("128-bit fixed-width decimal backing store")
size(16)
decimal(precision = 19, scale = 4)
}
record("Address", "PostalAddress") {
doc("Reusable postal address")
string("line1")
string("line2") {
nullable()
default(null)
order(FieldOrder.IGNORE)
}
string("city")
string("region") {
nullable()
default(null)
}
string("postalCode") {
aliases("postcode")
}
ref("country", "CountryCode")
}
record("Money") {
doc("Currency amount expressed as fixed-point decimal")
string("currency")
ref("amount", "Decimal128")
}
}
"com.example.identity" {
record("RegisteredCustomer") {
doc("Known customer with a persistent account")
logical("customerId", uuid)
string("email")
string("loyaltyTier") {
nullable()
default(null)
}
}
record("GuestCustomer", "AnonymousCustomer") {
doc("Checkout identity for a guest user")
string("email")
boolean("marketingOptIn") {
default(false)
}
}
}
"com.example.orders" {
enum("OrderStatus") {
doc("Lifecycle state for an order")
+"PENDING"
+"CONFIRMED"
+"CANCELLED"
+"FULFILLED"
}
record("LineItem", "PurchaseLine", "OrderLine") {
doc("A single purchasable item on an order")
string("sku")
int("quantity") {
default(1)
order(FieldOrder.ASCENDING)
}
ref("unitPrice", "com.example.common.Money")
array("tags", string) {
default(empty)
}
map("attributes", string) {
default(empty)
}
}
record("OrderPlaced", "OrderCreated", "OrderSubmitted") {
doc("Canonical order-created event for downstream consumers")
logical("eventId", uuid)
string("orderId") {
aliases("externalOrderId")
}
ref("status", "OrderStatus") {
default("PENDING")
}
union(
"customer",
ref("com.example.identity.RegisteredCustomer"),
ref("com.example.identity.GuestCustomer"),
)
array("lineItems", ref("LineItem")) {
default(empty)
}
ref("billingAddress", "com.example.common.Address")
ref("shippingAddress", "com.example.common.Address") {
nullable()
default(null)
}
ref("total", "com.example.common.Money")
logical("requestedShipDate", date) {
nullable()
default(null)
}
logical("placedAt", timestampMillis) {
order(FieldOrder.DESCENDING)
}
logical("warehouseCutoffLocal", logicalType("local-timestamp-micros", long)) {
nullable()
default(null)
}
map("metadata", string) {
default(empty)
order(FieldOrder.IGNORE)
}
}
}
}
}
}
Generator Semantics And Output Contract
- The implementation must choose and document the canonical Avro output shape for named types.
- The preferred first-release contract is one
.avsc file per top-level named type under avro/.
- If shared named types are emitted separately, generated references must remain valid and unambiguous by namespace and name.
- Output must be deterministic across repeated runs when the DSL input is unchanged.
- Output must be review-friendly, including stable JSON field ordering and formatting.
- Output directories and filenames must not depend on nondeterministic iteration order.
Validation Requirements
- Validate Avro name and namespace rules.
- Validate enum symbol uniqueness.
- Validate fixed size constraints.
- Validate union legality according to supported policy.
- Validate default values against declared field schema.
- Validate duplicate type names within the same effective namespace.
- Validate reference resolution for named types.
- Validate logical-type compatibility with the underlying primitive type.
- Validate namespace-local and fully-qualified cross-namespace references.
- Emit actionable diagnostics that point back to the user-authored script contract.
Non-Functional Requirements
- Emitted schemas must be deterministic and reproducible.
- Generation must be fast enough for normal repository authoring loops.
- The implementation must be maintainable and align with current Microsmith schema-module patterns.
- Formatting of generated Avro JSON must be stable and human-reviewable.
Security Considerations
- Output path handling must not permit writes outside the intended repository root.
- Validation diagnostics must avoid leaking unrelated filesystem or environment state.
- The generator must fail safely on invalid inputs rather than emitting misleading artifacts.
Operational Readiness
- Add README coverage for Avro support, file types, output contract, and DSL examples.
- Add representative fixtures for Avro generation.
- Add troubleshooting guidance for the most common Avro authoring mistakes.
- Ensure users understand that first-release scope is
.avsc generation, not .avdl or .avpr.
Backward Compatibility And Migration
- This is a new capability and should not change existing generation behavior for other schema targets.
- Any new default output directory for Avro must remain isolated to the Avro target.
- Future Avro protocol or IDL support must be additive and not break
.avsc generation contracts.
Observability And Metrics
- Fixture pass rate for Avro generation scenarios.
- Validation failure coverage for common invalid-schema cases.
- Determinism checks across repeated generation runs.
- Documentation issue rate for Avro onboarding after release.
Risks And Mitigations
- Avro union/default rules are easy to get wrong: mitigate with strict validation and explicit DSL modeling.
- Namespace/reference semantics may create subtle bugs: mitigate with dedicated cross-reference fixtures.
- Output-shape churn could frustrate adopters: define the file contract early and document it clearly.
- Scope creep into Avro protocol/IDL/codegen could delay delivery: keep first release limited to
.avsc generation.
- DSL ergonomics can drift toward JSON-shaped boilerplate: mitigate by explicitly preferring intuitive helper-based authoring over low-level field constructor APIs.
Acceptance Criteria
- Microsmith supports
schemas { avro { ... } } as a documented schema-generation target.
- Users can declare records, enums, fixed types, arrays, maps, unions, nullability, defaults, docs, aliases, namespaces, cross-namespace references, and logical types through an ergonomic DSL.
- The Avro DSL supports intuitive helper-style field declarations where practical rather than forcing raw Avro-JSON-shaped authoring.
- The Avro DSL supports string-invoke namespace blocks as the primary documented namespace syntax.
- The Avro DSL supports variadic aliases on named declarations.
- The Avro DSL supports modifier-style nullability such as
nullable() inside field blocks.
- Enum values can be declared ergonomically via
value("...") and +"...".
- The Avro DSL prefers symbol-style built-in types over
*Type() helpers for fixed primitive/logical shapes.
- The Avro DSL uses
ref(...) as the primary named-type reference form.
- The Avro DSL uses
array("name", type) and map("name", type) as the preferred first-release collection-field syntax.
- The Avro DSL supports an
empty marker for collection defaults instead of requiring raw Kotlin collection literals in normal authoring.
- The Avro DSL exposes typed helpers for common logical types and a generic fallback for uncommon logical types.
- The Avro DSL has a documented contract for reference resolution, union ordering, nullability, and default validation.
- Microsmith emits valid, deterministic
.avsc output for representative multi-type and multi-namespace fixtures.
- Default Avro output lands in
./avro unless explicitly configured otherwise.
- Invalid Avro constructs are rejected with actionable diagnostics.
- README and fixtures include at least one non-trivial Avro example that demonstrates multiple namespaces, cross-namespace references, ergonomic field/type helpers, and the
empty default marker for collection fields where relevant.
- Automated tests cover successful generation, invalid-schema diagnostics, determinism, and reference resolution.
Test Strategy
- Unit tests for Avro model validation and JSON emission.
- Unit tests for typed logical-type helpers and the generic logical-type fallback.
- Unit tests for DSL helper ergonomics and semantic normalization, including nullability, union handling, and empty-default normalization for arrays/maps.
- Integration tests for representative Avro fixtures with cross-type references.
- Integration tests for representative Avro fixtures with multiple namespaces and fully-qualified references.
- Golden-file tests for deterministic
.avsc output.
- Negative-path tests for invalid defaults, invalid unions, duplicate names, bad namespaces, and unresolved references.
- Parser-validation tests against a real Avro implementation to confirm emitted schemas are consumable.
- Documentation-snippet verification for the published Avro DSL example.
Dependencies
Definition Of Done
- Avro is implemented as a documented, validated, fixture-covered Microsmith generation target with a first-class ergonomic DSL and deterministic
.avsc output contract.
Scope: generation
Executive Summary
Add first-class Avro schema generation to Microsmith, including an intuitive Kotlin DSL, deterministic
.avscemission, Avro-specific validation, and fixture/documentation coverage that makes Avro a supported schema-generation target alongside the existing schema formats.Problem Statement
Microsmith already supports schema-driven generation workflows, but it does not currently support Avro. That blocks users working with Kafka, schema-registry, analytics, data-platform, and event-contract ecosystems from using Microsmith as their primary schema authoring surface. Without native Avro support, they have to maintain Avro schemas manually or introduce a separate transformation step, which weakens Microsmith's value as a multi-format schema-generation tool.
Objectives
.avscschema files suitable for downstream tooling and code review.Non-Objectives
.avpr) in the first release..avdl) parsing or emission in the first release.Functional Requirements
.avscfiles as the canonical output for the first implementation.avro/directory unless explicitly configured otherwise.DSL Requirements
schemas { avro { ... } }.Namespace Authoring
"com.example.common" { ... }"com.example.events" { ... }namespace("...") { ... }helper may exist as a secondary/helper API if implementation convenience requires it, but the primary documented DSL should prefer the string-invoke form.Named Types And Aliases
record("Address", "PostalAddress") { ... }enum("CountryCode", "IsoCountryCode", "LegacyCountryCode") { ... }fixed("Decimal128", "MoneyBytes") { ... }aliases(...)helpers may still exist where they improve composability, especially for field aliases, but common named-type aliases should not require a separate nested call.avro { ... }or namespace blocks in the first release rather than allowing arbitrarily nested named-type declarations inside records.Fields, Types, And Containers
string("name")int("quantity") { default(1) }string("line2") { nullable(); default(null) }ref("country", "CountryCode")array("tags", string)map("attributes", string)array("tags", string)map("attributes", string)nullable()inside a field block rather than exploding the API intonullableString(...),nullableInt(...), and similar variants.nullable()must be documented as syntactic sugar for a deterministic Avro union shape, not as a separate schema concept.References, Unions, And Defaults
emptymarker for empty array/map defaults rather than requiring Kotlin implementation details such asemptyList<Any>()oremptyMap<String, String>()in normal authoring.Enum Ergonomics
value("GB")+"GB"Primitive And Logical Types
stringbooleanintlongbytesuuiddatetimeMillisand/ortimeMicrostimestampMillisand/ortimestampMicrosdecimal(...).*Type()factory calls for fixed primitive/logical shapes.logical<Uuid>("eventId"), but those generic markers should resolve to Microsmith-defined Avro schema markers rather than host-language platform types.doc()should remain supported because Avrodocis emitted schema metadata, not just author-side commentary in the.microsmith.ktsfile.Documentation Examples
Example DSL
microsmith { schemas { avro { "com.example.common" { enum("CountryCode", "IsoCountryCode") { doc("ISO 3166-1 alpha-2 country code") +"GB" +"US" +"DE" value("FR") } fixed("Decimal128", "MoneyBytes") { doc("128-bit fixed-width decimal backing store") size(16) decimal(precision = 19, scale = 4) } record("Address", "PostalAddress") { doc("Reusable postal address") string("line1") string("line2") { nullable() default(null) order(FieldOrder.IGNORE) } string("city") string("region") { nullable() default(null) } string("postalCode") { aliases("postcode") } ref("country", "CountryCode") } record("Money") { doc("Currency amount expressed as fixed-point decimal") string("currency") ref("amount", "Decimal128") } } "com.example.identity" { record("RegisteredCustomer") { doc("Known customer with a persistent account") logical("customerId", uuid) string("email") string("loyaltyTier") { nullable() default(null) } } record("GuestCustomer", "AnonymousCustomer") { doc("Checkout identity for a guest user") string("email") boolean("marketingOptIn") { default(false) } } } "com.example.orders" { enum("OrderStatus") { doc("Lifecycle state for an order") +"PENDING" +"CONFIRMED" +"CANCELLED" +"FULFILLED" } record("LineItem", "PurchaseLine", "OrderLine") { doc("A single purchasable item on an order") string("sku") int("quantity") { default(1) order(FieldOrder.ASCENDING) } ref("unitPrice", "com.example.common.Money") array("tags", string) { default(empty) } map("attributes", string) { default(empty) } } record("OrderPlaced", "OrderCreated", "OrderSubmitted") { doc("Canonical order-created event for downstream consumers") logical("eventId", uuid) string("orderId") { aliases("externalOrderId") } ref("status", "OrderStatus") { default("PENDING") } union( "customer", ref("com.example.identity.RegisteredCustomer"), ref("com.example.identity.GuestCustomer"), ) array("lineItems", ref("LineItem")) { default(empty) } ref("billingAddress", "com.example.common.Address") ref("shippingAddress", "com.example.common.Address") { nullable() default(null) } ref("total", "com.example.common.Money") logical("requestedShipDate", date) { nullable() default(null) } logical("placedAt", timestampMillis) { order(FieldOrder.DESCENDING) } logical("warehouseCutoffLocal", logicalType("local-timestamp-micros", long)) { nullable() default(null) } map("metadata", string) { default(empty) order(FieldOrder.IGNORE) } } } } } }Generator Semantics And Output Contract
.avscfile per top-level named type underavro/.Validation Requirements
Non-Functional Requirements
Security Considerations
Operational Readiness
.avscgeneration, not.avdlor.avpr.Backward Compatibility And Migration
.avscgeneration contracts.Observability And Metrics
Risks And Mitigations
.avscgeneration.Acceptance Criteria
schemas { avro { ... } }as a documented schema-generation target.nullable()inside field blocks.value("...")and+"...".*Type()helpers for fixed primitive/logical shapes.ref(...)as the primary named-type reference form.array("name", type)andmap("name", type)as the preferred first-release collection-field syntax.emptymarker for collection defaults instead of requiring raw Kotlin collection literals in normal authoring..avscoutput for representative multi-type and multi-namespace fixtures../avrounless explicitly configured otherwise.emptydefault marker for collection fields where relevant.Test Strategy
.avscoutput.Dependencies
Definition Of Done
.avscoutput contract.