From 3636bd0d0540a98d0454d847a28042010ce667d6 Mon Sep 17 00:00:00 2001 From: Egil Hansen Date: Sat, 23 May 2026 18:42:40 +0000 Subject: [PATCH 01/16] docs(oes): add Egil.Orleans.Messaging API design document MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Comprehensive API design document covering all 8 design sections: - §0: Scope, packaging (one NuGet, split-ready), name settled as Egil.Orleans.Messaging - §1: IStateManager atomic writes with in-flight recovery - §2: Outbox storage placement (co-located on grain state) - §3: Outbox sealed class shape, epoch semantics, fingerprint equality, configurable max depth - §3a: VersionedState for ImmutableArray equality trap - §4: MessageTracker sealed class, dual-dictionary dedup - §5: StreamManager fluent builder facade - §6: Opinionated functional-grain pattern (guidance, not enforced) - §7: OutboxProcessor grain-scoped timer+reminder dispatch with callback-based postmen, per-attempt error tracking - §8: Naming (flat namespace), serialization ([Alias] with fully qualified prefix, sequential [Id]), telemetry conventions All decisions settled via structured grilling sessions. Directory still named Egil.Orleans.CQRS — rename deferred to project scaffold. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- Egil.Orleans.CQRS/api-design.md | 1230 +++++++++++++++++++++++++++++++ 1 file changed, 1230 insertions(+) create mode 100644 Egil.Orleans.CQRS/api-design.md diff --git a/Egil.Orleans.CQRS/api-design.md b/Egil.Orleans.CQRS/api-design.md new file mode 100644 index 0000000..7781837 --- /dev/null +++ b/Egil.Orleans.CQRS/api-design.md @@ -0,0 +1,1230 @@ +# Egil.Orleans.Messaging — API Design + +Working design doc for the abstractions in the `Egil.Orleans.Messaging` +library. + +Each section captures the decided shape, the rationale, and any explicit +non-goals. Decisions are settled top-down via grilling sessions; revisit +only when a new constraint surfaces. + +Status legend: **Settled** = won't change without a new force; **Open** = +still under design; **Deferred** = explicitly postponed past the spike. + +--- + +## 0. Scope & packaging + +**Status:** Settled. + +### What this library is + +A set of composable building blocks for Orleans grains that need: + +1. **Atomic, recoverable state writes** — grain's observable `State` is + never out of sync with what is durably persisted, even on ambiguous + write failures. +2. **Outbox pattern** — durable, co-located message buffer that commits + atomically with business-state changes. +3. **Outbox processing (postman)** — timer + reminder driven dispatch + with retry, telemetry, and failure callbacks. +4. **Receiver-side dedup** — `MessageTracker` that tracks high-water + positions from both outbox senders and Orleans streams. +5. **Stream subscription management** — fluent subscribe/resume/error + facade over Orleans implicit subscriptions. + +### What this library is NOT + +- Not CQRS in the read/write-model-separation sense. No read-model + projections, no query stores, no event sourcing. +- Not a replacement for `IGrainStorage`. It wraps `IPersistentState`, + not the provider layer. + +### Packaging + +One NuGet package for now. Internal boundaries (state management vs +messaging vs streaming) are kept clean so a future split is mechanical. +Split only when dependency weight becomes a concrete problem. + +### Name + +**`Egil.Orleans.Messaging`** — the outbox, postman, dedup, and stream +manager are all messaging infrastructure. `IStateManager` exists to make +the messaging atomic. Messaging is the reason the library exists; safe +state is the enabler. + +--- + +## 1. `IStateManager` — atomic state writes with in-flight recovery + +**Status:** Settled. + +### Goal + +Replace direct grain use of `IPersistentState` with a thin wrapper +that guarantees the grain's observable `State` is never out of sync with +what is durably persisted, even when `WriteStateAsync` fails ambiguously +(timeout, network drop, server 5xx, ETag conflict). + +Grain code injects `IStateManager`; the raw +`IPersistentState` stays internal to the wrapper. This is +non-negotiable — exposing both is the loophole that lets grain authors +read stale `storage.State` after a failed write. + +### Interface + +```csharp +public enum WritePolicy { Concurrent, Force } + +public interface IStateManager where T : class, IEquatable +{ + T State { get; } + Task ReadAsync(); + Task WriteAsync(T newState, WritePolicy policy = WritePolicy.Concurrent); + Task ClearAsync(); +} +``` + +`WritePolicy.Force` is a rare last-resort escape hatch — nulls +`storage.Etag` before writing so the underlying provider skips its +optimistic-concurrency check. Version stamping and the read-back-on- +failure recovery path still run identically. See §3a for why this lives +at the `IStateManager` layer rather than as a settable `Version` +property on `VersionedState`. + +Constraints on `T`: + +- `class` — atomic reference swap of `State` from `[AlwaysInterleave]` + handlers is a single pointer load, no torn reads. +- `IEquatable` — recovery path compares server-side state to + attempted write to decide swallow-vs-rethrow. Records implement this + for free. + +Users pick one of two paths for `T`: + +- **Path A — plain record + structural equality** + (`MyState : IEquatable`). User owns `Equals`. Works trivially + for record-of-primitives shapes. Breaks silently if state holds + `ImmutableArray<>` (reference-equality trap) — user must override + `Equals` themselves in that case. +- **Path B — inherit `VersionedState`** + (`MyState : VersionedState`). Framework owns equality via a + per-write `Guid Version`. Immune to any collection-equality issues in + the state graph. Recommended default for any non-trivial state. + +See §3a for the `VersionedState` base class. + +### Activation: state is auto-populated + +`IStateManager` behaves like `IPersistentState` from the grain's +point of view: by the time `OnActivateAsync` runs, `State` already +reflects what is durably stored. Grain code does **not** call `ReadAsync` +during activation. + +This is achieved by composition rather than reimplementation: the +default `StateManager` wraps an `IPersistentState` facet that +Orleans hydrates during the `SetupState` grain-lifecycle stage. The +manager exposes `State` as a proxy onto that underlying storage, so it +sees the hydrated value the moment Orleans's lifecycle hook fires — +before `OnActivateAsync` is invoked. + +`ReadAsync` is retained for the rare case where the grain wants to +force a re-read of authoritative state mid-activation (e.g. after a +known external mutation). It is not required by the activation flow. + +### `WriteAsync` semantics + +A single default `StateManager` handles both shapes. It branches on +the non-generic `VersionedState` marker (see §3a) at runtime: if `T` +derives from it, the manager stamps a fresh `Guid Version` before every +write and uses that version for the recovery-path equality check; +otherwise it falls back to `T.Equals(...)`. + +```csharp +internal sealed class StateManager(IPersistentState storage) : IStateManager + where T : class, IEquatable +{ + public T State => storage.State; + + public Task ReadAsync() => storage.ReadStateAsync(); + + public async Task WriteAsync(T newState, WritePolicy policy = WritePolicy.Concurrent) + { + var previous = storage.State; + + if (newState is VersionedState versioned) + { + versioned.Version = Guid.CreateVersion7(); + } + + if (policy == WritePolicy.Force) + { + storage.Etag = null; + } + + storage.State = newState; + + try + { + await storage.WriteStateAsync(); + } + catch (Exception ex) + { + try { await storage.ReadStateAsync(); } + catch + { + storage.State = previous; + throw; + } + + if (ex is not InconsistentStateException && IsEquivalent(storage.State, newState)) + { + return; // write actually persisted; lost-response, swallow + } + + throw; + } + } + + private static bool IsEquivalent(T persisted, T attempted) => + persisted is VersionedState pv && attempted is VersionedState av + ? pv.Version.Equals(av.Version) + : persisted.Equals(attempted); +} +``` + +Behaviour matrix: + +| Failure | After `WriteAsync` returns/throws | +| ---------------------------------- | ------------------------------------------------ | +| Success | `State == newState`, returns | +| Timeout, write actually persisted | `State == newState`, returns (silent recovery) | +| Timeout, write did not persist | `State == server's value`, throws original ex | +| 5xx / transient | Same as timeout — re-read decides | +| `InconsistentStateException` | `State == server's value`, **always rethrows** | +| Re-read also fails (double failure)| `storage.State` reverted, throws original ex | + +### Double failure behaviour + +When both `WriteStateAsync` and the recovery `ReadStateAsync` fail, the +manager reverts `storage.State` to `previous` and rethrows. After this +the grain holds correct data but a **stale ETag**. The next write +attempt may hit `InconsistentStateException` if the first write actually +persisted. + +**Library does not auto-recover from double failure.** The grain must +call `ReadAsync()` before its next write to refresh the ETag if it +suspects this state. This is a documented contract — the library +surfaces the failure, the grain decides the policy (retry, deactivate, +alert). + +### Notes + +- **No internal `Deactivate` call.** Grain code decides deactivation + policy. +- **Always rethrow on `InconsistentStateException`**, even if equality + matches. A coincidental match would silently swallow a real concurrent + write; the contract "if we threw conflict, your command read stale + data" must hold. +- **`newState`'s reference is mutated for versioned state.** `Version` + is `set` (not `init`), so `versioned.Version = ...` updates the + caller's reference. Documented contract. +- **No `ReferenceEquals` pre-write short-circuit.** Functional + `with { ... }` produces a fresh reference even when no fields changed. + +### Provider-specific implementations + +`IStateManager` wraps an existing `IPersistentState`, not a +replacement for grain storage providers. No new `IGrainStorage` +implementation is introduced. + +Each storage provider we care to optimise for may ship its own +`IStateManager` that: + +- Inspects provider-specific exception types to classify failures as + *DefinitelyDidNotPersist* (skip re-read, revert + rethrow) vs + *UnknownOutcome* (re-read). +- Optionally exploits provider features (conditional writes, blob + versions) to avoid the re-read. + +### Why a wrapper, not extension methods + +The wrapper was debated — recovery logic is stateless, could be an +extension method on `IPersistentState`. But the wrapper does a fourth +thing extensions cannot: **it hides `IPersistentState.State`**. + +`IStateManager.State` exposes only the **committed** snapshot — the +value after the last successful `WriteAsync`. During an in-flight write, +`IPersistentState.State` already holds the uncommitted value. Read +methods marked `[AlwaysInterleave]` that access `storage.State` directly +could observe uncommitted state — and if the write fails, they returned +data that never persisted. + +The wrapper is a **concurrency safety boundary**, not just convenience. +Extension methods can't provide this fence because the grain still holds +`IPersistentState` and any method can access `.State` directly. + +### Grain wiring + +```csharp +// User injects IPersistentState as normal, wraps in OnActivateAsync. +[PersistentState("state")] IPersistentState storage; +IStateManager stateManager; + +public override Task OnActivateAsync(CancellationToken ct) +{ + stateManager = storage.AsStateManager(); + // ... +} +``` + +`AsStateManager()` is an extension method on `IPersistentState`. +Internally: + +```csharp +public static IStateManager AsStateManager(this IPersistentState storage) + where T : class, IEquatable +{ + // Future: check DI for IStateManagerFactory to resolve + // provider-specific implementations. + return new StateManager(storage); +} +``` + +No DI registration needed for v1. When provider-specific overrides ship +later, the extension checks DI for a registered `IStateManagerFactory`, +falls back to default `StateManager`. No breaking change. + +--- + +## 2. `Outbox` storage placement + +**Status:** Settled. + +The `Outbox` collection lives **inside the grain's main state +record**. Atomicity is the whole point — a state change and the messages +announcing it commit in one ETag-protected write or neither commits. +Splitting into a separate `IPersistentState` blob would re-introduce +the "told one party, not the other" failure mode. + +### Write amplification + +Co-locating the outbox means every command's `WriteStateAsync` +re-serialises both halves. Accepted. + +Provider-level mitigation is allowed: a storage provider may detect +unchanged sub-graphs and skip writing unchanged bytes. That stays a +provider concern. + +### Non-goal: outbox-only writes + +The grain does not expose "enqueue without committing other state." +Every outbox change rides the same `WriteAsync(newState)` as the +business-state change that produced it. + +--- + +## 3. `Outbox` — message collection inside grain state + +**Status:** Settled. + +### Goal + +`Outbox` is the per-grain durable buffer of messages that have been +*announced* (committed alongside a state change) but not yet *delivered* +(handed off to a postman successfully). It lives as a property on the +grain's state record for atomic writes. + +The collection behaves like `ImmutableArray>` +— read-only iteration, indexer, value semantics, mutators return new +instances — with three additions: + +- A baked-in `Sender` (grain id). +- A monotonic `LatestSequenceNumber` that persists independently of the + message array contents. +- An `Add(T payload)` mutator that owns sequence assignment. + +### Shape + +```csharp +[GenerateSerializer] +public sealed record OutboxSequenceToken( + [property: Id(0)] long SequenceNumber, + [property: Id(1)] GrainId Sender, + [property: Id(2)] DateTimeOffset Timestamp, + [property: Id(3)] DateTimeOffset Epoch); + +[GenerateSerializer] +public sealed record OutboxMessageEnvelope( + [property: Id(0)] OutboxSequenceToken Token, + [property: Id(1)] T Message); + +[GenerateSerializer] +public sealed class Outbox : IReadOnlyList>, IEquatable> +{ + [Id(0)] private readonly GrainId _sender; + [Id(1)] private readonly long _latestSequenceNumber; + [Id(2)] private readonly ImmutableArray> _items; + [Id(3)] private readonly DateTimeOffset? _epoch; + + // Non-persisted; no [Id]. Mutable, registered post-construction. + [NonSerialized] + [JsonIgnore] + private TimeProvider _time = TimeProvider.System; + + internal Outbox( + GrainId sender, + long latestSequenceNumber, + ImmutableArray> items, + DateTimeOffset? epoch) + { + _sender = sender; + _latestSequenceNumber = latestSequenceNumber; + _items = items; + _epoch = epoch; + } + + public static Outbox Empty(GrainId sender) => + new(sender, latestSequenceNumber: 0, items: [], epoch: null); + + public void RegisterTimeProvider(TimeProvider time) => _time = time; + + public GrainId Sender => _sender; + public long LatestSequenceNumber => _latestSequenceNumber; + public DateTimeOffset? Epoch => _epoch; + public int Count => _items.Length; + public bool IsEmpty => _items.IsDefaultOrEmpty; + public OutboxMessageEnvelope this[int index] => _items[index]; + + public IEnumerator> GetEnumerator() + => ((IEnumerable>)_items).GetEnumerator(); + IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); + + public Outbox Add(T message) { /* increments seq, stamps epoch on first call */ } + public Outbox Remove(OutboxSequenceToken token) { /* removes by seq number */ } + public Outbox RemoveRange(IEnumerable tokens) { /* batch remove */ } + public Outbox Clear() { /* removes all items, preserves LatestSequenceNumber + Epoch */ } + + // O(1) fingerprint equality — see below. + public bool Equals(Outbox? other) { /* fingerprint check */ } + public override bool Equals(object? obj) => obj is Outbox o && Equals(o); + public override int GetHashCode() + => HashCode.Combine(_sender, _latestSequenceNumber, _items.Length, _epoch); +} +``` + +### Epoch semantics + +- `Empty(sender)` → `epoch = null`, `LatestSequenceNumber = 0`. +- First `Add()` → stamps `epoch = now`. Persisted with state. +- Subsequent `Add()` → same epoch, incrementing sequence number. +- `Clear()` → removes items, **preserves** `LatestSequenceNumber` and + `Epoch`. This is the normal "postman drained successfully" path. +- `Empty(sender)` again → **resets both** epoch (to null) and sequence + number (to 0). This is the nuclear option — the next `Add` starts a + fresh epoch. Receivers see `token.Epoch > stored.Epoch` and accept. + +**`Clear()` is the normal path.** Grains should almost never call +`Empty(sender)` on an active outbox. `Empty` is for construction-time +initialisation and deliberate ops-level sequence-space resets. +Document and warn. + +### Outbox depth telemetry and configurable maximum + +The outbox can grow unbounded if postman targets are down. Mitigation: + +- **Telemetry:** gauge for outbox depth per grain type, emitted on every + write. Operators see growth before it becomes a crisis. +- **Configurable maximum:** users can set a max outbox size when + configuring the outbox processor. When exceeded, oldest messages are + automatically dropped (FIFO). This is opt-in; default is no cap. +- **Documentation:** storage providers have entity size limits (e.g. + Azure Table = 1MB). Document the risk of unbounded growth. + +### O(1) fingerprint equality + +Two `Outbox` values are equal when +`(Sender, LatestSequenceNumber, Epoch, Count, _items[0].Seq, _items[^1].Seq)` +matches. Constant-time regardless of `_items.Length`. + +Invariant holds by construction: `Outbox` is a sealed class with +private readonly backing fields. The only mutation paths are the four +public methods (`Add` / `Remove` / `RemoveRange` / `Clear`), each of +which changes at least one fingerprint field. + +### Why these choices + +- **Sealed class, not record.** No `with`, no synthesized copy + constructor. Mutation through four methods only — fingerprint + invariant holds by construction. Reference type because + `RegisterTimeProvider` is a void mutator. +- **`Add(T payload)` not `Add(envelope)`.** Outbox owns sequence + assignment. Callers cannot fabricate sequence numbers. +- **`RegisterTimeProvider` not per-call argument.** Matches + `MessageTracker`. Successor instances carry the provider forward. +- **Lazy `Epoch`.** A grain that never sends doesn't burn a fresh epoch + on storage. +- **`Remove(token)` not `Remove(envelope)`.** Token is the identity. +- **`LatestSequenceNumber` is a separate field.** After flush, items are + empty; high-water mark persists independently. + +### Retry diagnostics + +Envelope stays immutable. `SendAttempts` and `LastException` do NOT +appear on `OutboxMessageEnvelope`. The postman tracks attempts +in-memory keyed on `OutboxSequenceToken`. On re-activation, counts +restart from zero. Acceptable for burst retry policies. + +--- + +## 3a. `VersionedState` — structural equality across `ImmutableArray` + +**Status:** Settled. + +### Problem + +`ImmutableArray.Equals` compares the underlying `T[]` reference, not +contents. Record-auto-generated `Equals` on state records holding +`ImmutableArray<>` returns false for identical-content different-backing +arrays. This breaks `IStateManager.WriteAsync`'s recovery path. + +### Resolution + +```csharp +[GenerateSerializer] +public abstract record VersionedState +{ + [Id(0)] public Guid Version { get; internal set; } = Guid.CreateVersion7(); +} + +[GenerateSerializer] +public abstract record VersionedState : VersionedState + where TSelf : VersionedState +{ + public virtual bool Equals(VersionedState? other) => + other is not null && Version == other.Version; + + public override int GetHashCode() => Version.GetHashCode(); +} +``` + +User state: + +```csharp +[GenerateSerializer] +public sealed record MyState : VersionedState +{ + [Id(1)] public Outbox Outbox { get; init; } = Outbox.Empty(sender); + [Id(2)] public ImmutableArray Items { get; init; } = []; +} +``` + +### Why this shape + +- **`Guid Version` with `internal set`.** Library code can write + `Version`; user code cannot. Hard compile-time fence. +- **`set`, not `init`.** Library mutates caller's reference during + `WriteAsync` to stamp new version. +- **`Guid.CreateVersion7()`.** Sortable UUID v7 (.NET 9+). Cheap, + collision-free at this rate. +- **Initialiser on property.** Fresh records get non-empty version + before any write. +- **Two-layer base.** Non-generic `VersionedState` gives `StateManager` + a single type to test against. Generic layer carries typed equality. + +### Why `Version` is internal-set, not user-settable + +`IPersistentState.Etag` is storage concurrency. `Version` is +library-internal recovery decoration. Conflating them erases a layer. +Force-write escape hatch lives on `IStateManager` via +`WritePolicy.Force`. + +--- + +## 4. `MessageTracker` + +**Status:** Settled. + +Receiver-side, persisted dedup state. Tracks high-water position from +each upstream source. Two source kinds: + +- **Orleans streams** — keyed by `StreamId`; position is a + `StreamCursor` wrapping `(StreamId, StreamSequenceToken)`. +- **Outbox messages** — keyed by sender `GrainId`; position is an + `OutboxSequenceToken`. + +### Shape + +**Sealed class** (not record — consistent with `Outbox`, avoids +`_time` field participating in record-synthesized equality). + +```csharp +[GenerateSerializer] +public sealed class MessageTracker +{ + [Id(0)] private ImmutableDictionary _stream; + [Id(1)] private ImmutableDictionary _outbox; + + // Non-persisted; no [Id]. Mutable. + [NonSerialized] + [JsonIgnore] + private TimeProvider _time = TimeProvider.System; + + public void RegisterTimeProvider(TimeProvider time) => _time = time; + + public bool ProcessMessage(StreamCursor cursor, out MessageTracker next); + public bool ProcessMessage(OutboxSequenceToken token, out MessageTracker next); + + public StreamCursor? LatestStream(StreamId stream); + public OutboxSequenceToken? LatestOutbox(GrainId sender); + + public MessageTracker Evict(DateTimeOffset olderThan); + public MessageTracker EvictStreams(DateTimeOffset olderThan); + public MessageTracker EvictOutboxes(DateTimeOffset olderThan); + public MessageTracker Evict(StreamId stream, DateTimeOffset olderThan); + public MessageTracker Evict(GrainId sender, DateTimeOffset olderThan); + + [GenerateSerializer] + private readonly record struct StreamEntry( + [property: Id(0)] StreamCursor LastPosition, + [property: Id(1)] DateTimeOffset Received); + + [GenerateSerializer] + private readonly record struct OutboxEntry( + [property: Id(0)] DateTimeOffset Epoch, + [property: Id(1)] long LastSequenceNumber, + [property: Id(2)] DateTimeOffset Received); +} +``` + +### Identity model (no OriginId) + +- Outbox identity: `OutboxSequenceToken.Sender` in the envelope. + Payloads stay clean. +- Stream identity: `StreamId` from runtime. Sufficient under the + one-provider-per-namespace convention (code-review enforced, not + runtime enforced). This is Orleans's own constraint — the library + doesn't make it worse. + +### `ProcessMessage(StreamCursor)` semantics + +| Prior entry | Decision | Effect | +| -------------------------------- | -------- | ---------------------------------------------- | +| None | Accept | Insert `(LastPosition = cursor, Received = now)` | +| `cursor > stored.LastPosition` | Accept | Update position + received | +| `cursor <= stored.LastPosition` | Reject | No change | + +### `ProcessMessage(OutboxSequenceToken)` semantics + +| Prior entry | Comparison | Decision | Effect | +| ----------- | -------------------------------------------- | -------- | ------------------------ | +| None | — | Accept | Insert | +| Exists | `token.Epoch > stored.Epoch` | Accept | Replace (sender reset) | +| Exists | Same epoch, `token.Seq > stored.LastSeq` | Accept | Update seq + received | +| Exists | Same epoch, `token.Seq <= stored.LastSeq` | Reject | Duplicate | +| Exists | `token.Epoch < stored.Epoch` | Reject | Stale epoch | + +### `Evict` — uniform cleanup + +Five overloads, one rule: remove entries where `entry.Received < olderThan`. +No separate `Forget` API — `Evict(id, DateTimeOffset.MaxValue)` is the +documented idiom for unconditional clear. + +### `RegisterTimeProvider` — void by design + +Mutable field, non-persisted, excluded from equality. After +deserialization, grain must re-register. Falls back to +`TimeProvider.System` if skipped — correct for production, breaks +fake-clock tests. + +--- + +## 5. `StreamManager` — fluent subscribe/resume/error facade + +**Status:** Settled. + +Grain-level facade around Orleans implicit subscriptions. Four +responsibilities: + +1. **Subscribe** to stream namespaces during `OnActivateAsync`. +2. **Resume** from last accepted `StreamCursor` per namespace. +3. **Dispatch** with projected `StreamCursor`. +4. **Per-subscription error handling** via `.OnError(...)`. + +### Why a facade, not a base class + +Extension/composition over inheritance. Grain inherits from `Grain`, +uses `StreamManager` as a field. + +### Shape + +```csharp +public sealed class StreamManager +{ + public static StreamManagerBuilder ForGrain( + Grain owner, + MessageTracker trackerSnapshot, + IServiceProvider services); +} + +public sealed class StreamManagerBuilder +{ + public StreamSubscriptionBuilder Subscribe( + string streamNamespace, + bool resumeUsingLatestSequenceToken = true); + + public StreamManager Build(); +} + +public sealed class StreamSubscriptionBuilder +{ + public StreamSubscriptionBuilder OnNext( + Func handler); + + public StreamSubscriptionBuilder OnError( + Func handler); + + public StreamManagerBuilder Done(); +} +``` + +### Typical wiring + +```csharp +public override async Task OnActivateAsync(CancellationToken ct) +{ + var state = await stateManager.ReadAsync(); + state.Tracker.RegisterTimeProvider(timeProvider); + + streamManager = StreamManager.ForGrain(this, state.Tracker, services) + .Subscribe("electricity-prices") + .OnNext(HandlePriceTickAsync) + .OnError(LogStreamErrorAsync) + .Done() + .Subscribe("tariff-events") + .OnNext(HandleTariffChangedAsync) + .Done() + .Build(); +} +``` + +### Resume semantics + +- `Build()` reads `trackerSnapshot.LatestStream(streamId)` once per + subscription. +- If cursor exists → hydrate `StreamSequenceToken`, pass to + `SubscribeAsync`. +- If null → subscribe without token (provider default, typically: + start from current). +- Tracker read once at `Build()` time, never again. Handler is the + only path that mutates the tracker. + +### Per-subscription `OnError` + +Signature: `Func`. Default when +omitted: log + emit counter, do NOT rethrow. + +### One-provider-per-namespace + +Accepted as a code-review convention. Orleans's own constraint — the +library doesn't make it worse. + +### `StreamCursor` projection constraint + +`StreamCursor` carries an opaque `StreamSequenceToken`. Library ships an +STJ converter for a closed set of known subtypes discriminated on +`$kind`: + +| Subtype | Source | +| ------------------------------- | ------------------------------- | +| `EventSequenceToken` | Orleans SimpleMessageStream | +| `EventHubSequenceToken` | Orleans EH provider v1 | +| `EventHubSequenceTokenV2` | Orleans EH provider v2 | +| `EnrichedEventHubSequenceToken` | Library-shipped v2 subclass | + +Unknown subtype throws at serialization time — silently dropping the +cursor would corrupt dedup. + +### Custom EH adapter for enriched tokens + +Optional pattern. To opt into `EnrichedEventHubSequenceToken` (carries +`EnqueuedTime`, see §5 projection constraint table), the silo wires a +custom `IEventHubDataAdapter` via `UseDataAdapter`: + +```csharp +internal sealed class EnrichedEventHubAdapter(Serializer serializer) + : EventHubDataAdapter(serializer) +{ + public override StreamSequenceToken GetSequenceToken( + EventHubMessage eventHubMessage, int eventIndex) + { + var seqNum = eventHubMessage.SequenceNumber; + return new EnrichedEventHubSequenceToken( + eventHubMessage.Offset, + seqNum, + eventIndex, + eventHubMessage.EnqueuedTimeUtc); + } +} +``` + +`StreamManager` is unaware of the adapter — the enrichment surfaces +through `StreamCursor.TryGetEnqueuedTime(...)` in the user's `OnNext`. + +### OpenTelemetry trace correlation + +Orleans streams lose `Activity.Current` across the queue boundary. To +correlate consumer-side spans with producer-side spans without creating +multi-hour distributed traces, `StreamManager` should use +`ActivityLink`s, not parent chaining: + +- Producer side (in a custom `ToQueueMessage` override on the adapter): + stash `Activity.Current?.Id` into `EventData.Properties["traceparent"]` + before the event hits EH. +- Consumer side (in `StreamManager`'s OnNext wrapper): read the + property, parse into `ActivityContext`, start the OnNext span with + `ActivityKind.Consumer` and `links: [new ActivityLink(parsedContext)]`. + +This produces separate traces per delivery, each with a link back to +the producer span. OTel backends render the cross-trace arrow without +collapsing weeks of traffic into one trace. + +The adapter override is the user's responsibility (it lives in their +custom `IEventHubDataAdapter`); the consumer-side parse+link is a +library concern owned by `StreamManager`. + +### Telemetry + +Track at minimum (see `OutboxProcessor` for the meter pattern): + +- Counter: messages delivered per `(streamNamespace, accepted|rejected)`. +- Counter: subscriptions established / torn down / errored. +- Histogram: handler latency per `streamNamespace`. +- Histogram (when `EnrichedEventHubSequenceToken` is available): + end-to-end lag = `now - cursor.TryGetEnqueuedTime()`. Surfaces + consumer lag against the broker, parallel to outbox sender-to-receiver + timing. + +--- + +## 6. Opinionated grain pattern — functional commands, interleaved reads + +**Status:** Settled (guidance, not enforced by library API). + +### The pattern + +Grains using this library should follow a functional-command model: + +1. **State is immutable.** Grain state is a `record` (or sealed record) + composed of immutable types (`ImmutableArray`, `Outbox`, + `MessageTracker`, value objects). No mutable collections, no mutable + fields. + +2. **Commands run sequentially.** Methods that mutate state ("commands") + produce a new state value from `(currentState, commandPayload)` and + call `IStateManager.WriteAsync(newState)`. They run under + Orleans' default non-reentrant turn-based concurrency — one at a + time, no interleaving. + +3. **Reads interleave freely.** Methods that only read committed state + can be marked `[AlwaysInterleave]`. They see the last + `WriteAsync`-committed snapshot via `IStateManager.State` — the + committed-state fence (§1) ensures they never observe in-flight + uncommitted values. Multiple reads execute in parallel. + +4. **No external I/O in command handlers.** Commands should not call + HTTP, query databases, or invoke other grains. All input needed for + the decision must arrive in the command payload. External data is + fetched by the caller *before* invoking the grain. If a command needs + to trigger downstream work, enqueue it via `Outbox.Add(...)` and + let the `OutboxProcessor` dispatch it after the write. + +### Why this works + +- **Deterministic commands.** Same state + same payload → same result. + Easy to test, easy to reason about, no ambient dependencies. +- **Safe concurrency.** Reads never block writes. Writes are serialised + by the runtime. No custom locks. +- **Recovery-friendly.** `IStateManager.WriteAsync` recovery path + relies on `VersionedState.Equals` — works because state is a + value type (record with `Version` equality). +- **Outbox replaces side effects.** Instead of "write state + call + service" (two failure points), it's "write state with outbox item" + (one atomic write) + "processor retries delivery" (idempotent). + +### What the library does NOT enforce + +- No compile-time prevention of injecting `HttpClient` or calling + external services in command methods. This is a documentation and + code-review concern. +- No base class. The pattern emerges from the types: `VersionedState` + is a record (immutable), `IStateManager.WriteAsync` takes a new + value (not mutation), `Outbox.Add` returns a new instance. +- Future: Roslyn analyzers could warn on external I/O inside methods + that call `WriteAsync`. Not in scope for v1. + +--- + +## 7. `OutboxProcessor` — timer + reminder driven dispatch + +**Status:** Settled. + +Grain-scoped component that owns the timer, reminder, and postman +dispatch lifecycle for draining `Outbox`. Modelled after the +[spike](https://gist.github.com/egil/2f3318d1bd22045268e11a5d988ba938) +in the `Clever.PricingEngine` codebase. + +### Architecture + +- **Grain-scoped, not silo-scoped.** Each grain with an outbox gets its + own `OutboxProcessor`. No external scan, no registry, no second store. +- **GrainTimer** for in-process fast retry while activated. +- **Durable Reminder** for cross-activation recovery. Reactivates the + grain if it deactivates with pending items. Timer arms on activation. +- **Non-reentrant assumption.** Orleans default turn-based concurrency + serialises calls; the processor adds no internal locks. Not safe on + `[Reentrant]` grains — document and enforce. + +### Postman dispatch + +**Callback-based.** The grain registers one or more postmen via +`AddPostman(...)`, each handling a subtype of `TOutbox`. Matching +is first-registered-wins against the item's runtime type — order from +most specific to least specific (like a `switch`). + +- Per-item exceptions are caught and surfaced through `OnPostErrorAsync` + with attempt count (in-memory, resets on reactivation) — the grain + decides: leave item in state to retry, or remove to dead-letter after + N attempts. +- Each item dispatches to exactly **one** postman (first-registered-wins). + Items whose runtime type matches no postman → reported as failed with + `NoPostmanRegisteredException`. +- `PostAsync` only throws `TimeoutException` (per-run timeout), + `OperationCanceledException` (caller token), or callback exceptions. + +### Grain integration pattern + +```csharp +// 1. Marker interface — DIM handles ReceiveReminder. +public interface IOutboxGrain : IRemindable +{ + Task IRemindable.ReceiveReminder(string reminderName, TickStatus status) + { + var grainBase = (IGrainBase)this; + var component = grainBase.GrainContext.GetComponent(); + return component is null + ? Task.CompletedTask + : component.ReceiveReminderAsync(reminderName, status).AsTask(); + } +} + +// 2. C# 14 extension for InitializeOutboxProcessor. +extension(TGrain grain) where TGrain : IOutboxGrain, IGrainBase +{ + public OutboxProcessor InitializeOutboxProcessor( + OutboxProcessorOptions options) where TOutbox : notnull + { + var services = grain.GrainContext.ActivationServices; + var processor = new OutboxProcessor( + grain, options, + services.GetRequiredService() + .CreateLogger($"OutboxProcessor<{typeof(TOutbox).Name}>"), + services.GetService() ?? TimeProvider.System, + services.GetRequiredService()); + processor.AttachToGrain(); + return processor; + } +} +``` + +### `OutboxProcessorOptions` + +```csharp +public sealed class OutboxProcessorOptions where TOutbox : notnull +{ + /// Snapshot of pending items. Called once per post run. + public required Func> GetPending { get; init; } + + /// Items successfully posted. Grain must remove from backing collection. + public required Func, CancellationToken, ValueTask> + OnPostCompletedAsync { get; init; } + + /// Failed items with exception and attempt count (in-memory, resets on + /// reactivation). Grain decides: leave to retry, or remove to + /// dead-letter after N attempts. If null, failed items retry silently. + public Func, + CancellationToken, ValueTask>? OnPostErrorAsync { get; init; } + + /// Max time per post run. Set below grain's response timeout. + public TimeSpan ProcessingTimeout { get; init; } = TimeSpan.FromSeconds(20); + + /// Timer + reminder period. Orleans reminders fire at most once/minute. + public TimeSpan RetryDelay { get; init; } = TimeSpan.FromMinutes(2); +} +``` + +### `OutboxProcessor` + +```csharp +public sealed partial class OutboxProcessor : IOutboxComponent + where TOutbox : notnull +{ + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox; + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox; + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox; + + /// Posts pending items. Safe to call from grain's task scheduler. + /// Arms timer/reminder if items remain, unregisters if empty. + public ValueTask PostAsync(CancellationToken cancellationToken = default); + + /// Called by IOutboxGrain DIM. No-ops for unknown reminder names. + public ValueTask ReceiveReminderAsync(string reminderName, TickStatus status); +} + +internal interface IOutboxComponent +{ + ValueTask ReceiveReminderAsync(string reminderName, TickStatus status); +} +``` + +### Grain author experience + +Two obligations (both compiler-enforced): + +1. Implement `IOutboxGrain`. +2. Call `InitializeOutboxProcessor(...)` in `OnActivateAsync`. + +No `ReceiveReminder` override needed (DIM handles it). No manual +timer/reminder lifecycle. No telemetry wiring. + +Escape hatch for grains with their own reminders: + +```csharp +public async Task ReceiveReminder(string name, TickStatus status) +{ + if (name == MyOwnReminder) { await DoMyReminderWork(); return; } + await outbox.ReceiveReminderAsync(name, status); +} +``` + +### Why these choices + +- **Grain-scoped, not silo-scoped.** Grain already knows its own outbox + state. No external scan needed. Reminder ensures cross-activation + recovery. Timer handles fast retry. Pattern Orleans itself uses. +- **Callback-based, not DI service.** Grain controls dispatch logic, + can pass its own state to the postman. DI service adds indirection + without clear benefit. +- **First-registered-wins postman matching.** Simple dispatch model. + Order most-specific first. Unmatched items → `NoPostmanRegisteredException` + via `OnPostErrorAsync`. +- **`PostAsync` swallows per-item errors.** Grain observes failures via + `OnPostErrorAsync` with attempt count. Processor never drops items + silently unless the grain explicitly removes them. +- **One postman per item.** First-registered-wins, not broadcast. + Simpler error semantics, no partial-success ambiguity. +- **DIM on `IOutboxGrain`.** Zero ceremony. Grain author never writes + `ReceiveReminder` unless they have their own reminders. +- **C# 14 extension members.** Generic constraint on `TGrain : + IOutboxGrain, IGrainBase` means the extension won't compile on types + that don't implement the marker. Type-safe opt-in. + +--- + +## 8. Naming, serialization & telemetry conventions + +**Status:** Settled. + +### Namespace + +Flat: all public types in `Egil.Orleans.Messaging`. One `using` +statement. ~15-20 public types — not crowded enough to split. + +### Orleans `[Alias]` on serializable types + +All public serializable types get `[Alias]` for version-tolerant +serialization. Aliases are **globally scoped** — must be unique across +the entire application. + +**Pattern:** `egil.orleans.messaging.TypeName`. Generic types include +backtick + arity. + +Examples: + +``` +[Alias("egil.orleans.messaging.Outbox`1")] +[Alias("egil.orleans.messaging.OutboxMessageEnvelope`1")] +[Alias("egil.orleans.messaging.MessageTracker")] +[Alias("egil.orleans.messaging.OutboxSequenceToken")] +[Alias("egil.orleans.messaging.StreamCursor")] +[Alias("egil.orleans.messaging.VersionedState")] +``` + +### `[Id]` numbering + +Sequential per type, scoped per inheritance level (matches Orleans +convention). New fields get the next number. Never reuse removed IDs. + +```csharp +[GenerateSerializer] +public abstract record VersionedState +{ + [Id(0)] public Guid Version { get; internal set; } +} + +[GenerateSerializer] +public abstract record VersionedState : VersionedState +{ + // Inherits [Id(0)] from parent. Child IDs start fresh at [Id(0)]. +} +``` + +### System.Text.Json serialization + +All serializable types carry `[JsonConverter]` attributes referencing +library-shipped converters. STJ discovers them automatically — users +need no registration, no `JsonSerializerOptions` configuration. + +This ensures correct round-tripping through storage providers that use +STJ (e.g., Orleans's Cosmos, blob, or custom providers configured with +`System.Text.Json`). + +**Newtonsoft.Json is not supported.** Users whose storage providers use +Newtonsoft must either configure their provider for STJ or write their +own converters. Documented as a known limitation. + +| Type | Converter approach | +|------|-------------------| +| `Outbox` | `[JsonConverter(typeof(OutboxJsonConverterFactory))]` — factory creates closed `JsonConverter>` | +| `OutboxMessageEnvelope` | `[JsonConverter(typeof(OutboxMessageEnvelopeJsonConverterFactory))]` | +| `MessageTracker` | `[JsonConverter(typeof(MessageTrackerJsonConverter))]` | +| `OutboxSequenceToken` | `[JsonConverter(typeof(OutboxSequenceTokenJsonConverter))]` | +| `StreamCursor` | `[JsonConverter(typeof(StreamCursorJsonConverter))]` | +| `VersionedState` | No custom converter — `[JsonInclude]` on `Version` property makes `internal set` visible to STJ | + +**Why custom converters (not `[JsonInclude]` on private fields):** + +`Outbox` and `MessageTracker` are sealed classes with private +backing fields. Exposing them via `[JsonInclude]` would leak internals +and weaken the fingerprint invariant. Custom converters keep +encapsulation intact and control the exact wire format. + +**`VersionedState` exception:** `Version` is a single `Guid` property +with `internal set`. `[JsonInclude]` is sufficient — no encapsulation +risk, and a full custom converter for an abstract base class is +unnecessary complexity. + +**Generic converters:** STJ requires `JsonConverterFactory` for open +generic types. The factory's `CreateConverter` method creates the closed +`JsonConverter>` for the specific `T`. + +### Non-serialized fields + +Service-reference fields like `TimeProvider _time` get both +`[NonSerialized]` and `[JsonIgnore]` — belt-and-suspenders: + +- **No `[Id]`** → Orleans `[GenerateSerializer]` skips them. +- **`[NonSerialized]`** → .NET runtime serializers skip them. +- **`[JsonIgnore]`** → STJ skips them even if a storage provider + bypasses our custom converter and falls back to reflection. +- **Custom converters** also skip them explicitly. + +All four layers prevent accidental serialization of non-restorable +service references. `RegisterTimeProvider()` re-injects after +deserialization. + +### Telemetry + +**Meter name:** `egil.orleans.messaging` (matches package name). + +**Outbox-specific metrics only** — do not duplicate Orleans-provided +metrics for state read/write, activation lifecycle, messaging layer. + +| Instrument | Type | Description | +|---------------------------|-----------|--------------------------------------| +| `outbox.post.duration` | Histogram | Post run duration (ms) | +| `outbox.post.item.duration` | Histogram | Per-item postman dispatch duration | +| `outbox.post.items` | Counter | Items successfully dispatched | +| `outbox.post.errors` | Counter | Items that failed dispatch | +| `outbox.depth` | Gauge | Pending items per grain type | + +**Tags** (matching spike pattern): +- `grain.type` — owning grain type name +- `event.type` — outbox item type name +- `success` — `true`/`false` on per-item histograms + +**ActivitySource:** `egil.orleans.messaging` for distributed traces. + +### Public interface surface + +- `IOutboxGrain` — marker + DIM for `ReceiveReminder`. Kept: DIM saves + real boilerplate for the 80% case (grains with no other reminders). + Generic constraint on `InitializeOutboxProcessor` ensures type-safe + opt-in. +- `IOutboxComponent` — internal. Not part of public API. + +--- + +## 9. Test strategy + +**Status:** Settled. + +### Test runner + +All tests run through `Egil.Orleans.Testing`'s `InProcessTestCluster` — +real Orleans runtime, fast boot, no mocking of Orleans internals. Even +pure-logic types (`Outbox`, `MessageTracker`) are tested through +grain interactions to validate real serialization, persistence, and +concurrency behavior. + +### Coverage targets + +Matching `Egil.Orleans.Testing` convention: + +- **100% branch coverage** on core types: `Outbox`, + `MessageTracker`, `StateManager`, `VersionedState`, + `OutboxProcessor`. +- **95% branch coverage** on supporting types: `OutboxSequenceToken`, + `StreamCursor`, `StreamManager`, `OutboxMessageEnvelope`, + `WritePolicy`. + +### Test grain shape + +Purpose-built test grains in the test assembly, each targeting a +specific behavior: + +- **Write recovery grain** — exercises `StateManager.WriteAsync` + failure + recovery path, double failure, `WritePolicy.Force`. +- **Outbox drain grain** — exercises `Outbox` Add/Remove/Clear, + epoch reset, `OutboxProcessor` timer/reminder lifecycle, postman + dispatch + error callback with attempt count. +- **Dedup grain** — exercises `MessageTracker.ProcessMessage` for both + stream cursors and outbox tokens, epoch-aware acceptance, eviction. +- **Interleaved-read grain** — exercises `[AlwaysInterleave]` reads + seeing only committed state while a write is in-flight. +- **Stuck postman grain** — exercises `ProcessingTimeout` behavior, + `OnPostErrorAsync` with timeout exception. +- **Multi-reminder grain** — exercises `IOutboxGrain` DIM with grain + that also has its own reminders (DIM shadowing). + +### Serialization round-trip tests + +Dedicated tests for every type with `[GenerateSerializer]`: +- **Orleans serialization:** Serialize → bytes → deserialize → assert equal. +- **System.Text.Json:** Serialize → JSON string → deserialize → assert + equal. Validates `[JsonConverter]` attributes and converter correctness. +- Catches: missing `[Id]`, wrong `[Alias]`, `ImmutableArray` edge + cases, version-tolerance regressions, STJ converter bugs. + +Types covered: `Outbox`, `OutboxMessageEnvelope`, +`MessageTracker`, `OutboxSequenceToken`, `StreamCursor`, +`VersionedState` subtypes. + +### No mocks + +No mocking `IPersistentState` or Orleans internals. The cluster +provides real storage (in-memory), real timers, real reminders. Test +grains exercise the library through the same code path production grains +use. From f0f35c882df6253eb074612a72cb28e02fc0c851 Mon Sep 17 00:00:00 2001 From: Egil Hansen Date: Sat, 23 May 2026 19:15:59 +0000 Subject: [PATCH 02/16] build(oes): scaffold Egil.Orleans.Messaging project structure MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Rename directory from Egil.Orleans.CQRS to Egil.Orleans.Messaging. Add solution file, Directory.Packages.props, version.json, global.json, src csproj (library), and test csproj. Package description: toolbox of composable building blocks — not a framework. Pick what you need, leave the rest. [skip notes] Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- .../Directory.Packages.props | 30 ++++++++++ .../Egil.Orleans.Messaging.slnx | 8 +++ .../api-design.md | 0 Egil.Orleans.Messaging/global.json | 9 +++ .../Egil.Orleans.Messaging.csproj | 55 +++++++++++++++++++ .../Egil.Orleans.Messaging.Tests.csproj | 34 ++++++++++++ Egil.Orleans.Messaging/version.json | 30 ++++++++++ 7 files changed, 166 insertions(+) create mode 100644 Egil.Orleans.Messaging/Directory.Packages.props create mode 100644 Egil.Orleans.Messaging/Egil.Orleans.Messaging.slnx rename {Egil.Orleans.CQRS => Egil.Orleans.Messaging}/api-design.md (100%) create mode 100644 Egil.Orleans.Messaging/global.json create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj create mode 100644 Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj create mode 100644 Egil.Orleans.Messaging/version.json diff --git a/Egil.Orleans.Messaging/Directory.Packages.props b/Egil.Orleans.Messaging/Directory.Packages.props new file mode 100644 index 0000000..ce414d3 --- /dev/null +++ b/Egil.Orleans.Messaging/Directory.Packages.props @@ -0,0 +1,30 @@ + + + true + false + $(NoWarn);NU1507 + + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/Egil.Orleans.Messaging/Egil.Orleans.Messaging.slnx b/Egil.Orleans.Messaging/Egil.Orleans.Messaging.slnx new file mode 100644 index 0000000..779ae7b --- /dev/null +++ b/Egil.Orleans.Messaging/Egil.Orleans.Messaging.slnx @@ -0,0 +1,8 @@ + + + + + + + + diff --git a/Egil.Orleans.CQRS/api-design.md b/Egil.Orleans.Messaging/api-design.md similarity index 100% rename from Egil.Orleans.CQRS/api-design.md rename to Egil.Orleans.Messaging/api-design.md diff --git a/Egil.Orleans.Messaging/global.json b/Egil.Orleans.Messaging/global.json new file mode 100644 index 0000000..60fdc5d --- /dev/null +++ b/Egil.Orleans.Messaging/global.json @@ -0,0 +1,9 @@ +{ + "sdk": { + "rollForward": "latestMajor", + "allowPrerelease": false + }, + "test": { + "runner": "Microsoft.Testing.Platform" + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj new file mode 100644 index 0000000..21a83ea --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj @@ -0,0 +1,55 @@ + + + + net10.0 + enable + enable + true + embedded + + + + Egil.Orleans.Messaging + Orleans Messaging Toolbox + Egil Hansen + Egil Hansen + + A toolbox of composable building blocks for Orleans grains that need + atomic state writes, transactional outbox, receiver-side dedup, and + stream subscription management. Pick what you need, leave the rest. + + README.md + orleans,messaging,outbox,dedup,cqrs,grain,egil + Egil Hansen + https://github.com/egil/framework + https://github.com/egil/framework + git + LICENSE + true + true + true + + + + + all + runtime; build; native; contentfiles; analyzers; buildtransitive + + + + + + + + <_Parameter1>Egil.Orleans.Messaging.Tests + + + + + + True + \ + + + + diff --git a/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj new file mode 100644 index 0000000..a26f041 --- /dev/null +++ b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj @@ -0,0 +1,34 @@ + + + + enable + enable + true + Exe + Egil.Orleans.Messaging.Tests + net10.0 + true + + + + + + + + + + + + + + + + + + + + + + + + diff --git a/Egil.Orleans.Messaging/version.json b/Egil.Orleans.Messaging/version.json new file mode 100644 index 0000000..5dc4331 --- /dev/null +++ b/Egil.Orleans.Messaging/version.json @@ -0,0 +1,30 @@ +{ + "$schema": "https://raw.githubusercontent.com/dotnet/Nerdbank.GitVersioning/main/src/NerdBank.GitVersioning/version.schema.json", + "version": "0.1-alpha", + "publicReleaseRefSpec": [ + "^refs/heads/main$", + "^refs/heads/release/egil-orleans-messaging/v\\d+(?:\\.\\d+)?$" + ], + "cloudBuild": { + "setAllVariables": true, + "buildNumber": { + "enabled": true + } + }, + "release": { + "tagName": "egil-orleans-messaging/v{version}", + "branchName": "release/egil-orleans-messaging/v{version}" + }, + "pathFilters": [ + "./", + "./Directory.Packages.props", + "../.github/workflows/egil-orleans-messaging-ci.yml", + "../.editorconfig", + "../.gitattributes", + "../.gitignore", + "../Directory.Build.props", + "../LICENSE", + "../version.json", + "../xunit.runner.json" + ] +} From 80031aff072424142952517819752f2891625d36 Mon Sep 17 00:00:00 2001 From: Egil Hansen Date: Sat, 23 May 2026 19:35:21 +0000 Subject: [PATCH 03/16] feat(om): stub types with XML docs --- .../Directory.Packages.props | 3 + Egil.Orleans.Messaging/api-design.md | 302 ++++++++-------- .../Egil.Orleans.Messaging.csproj | 2 + .../EnrichedEventHubAdapter.cs | 213 +++++++++++ .../EnrichedEventHubAdapterExtensions.cs | 63 ++++ .../EnrichedEventHubSequenceToken.cs | 127 +++++++ .../Egil.Orleans.Messaging/IOutboxGrain.cs | 51 +++ .../Egil.Orleans.Messaging/IStateManager.cs | 138 +++++++ .../Egil.Orleans.Messaging/MessageTracker.cs | 230 ++++++++++++ .../MessageTrackerJsonConverter.cs | 38 ++ .../NoPostmanRegisteredException.cs | 45 +++ .../src/Egil.Orleans.Messaging/Outbox.cs | 339 ++++++++++++++++++ .../OutboxJsonConverterFactory.cs | 35 ++ .../OutboxMessageEnvelope.cs | 45 +++ ...tboxMessageEnvelopeJsonConverterFactory.cs | 30 ++ .../Egil.Orleans.Messaging/OutboxProcessor.cs | 170 +++++++++ .../OutboxProcessorExtensions.cs | 66 ++++ .../OutboxProcessorOptions.cs | 98 +++++ .../OutboxSequenceToken.cs | 59 +++ .../OutboxSequenceTokenJsonConverter.cs | 32 ++ .../Egil.Orleans.Messaging/StateManager.cs | 124 +++++++ .../StateManagerExtensions.cs | 58 +++ .../Egil.Orleans.Messaging/StreamCursor.cs | 125 +++++++ .../StreamCursorJsonConverter.cs | 40 +++ .../Egil.Orleans.Messaging/StreamManager.cs | 161 +++++++++ .../StreamManagerExtensions.cs | 49 +++ .../Egil.Orleans.Messaging/VersionedState.cs | 64 ++++ .../src/Egil.Orleans.Messaging/WritePolicy.cs | 24 ++ .../Egil.Orleans.Messaging.Tests.csproj | 1 + .../OutboxTests.cs | 210 +++++++++++ .../xunit.runner.json | 5 + 31 files changed, 2801 insertions(+), 146 deletions(-) create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapter.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapterExtensions.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubSequenceToken.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IOutboxGrain.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IStateManager.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTracker.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTrackerJsonConverter.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/NoPostmanRegisteredException.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Outbox.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxJsonConverterFactory.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelope.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelopeJsonConverterFactory.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessor.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorExtensions.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorOptions.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceToken.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceTokenJsonConverter.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManager.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManagerExtensions.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursor.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursorJsonConverter.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManager.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManagerExtensions.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/VersionedState.cs create mode 100644 Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/WritePolicy.cs create mode 100644 Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/OutboxTests.cs create mode 100644 Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/xunit.runner.json diff --git a/Egil.Orleans.Messaging/Directory.Packages.props b/Egil.Orleans.Messaging/Directory.Packages.props index ce414d3..ea7f22b 100644 --- a/Egil.Orleans.Messaging/Directory.Packages.props +++ b/Egil.Orleans.Messaging/Directory.Packages.props @@ -16,15 +16,18 @@ + + + diff --git a/Egil.Orleans.Messaging/api-design.md b/Egil.Orleans.Messaging/api-design.md index 7781837..d380c42 100644 --- a/Egil.Orleans.Messaging/api-design.md +++ b/Egil.Orleans.Messaging/api-design.md @@ -106,12 +106,13 @@ Users pick one of two paths for `T`: for record-of-primitives shapes. Breaks silently if state holds `ImmutableArray<>` (reference-equality trap) — user must override `Equals` themselves in that case. -- **Path B — inherit `VersionedState`** - (`MyState : VersionedState`). Framework owns equality via a - per-write `Guid Version`. Immune to any collection-equality issues in - the state graph. Recommended default for any non-trivial state. +- **Path B — inherit `VersionedState`** + (`MyState : VersionedState`). Library stamps a per-write `Guid Version`. + Recovery path pattern-matches on `VersionedState` and compares `Version` + directly — immune to collection-equality issues in the state graph. + Recommended default for any non-trivial state. -See §3a for the `VersionedState` base class. +See §3a for `VersionedState` and why the generic layer was removed. ### Activation: state is auto-populated @@ -361,15 +362,15 @@ public sealed record OutboxMessageEnvelope( [GenerateSerializer] public sealed class Outbox : IReadOnlyList>, IEquatable> { - [Id(0)] private readonly GrainId _sender; - [Id(1)] private readonly long _latestSequenceNumber; - [Id(2)] private readonly ImmutableArray> _items; - [Id(3)] private readonly DateTimeOffset? _epoch; + [Id(0)] private readonly GrainId sender; + [Id(1)] private readonly long latestSequenceNumber; + [Id(2)] private readonly ImmutableArray> items; + [Id(3)] private readonly DateTimeOffset? epoch; // Non-persisted; no [Id]. Mutable, registered post-construction. [NonSerialized] [JsonIgnore] - private TimeProvider _time = TimeProvider.System; + private TimeProvider time = TimeProvider.System; internal Outbox( GrainId sender, @@ -377,54 +378,53 @@ public sealed class Outbox : IReadOnlyList>, IEquata ImmutableArray> items, DateTimeOffset? epoch) { - _sender = sender; - _latestSequenceNumber = latestSequenceNumber; - _items = items; - _epoch = epoch; + this.sender = sender; + this.latestSequenceNumber = latestSequenceNumber; + this.items = items; + this.epoch = epoch; } - public static Outbox Empty(GrainId sender) => + public static Outbox Create(GrainId sender) => new(sender, latestSequenceNumber: 0, items: [], epoch: null); - public void RegisterTimeProvider(TimeProvider time) => _time = time; + public void RegisterTimeProvider(TimeProvider time) => this.time = time; - public GrainId Sender => _sender; - public long LatestSequenceNumber => _latestSequenceNumber; - public DateTimeOffset? Epoch => _epoch; - public int Count => _items.Length; - public bool IsEmpty => _items.IsDefaultOrEmpty; - public OutboxMessageEnvelope this[int index] => _items[index]; + public GrainId Sender => sender; + public long LatestSequenceNumber => latestSequenceNumber; + public DateTimeOffset? Epoch => epoch; + public int Count => items.Length; + public bool IsEmpty => items.IsDefaultOrEmpty; + public OutboxMessageEnvelope this[int index] => items[index]; public IEnumerator> GetEnumerator() - => ((IEnumerable>)_items).GetEnumerator(); + => ((IEnumerable>)items).GetEnumerator(); IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); public Outbox Add(T message) { /* increments seq, stamps epoch on first call */ } - public Outbox Remove(OutboxSequenceToken token) { /* removes by seq number */ } - public Outbox RemoveRange(IEnumerable tokens) { /* batch remove */ } + public Outbox Remove(OutboxSequenceToken token) { /* removes FIFO head if token matches */ } + public Outbox RemoveRange(IEnumerable tokens) { /* removes matching FIFO prefix */ } public Outbox Clear() { /* removes all items, preserves LatestSequenceNumber + Epoch */ } - // O(1) fingerprint equality — see below. - public bool Equals(Outbox? other) { /* fingerprint check */ } + // O(1) sequence equality — see below. + public bool Equals(Outbox? other) { /* metadata + first/last sequence check */ } public override bool Equals(object? obj) => obj is Outbox o && Equals(o); - public override int GetHashCode() - => HashCode.Combine(_sender, _latestSequenceNumber, _items.Length, _epoch); + public override int GetHashCode() { /* metadata + first/last sequence hash */ } } ``` ### Epoch semantics -- `Empty(sender)` → `epoch = null`, `LatestSequenceNumber = 0`. +- `Create(sender)` → `epoch = null`, `LatestSequenceNumber = 0`. - First `Add()` → stamps `epoch = now`. Persisted with state. - Subsequent `Add()` → same epoch, incrementing sequence number. - `Clear()` → removes items, **preserves** `LatestSequenceNumber` and `Epoch`. This is the normal "postman drained successfully" path. -- `Empty(sender)` again → **resets both** epoch (to null) and sequence +- `Create(sender)` again → **resets both** epoch (to null) and sequence number (to 0). This is the nuclear option — the next `Add` starts a fresh epoch. Receivers see `token.Epoch > stored.Epoch` and accept. **`Clear()` is the normal path.** Grains should almost never call -`Empty(sender)` on an active outbox. `Empty` is for construction-time +`Create(sender)` on an active outbox. `Create` is for construction-time initialisation and deliberate ops-level sequence-space resets. Document and warn. @@ -440,22 +440,28 @@ The outbox can grow unbounded if postman targets are down. Mitigation: - **Documentation:** storage providers have entity size limits (e.g. Azure Table = 1MB). Document the risk of unbounded growth. -### O(1) fingerprint equality +### O(1) sequence equality Two `Outbox` values are equal when -`(Sender, LatestSequenceNumber, Epoch, Count, _items[0].Seq, _items[^1].Seq)` -matches. Constant-time regardless of `_items.Length`. +`(Sender, LatestSequenceNumber, Epoch, Count, first sequence number, last sequence number)` +matches. Constant-time regardless of `items.Length`. -Invariant holds by construction: `Outbox` is a sealed class with -private readonly backing fields. The only mutation paths are the four -public methods (`Add` / `Remove` / `RemoveRange` / `Clear`), each of -which changes at least one fingerprint field. +This relies on the outbox invariant that sequence numbers are assigned +only by `Add` and pending items are removed only in FIFO order. Under +that invariant, matching first and last sequence numbers with matching +count identifies the same contiguous pending sequence window. Equality +therefore stays independent of outbox depth and does not need a separate +persisted fingerprint field. + +If two outboxes have the same sender, epoch, high-water mark, count, and +sequence-window endpoints but different payloads, the outbox was used in +a way that broke encapsulation/invariants. Equality does not attempt to +detect that invalid state. ### Why these choices - **Sealed class, not record.** No `with`, no synthesized copy - constructor. Mutation through four methods only — fingerprint - invariant holds by construction. Reference type because + constructor. Mutation through four methods only. Reference type because `RegisterTimeProvider` is a void mutator. - **`Add(T payload)` not `Add(envelope)`.** Outbox owns sequence assignment. Callers cannot fabricate sequence numbers. @@ -463,7 +469,9 @@ which changes at least one fingerprint field. `MessageTracker`. Successor instances carry the provider forward. - **Lazy `Epoch`.** A grain that never sends doesn't burn a fresh epoch on storage. -- **`Remove(token)` not `Remove(envelope)`.** Token is the identity. +- **`Remove(token)` not `Remove(envelope)`.** Token is the identity, but + removal is constrained to the FIFO head to preserve the contiguous + pending-sequence invariant. - **`LatestSequenceNumber` is a separate field.** After flush, items are empty; high-water mark persists independently. @@ -476,9 +484,9 @@ restart from zero. Acceptable for burst retry policies. --- -## 3a. `VersionedState` — structural equality across `ImmutableArray` +## 3a. `VersionedState` — version-based equality for recovery -**Status:** Settled. +**Status:** Settled. Generic `VersionedState` removed — see below. ### Problem @@ -495,41 +503,41 @@ public abstract record VersionedState { [Id(0)] public Guid Version { get; internal set; } = Guid.CreateVersion7(); } - -[GenerateSerializer] -public abstract record VersionedState : VersionedState - where TSelf : VersionedState -{ - public virtual bool Equals(VersionedState? other) => - other is not null && Version == other.Version; - - public override int GetHashCode() => Version.GetHashCode(); -} ``` User state: ```csharp [GenerateSerializer] -public sealed record MyState : VersionedState +public sealed record MyState : VersionedState { - [Id(1)] public Outbox Outbox { get; init; } = Outbox.Empty(sender); + [Id(1)] public Outbox Outbox { get; init; } = Outbox.Create(sender); [Id(2)] public ImmutableArray Items { get; init; } = []; } ``` -### Why this shape - -- **`Guid Version` with `internal set`.** Library code can write - `Version`; user code cannot. Hard compile-time fence. -- **`set`, not `init`.** Library mutates caller's reference during - `WriteAsync` to stamp new version. -- **`Guid.CreateVersion7()`.** Sortable UUID v7 (.NET 9+). Cheap, - collision-free at this rate. -- **Initialiser on property.** Fresh records get non-empty version - before any write. -- **Two-layer base.** Non-generic `VersionedState` gives `StateManager` - a single type to test against. Generic layer carries typed equality. +### Why the generic `VersionedState` was removed + +The original design had a two-layer hierarchy where the generic layer +overrode `Equals` to compare only `Version`, bypassing `ImmutableArray` +reference-equality. However: + +1. **The Equals override was broken.** When the user writes + `sealed record MyState : VersionedState`, the compiler + generates `MyState.Equals(MyState?)` that calls `base.Equals(other)` + (the version-only check) **AND** adds property-level checks for all + of `MyState`'s declared properties. So `ImmutableArray` comparisons + still happen in the generated code. +2. **The recovery path doesn't need it.** `IStateManager.WriteAsync` + does `if (newState is VersionedState v)` and compares `v.Version` + directly via pattern matching — it never relies on `T.Equals()` for + VersionedState-derived types. +3. **`IEquatable` on `IStateManager` is satisfied automatically** + by the record-generated equality for non-VersionedState types. + +The non-generic `VersionedState` provides everything the library needs: +the `Version` property, a single type for pattern matching at runtime, +and the `[JsonInclude]` + `internal set` fence. ### Why `Version` is internal-set, not user-settable @@ -555,21 +563,21 @@ each upstream source. Two source kinds: ### Shape **Sealed class** (not record — consistent with `Outbox`, avoids -`_time` field participating in record-synthesized equality). +`time` field participating in record-synthesized equality). ```csharp [GenerateSerializer] public sealed class MessageTracker { - [Id(0)] private ImmutableDictionary _stream; - [Id(1)] private ImmutableDictionary _outbox; + [Id(0)] private ImmutableDictionary streams; + [Id(1)] private ImmutableDictionary outbox; // Non-persisted; no [Id]. Mutable. [NonSerialized] [JsonIgnore] - private TimeProvider _time = TimeProvider.System; + private TimeProvider time = TimeProvider.System; - public void RegisterTimeProvider(TimeProvider time) => _time = time; + public void RegisterTimeProvider(TimeProvider time) => this.time = time; public bool ProcessMessage(StreamCursor cursor, out MessageTracker next); public bool ProcessMessage(OutboxSequenceToken token, out MessageTracker next); @@ -648,7 +656,7 @@ responsibilities: 1. **Subscribe** to stream namespaces during `OnActivateAsync`. 2. **Resume** from last accepted `StreamCursor` per namespace. 3. **Dispatch** with projected `StreamCursor`. -4. **Per-subscription error handling** via `.OnError(...)`. +4. **Per-subscription error handling** via the optional `onError` callback. ### Why a facade, not a base class @@ -660,31 +668,27 @@ uses `StreamManager` as a field. ```csharp public sealed class StreamManager { - public static StreamManagerBuilder ForGrain( - Grain owner, - MessageTracker trackerSnapshot, - IServiceProvider services); -} - -public sealed class StreamManagerBuilder -{ - public StreamSubscriptionBuilder Subscribe( + public StreamManager Subscribe( string streamNamespace, - bool resumeUsingLatestSequenceToken = true); + Func onNextAsync, + Action? onError = default, + bool passLatestSequenceTokenOnResume = true); - public StreamManager Build(); + public StreamManager Subscribe( + string streamNamespace, + Func onNextAsync, + Action? onError = default, + bool passLatestSequenceTokenOnResume = true); } -public sealed class StreamSubscriptionBuilder +public static class StreamManagerExtensions { - public StreamSubscriptionBuilder OnNext( - Func handler); - - public StreamSubscriptionBuilder OnError( - Func handler); - - public StreamManagerBuilder Done(); + public static StreamManager InitializeStreamManager( + this TGrain grain, + MessageTracker trackerSnapshot) + where TGrain : IGrainBase; } + ``` ### Typical wiring @@ -695,33 +699,30 @@ public override async Task OnActivateAsync(CancellationToken ct) var state = await stateManager.ReadAsync(); state.Tracker.RegisterTimeProvider(timeProvider); - streamManager = StreamManager.ForGrain(this, state.Tracker, services) - .Subscribe("electricity-prices") - .OnNext(HandlePriceTickAsync) - .OnError(LogStreamErrorAsync) - .Done() - .Subscribe("tariff-events") - .OnNext(HandleTariffChangedAsync) - .Done() - .Build(); + streamManager = this.InitializeStreamManager(state.Tracker) + .Subscribe("electricity-prices", HandlePriceTickAsync, LogStreamError) + .Subscribe("tariff-events", HandleTariffChangedAsync); } ``` +The `TEvent` generic argument is usually inferred from the handler method +group. Users only specify it for inline lambdas or ambiguous method groups. + ### Resume semantics -- `Build()` reads `trackerSnapshot.LatestStream(streamId)` once per - subscription. +- Each subscription reads `trackerSnapshot.LatestStream(streamId)` once + when `Subscribe(...)` activates it. - If cursor exists → hydrate `StreamSequenceToken`, pass to `SubscribeAsync`. - If null → subscribe without token (provider default, typically: start from current). -- Tracker read once at `Build()` time, never again. Handler is the +- Tracker read once at subscription time, never again. Handler is the only path that mutates the tracker. ### Per-subscription `OnError` -Signature: `Func`. Default when -omitted: log + emit counter, do NOT rethrow. +Signature: `Action`, where the string is the stream +namespace. Default when omitted: log + emit counter, do NOT rethrow. ### One-provider-per-namespace @@ -744,31 +745,39 @@ STJ converter for a closed set of known subtypes discriminated on Unknown subtype throws at serialization time — silently dropping the cursor would corrupt dedup. -### Custom EH adapter for enriched tokens +### Event Hub adapter for enriched tokens + +The library ships `EnrichedEventHubAdapter`, a public unsealed +`EventHubDataAdapter` subclass. It opts Event Hub streams into +`EnrichedEventHubSequenceToken`, which carries: + +- `EnqueuedTime` — broker-side enqueue time for lag measurement. +- `StreamProviderName` — provider identity for dedup and multi-provider + edge cases. +- `TraceParent` — W3C traceparent captured from the producer-side + `Activity.Current?.Id`. -Optional pattern. To opt into `EnrichedEventHubSequenceToken` (carries -`EnqueuedTime`, see §5 projection constraint table), the silo wires a -custom `IEventHubDataAdapter` via `UseDataAdapter`: +Users who want the built-in behavior register it with +`UseEnrichedDataAdapter()` on `IEventHubStreamConfigurator`: ```csharp -internal sealed class EnrichedEventHubAdapter(Serializer serializer) - : EventHubDataAdapter(serializer) +siloBuilder.AddEventHubStreams("orders", b => { - public override StreamSequenceToken GetSequenceToken( - EventHubMessage eventHubMessage, int eventIndex) - { - var seqNum = eventHubMessage.SequenceNumber; - return new EnrichedEventHubSequenceToken( - eventHubMessage.Offset, - seqNum, - eventIndex, - eventHubMessage.EnqueuedTimeUtc); - } -} + b.UseEnrichedDataAdapter(); + // ... other Event Hub config +}); ``` -`StreamManager` is unaware of the adapter — the enrichment surfaces -through `StreamCursor.TryGetEnqueuedTime(...)` in the user's `OnNext`. +Users who need custom adapter behavior can subclass +`EnrichedEventHubAdapter` and register their subclass via Orleans' +`UseDataAdapter` directly. The library intentionally provides no +generic registration helper for custom subclasses because those adapters +usually need extra services/options. + +`StreamManager` is unaware of Event Hubs specifically — enrichment +surfaces through `StreamCursor.TryGetEnqueuedTime(...)`, +`StreamCursor.TryGetStreamProviderName(...)`, and +`StreamCursor.TryGetTraceParent(...)`. ### OpenTelemetry trace correlation @@ -777,20 +786,24 @@ correlate consumer-side spans with producer-side spans without creating multi-hour distributed traces, `StreamManager` should use `ActivityLink`s, not parent chaining: -- Producer side (in a custom `ToQueueMessage` override on the adapter): - stash `Activity.Current?.Id` into `EventData.Properties["traceparent"]` +- Producer side (`EnrichedEventHubAdapter.ToQueueMessage`): stash + `Activity.Current?.Id` into `EventData.Properties["traceparent"]` before the event hits EH. +- Adapter ingest side (`EnrichedEventHubAdapter.GetStreamPosition`): + extract `EventData.Properties["traceparent"]` into + `EnrichedEventHubSequenceToken.TraceParent`. - Consumer side (in `StreamManager`'s OnNext wrapper): read the - property, parse into `ActivityContext`, start the OnNext span with + token's traceparent, parse into `ActivityContext`, start the OnNext span with `ActivityKind.Consumer` and `links: [new ActivityLink(parsedContext)]`. This produces separate traces per delivery, each with a link back to the producer span. OTel backends render the cross-trace arrow without collapsing weeks of traffic into one trace. -The adapter override is the user's responsibility (it lives in their -custom `IEventHubDataAdapter`); the consumer-side parse+link is a -library concern owned by `StreamManager`. +The built-in adapter owns producer-side propagation for users who call +`UseEnrichedDataAdapter()`. Custom adapters should preserve the same +`traceparent` property behavior if they want `StreamManager` to create +links. ### Telemetry @@ -845,8 +858,8 @@ Grains using this library should follow a functional-command model: - **Safe concurrency.** Reads never block writes. Writes are serialised by the runtime. No custom locks. - **Recovery-friendly.** `IStateManager.WriteAsync` recovery path - relies on `VersionedState.Equals` — works because state is a - value type (record with `Version` equality). + pattern-matches `VersionedState` and compares `Version` directly — + works because the library stamps a v7 UUID on every write. - **Outbox replaces side effects.** Instead of "write state + call service" (two failure points), it's "write state with outbox item" (one atomic write) + "processor retries delivery" (idempotent). @@ -856,7 +869,7 @@ Grains using this library should follow a functional-command model: - No compile-time prevention of injecting `HttpClient` or calling external services in command methods. This is a documentation and code-review concern. -- No base class. The pattern emerges from the types: `VersionedState` +- No base class. The pattern emerges from the types: `VersionedState` is a record (immutable), `IStateManager.WriteAsync` takes a new value (not mutation), `Outbox.Add` returns a new instance. - Future: Roslyn analyzers could warn on external I/O inside methods @@ -1073,14 +1086,11 @@ public abstract record VersionedState { [Id(0)] public Guid Version { get; internal set; } } - -[GenerateSerializer] -public abstract record VersionedState : VersionedState -{ - // Inherits [Id(0)] from parent. Child IDs start fresh at [Id(0)]. -} ``` +> **Note:** The generic `VersionedState` layer was removed. +> See §3a for rationale. Child record IDs start fresh at `[Id(0)]`. + ### System.Text.Json serialization All serializable types carry `[JsonConverter]` attributes referencing @@ -1091,9 +1101,9 @@ This ensures correct round-tripping through storage providers that use STJ (e.g., Orleans's Cosmos, blob, or custom providers configured with `System.Text.Json`). -**Newtonsoft.Json is not supported.** Users whose storage providers use -Newtonsoft must either configure their provider for STJ or write their -own converters. Documented as a known limitation. +**Newtonsoft.Json is not supported out of the box.** Users whose storage +providers use Newtonsoft can write and register their own converters. +Documented as a known limitation. | Type | Converter approach | |------|-------------------| @@ -1122,7 +1132,7 @@ generic types. The factory's `CreateConverter` method creates the closed ### Non-serialized fields -Service-reference fields like `TimeProvider _time` get both +Service-reference fields like `TimeProvider time` get both `[NonSerialized]` and `[JsonIgnore]` — belt-and-suspenders: - **No `[Id]`** → Orleans `[GenerateSerializer]` skips them. @@ -1184,7 +1194,7 @@ concurrency behavior. Matching `Egil.Orleans.Testing` convention: - **100% branch coverage** on core types: `Outbox`, - `MessageTracker`, `StateManager`, `VersionedState`, + `MessageTracker`, `StateManager`, `VersionedState`, `OutboxProcessor`. - **95% branch coverage** on supporting types: `OutboxSequenceToken`, `StreamCursor`, `StreamManager`, `OutboxMessageEnvelope`, diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj index 21a83ea..6cffc4c 100644 --- a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Egil.Orleans.Messaging.csproj @@ -35,8 +35,10 @@ all runtime; build; native; contentfiles; analyzers; buildtransitive + + diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapter.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapter.cs new file mode 100644 index 0000000..584fc17 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapter.cs @@ -0,0 +1,213 @@ +using System.Diagnostics; +using Orleans.Providers.Streams.Common; +using Orleans.Serialization; +using Orleans.Streaming.EventHubs; +using Orleans.Streams; + +namespace Egil.Orleans.Messaging; + +/// +/// An subclass that produces +/// instances carrying the +/// broker-side enqueue time, stream provider name, and W3C traceparent. +/// +/// +/// +/// Registration: Do not instantiate directly. Use +/// +/// on the IEventHubStreamConfigurator during silo setup: +/// +/// siloBuilder.AddEventHubStreams("my-provider", b => +/// { +/// b.UseEnrichedDataAdapter(); +/// // ... other config +/// }); +/// +/// +/// +/// Overrides: This adapter overrides three methods: +/// +/// and +/// +/// — to return instead of +/// . +/// — to stamp +/// Activity.Current?.Id into the outgoing +/// EventData.Properties["traceparent"] before the event hits +/// Event Hub, enabling cross-queue trace correlation via +/// s on the consumer side. +/// +/// All other adapter behavior (batch container, partition key) is +/// inherited unchanged from . +/// +/// +/// OTel trace correlation: Orleans streams lose +/// across the queue boundary. This adapter +/// uses the W3C traceparent propagation pattern: +/// +/// Producer side (): +/// stashes Activity.Current?.Id into +/// EventData.Properties["traceparent"]. +/// Consumer side (): +/// extracts the traceparent property and stores it in +/// . +/// StreamManager: reads +/// and creates an +/// — correlating consumer spans to producer +/// spans without creating multi-hour parent-child traces. +/// +/// +/// +/// Subclassing: This class is not sealed — users who need additional +/// adapter customization (custom partition keys, custom batch containers, +/// etc.) can inherit from instead of +/// from directly. This preserves the +/// enriched token behavior while allowing further overrides: +/// +/// public class MyAdapter : EnrichedEventHubAdapter +/// { +/// public MyAdapter(string streamProviderName, Serializer serializer) +/// : base(streamProviderName, serializer) { } +/// +/// public override string GetPartitionKey(StreamId streamId) +/// => streamId.GetNamespace(); +/// } +/// +/// When subclassing, register via UseDataAdapter directly on the +/// IEventHubStreamConfigurator with your custom adapter type. +/// +/// +public class EnrichedEventHubAdapter : EventHubDataAdapter +{ + /// + /// The Event Hub application property key used to propagate the W3C + /// traceparent header across the queue boundary. + /// + protected const string TraceParentPropertyKey = "traceparent"; + + /// + /// The name of the Orleans stream provider. Accessible to subclasses for + /// custom token construction or diagnostics. + /// + protected string StreamProviderName { get; } + + /// + /// Creates a new adapter that enriches tokens with the given + /// . + /// + /// + /// The name of the Orleans stream provider, passed through from the + /// UseDataAdapter factory delegate. Stored in every + /// produced by this adapter. + /// + /// + /// The Orleans used by the base + /// for batch container + /// serialization/deserialization. + /// + public EnrichedEventHubAdapter(string streamProviderName, Serializer serializer) + : base(serializer) + { + StreamProviderName = streamProviderName; + } + + /// + /// Overrides the base to stamp Activity.Current?.Id into the + /// outgoing EventData.Properties["traceparent"] before the event + /// is published to Event Hub. + /// + /// + /// + /// This is the producer side of the OTel trace correlation pattern. + /// The current (W3C format) is captured at + /// publish time and stored as an Event Hub application property. On the + /// consumer side, extracts it into the + /// property. + /// + /// + /// If no is active at publish time, no property + /// is set and the consumer-side token will have a null + /// . + /// + /// + /// The type of stream events. + /// The target stream identity. + /// The events to publish. + /// The sequence token (must be null for Event Hubs). + /// The Orleans request context dictionary. + /// + /// An with the + /// traceparent property stamped if an + /// is active. + /// + public override Azure.Messaging.EventHubs.EventData ToQueueMessage( + StreamId streamId, + IEnumerable events, + StreamSequenceToken token, + Dictionary requestContext) + { + throw new NotImplementedException(); + } + + /// + /// Overrides the base to produce an + /// carrying the cached message's enqueue time and the stream provider name. + /// + /// + /// + /// Called by the Event Hub cache when delivering messages from the in-memory + /// cache. The is available on every + /// cached message — no additional broker round-trip is needed. + /// + /// + /// Note: The + /// is not available on tokens reconstructed from cache — the + /// struct does not carry Event Hub application + /// properties. Trace correlation uses the token produced by + /// at initial ingest. + /// + /// + /// The cached Event Hub message. + /// + /// An with the enqueue time and + /// provider name populated. + /// is null for cache-reconstructed tokens. + /// + public override StreamSequenceToken GetSequenceToken(ref CachedMessage cachedMessage) + { + throw new NotImplementedException(); + } + + /// + /// Overrides the base to produce an + /// when the adapter first sees a raw EventData from the Event Hub + /// partition receiver. + /// + /// + /// + /// Called once per incoming EventData before it enters the cache. + /// The EventData.EnqueuedTime property carries the broker-stamped + /// UTC time; the is captured at adapter + /// construction time. + /// + /// + /// Trace correlation: If the incoming EventData has a + /// "traceparent" application property (stamped by + /// on the producer side), it is extracted + /// and stored in . + /// + /// + /// The Event Hub partition identifier. + /// The raw Event Hub message. + /// + /// A whose SequenceToken is an + /// with enqueue time, + /// provider name, and traceparent (if present). + /// + public override StreamPosition GetStreamPosition( + string partition, + Azure.Messaging.EventHubs.EventData queueMessage) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapterExtensions.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapterExtensions.cs new file mode 100644 index 0000000..767f966 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubAdapterExtensions.cs @@ -0,0 +1,63 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Extension methods for registering the -producing +/// data adapter on an Event Hub stream configurator. +/// +/// +/// +/// Usage: +/// +/// siloBuilder.AddEventHubStreams("my-provider", b => +/// { +/// b.ConfigureEventHub(ob => ob.Configure(options => +/// { +/// options.ConfigureTableStorageCheckpointing(...); +/// })); +/// b.UseEnrichedDataAdapter(); +/// }); +/// +/// +/// +/// What it does: Registers the library's internal +/// EnrichedEventHubAdapter via UseDataAdapter. The adapter +/// overrides GetStreamPosition and GetSequenceToken to return +/// instances carrying the +/// broker-side enqueue time and the stream provider name. No user-written +/// adapter code is needed. +/// +/// +/// Downstream access: Once registered, the enrichment is transparently +/// available on every stream token. Use +/// for lag measurement and +/// for provider-aware dedup and diagnostics. +/// +/// +public static class EnrichedEventHubAdapterExtensions +{ + /// + /// Registers the library's -producing + /// data adapter on the given Event Hub stream configurator. + /// + /// + /// + /// Replaces the default EventHubDataAdapter. Only one data adapter + /// can be active per stream provider — calling this after a prior + /// UseDataAdapter replaces the previous registration. + /// + /// + /// The adapter is resolved per-provider: the Serializer is obtained + /// from the silo's , and the stream provider + /// name is captured from the configurator's factory delegate. + /// + /// + /// + /// The Event Hub stream configurator, obtained from + /// ISiloBuilder.AddEventHubStreams. + /// + public static void UseEnrichedDataAdapter( + this IEventHubStreamConfigurator configurator) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubSequenceToken.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubSequenceToken.cs new file mode 100644 index 0000000..aed2805 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/EnrichedEventHubSequenceToken.cs @@ -0,0 +1,127 @@ +using Orleans.Streaming.EventHubs; + +namespace Egil.Orleans.Messaging; + +/// +/// An subclass that carries the +/// broker-side and the , +/// enabling end-to-end lag measurement and provider-aware dedup in +/// and . +/// +/// +/// +/// Registration: Call +/// +/// on the Event Hub stream configurator during silo setup. This registers +/// the library's internal EnrichedEventHubAdapter that overrides +/// GetStreamPosition and GetSequenceToken to produce +/// instances automatically. +/// +/// +/// Transparent to StreamManager: The is +/// unaware of the adapter. The enrichment surfaces through +/// and +/// in the user's +/// OnNext handler. +/// +/// +/// Dedup and edge cases: The embedded +/// in the token enables to distinguish messages +/// arriving from different stream providers on the same stream namespace, +/// supporting multi-provider topologies and provider-specific eviction. +/// +/// +/// OTel trace correlation: The property +/// carries the W3C traceparent header captured at publish time by +/// . On the consumer side, +/// reads it via +/// and creates an +/// — correlating consumer +/// spans to producer spans without multi-hour parent-child traces. +/// +/// +/// Serialization: Inherits Orleans serialization from +/// . The library's STJ converter for +/// recognizes this subtype via a $kind +/// discriminator and round-trips , +/// , and fields. +/// +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.EnrichedEventHubSequenceToken")] +public class EnrichedEventHubSequenceToken : EventHubSequenceTokenV2 +{ + /// + /// The wall-clock time the event was enqueued at the Event Hub broker. + /// Used by to compute + /// end-to-end lag: now - EnqueuedTime. + /// + [Id(0)] + public DateTimeOffset EnqueuedTime { get; } + + /// + /// The name of the Orleans stream provider that delivered this event. + /// Enables provider-aware dedup and diagnostics in . + /// + [Id(1)] + public string StreamProviderName { get; } + + /// + /// The W3C traceparent header value from the producer-side + /// at publish time, or null + /// if no activity was active when the event was published. + /// + /// + /// Stamped by in its + /// ToQueueMessage override (producer side) and extracted in + /// GetStreamPosition (consumer side). + /// uses this to create + /// s — correlating consumer + /// spans to producer spans without creating multi-hour parent-child traces. + /// + [Id(2)] + public string? TraceParent { get; } + + /// + /// Creates a new enriched token with the broker-side enqueue time and + /// stream provider name. + /// + /// The Event Hub partition offset (string). + /// The Event Hub sequence number. + /// + /// The index of the event within a batch at the same sequence number. + /// + /// + /// The (UTC) when the event was enqueued at the + /// Event Hub broker. + /// + /// + /// The name of the Orleans stream provider. Passed through from the + /// EnrichedEventHubAdapter at construction time. + /// + /// + /// The W3C traceparent value captured at publish time, or + /// null if no was active. + /// + public EnrichedEventHubSequenceToken( + string offset, + long sequenceNumber, + int eventIndex, + DateTime enqueuedTime, + string streamProviderName, + string? traceParent = null) + : base(offset, sequenceNumber, eventIndex) + { + EnqueuedTime = new DateTimeOffset(enqueuedTime, TimeSpan.Zero); + StreamProviderName = streamProviderName; + TraceParent = traceParent; + } + + /// + /// Parameterless constructor for Orleans serializer use only. + /// + public EnrichedEventHubSequenceToken() + { + StreamProviderName = string.Empty; + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IOutboxGrain.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IOutboxGrain.cs new file mode 100644 index 0000000..3842c79 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IOutboxGrain.cs @@ -0,0 +1,51 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Marker interface for grains that use . +/// Provides a default interface method (DIM) for +/// that routes reminder callbacks to the processor automatically. +/// +/// +/// +/// Zero ceremony: Implementing this interface is one of two obligations +/// for grains using the outbox pattern (the other is calling +/// InitializeOutboxProcessor in OnActivateAsync). No manual +/// ReceiveReminder override is needed — the DIM discovers the +/// registered on the grain context and +/// forwards the callback. +/// +/// +/// Escape hatch: Grains with their own reminders can override +/// explicitly, handle their own +/// reminder names, and forward unknown names to the processor: +/// +/// public async Task ReceiveReminder(string name, TickStatus status) +/// { +/// if (name == MyOwnReminder) { await DoMyWork(); return; } +/// await outboxProcessor.ReceiveReminderAsync(name, status); +/// } +/// +/// +/// +/// Compiler enforcement: The InitializeOutboxProcessor +/// extension method constrains TGrain : IOutboxGrain, IGrainBase, +/// so grains that forget to implement this interface get a compile error. +/// +/// +public interface IOutboxGrain : IRemindable +{ + /// + /// Default implementation that discovers the + /// on the grain context and forwards the reminder callback. Returns + /// if no processor is attached (safe + /// no-op for reminders that fire before activation completes). + /// + Task IRemindable.ReceiveReminder(string reminderName, TickStatus status) + { + var grainBase = (IGrainBase)this; + var component = grainBase.GrainContext.GetComponent(); + return component is null + ? Task.CompletedTask + : component.ReceiveReminderAsync(reminderName, status).AsTask(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IStateManager.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IStateManager.cs new file mode 100644 index 0000000..0aa1d3a --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/IStateManager.cs @@ -0,0 +1,138 @@ +namespace Egil.Orleans.Messaging; + +/// +/// A thin wrapper around that guarantees +/// the grain's observable is never out of sync with what is +/// durably persisted, even when fails ambiguously +/// (timeout, network drop, server 5xx, ETag conflict). +/// +/// +/// +/// Committed-state fence: exposes only the last +/// successfully written value. During an in-flight write, the underlying +/// .State already holds the uncommitted +/// value. Methods marked [AlwaysInterleave] that read +/// through this interface are guaranteed to never observe uncommitted state. +/// This is the primary reason exists as a wrapper +/// rather than extension methods on . +/// +/// +/// Usage: Inject as normal via +/// [PersistentState], then wrap it during OnActivateAsync: +/// +/// stateManager = storage.AsStateManager(); +/// +/// The raw should not be accessed +/// directly after wrapping — doing so bypasses the committed-state fence. +/// +/// +/// Recovery: On ambiguous write failure, the manager re-reads from +/// storage. If the write actually persisted (detected via version or equality +/// check), it swallows the exception. If the write did not persist, it rethrows. +/// If both write and re-read fail (double failure), the manager reverts to the +/// previous state and rethrows — the grain must call +/// before its next write to refresh the ETag. +/// +/// +/// +/// The grain state type. Must be a reference type (atomic pointer swap for +/// interleaved reads) and implement (recovery path +/// compares server-side state to attempted write). Records satisfy both for free. +/// For state containing +/// or other types with reference-based equality, inherit from +/// — the recovery path pattern-matches against it +/// and compares directly, bypassing +/// Equals entirely. +/// +/// Deep immutability: Every type referenced from the state record +/// should also be immutable. The strength of this requirement depends on +/// how the grain is used: +/// +/// +/// Required when the grain uses [AlwaysInterleave] on any +/// read method. Interleaved readers access concurrently +/// with an in-flight command building the next state value. If the state +/// graph contains mutable reference types, a reader could observe a +/// partially mutated object even though the root reference hasn't +/// been swapped yet (the old state's inner mutable object is being modified +/// in place by the command). +/// +/// +/// Strongly recommended even without interleaving. Orleans default +/// turn-based concurrency prevents concurrent access, so torn reads cannot +/// occur. However, immutable state still provides value: it makes the +/// functional-command pattern (with { ... }) predictable, prevents +/// accidental mutation of the "previous" snapshot held by the recovery +/// path, and avoids subtle bugs if [AlwaysInterleave] is added later. +/// +/// +/// Use ImmutableArray<T>, ImmutableDictionary<K,V>, +/// records, and value objects throughout. Mutable collections +/// (, +/// ) inside +/// the state graph undermine these guarantees. +/// +/// +public interface IStateManager + where T : class, IEquatable +{ + /// + /// Gets the last successfully committed state snapshot. + /// + /// + /// Safe to read from [AlwaysInterleave] methods — returns only + /// committed values, never in-flight uncommitted state. This is the + /// committed-state fence that justifies the wrapper over raw + /// . + /// + T State { get; } + + /// + /// Re-reads state from durable storage, replacing the current + /// snapshot. + /// + /// + /// Not required during activation — + /// auto-hydrates before OnActivateAsync. Use only when the grain + /// needs to force a re-read mid-activation (e.g., after a known external + /// mutation or to recover from a double-failure scenario). + /// + Task ReadAsync(); + + /// + /// Atomically writes to durable storage. + /// On success, reflects . + /// + /// + /// + /// If derives from , + /// a fresh version (v7) is stamped on it before writing. + /// The caller's reference is mutated — this is a documented contract. + /// + /// + /// Recovery on failure: Re-reads from storage. If the write actually + /// landed (version or equality match), swallows the exception and returns + /// normally. If it did not land, rethrows the original exception. + /// InconsistentStateException always rethrows + /// even if equality matches — a coincidental match must not hide a real + /// concurrent write. + /// + /// + /// Double failure: If both write and re-read fail, reverts + /// to its pre-write value and rethrows. The grain + /// holds correct data but a stale ETag — call + /// before the next write. + /// + /// + /// The new state value to persist. + /// + /// Concurrency policy. (default) uses + /// the provider's ETag check. bypasses it. + /// + Task WriteAsync(T newState, WritePolicy policy = WritePolicy.Concurrent); + + /// + /// Clears the persisted state, resetting it to default. + /// + Task ClearAsync(); +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTracker.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTracker.cs new file mode 100644 index 0000000..12b8a5b --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTracker.cs @@ -0,0 +1,230 @@ +using System.Collections.Immutable; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// Receiver-side, persisted dedup state. Tracks the high-water position from +/// each upstream message source so the grain can detect and reject duplicates. +/// +/// +/// +/// Two source kinds: +/// +/// Orleans streams — keyed by ; position +/// is a wrapping (StreamId, StreamSequenceToken). +/// Outbox messages — keyed by sender ; +/// position is an . +/// +/// +/// +/// Sealed class, not record. Consistent with — +/// avoids time field participating in record-synthesized equality, and +/// prevents with { ... } expressions that could bypass invariants. +/// +/// +/// ProcessMessage semantics (streams): +/// +/// No prior entry → Accept, insert (LastPosition = cursor, Received = now). +/// cursor > stored.LastPosition → Accept, update position + received. +/// cursor <= stored.LastPosition → Reject (duplicate), no change. +/// +/// +/// +/// ProcessMessage semantics (outbox): +/// +/// No prior entry → Accept, insert. +/// token.Epoch > stored.Epoch → Accept (sender reset), replace entry. +/// Same epoch, token.Seq > stored.LastSeq → Accept, update. +/// Same epoch, token.Seq <= stored.LastSeq → Reject (duplicate). +/// token.Epoch < stored.Epoch → Reject (stale epoch). +/// +/// +/// +/// Eviction: Five overloads, one rule — remove entries where +/// entry.Received < olderThan. No separate Forget API. +/// Evict(id, DateTimeOffset.MaxValue) is the documented idiom for +/// unconditional removal of a single source entry. +/// +/// +/// TimeProvider: Non-persisted ([NonSerialized], +/// [JsonIgnore], no [Id]). After deserialization, the grain +/// must call to inject a test-friendly +/// clock. Falls back to if skipped. +/// +/// +/// Serialization: Decorated with [GenerateSerializer] for Orleans +/// and [JsonConverter] for STJ. The custom converter keeps private backing +/// fields encapsulated. +/// +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.MessageTracker")] +// [JsonConverter(typeof(MessageTrackerJsonConverter))] +public sealed class MessageTracker : IEquatable +{ + [Id(0)] private readonly ImmutableDictionary streams; + [Id(1)] private readonly ImmutableDictionary outbox; + + /// + /// Non-persisted service reference. No [Id], no serialization. + /// Falls back to when not explicitly set. + /// + [NonSerialized] + [JsonIgnore] + private TimeProvider time = TimeProvider.System; + + /// + /// Creates an empty with no tracked sources. + /// + public MessageTracker() + { + streams = ImmutableDictionary.Empty; + outbox = ImmutableDictionary.Empty; + } + + private MessageTracker( + ImmutableDictionary streams, + ImmutableDictionary outbox) + { + this.streams = streams; + this.outbox = outbox; + } + + /// + /// Registers a for Received timestamps. + /// Must be called after deserialization to inject a test-friendly clock. + /// + public void RegisterTimeProvider(TimeProvider time) => this.time = time; + + /// + /// Evaluates a stream message for acceptance. Returns true if the + /// advances past the stored high-water mark. + /// + /// The stream cursor to evaluate. + /// + /// When accepted, a new with the updated + /// position. When rejected, equals this. + /// + /// true if accepted (new message); false if duplicate. + public bool ProcessMessage(StreamCursor cursor, out MessageTracker next) + { + throw new NotImplementedException(); + } + + /// + /// Evaluates an outbox message for acceptance. Returns true if the + /// advances past the stored position or carries + /// a newer epoch. + /// + /// The outbox sequence token to evaluate. + /// + /// When accepted, a new with the updated + /// position. When rejected, equals this. + /// + /// true if accepted; false if duplicate or stale. + public bool ProcessMessage(OutboxSequenceToken token, out MessageTracker next) + { + throw new NotImplementedException(); + } + + /// + /// Returns the last accepted for the given + /// , or null if no messages from that + /// stream have been tracked. + /// + public StreamCursor? LatestStream(StreamId stream) + { + throw new NotImplementedException(); + } + + /// + /// Returns the last accepted for the + /// given , or null if no outbox messages + /// from that sender have been tracked. + /// + public OutboxSequenceToken? LatestOutbox(GrainId sender) + { + throw new NotImplementedException(); + } + + /// + /// Removes all entries (both stream and outbox) where + /// entry.Received < . + /// + public MessageTracker Evict(DateTimeOffset olderThan) + { + throw new NotImplementedException(); + } + + /// + /// Removes stream entries where + /// entry.Received < . + /// Outbox entries are unaffected. + /// + public MessageTracker EvictStreams(DateTimeOffset olderThan) + { + throw new NotImplementedException(); + } + + /// + /// Removes outbox entries where + /// entry.Received < . + /// Stream entries are unaffected. + /// + public MessageTracker EvictOutboxes(DateTimeOffset olderThan) + { + throw new NotImplementedException(); + } + + /// + /// Removes the entry for the given if + /// entry.Received < . + /// Use DateTimeOffset.MaxValue to unconditionally remove. + /// + public MessageTracker Evict(StreamId stream, DateTimeOffset olderThan) + { + throw new NotImplementedException(); + } + + /// + /// Removes the entry for the given outbox if + /// entry.Received < . + /// Use DateTimeOffset.MaxValue to unconditionally remove. + /// + public MessageTracker Evict(GrainId sender, DateTimeOffset olderThan) + { + throw new NotImplementedException(); + } + + /// + public bool Equals(MessageTracker? other) + { + throw new NotImplementedException(); + } + + /// + public override bool Equals(object? obj) => obj is MessageTracker o && Equals(o); + + /// + public override int GetHashCode() => HashCode.Combine(streams.Count, outbox.Count); + + /// + /// Internal entry tracking a stream source's last known position and + /// the wall-clock time it was received. + /// + [GenerateSerializer] + internal readonly record struct StreamEntry( + [property: Id(0)] StreamCursor LastPosition, + [property: Id(1)] DateTimeOffset Received); + + /// + /// Internal entry tracking an outbox source's last known epoch, + /// sequence number, and the wall-clock time it was received. + /// + [GenerateSerializer] + internal readonly record struct OutboxEntry( + [property: Id(0)] DateTimeOffset Epoch, + [property: Id(1)] long LastSequenceNumber, + [property: Id(2)] DateTimeOffset Received); +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTrackerJsonConverter.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTrackerJsonConverter.cs new file mode 100644 index 0000000..f06ecdf --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/MessageTrackerJsonConverter.cs @@ -0,0 +1,38 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// STJ converter for . Serializes and +/// deserializes the tracker's internal stream-position and outbox-position +/// dictionaries without exposing private fields. +/// +/// +/// +/// is a sealed class with private +/// +/// backing fields. Exposing them via [JsonInclude] would leak +/// internal structure and weaken encapsulation. This converter controls +/// the exact wire format. +/// +/// +/// Registered on via [JsonConverter]. +/// STJ discovers the attribute automatically — no user-side +/// configuration needed. +/// +/// +internal sealed class MessageTrackerJsonConverter : JsonConverter +{ + /// + public override MessageTracker? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } + + /// + public override void Write(Utf8JsonWriter writer, MessageTracker value, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/NoPostmanRegisteredException.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/NoPostmanRegisteredException.cs new file mode 100644 index 0000000..3ff3ec6 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/NoPostmanRegisteredException.cs @@ -0,0 +1,45 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Thrown (internally, via ) +/// when an outbox item's runtime type does not match any registered postman. +/// +/// +/// +/// This exception is never thrown out of . +/// It is surfaced through the +/// callback as the Error member of the failure tuple, allowing the grain +/// to decide how to handle unmatched items (log, dead-letter, remove, etc.). +/// +/// +/// Common cause: A new subtype of the outbox base type was added but no +/// corresponding +/// call was registered. Fix by adding the missing postman registration in +/// OnActivateAsync. +/// +/// +/// Postman ordering: Postmen are matched first-registered-wins. If a +/// less-specific postman is registered before a more-specific one, the +/// more-specific subtype may be consumed by the less-specific postman instead +/// of reaching this exception. Register from most specific to least specific. +/// +/// +public sealed class NoPostmanRegisteredException : InvalidOperationException +{ + /// + /// Creates a new for the + /// given . + /// + /// The runtime type of the unmatched outbox item. + public NoPostmanRegisteredException(Type itemType) + : base($"No postman registered for outbox item type '{itemType.FullName}'.") + { + ItemType = itemType; + } + + /// + /// The runtime type of the outbox item that could not be matched to any + /// registered postman. + /// + public Type ItemType { get; } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Outbox.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Outbox.cs new file mode 100644 index 0000000..2eba2bd --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/Outbox.cs @@ -0,0 +1,339 @@ +using System.Collections; +using System.Collections.Immutable; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// A per-grain durable buffer of messages that have been announced +/// (committed alongside a state change) but not yet delivered +/// (handed off to a postman successfully). Lives as a property on the grain's +/// state record so it participates in atomic WriteStateAsync calls. +/// +/// +/// +/// Immutable-collection semantics: Behaves like +/// — read-only iteration, indexer access, +/// mutators (, , , +/// ) return new instances. The original is never +/// modified. Assign the return value back to the state property and write. +/// +/// +/// Sequence ownership: Only assigns sequence numbers. +/// Callers supply the payload; the outbox stamps +/// with a monotonically increasing and the +/// current . This is a hard invariant — there is no public +/// constructor that accepts a pre-built sequence number. +/// +/// +/// Epoch semantics: +/// +/// Epoch = null, +/// LatestSequenceNumber = 0. Use at construction time or for deliberate +/// ops-level sequence-space resets. +/// First → stamps Epoch = now. Persisted with state. +/// Subsequent → same epoch, incrementing sequence number. +/// → removes all items but preserves +/// and . This is the normal +/// "postman drained successfully" path. +/// +/// Grains should almost never call on an active outbox. +/// +/// +/// Equality: Two instances are equal when +/// sender, sequence metadata, epoch, count, first sequence number, and last +/// sequence number are equal. Equality is O(1). +/// +/// +/// Unbounded growth risk: If postman targets are down, the outbox grows +/// without limit. Mitigations: telemetry (gauge for depth per grain type), +/// configurable max via (oldest +/// messages auto-dropped FIFO), and documentation of storage-provider entity +/// size limits (e.g., Azure Table = 1 MB). +/// +/// +/// Serialization: Decorated with [GenerateSerializer] for Orleans. +/// A [JsonConverter] attribute (via JsonConverterFactory) ensures +/// System.Text.Json round-trips without exposing private backing fields. +/// Newtonsoft.Json is not supported out of the box; users can write and +/// register their own Newtonsoft converter if needed. +/// +/// +/// TimeProvider: The time field is non-persisted +/// ([NonSerialized], [JsonIgnore], no [Id]). After +/// deserialization the grain must call to +/// inject a test-friendly clock. Falls back to +/// if skipped — correct for production, breaks fake-clock tests. +/// +/// +/// +/// The user-defined message payload type. Must be serializable by Orleans +/// ([GenerateSerializer]) and by System.Text.Json if the storage provider +/// uses STJ. +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.Outbox`1")] +// [JsonConverter(typeof(OutboxJsonConverterFactory))] +public sealed class Outbox : IReadOnlyList>, IEquatable> +{ + [Id(0)] private readonly GrainId sender; + [Id(1)] private readonly long latestSequenceNumber; + [Id(2)] private readonly ImmutableArray> items; + [Id(3)] private readonly DateTimeOffset? epoch; + + /// + /// Non-persisted service reference. No [Id], no serialization. + /// Falls back to when not explicitly set. + /// + [NonSerialized] + [JsonIgnore] + private TimeProvider time = TimeProvider.System; + + /// + /// Internal constructor used by mutation methods to produce new instances. + /// Not user-callable — use to create the + /// initial outbox, then to append messages. + /// + internal Outbox( + GrainId sender, + long latestSequenceNumber, + ImmutableArray> items, + DateTimeOffset? epoch) + { + this.sender = sender; + this.latestSequenceNumber = latestSequenceNumber; + this.items = items; + this.epoch = epoch; + } + + /// + /// Creates a fresh, empty outbox for the given . + /// is null and + /// is 0. The next stamps a new epoch. + /// + /// + /// Use at grain-state construction time (default property initializer). + /// Calling on an active outbox is a nuclear reset — the next + /// starts a fresh epoch. Receivers see the epoch change + /// and accept unconditionally. Prefer for the normal + /// "postman drained" path. + /// + /// + /// The of the grain that owns this outbox. Baked + /// into every produced by . + /// + public static Outbox Create(GrainId sender) => + new(sender, latestSequenceNumber: 0, items: [], epoch: null); + + /// + /// Registers a for timestamp generation. + /// Must be called after deserialization to inject a test-friendly clock. + /// + /// + /// This is a void mutator on a non-persisted field — it does not produce + /// a new instance. The provider is carried forward + /// to successor instances created by , , + /// , and . + /// + public void RegisterTimeProvider(TimeProvider time) => this.time = time; + + /// + /// The of the grain that owns this outbox. Baked + /// into every . + /// + public GrainId Sender => sender; + + /// + /// The highest sequence number ever assigned in this outbox, including + /// items that have been removed. Persists independently of item contents — + /// does not reset it. + /// + public long LatestSequenceNumber => latestSequenceNumber; + + /// + /// The epoch marker stamped on the first call. null + /// only for a freshly constructed () outbox that + /// has never had an item added. + /// + public DateTimeOffset? Epoch => epoch; + + /// Gets the number of pending messages. + public int Count => items.Length; + + /// + /// true when the outbox contains no pending messages. Note that + /// may be non-zero even when empty + /// (items were drained by the postman). + /// + public bool IsEmpty => items.IsDefaultOrEmpty; + + /// Gets the envelope at the specified index. + public OutboxMessageEnvelope this[int index] => items[index]; + + /// + public IEnumerator> GetEnumerator() + => ((IEnumerable>)items).GetEnumerator(); + + /// + IEnumerator IEnumerable.GetEnumerator() => GetEnumerator(); + + /// + /// Appends a message to the outbox, assigning the next sequence number + /// and stamping the epoch on first call. + /// + /// + /// Returns a new instance — the original + /// is not modified. Assign the result back to the state property: + /// + /// state = state with { Outbox = state.Outbox.Add(myEvent) }; + /// await stateManager.WriteAsync(state); + /// + /// + /// The user-defined payload to enqueue. + /// A new outbox containing the appended message. + public Outbox Add(T message) + { + var now = time.GetUtcNow(); + var epoch = this.epoch ?? now; + var sequenceNumber = latestSequenceNumber + 1; + var token = new OutboxSequenceToken(sequenceNumber, sender, now, epoch); + var next = new Outbox( + sender, + sequenceNumber, + items.Add(new OutboxMessageEnvelope(token, message)), + epoch); + + next.RegisterTimeProvider(time); + return next; + } + + /// + /// Removes the message identified by from the outbox. + /// + /// + /// Matches the full identity against the + /// first pending item. If the token is not the FIFO head, returns the same + /// instance unchanged. Does not affect + /// or . + /// + /// The token of the message to remove. + /// A new outbox without the specified message. + public Outbox Remove(OutboxSequenceToken token) + { + if (items.IsDefaultOrEmpty || items[0].Token != token) + { + return this; + } + + var next = new Outbox( + sender, + latestSequenceNumber, + items.RemoveAt(0), + epoch); + + next.RegisterTimeProvider(time); + return next; + } + + /// + /// Batch-removes messages identified by . + /// + /// + /// Removes the longest FIFO prefix whose tokens appear in + /// . Tokens not found at the head of the outbox + /// are silently ignored. + /// + /// The tokens of the messages to remove. + /// A new outbox without the specified messages. + public Outbox RemoveRange(IEnumerable tokens) + { + var tokenSet = tokens.ToHashSet(); + if (tokenSet.Count == 0) + { + return this; + } + + var removeCount = 0; + while (removeCount < items.Length && tokenSet.Contains(items[removeCount].Token)) + { + removeCount++; + } + + var remaining = items.RemoveRange(0, removeCount); + if (remaining.Length == items.Length) + { + return this; + } + + var next = new Outbox( + sender, + latestSequenceNumber, + remaining, + epoch); + next.RegisterTimeProvider(time); + return next; + } + + /// + /// Removes all pending messages but preserves + /// and . + /// + /// + /// This is the normal path after the postman has successfully drained all + /// items. The high-water mark persists so subsequent calls + /// continue the sequence without gaps. Receivers see monotonically increasing + /// sequence numbers within the same epoch. + /// + /// A new empty outbox preserving sequence metadata. + public Outbox Clear() + { + if (items.IsDefaultOrEmpty) + { + return this; + } + + var next = new Outbox( + sender, + latestSequenceNumber, + [], + epoch); + + next.RegisterTimeProvider(time); + return next; + } + + /// + /// O(1) equality over sender, sequence metadata, epoch, count, first + /// sequence number, and last sequence number. + /// + public bool Equals(Outbox? other) + { + return ReferenceEquals(this, other) + || (other is not null + && sender.Equals(other.sender) + && latestSequenceNumber == other.latestSequenceNumber + && epoch == other.epoch + && items.Length == other.items.Length + && FirstSequenceNumber == other.FirstSequenceNumber + && LastSequenceNumber == other.LastSequenceNumber); + } + + /// + public override bool Equals(object? obj) => obj is Outbox o && Equals(o); + + /// + public override int GetHashCode() + => HashCode.Combine( + sender, + latestSequenceNumber, + epoch, + items.Length, + FirstSequenceNumber, + LastSequenceNumber); + + private long FirstSequenceNumber => + items.IsDefaultOrEmpty ? 0 : items[0].Token.SequenceNumber; + + private long LastSequenceNumber => + items.IsDefaultOrEmpty ? 0 : items[^1].Token.SequenceNumber; +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxJsonConverterFactory.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxJsonConverterFactory.cs new file mode 100644 index 0000000..5e6b225 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxJsonConverterFactory.cs @@ -0,0 +1,35 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// STJ converter factory that creates closed +/// instances for . +/// +/// +/// +/// Registered on via [JsonConverter]. STJ +/// discovers the attribute automatically — no user-side +/// configuration needed. +/// +/// +/// The converter serializes only the structural data needed to reconstruct +/// the outbox (sender, epoch, sequence numbers, items). Internal +/// bookkeeping fields (fingerprint cache, time provider) are excluded. +/// +/// +internal sealed class OutboxJsonConverterFactory : JsonConverterFactory +{ + /// + public override bool CanConvert(Type typeToConvert) + { + throw new NotImplementedException(); + } + + /// + public override JsonConverter? CreateConverter(Type typeToConvert, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelope.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelope.cs new file mode 100644 index 0000000..f52b667 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelope.cs @@ -0,0 +1,45 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Wraps a user-defined message with its +/// , forming the unit of storage and +/// dispatch within an . +/// +/// +/// +/// Immutability: Envelopes are immutable records. Retry diagnostics +/// (attempt count, last exception) are not stored on the envelope — +/// the tracks attempts in-memory, +/// keyed by . On grain reactivation, attempt counts +/// restart from zero. +/// +/// +/// Serialization: Decorated with [GenerateSerializer] for +/// Orleans and [JsonConverter] (via JsonConverterFactory) for +/// System.Text.Json. The factory creates a closed +/// JsonConverter<OutboxMessageEnvelope<T>> for the specific +/// at runtime. +/// +/// +/// Storage co-location: Envelopes live inside , +/// which lives inside the grain's state record. They are serialized as part +/// of the grain's single WriteStateAsync call — there is no separate +/// "outbox store." +/// +/// +/// +/// The user-defined message payload type. Must be serializable by both Orleans +/// ([GenerateSerializer]) and System.Text.Json if the storage provider +/// uses STJ. +/// +/// +/// The sequence token identifying this message. Assigned by +/// — never user-constructed. +/// +/// The user-defined message payload. +[GenerateSerializer] +[Alias("egil.orleans.messaging.OutboxMessageEnvelope`1")] +// [JsonConverter(typeof(OutboxMessageEnvelopeJsonConverterFactory))] +public sealed record OutboxMessageEnvelope( + [property: Id(0)] OutboxSequenceToken Token, + [property: Id(1)] T Message); diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelopeJsonConverterFactory.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelopeJsonConverterFactory.cs new file mode 100644 index 0000000..1f4399d --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxMessageEnvelopeJsonConverterFactory.cs @@ -0,0 +1,30 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// STJ converter factory that creates closed +/// instances for . +/// +/// +/// +/// Registered on via +/// [JsonConverter]. STJ discovers the attribute automatically — +/// no user-side configuration needed. +/// +/// +internal sealed class OutboxMessageEnvelopeJsonConverterFactory : JsonConverterFactory +{ + /// + public override bool CanConvert(Type typeToConvert) + { + throw new NotImplementedException(); + } + + /// + public override JsonConverter? CreateConverter(Type typeToConvert, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessor.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessor.cs new file mode 100644 index 0000000..60f1757 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessor.cs @@ -0,0 +1,170 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Grain-scoped component that owns the timer, reminder, and postman dispatch +/// lifecycle for draining an . +/// +/// +/// +/// Architecture: Each grain with an outbox gets its own +/// — no external scan, no registry, +/// no second store. +/// +/// GrainTimer for in-process fast retry while activated. +/// Durable Reminder for cross-activation recovery. Reactivates +/// the grain if it deactivates with pending items. +/// +/// +/// +/// Non-reentrant assumption: Relies on Orleans default turn-based +/// concurrency — adds no internal locks. Not safe on +/// [Reentrant] grains. Document and enforce via code review. +/// +/// +/// Postman dispatch: The grain registers one or more postmen via +/// , each handling a +/// subtype of . Matching is +/// first-registered-wins against the item's runtime type — register +/// from most specific to least specific (like a switch). +/// +/// Each item dispatches to exactly one postman. +/// Items whose runtime type matches no postman → reported as failed +/// with via +/// . +/// Per-item exceptions are caught and surfaced through +/// with +/// an in-memory attempt count that resets on grain reactivation. +/// +/// +/// +/// error contract: Only throws +/// (per-run timeout), +/// (caller token), or callback +/// exceptions from +/// / . Per-item +/// postman failures are swallowed and routed to the error callback. +/// +/// +/// Grain integration: Two obligations (both compiler-enforced): +/// +/// Implement . +/// Call InitializeOutboxProcessor(...) in OnActivateAsync. +/// +/// No ReceiveReminder override needed — the +/// DIM handles it. No manual timer/reminder lifecycle. No telemetry wiring. +/// +/// +/// Escape hatch: Grains with their own reminders can forward unknown +/// reminder names to : +/// +/// public async Task ReceiveReminder(string name, TickStatus status) +/// { +/// if (name == MyOwnReminder) { await DoMyWork(); return; } +/// await outboxProcessor.ReceiveReminderAsync(name, status); +/// } +/// +/// +/// +/// Telemetry: Emits to meter egil.orleans.messaging: +/// outbox.post.duration (histogram), outbox.post.item.duration +/// (histogram), outbox.post.items (counter), outbox.post.errors +/// (counter), outbox.depth (gauge). Tags: grain.type, +/// event.type, success. +/// +/// +/// +/// The base type of items in the outbox. Postmen can handle subtypes via +/// . +/// +public sealed partial class OutboxProcessor : IOutboxComponent + where TOutbox : notnull +{ + internal OutboxProcessor() { } + + /// + /// Registers a postman that handles items of type . + /// + /// + /// Order matters. Postmen are matched first-registered-wins against + /// the item's runtime type. Register from most specific to least specific. + /// Returns this for fluent chaining during OnActivateAsync. + /// + /// The subtype this postman handles. + /// + /// Async callback that delivers the item. Per-item exceptions are caught + /// and surfaced through . + /// + /// This processor for chaining. + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox + { + throw new NotImplementedException(); + } + + /// + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox + { + throw new NotImplementedException(); + } + + /// + /// + /// Async callback with cancellation support. The token is the per-run + /// timeout from . + /// + public OutboxProcessor AddPostman( + Func postman) where TSub : TOutbox + { + throw new NotImplementedException(); + } + + /// + /// Posts all pending items by dispatching each to its matching postman. + /// Arms timer/reminder if items remain; unregisters if the outbox is empty. + /// + /// + /// Safe to call from the grain's task scheduler (turn-based). Does not + /// throw for per-item postman failures — those are routed to + /// . Only + /// throws , + /// , or callback exceptions. + /// + /// Optional cancellation token. + public ValueTask PostAsync(CancellationToken cancellationToken = default) + { + throw new NotImplementedException(); + } + + /// + /// Called by the DIM when a reminder fires. + /// No-ops for reminder names not owned by this processor. + /// + /// The reminder name from Orleans. + /// The tick status from Orleans. + public ValueTask ReceiveReminderAsync(string reminderName, TickStatus status) + { + throw new NotImplementedException(); + } + + /// + /// Attaches this processor to the grain's as + /// a component so the DIM can discover it. + /// Called internally by InitializeOutboxProcessor. + /// + internal void AttachToGrain() + { + throw new NotImplementedException(); + } +} + +/// +/// Internal interface registered as a grain context component so the +/// DIM can forward reminder callbacks without +/// knowing the outbox generic type. +/// +internal interface IOutboxComponent +{ + /// Forwards a reminder callback to the outbox processor. + ValueTask ReceiveReminderAsync(string reminderName, TickStatus status); +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorExtensions.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorExtensions.cs new file mode 100644 index 0000000..ca6d69c --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorExtensions.cs @@ -0,0 +1,66 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Extension methods for wiring into +/// a grain's activation lifecycle. +/// +public static class OutboxProcessorExtensions +{ + /// + /// Creates and attaches an to the + /// grain. Call in OnActivateAsync and chain + /// + /// calls on the result. + /// + /// + /// + /// Registers the processor as an on the + /// grain context so the DIM can discover it + /// for reminder forwarding. + /// + /// + /// Usage: + /// + /// public override async Task OnActivateAsync(CancellationToken ct) + /// { + /// outboxProcessor = this.InitializeOutboxProcessor(new OutboxProcessorOptions<IMyEvent> + /// { + /// GetPending = () => stateManager.State.Outbox + /// .Select(e => e.Message).ToImmutableArray(), + /// OnPostCompletedAsync = async (items, ct) => + /// { + /// // remove delivered items and persist + /// }, + /// OnPostErrorAsync = async (failures, ct) => + /// { + /// // log, dead-letter, or leave for retry + /// }, + /// }) + /// .AddPostman<PriceCalculated>(PublishPriceCalculatedAsync) + /// .AddPostman<InvoiceReady>(SendInvoiceReadyAsync); + /// } + /// + /// + /// + /// + /// The grain type. Must implement (for reminder + /// DIM) and (for grain context access). Both + /// constraints are compiler-enforced. + /// + /// The base type of outbox items. + /// The grain instance (this). + /// Processor configuration. + /// + /// The processor instance. Chain AddPostman calls on it, then store + /// in a grain field for later + /// calls. + /// + public static OutboxProcessor InitializeOutboxProcessor( + this TGrain grain, + OutboxProcessorOptions options) + where TGrain : IGrainBase, IOutboxGrain + where TOutbox : notnull + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorOptions.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorOptions.cs new file mode 100644 index 0000000..96951a1 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxProcessorOptions.cs @@ -0,0 +1,98 @@ +using System.Collections.Immutable; + +namespace Egil.Orleans.Messaging; + +/// +/// Configuration for . Defines how the +/// processor reads pending items, reports successes/failures, and controls +/// timing. +/// +/// +/// +/// Callback model: The processor uses callbacks rather than DI services +/// so the grain controls dispatch logic and can pass its own state to the +/// handlers. This keeps the processor grain-scoped with no ambient dependencies. +/// +/// +/// Required callbacks: +/// +/// — snapshot of pending items, called once +/// per post run. +/// — items successfully posted. +/// The grain must remove them from its backing collection and persist. +/// +/// +/// +/// Optional callback: +/// +/// — failed items with exception and +/// attempt count. If null, failed items retry silently on the next +/// run. The grain decides: leave the item in state to retry, or remove it +/// to dead-letter after N attempts. +/// +/// +/// +/// +/// The base type of outbox items. Must match the type parameter of the +/// this options instance configures. +/// +public sealed class OutboxProcessorOptions + where TOutbox : notnull +{ + /// + /// Snapshot of pending items. Called once per post run. The processor + /// iterates the returned array and dispatches each item to its matching + /// postman. + /// + /// + /// Typically implemented as a lambda reading the grain's current outbox: + /// + /// GetPending = () => stateManager.State.Outbox.Select(e => e.Message).ToImmutableArray() + /// + /// + public required Func> GetPending { get; init; } + + /// + /// Called with the items that were successfully dispatched by their postmen. + /// The grain must remove these items from its backing collection (e.g., + /// ) and persist the updated state. + /// + /// + /// Exceptions thrown from this callback propagate out of + /// — the processor does + /// not catch callback failures. + /// + public required Func, CancellationToken, ValueTask> + OnPostCompletedAsync { get; init; } + + /// + /// Called with items that failed dispatch, along with the exception and + /// the in-memory attempt count (resets on grain reactivation). The grain + /// decides: leave the item in state to retry on the next run, or remove + /// it to dead-letter after N attempts. + /// + /// + /// If null, failed items retry silently on the next post run. + /// Exceptions thrown from this callback propagate out of + /// . + /// + public Func, + CancellationToken, ValueTask>? OnPostErrorAsync { get; init; } + + /// + /// Maximum time per post run. Set below the grain's response timeout to + /// avoid grain-level timeouts during dispatch. Default: 20 seconds. + /// + public TimeSpan ProcessingTimeout { get; init; } = TimeSpan.FromSeconds(20); + + /// + /// Timer and reminder period. Controls how frequently the processor + /// retries pending items. Default: 2 minutes. + /// + /// + /// Orleans reminders fire at most once per minute, so values below 1 + /// minute are effectively timer-only (the reminder still fires at its + /// minimum interval for cross-activation recovery). + /// + public TimeSpan RetryDelay { get; init; } = TimeSpan.FromMinutes(2); +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceToken.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceToken.cs new file mode 100644 index 0000000..5bb74d6 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceToken.cs @@ -0,0 +1,59 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Identifies a specific message within an . Combines a +/// monotonic sequence number with the sender's identity and timing information +/// to form a globally unique, totally ordered token per sender. +/// +/// +/// +/// Identity: The token is the identity of an outbox message. Receivers +/// use and to detect duplicates +/// via . +/// +/// +/// Epoch semantics: changes only when the sender +/// calls (nuclear reset). When a receiver sees +/// a token with Epoch > stored.Epoch, it accepts unconditionally, +/// knowing the sender intentionally reset its sequence space. Same-epoch +/// tokens are compared by — higher wins, equal +/// or lower is a duplicate. +/// +/// +/// Ordering: Within a single sender, tokens are totally ordered by +/// (Epoch, SequenceNumber). Across different senders, tokens are not +/// comparable — use to partition. +/// +/// +/// Serialization: Decorated with [GenerateSerializer] for +/// Orleans and [JsonConverter] for System.Text.Json. Round-trips +/// correctly through any Orleans storage provider using either serializer. +/// +/// +/// +/// Monotonically increasing sequence number within a single . +/// Assigned by — callers cannot fabricate or choose +/// sequence numbers. Starts at 1 for each new epoch. +/// +/// +/// The of the grain that owns the outbox. Receivers use +/// this to partition dedup tracking per sender. +/// +/// +/// Wall-clock time when the message was added to the outbox. Informational — +/// not used for ordering or dedup. Stamped by the sender's +/// . +/// +/// +/// Opaque marker that changes only on (full +/// sequence-space reset). Receivers compare epochs to detect resets. A token +/// with a newer epoch supersedes all prior sequence numbers from the same sender. +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.OutboxSequenceToken")] +// [JsonConverter(typeof(OutboxSequenceTokenJsonConverter))] +public sealed record OutboxSequenceToken( + [property: Id(0)] long SequenceNumber, + [property: Id(1)] GrainId Sender, + [property: Id(2)] DateTimeOffset Timestamp, + [property: Id(3)] DateTimeOffset Epoch); diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceTokenJsonConverter.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceTokenJsonConverter.cs new file mode 100644 index 0000000..a15ac73 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/OutboxSequenceTokenJsonConverter.cs @@ -0,0 +1,32 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// STJ converter for . Serializes and +/// deserializes the token's , +/// , and +/// properties. +/// +/// +/// +/// Registered on via [JsonConverter]. +/// STJ discovers the attribute automatically — no user-side +/// configuration needed. +/// +/// +internal sealed class OutboxSequenceTokenJsonConverter : JsonConverter +{ + /// + public override OutboxSequenceToken? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } + + /// + public override void Write(Utf8JsonWriter writer, OutboxSequenceToken value, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManager.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManager.cs new file mode 100644 index 0000000..efc4b5e --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManager.cs @@ -0,0 +1,124 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Default implementation of . Wraps an +/// and adds committed-state fencing, +/// version stamping for -derived types, and +/// automatic write recovery. +/// +/// +/// +/// Committed-state fence: always returns the +/// last successfully written value. During , the +/// underlying .State is mutated, but the +/// caller's view is updated only after the write succeeds. On failure, the +/// recovery path re-reads from storage to determine whether the write +/// actually landed. +/// +/// +/// Version stamping: If derives from +/// , the manager stamps a fresh +/// on the +/// property before every write. The recovery path then compares versions +/// directly (pattern-matched via is VersionedState) instead of +/// relying on , which avoids the +/// +/// reference-equality trap. +/// +/// +/// Recovery matrix: +/// +/// +/// Failure +/// Outcome after WriteAsync +/// +/// +/// Success +/// State == newState, returns normally. +/// +/// +/// Write throws, re-read succeeds, state matches +/// Lost-response — write landed. Exception swallowed. +/// +/// +/// Write throws, re-read succeeds, state does not match +/// Write genuinely failed. Original exception rethrown. +/// +/// +/// Write throws, re-read also throws +/// Double failure. State reverts to pre-write snapshot. +/// Original exception rethrown. Caller must call +/// before the next write to re-sync. +/// +/// +/// +/// +/// Concurrency: (default) uses +/// the storage provider's ETag for optimistic concurrency. Writes against +/// stale ETags fail with InconsistentStateException. +/// nulls the ETag before writing, bypassing +/// the concurrency check — use only for admin repair operations. +/// +/// +/// Thread safety: Relies on Orleans turn-based concurrency. Not safe +/// for [Reentrant] grains unless is only read +/// (never written) from interleaved calls. +/// +/// +/// +/// +/// +internal sealed class StateManager : IStateManager + where T : class, IEquatable +{ + private readonly IPersistentState storage; + + /// + /// Creates a new wrapping the given + /// facet. + /// + /// + /// The Orleans persistent state facet to wrap. The manager takes full + /// ownership — callers must not read or write + /// directly after wrapping. + /// + internal StateManager(IPersistentState storage) + { + this.storage = storage; + } + + /// + public T State + { + get => throw new NotImplementedException(); + } + + /// + public Task ReadAsync() + { + throw new NotImplementedException(); + } + + /// + public Task WriteAsync(T newState, WritePolicy policy = WritePolicy.Concurrent) + { + throw new NotImplementedException(); + } + + /// + public Task ClearAsync() + { + throw new NotImplementedException(); + } + + /// + /// Determines whether the persisted state matches the attempted write. + /// For -derived types, compares + /// directly. For plain types, falls + /// back to . + /// + private static bool IsEquivalent(T persisted, T attempted) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManagerExtensions.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManagerExtensions.cs new file mode 100644 index 0000000..4df55e3 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StateManagerExtensions.cs @@ -0,0 +1,58 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Provides the extension method that wraps an +/// in an . +/// +public static class StateManagerExtensions +{ + /// + /// Wraps the given in an + /// that provides atomic write recovery and a committed-state fence. + /// See for full behavioral contract, recovery + /// semantics, and deep-immutability requirements. + /// + /// + /// + /// Call site: Typically called once in OnActivateAsync: + /// + /// [PersistentState("state")] IPersistentState<MyState> storage; + /// IStateManager<MyState> stateManager; + /// + /// public override Task OnActivateAsync(CancellationToken ct) + /// { + /// stateManager = storage.AsStateManager(); + /// // ... + /// } + /// + /// + /// + /// Important: After calling this method, the grain should not access + /// directly. Doing so bypasses the committed-state + /// fence. See remarks for details. + /// + /// + /// Provider-specific overrides (future): In v1, this always returns + /// the default StateManager<T>. In a future version, this method + /// will check ActivationServices for a registered + /// IStateManagerFactory to resolve provider-specific implementations + /// that can classify exceptions more precisely (e.g., "definitely did not + /// persist" vs. "unknown outcome") and skip the re-read when safe. + /// + /// + /// + /// + /// The Orleans-managed persistent state facet, typically injected via + /// [PersistentState("name")]. Must already be hydrated (Orleans + /// hydrates it during the SetupState lifecycle stage, before + /// OnActivateAsync). + /// + /// + /// An wrapping . + /// + public static IStateManager AsStateManager(this IPersistentState storage) + where T : class, IEquatable + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursor.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursor.cs new file mode 100644 index 0000000..d1967d0 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursor.cs @@ -0,0 +1,125 @@ +using Orleans.Streams; + +namespace Egil.Orleans.Messaging; + +/// +/// Wraps a and its associated +/// into a single value that uses for dedup and +/// uses for resume. +/// +/// +/// +/// Projection constraint: is an abstract +/// type. The library ships an STJ converter that handles a closed set of known +/// subtypes via a $kind discriminator: +/// +/// EventSequenceToken — Orleans SimpleMessageStream +/// EventHubSequenceToken — Orleans Event Hub provider v1 +/// EventHubSequenceTokenV2 — Orleans Event Hub provider v2 +/// — library-shipped v2 +/// subclass carrying +/// +/// Unknown subtypes throw at serialization time — silently dropping the cursor +/// would corrupt dedup state. +/// +/// +/// Enriched time: Use to extract the +/// broker-side enqueue time when the token is an . +/// Returns false for all other token types. This enables end-to-end lag +/// histograms in the telemetry without coupling the +/// cursor type to a specific streaming provider. +/// +/// +/// Serialization: Decorated with [GenerateSerializer] for Orleans +/// and [JsonConverter] for System.Text.Json. The STJ converter handles +/// the polymorphic via the discriminator. +/// +/// +/// The Orleans stream identity. +/// +/// The opaque sequence token for resumption. May be null when no prior +/// position exists (subscribe from provider default). +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.StreamCursor")] +// [JsonConverter(typeof(StreamCursorJsonConverter))] +public sealed record StreamCursor( + [property: Id(0)] StreamId StreamId, + [property: Id(1)] StreamSequenceToken? Token) +{ + /// + /// Attempts to extract the broker-side enqueue time from the underlying + /// . + /// + /// + /// Returns true only when the token is an + /// . Callers can use this to + /// compute end-to-end lag (now - enqueuedTime) without coupling + /// to a specific streaming provider. + /// + /// + /// The time the event was enqueued at the broker, or default if + /// the token type does not carry this information. + /// + /// + /// true if is an + /// and the time was extracted; + /// false otherwise. + /// + public bool TryGetEnqueuedTime(out DateTimeOffset enqueuedTime) + { + throw new NotImplementedException(); + } + + /// + /// Attempts to extract the stream provider name from the underlying + /// . + /// + /// + /// Returns true only when the token is an + /// . Enables provider-aware + /// dedup and diagnostics without coupling to a specific streaming provider. + /// + /// + /// The name of the stream provider that delivered this event, or + /// null if the token type does not carry this information. + /// + /// + /// true if is an + /// and the name was extracted; + /// false otherwise. + /// + public bool TryGetStreamProviderName([NotNullWhen(true)] out string? streamProviderName) + { + throw new NotImplementedException(); + } + + /// + /// Attempts to extract the W3C traceparent value from the underlying + /// . + /// + /// + /// Returns true only when the token is an + /// whose + /// is not null. + /// uses this to create + /// s — correlating consumer + /// spans to producer spans without creating multi-hour parent-child traces. + /// + /// + /// The W3C traceparent value from the producer-side + /// , or null if the token + /// type does not carry this information or no activity was active at + /// publish time. + /// + /// + /// true if is an + /// with a non-null + /// ; false + /// otherwise. + /// + public bool TryGetTraceParent([NotNullWhen(true)] out string? traceParent) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursorJsonConverter.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursorJsonConverter.cs new file mode 100644 index 0000000..3fcee0b --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamCursorJsonConverter.cs @@ -0,0 +1,40 @@ +using System.Text.Json; +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// STJ converter for . Serializes and +/// deserializes the cursor's and +/// sequence token. +/// +/// +/// +/// wraps a StreamId and an optional +/// StreamSequenceToken. The token is polymorphic — it may be an +/// , +/// EventHubSequenceTokenV2, +/// or another provider-specific subclass. The converter round-trips the +/// concrete type via a discriminator so deserialization restores the +/// original token type. +/// +/// +/// Registered on via [JsonConverter]. +/// STJ discovers the attribute automatically — no user-side +/// configuration needed. +/// +/// +internal sealed class StreamCursorJsonConverter : JsonConverter +{ + /// + public override StreamCursor? Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } + + /// + public override void Write(Utf8JsonWriter writer, StreamCursor value, JsonSerializerOptions options) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManager.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManager.cs new file mode 100644 index 0000000..31bd7a4 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManager.cs @@ -0,0 +1,161 @@ +using Orleans.Streams; + +namespace Egil.Orleans.Messaging; + +/// +/// Grain-level facade around Orleans implicit subscriptions. Manages subscribe, +/// resume, dispatch, and per-subscription error handling. +/// +/// +/// +/// Composition over inheritance: The grain inherits from Grain +/// and uses as a field — no base-class coupling. +/// +/// +/// Typical wiring: +/// +/// public override async Task OnActivateAsync(CancellationToken ct) +/// { +/// streamManager = this.InitializeStreamManager(state.Tracker) +/// .Subscribe("electricity-prices", HandlePriceTickAsync, LogStreamError) +/// .Subscribe("tariff-events", HandleTariffChangedAsync); +/// } +/// +/// +/// +/// Resume semantics: each subscription reads +/// trackerSnapshot.LatestStream(streamId) once per subscription. +/// If a cursor exists, the is passed to +/// SubscribeAsync. If null, subscribes without a token +/// (provider default, typically: start from current). The tracker is read +/// once at subscription time, never again — the handler is the only path that +/// mutates the tracker. +/// +/// +/// OTel trace correlation: The OnNext wrapper reads +/// when the stream +/// token is enriched, parses it into an ActivityContext, and starts +/// the handler span with ActivityKind.Consumer and an +/// ActivityLink back to the producer. This produces separate traces +/// per delivery without collapsing weeks of traffic into one distributed trace. +/// +/// +/// Telemetry: Emits counters for messages delivered per +/// (streamNamespace, accepted|rejected), subscriptions +/// established/torn-down/errored, histograms for handler latency, and +/// (when is available) +/// end-to-end lag. +/// +/// +public sealed class StreamManager +{ + private StreamManager() { } + + /// + /// Creates a for the given grain. + /// + /// The grain that owns this stream manager. + /// + /// A snapshot of the grain's at activation + /// time. Used to look up resume tokens. Read once per subscription — not + /// held afterward. + /// + /// A for fluent subscription configuration. + internal static StreamManager Create( + IGrainBase owner, + MessageTracker trackerSnapshot) + { + throw new NotImplementedException(); + } + + /// + /// Declares a subscription to the given . + /// + /// + /// The expected event type on this stream namespace. Must match the type + /// published by the producer. Usually inferred from + /// ; specify explicitly only for inline + /// lambdas or ambiguous method groups. + /// + /// + /// The Orleans stream namespace to subscribe to. Must follow the + /// one-provider-per-namespace convention (code-review enforced, not + /// runtime enforced). + /// + /// + /// Receives the deserialized event and the + /// representing this event's position. The handler should update the grain's + /// and persist state before returning. + /// + /// + /// Optional synchronous error handler. Receives the stream namespace and the + /// exception. If omitted, the default behavior is to log and emit a counter + /// without rethrowing. + /// + /// + /// When true (default), the subscription looks up the last accepted + /// from the snapshot + /// and passes its token to SubscribeAsync for resumption. When + /// false, subscribes from the provider's default position (typically: + /// current/latest). + /// + /// + /// This instance for fluent subscription chaining. + /// + public StreamManager Subscribe( + string streamNamespace, + Func onNextAsync, + Action? onError = default, + bool passLatestSequenceTokenOnResume = true) + { + throw new NotImplementedException(); + } + + /// + /// Declares a subscription to the given + /// using a -returning event handler. + /// + /// + /// Convenience overload for handlers that already return . + /// Implementations should adapt this to the overload. + /// + /// + /// The expected event type on this stream namespace. Must match the type + /// published by the producer. Usually inferred from + /// ; specify explicitly only for inline + /// lambdas or ambiguous method groups. + /// + /// + /// The Orleans stream namespace to subscribe to. Must follow the + /// one-provider-per-namespace convention (code-review enforced, not + /// runtime enforced). + /// + /// + /// Receives the deserialized event and the + /// representing this event's position. The handler should update the grain's + /// and persist state before returning. + /// + /// + /// Optional synchronous error handler. Receives the stream namespace and the + /// exception. If omitted, the default behavior is to log and emit a counter + /// without rethrowing. + /// + /// + /// When true (default), the subscription looks up the last accepted + /// from the snapshot + /// and passes its token to SubscribeAsync for resumption. When + /// false, subscribes from the provider's default position (typically: + /// current/latest). + /// + /// + /// This instance for fluent subscription chaining. + /// + public StreamManager Subscribe( + string streamNamespace, + Func onNextAsync, + Action? onError = default, + bool passLatestSequenceTokenOnResume = true) + { + throw new NotImplementedException(); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManagerExtensions.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManagerExtensions.cs new file mode 100644 index 0000000..cece3fc --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/StreamManagerExtensions.cs @@ -0,0 +1,49 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Extension methods for wiring into a grain's +/// activation lifecycle. +/// +public static class StreamManagerExtensions +{ + /// + /// Creates a for the grain and returns it for + /// fluent stream subscription configuration. + /// + /// + /// + /// Call from OnActivateAsync, then chain + /// Subscribe calls on the returned manager. + /// + /// + /// Usage: + /// + /// public override async Task OnActivateAsync(CancellationToken ct) + /// { + /// streamManager = this.InitializeStreamManager(state.Tracker) + /// .Subscribe("electricity-prices", HandlePriceTickAsync); + /// } + /// + /// + /// + /// + /// The grain type. Must implement so stream providers + /// and telemetry infrastructure can be resolved through activation services. + /// + /// The grain instance (this). + /// + /// A snapshot of the grain's at activation + /// time. Used to look up resume tokens. + /// + /// + /// The initialized . Store it in a grain field so + /// the subscriptions remain rooted for the activation lifetime. + /// + public static StreamManager InitializeStreamManager( + this TGrain grain, + MessageTracker trackerSnapshot) + where TGrain : IGrainBase + { + return StreamManager.Create(grain, trackerSnapshot); + } +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/VersionedState.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/VersionedState.cs new file mode 100644 index 0000000..2ad1484 --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/VersionedState.cs @@ -0,0 +1,64 @@ +using System.Text.Json.Serialization; + +namespace Egil.Orleans.Messaging; + +/// +/// Abstract base for version-stamped grain state. Provides the +/// property that +/// uses for its write-recovery comparison. +/// +/// +/// +/// detects this base type at runtime via +/// pattern matching (if (newState is VersionedState v)). If the state +/// derives from , the manager stamps a fresh +/// on before every +/// write and compares directly on the recovery path. +/// This means the recovery comparison does not use +/// T.Equals() for -derived types, avoiding +/// problems with types like +/// whose Equals uses reference equality. For non- +/// types the manager falls back to T.Equals(). +/// +/// +/// has internal set — library code can stamp it; +/// user code cannot. This is a hard compile-time fence. The set (not +/// init) accessor allows mutation of the caller's reference during +/// . +/// +/// +/// Usage: +/// +/// [GenerateSerializer] +/// public sealed record MyState : VersionedState +/// { +/// [Id(0)] public Outbox<MyEvent> Outbox { get; init; } +/// [Id(1)] public ImmutableArray<Something> Items { get; init; } = []; +/// } +/// +/// +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.VersionedState")] +public abstract record VersionedState +{ + /// + /// Gets the version identifier stamped by + /// on every successful write. UUID v7, sortable, collision-free at grain + /// write rates. + /// + /// + /// + /// Fresh records get a non-empty version from the initializer. The manager + /// overwrites it with a new v7 UUID before each WriteAsync call. + /// + /// + /// For System.Text.Json serialization: the internal set accessor + /// is invisible to STJ by default. The + /// makes it visible without requiring a custom converter. + /// + /// + [Id(0)] + [JsonInclude] + public Guid Version { get; internal set; } = Guid.CreateVersion7(); +} diff --git a/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/WritePolicy.cs b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/WritePolicy.cs new file mode 100644 index 0000000..c64c1fd --- /dev/null +++ b/Egil.Orleans.Messaging/src/Egil.Orleans.Messaging/WritePolicy.cs @@ -0,0 +1,24 @@ +namespace Egil.Orleans.Messaging; + +/// +/// Controls how handles optimistic concurrency. +/// +[GenerateSerializer] +[Alias("egil.orleans.messaging.WritePolicy")] +public enum WritePolicy +{ + /// + /// Default. Uses the storage provider's ETag-based optimistic concurrency check. + /// If another writer changed the state since the last read, the write fails with + /// InconsistentStateException. + /// + Concurrent = 0, + + /// + /// Nulls the ETag before writing, bypassing the provider's concurrency check. + /// Use as a last-resort escape hatch when a grain needs to force-overwrite + /// state regardless of concurrent modifications (e.g., admin repair operations). + /// The recovery path and version stamping still run identically. + /// + Force = 1, +} diff --git a/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj index a26f041..01209a6 100644 --- a/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj +++ b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/Egil.Orleans.Messaging.Tests.csproj @@ -24,6 +24,7 @@ + diff --git a/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/OutboxTests.cs b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/OutboxTests.cs new file mode 100644 index 0000000..34dfc05 --- /dev/null +++ b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/OutboxTests.cs @@ -0,0 +1,210 @@ +using TimeProviderExtensions; + +namespace Egil.Orleans.Messaging.Tests; + +public sealed class OutboxTests +{ + [Fact] + public void Add_appends_message_with_next_sequence_sender_and_time() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + + var next = outbox.Add("created"); + + Assert.Empty(outbox); + Assert.Single(next); + Assert.Equal(1, next.LatestSequenceNumber); + Assert.Equal(now, next.Epoch); + Assert.Equal("created", next[0].Message); + Assert.Equal(new OutboxSequenceToken(1, sender, now, now), next[0].Token); + } + + [Fact] + public void Add_preserves_epoch_and_increments_sequence_for_later_messages() + { + var sender = GrainId.Create("test/sender", "one"); + var epoch = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var later = epoch.AddMinutes(5); + var time = new ManualTimeProvider(epoch); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + + var next = outbox.Add("first"); + time.Advance(TimeSpan.FromMinutes(5)); + next = next.Add("second"); + + Assert.Equal(2, next.Count); + Assert.Equal(2, next.LatestSequenceNumber); + Assert.Equal(epoch, next.Epoch); + Assert.Equal(new OutboxSequenceToken(1, sender, epoch, epoch), next[0].Token); + Assert.Equal(new OutboxSequenceToken(2, sender, later, epoch), next[1].Token); + } + + [Fact] + public void Remove_deletes_matching_sequence_and_preserves_sequence_metadata() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first").Add("second"); + + var next = outbox.Remove(outbox[0].Token); + + Assert.Single(next); + Assert.Equal("second", next[0].Message); + Assert.Equal(2, next.LatestSequenceNumber); + Assert.Equal(now, next.Epoch); + } + + [Fact] + public void Remove_ignores_token_from_different_sender_with_same_sequence() + { + var sender = GrainId.Create("test/sender", "one"); + var otherSender = GrainId.Create("test/sender", "two"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first"); + var otherToken = new OutboxSequenceToken(1, otherSender, now, now); + + var next = outbox.Remove(otherToken); + + Assert.Same(outbox, next); + Assert.Single(next); + } + + [Fact] + public void Remove_ignores_matching_token_that_is_not_fifo_head() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first").Add("second"); + + var next = outbox.Remove(outbox[1].Token); + + Assert.Same(outbox, next); + Assert.Equal(2, next.Count); + } + + [Fact] + public void RemoveRange_deletes_matching_fifo_prefix_and_ignores_missing_tokens() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first").Add("second").Add("third"); + var missing = new OutboxSequenceToken(99, sender, now, now); + + var next = outbox.RemoveRange([outbox[0].Token, outbox[1].Token, missing]); + + Assert.Single(next); + Assert.Equal("third", next[0].Message); + Assert.Equal(3, next.LatestSequenceNumber); + Assert.Equal(now, next.Epoch); + } + + [Fact] + public void RemoveRange_stops_at_first_token_gap_to_preserve_fifo_order() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var time = new ManualTimeProvider(now); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first").Add("second").Add("third"); + + var next = outbox.RemoveRange([outbox[0].Token, outbox[2].Token]); + + Assert.Equal(2, next.Count); + Assert.Equal("second", next[0].Message); + Assert.Equal("third", next[1].Message); + } + + [Fact] + public void Clear_removes_items_and_keeps_sequence_metadata_for_next_add() + { + var sender = GrainId.Create("test/sender", "one"); + var epoch = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var later = epoch.AddMinutes(5); + var time = new ManualTimeProvider(epoch); + var outbox = Outbox.Create(sender); + outbox.RegisterTimeProvider(time); + outbox = outbox.Add("first").Add("second"); + + var cleared = outbox.Clear(); + time.Advance(TimeSpan.FromMinutes(5)); + var next = cleared.Add("third"); + + Assert.Empty(cleared); + Assert.Equal(2, cleared.LatestSequenceNumber); + Assert.Equal(epoch, cleared.Epoch); + Assert.Equal(new OutboxSequenceToken(3, sender, later, epoch), next[0].Token); + } + + [Fact] + public void Equals_treats_matching_sequence_window_as_same_pending_items() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var leftTime = new ManualTimeProvider(now); + var rightTime = new ManualTimeProvider(now); + var left = Outbox.Create(sender); + var right = Outbox.Create(sender); + left.RegisterTimeProvider(leftTime); + right.RegisterTimeProvider(rightTime); + + left = left.Add("left").Add("middle-left").Add("last"); + right = right.Add("right").Add("middle-right").Add("last"); + + Assert.Equal(left, right); + } + + [Fact] + public void Equals_returns_true_when_sender_sequence_epoch_and_envelopes_match() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var leftTime = new ManualTimeProvider(now); + var rightTime = new ManualTimeProvider(now); + var left = Outbox.Create(sender); + var right = Outbox.Create(sender); + left.RegisterTimeProvider(leftTime); + right.RegisterTimeProvider(rightTime); + + left = left.Add("first").Add("second"); + right = right.Add("first").Add("second"); + + Assert.Equal(left, right); + Assert.Equal(left.GetHashCode(), right.GetHashCode()); + } + + [Fact] + public void Equals_returns_false_when_sequence_metadata_differs() + { + var sender = GrainId.Create("test/sender", "one"); + var now = new DateTimeOffset(2026, 5, 23, 12, 30, 0, TimeSpan.Zero); + var leftTime = new ManualTimeProvider(now); + var rightTime = new ManualTimeProvider(now); + var left = Outbox.Create(sender); + var right = Outbox.Create(sender); + left.RegisterTimeProvider(leftTime); + right.RegisterTimeProvider(rightTime); + + left = left.Add("first").Add("second"); + right = right.Add("first").Add("second").Add("third"); + + Assert.NotEqual(left, right); + } +} diff --git a/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/xunit.runner.json b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/xunit.runner.json new file mode 100644 index 0000000..da63501 --- /dev/null +++ b/Egil.Orleans.Messaging/test/Egil.Orleans.Messaging.Tests/xunit.runner.json @@ -0,0 +1,5 @@ +{ + "$schema": "https://xunit.net/schema/current/xunit.runner.schema.json", + "methodDisplay": "method", + "methodDisplayOptions": "all" +} \ No newline at end of file From a777a292c597df57d95c9ce9fc2acb46a5e3fbb8 Mon Sep 17 00:00:00 2001 From: Egil Hansen Date: Sat, 23 May 2026 22:12:40 +0000 Subject: [PATCH 04/16] ai: add skills --- .agents/skills/caveman/SKILL.md | 49 +++++ .agents/skills/diagnose/SKILL.md | 117 ++++++++++++ .../diagnose/scripts/hitl-loop.template.sh | 41 +++++ .agents/skills/grill-me/SKILL.md | 10 ++ .agents/skills/grill-with-docs/ADR-FORMAT.md | 47 +++++ .../skills/grill-with-docs/CONTEXT-FORMAT.md | 63 +++++++ .agents/skills/grill-with-docs/SKILL.md | 88 +++++++++ .agents/skills/handoff/SKILL.md | 15 ++ .../DEEPENING.md | 37 ++++ .../HTML-REPORT.md | 123 +++++++++++++ .../INTERFACE-DESIGN.md | 44 +++++ .../improve-codebase-architecture/LANGUAGE.md | 53 ++++++ .../improve-codebase-architecture/SKILL.md | 81 +++++++++ .agents/skills/prototype/LOGIC.md | 79 ++++++++ .agents/skills/prototype/SKILL.md | 30 ++++ .agents/skills/prototype/UI.md | 112 ++++++++++++ .agents/skills/testing/SKILL.md | 119 +++++++++++++ .agents/skills/testing/builders.md | 101 +++++++++++ .agents/skills/testing/commit-discipline.md | 39 ++++ .agents/skills/testing/e2e.md | 101 +++++++++++ .agents/skills/testing/fakes.md | 65 +++++++ .agents/skills/testing/interface-design.md | 96 ++++++++++ .agents/skills/testing/orleans.md | 93 ++++++++++ .agents/skills/testing/parallelization.md | 47 +++++ .agents/skills/testing/refactoring.md | 28 +++ .agents/skills/testing/tdd-workflow.md | 88 +++++++++ .agents/skills/testing/test-after.md | 48 +++++ .agents/skills/testing/tests.md | 128 +++++++++++++ .agents/skills/to-issues/SKILL.md | 83 +++++++++ .agents/skills/to-prd/SKILL.md | 76 ++++++++ .agents/skills/triage/AGENT-BRIEF.md | 168 ++++++++++++++++++ .agents/skills/triage/OUT-OF-SCOPE.md | 101 +++++++++++ .agents/skills/triage/SKILL.md | 103 +++++++++++ .agents/skills/write-a-skill/SKILL.md | 117 ++++++++++++ .agents/skills/zoom-out/SKILL.md | 7 + 35 files changed, 2597 insertions(+) create mode 100644 .agents/skills/caveman/SKILL.md create mode 100644 .agents/skills/diagnose/SKILL.md create mode 100644 .agents/skills/diagnose/scripts/hitl-loop.template.sh create mode 100644 .agents/skills/grill-me/SKILL.md create mode 100644 .agents/skills/grill-with-docs/ADR-FORMAT.md create mode 100644 .agents/skills/grill-with-docs/CONTEXT-FORMAT.md create mode 100644 .agents/skills/grill-with-docs/SKILL.md create mode 100644 .agents/skills/handoff/SKILL.md create mode 100644 .agents/skills/improve-codebase-architecture/DEEPENING.md create mode 100644 .agents/skills/improve-codebase-architecture/HTML-REPORT.md create mode 100644 .agents/skills/improve-codebase-architecture/INTERFACE-DESIGN.md create mode 100644 .agents/skills/improve-codebase-architecture/LANGUAGE.md create mode 100644 .agents/skills/improve-codebase-architecture/SKILL.md create mode 100644 .agents/skills/prototype/LOGIC.md create mode 100644 .agents/skills/prototype/SKILL.md create mode 100644 .agents/skills/prototype/UI.md create mode 100644 .agents/skills/testing/SKILL.md create mode 100644 .agents/skills/testing/builders.md create mode 100644 .agents/skills/testing/commit-discipline.md create mode 100644 .agents/skills/testing/e2e.md create mode 100644 .agents/skills/testing/fakes.md create mode 100644 .agents/skills/testing/interface-design.md create mode 100644 .agents/skills/testing/orleans.md create mode 100644 .agents/skills/testing/parallelization.md create mode 100644 .agents/skills/testing/refactoring.md create mode 100644 .agents/skills/testing/tdd-workflow.md create mode 100644 .agents/skills/testing/test-after.md create mode 100644 .agents/skills/testing/tests.md create mode 100644 .agents/skills/to-issues/SKILL.md create mode 100644 .agents/skills/to-prd/SKILL.md create mode 100644 .agents/skills/triage/AGENT-BRIEF.md create mode 100644 .agents/skills/triage/OUT-OF-SCOPE.md create mode 100644 .agents/skills/triage/SKILL.md create mode 100644 .agents/skills/write-a-skill/SKILL.md create mode 100644 .agents/skills/zoom-out/SKILL.md diff --git a/.agents/skills/caveman/SKILL.md b/.agents/skills/caveman/SKILL.md new file mode 100644 index 0000000..85770a3 --- /dev/null +++ b/.agents/skills/caveman/SKILL.md @@ -0,0 +1,49 @@ +--- +name: caveman +description: > + Ultra-compressed communication mode. Cuts token usage ~75% by dropping + filler, articles, and pleasantries while keeping full technical accuracy. + Use when user says "caveman mode", "talk like caveman", "use caveman", + "less tokens", "be brief", or invokes /caveman. +--- + +Respond terse like smart caveman. All technical substance stay. Only fluff die. + +## Persistence + +ACTIVE EVERY RESPONSE once triggered. No revert after many turns. No filler drift. Still active if unsure. Off only when user says "stop caveman" or "normal mode". + +## Rules + +Drop: articles (a/an/the), filler (just/really/basically/actually/simply), pleasantries (sure/certainly/of course/happy to), hedging. Fragments OK. Short synonyms (big not extensive, fix not "implement a solution for"). Abbreviate common terms (DB/auth/config/req/res/fn/impl). Strip conjunctions. Use arrows for causality (X -> Y). One word when one word enough. + +Technical terms stay exact. Code blocks unchanged. Errors quoted exact. + +Pattern: `[thing] [action] [reason]. [next step].` + +Not: "Sure! I'd be happy to help you with that. The issue you're experiencing is likely caused by..." +Yes: "Bug in auth middleware. Token expiry check use `<` not `<=`. Fix:" + +### Examples + +**"Why React component re-render?"** + +> Inline obj prop -> new ref -> re-render. `useMemo`. + +**"Explain database connection pooling."** + +> Pool = reuse DB conn. Skip handshake -> fast under load. + +## Auto-Clarity Exception + +Drop caveman temporarily for: security warnings, irreversible action confirmations, multi-step sequences where fragment order risks misread, user asks to clarify or repeats question. Resume caveman after clear part done. + +Example -- destructive op: + +> **Warning:** This will permanently delete all rows in the `users` table and cannot be undone. +> +> ```sql +> DROP TABLE users; +> ``` +> +> Caveman resume. Verify backup exist first. diff --git a/.agents/skills/diagnose/SKILL.md b/.agents/skills/diagnose/SKILL.md new file mode 100644 index 0000000..ed55bda --- /dev/null +++ b/.agents/skills/diagnose/SKILL.md @@ -0,0 +1,117 @@ +--- +name: diagnose +description: Disciplined diagnosis loop for hard bugs and performance regressions. Reproduce → minimise → hypothesise → instrument → fix → regression-test. Use when user says "diagnose this" / "debug this", reports a bug, says something is broken/throwing/failing, or describes a performance regression. +--- + +# Diagnose + +A discipline for hard bugs. Skip phases only when explicitly justified. + +When exploring the codebase, use the project's domain glossary to get a clear mental model of the relevant modules, and check ADRs in the area you're touching. + +## Phase 1 — Build a feedback loop + +**This is the skill.** Everything else is mechanical. If you have a fast, deterministic, agent-runnable pass/fail signal for the bug, you will find the cause — bisection, hypothesis-testing, and instrumentation all just consume that signal. If you don't have one, no amount of staring at code will save you. + +Spend disproportionate effort here. **Be aggressive. Be creative. Refuse to give up.** + +### Ways to construct one — try them in roughly this order + +1. **Failing test** at whatever seam reaches the bug — unit, integration, e2e. +2. **Curl / HTTP script** against a running dev server. +3. **CLI invocation** with a fixture input, diffing stdout against a known-good snapshot. +4. **Headless browser script** (Playwright / Puppeteer) — drives the UI, asserts on DOM/console/network. +5. **Replay a captured trace.** Save a real network request / payload / event log to disk; replay it through the code path in isolation. +6. **Throwaway harness.** Spin up a minimal subset of the system (one service, mocked deps) that exercises the bug code path with a single function call. +7. **Property / fuzz loop.** If the bug is "sometimes wrong output", run 1000 random inputs and look for the failure mode. +8. **Bisection harness.** If the bug appeared between two known states (commit, dataset, version), automate "boot at state X, check, repeat" so you can `git bisect run` it. +9. **Differential loop.** Run the same input through old-version vs new-version (or two configs) and diff outputs. +10. **HITL bash script.** Last resort. If a human must click, drive _them_ with `scripts/hitl-loop.template.sh` so the loop is still structured. Captured output feeds back to you. + +Build the right feedback loop, and the bug is 90% fixed. + +### Iterate on the loop itself + +Treat the loop as a product. Once you have _a_ loop, ask: + +- Can I make it faster? (Cache setup, skip unrelated init, narrow the test scope.) +- Can I make the signal sharper? (Assert on the specific symptom, not "didn't crash".) +- Can I make it more deterministic? (Pin time, seed RNG, isolate filesystem, freeze network.) + +A 30-second flaky loop is barely better than no loop. A 2-second deterministic loop is a debugging superpower. + +### Non-deterministic bugs + +The goal is not a clean repro but a **higher reproduction rate**. Loop the trigger 100×, parallelise, add stress, narrow timing windows, inject sleeps. A 50%-flake bug is debuggable; 1% is not — keep raising the rate until it's debuggable. + +### When you genuinely cannot build a loop + +Stop and say so explicitly. List what you tried. Ask the user for: (a) access to whatever environment reproduces it, (b) a captured artifact (HAR file, log dump, core dump, screen recording with timestamps), or (c) permission to add temporary production instrumentation. Do **not** proceed to hypothesise without a loop. + +Do not proceed to Phase 2 until you have a loop you believe in. + +## Phase 2 — Reproduce + +Run the loop. Watch the bug appear. + +Confirm: + +- [ ] The loop produces the failure mode the **user** described — not a different failure that happens to be nearby. Wrong bug = wrong fix. +- [ ] The failure is reproducible across multiple runs (or, for non-deterministic bugs, reproducible at a high enough rate to debug against). +- [ ] You have captured the exact symptom (error message, wrong output, slow timing) so later phases can verify the fix actually addresses it. + +Do not proceed until you reproduce the bug. + +## Phase 3 — Hypothesise + +Generate **3–5 ranked hypotheses** before testing any of them. Single-hypothesis generation anchors on the first plausible idea. + +Each hypothesis must be **falsifiable**: state the prediction it makes. + +> Format: "If is the cause, then will make the bug disappear / will make it worse." + +If you cannot state the prediction, the hypothesis is a vibe — discard or sharpen it. + +**Show the ranked list to the user before testing.** They often have domain knowledge that re-ranks instantly ("we just deployed a change to #3"), or know hypotheses they've already ruled out. Cheap checkpoint, big time saver. Don't block on it — proceed with your ranking if the user is AFK. + +## Phase 4 — Instrument + +Each probe must map to a specific prediction from Phase 3. **Change one variable at a time.** + +Tool preference: + +1. **Debugger / REPL inspection** if the env supports it. One breakpoint beats ten logs. +2. **Targeted logs** at the boundaries that distinguish hypotheses. +3. Never "log everything and grep". + +**Tag every debug log** with a unique prefix, e.g. `[DEBUG-a4f2]`. Cleanup at the end becomes a single grep. Untagged logs survive; tagged logs die. + +**Perf branch.** For performance regressions, logs are usually wrong. Instead: establish a baseline measurement (timing harness, `performance.now()`, profiler, query plan), then bisect. Measure first, fix second. + +## Phase 5 — Fix + regression test + +Write the regression test **before the fix** — but only if there is a **correct seam** for it. + +A correct seam is one where the test exercises the **real bug pattern** as it occurs at the call site. If the only available seam is too shallow (single-caller test when the bug needs multiple callers, unit test that can't replicate the chain that triggered the bug), a regression test there gives false confidence. + +**If no correct seam exists, that itself is the finding.** Note it. The codebase architecture is preventing the bug from being locked down. Flag this for the next phase. + +If a correct seam exists: + +1. Turn the minimised repro into a failing test at that seam. +2. Watch it fail. +3. Apply the fix. +4. Watch it pass. +5. Re-run the Phase 1 feedback loop against the original (un-minimised) scenario. + +## Phase 6 — Cleanup + post-mortem + +Required before declaring done: + +- [ ] Original repro no longer reproduces (re-run the Phase 1 loop) +- [ ] Regression test passes (or absence of seam is documented) +- [ ] All `[DEBUG-...]` instrumentation removed (`grep` the prefix) +- [ ] Throwaway prototypes deleted (or moved to a clearly-marked debug location) +- [ ] The hypothesis that turned out correct is stated in the commit / PR message — so the next debugger learns + +**Then ask: what would have prevented this bug?** If the answer involves architectural change (no good test seam, tangled callers, hidden coupling) hand off to the `/improve-codebase-architecture` skill with the specifics. Make the recommendation **after** the fix is in, not before — you have more information now than when you started. diff --git a/.agents/skills/diagnose/scripts/hitl-loop.template.sh b/.agents/skills/diagnose/scripts/hitl-loop.template.sh new file mode 100644 index 0000000..40afc46 --- /dev/null +++ b/.agents/skills/diagnose/scripts/hitl-loop.template.sh @@ -0,0 +1,41 @@ +#!/usr/bin/env bash +# Human-in-the-loop reproduction loop. +# Copy this file, edit the steps below, and run it. +# The agent runs the script; the user follows prompts in their terminal. +# +# Usage: +# bash hitl-loop.template.sh +# +# Two helpers: +# step "" → show instruction, wait for Enter +# capture VAR "" → show question, read response into VAR +# +# At the end, captured values are printed as KEY=VALUE for the agent to parse. + +set -euo pipefail + +step() { + printf '\n>>> %s\n' "$1" + read -r -p " [Enter when done] " _ +} + +capture() { + local var="$1" question="$2" answer + printf '\n>>> %s\n' "$question" + read -r -p " > " answer + printf -v "$var" '%s' "$answer" +} + +# --- edit below --------------------------------------------------------- + +step "Open the app at http://localhost:3000 and sign in." + +capture ERRORED "Click the 'Export' button. Did it throw an error? (y/n)" + +capture ERROR_MSG "Paste the error message (or 'none'):" + +# --- edit above --------------------------------------------------------- + +printf '\n--- Captured ---\n' +printf 'ERRORED=%s\n' "$ERRORED" +printf 'ERROR_MSG=%s\n' "$ERROR_MSG" diff --git a/.agents/skills/grill-me/SKILL.md b/.agents/skills/grill-me/SKILL.md new file mode 100644 index 0000000..bd04394 --- /dev/null +++ b/.agents/skills/grill-me/SKILL.md @@ -0,0 +1,10 @@ +--- +name: grill-me +description: Interview the user relentlessly about a plan or design until reaching shared understanding, resolving each branch of the decision tree. Use when user wants to stress-test a plan, get grilled on their design, or mentions "grill me". +--- + +Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. + +Ask the questions one at a time. + +If a question can be answered by exploring the codebase, explore the codebase instead. diff --git a/.agents/skills/grill-with-docs/ADR-FORMAT.md b/.agents/skills/grill-with-docs/ADR-FORMAT.md new file mode 100644 index 0000000..da7e78e --- /dev/null +++ b/.agents/skills/grill-with-docs/ADR-FORMAT.md @@ -0,0 +1,47 @@ +# ADR Format + +ADRs live in `docs/adr/` and use sequential numbering: `0001-slug.md`, `0002-slug.md`, etc. + +Create the `docs/adr/` directory lazily — only when the first ADR is needed. + +## Template + +```md +# {Short title of the decision} + +{1-3 sentences: what's the context, what did we decide, and why.} +``` + +That's it. An ADR can be a single paragraph. The value is in recording *that* a decision was made and *why* — not in filling out sections. + +## Optional sections + +Only include these when they add genuine value. Most ADRs won't need them. + +- **Status** frontmatter (`proposed | accepted | deprecated | superseded by ADR-NNNN`) — useful when decisions are revisited +- **Considered Options** — only when the rejected alternatives are worth remembering +- **Consequences** — only when non-obvious downstream effects need to be called out + +## Numbering + +Scan `docs/adr/` for the highest existing number and increment by one. + +## When to offer an ADR + +All three of these must be true: + +1. **Hard to reverse** — the cost of changing your mind later is meaningful +2. **Surprising without context** — a future reader will look at the code and wonder "why on earth did they do it this way?" +3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons + +If a decision is easy to reverse, skip it — you'll just reverse it. If it's not surprising, nobody will wonder why. If there was no real alternative, there's nothing to record beyond "we did the obvious thing." + +### What qualifies + +- **Architectural shape.** "We're using a monorepo." "The write model is event-sourced, the read model is projected into Postgres." +- **Integration patterns between contexts.** "Ordering and Billing communicate via domain events, not synchronous HTTP." +- **Technology choices that carry lock-in.** Database, message bus, auth provider, deployment target. Not every library — just the ones that would take a quarter to swap out. +- **Boundary and scope decisions.** "Customer data is owned by the Customer context; other contexts reference it by ID only." The explicit no-s are as valuable as the yes-s. +- **Deliberate deviations from the obvious path.** "We're using manual SQL instead of an ORM because X." Anything where a reasonable reader would assume the opposite. These stop the next engineer from "fixing" something that was deliberate. +- **Constraints not visible in the code.** "We can't use AWS because of compliance requirements." "Response times must be under 200ms because of the partner API contract." +- **Rejected alternatives when the rejection is non-obvious.** If you considered GraphQL and picked REST for subtle reasons, record it — otherwise someone will suggest GraphQL again in six months. diff --git a/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md b/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md new file mode 100644 index 0000000..0830255 --- /dev/null +++ b/.agents/skills/grill-with-docs/CONTEXT-FORMAT.md @@ -0,0 +1,63 @@ +# CONTEXT.md Format + +## Structure + +```md +# {Context Name} + +{One or two sentence description of what this context is and why it exists.} + +## Language + +**Order**: +{A one or two sentence description of the term} +_Avoid_: Purchase, transaction + +**Invoice**: +A request for payment sent to a customer after delivery. +_Avoid_: Bill, payment request + +**Customer**: +A person or organization that places orders. +_Avoid_: Client, buyer, account +``` + +## Rules + +- **Be opinionated.** When multiple words exist for the same concept, pick the best one and list the others as aliases to avoid. +- **Flag conflicts explicitly.** If a term is used ambiguously, call it out in "Flagged ambiguities" with a clear resolution. +- **Keep definitions tight.** One or two sentences max. Define what it IS, not what it does. +- **Show relationships.** Use bold term names and express cardinality where obvious. +- **Only include terms specific to this project's context.** General programming concepts (timeouts, error types, utility patterns) don't belong even if the project uses them extensively. Before adding a term, ask: is this a concept unique to this context, or a general programming concept? Only the former belongs. +- **Group terms under subheadings** when natural clusters emerge. If all terms belong to a single cohesive area, a flat list is fine. +- **Write an example dialogue.** A conversation between a dev and a domain expert that demonstrates how the terms interact naturally and clarifies boundaries between related concepts. + +## Single vs multi-context repos + +**Single context (most repos):** One `CONTEXT.md` at the repo root. + +**Multiple contexts:** A `CONTEXT-MAP.md` at the repo root lists the contexts, where they live, and how they relate to each other: + +```md +# Context Map + +## Contexts + +- [Ordering](./src/ordering/CONTEXT.md) — receives and tracks customer orders +- [Billing](./src/billing/CONTEXT.md) — generates invoices and processes payments +- [Fulfillment](./src/fulfillment/CONTEXT.md) — manages warehouse picking and shipping + +## Relationships + +- **Ordering → Fulfillment**: Ordering emits `OrderPlaced` events; Fulfillment consumes them to start picking +- **Fulfillment → Billing**: Fulfillment emits `ShipmentDispatched` events; Billing consumes them to generate invoices +- **Ordering ↔ Billing**: Shared types for `CustomerId` and `Money` +``` + +The skill infers which structure applies: + +- If `CONTEXT-MAP.md` exists, read it to find contexts +- If only a root `CONTEXT.md` exists, single context +- If neither exists, create a root `CONTEXT.md` lazily when the first term is resolved + +When multiple contexts exist, infer which one the current topic relates to. If unclear, ask. diff --git a/.agents/skills/grill-with-docs/SKILL.md b/.agents/skills/grill-with-docs/SKILL.md new file mode 100644 index 0000000..5ea0aa9 --- /dev/null +++ b/.agents/skills/grill-with-docs/SKILL.md @@ -0,0 +1,88 @@ +--- +name: grill-with-docs +description: Grilling session that challenges your plan against the existing domain model, sharpens terminology, and updates documentation (CONTEXT.md, ADRs) inline as decisions crystallise. Use when user wants to stress-test a plan against their project's language and documented decisions. +--- + + + +Interview me relentlessly about every aspect of this plan until we reach a shared understanding. Walk down each branch of the design tree, resolving dependencies between decisions one-by-one. For each question, provide your recommended answer. + +Ask the questions one at a time, waiting for feedback on each question before continuing. + +If a question can be answered by exploring the codebase, explore the codebase instead. + + + + + +## Domain awareness + +During codebase exploration, also look for existing documentation: + +### File structure + +Most repos have a single context: + +``` +/ +├── CONTEXT.md +├── docs/ +│ └── adr/ +│ ├── 0001-event-sourced-orders.md +│ └── 0002-postgres-for-write-model.md +└── src/ +``` + +If a `CONTEXT-MAP.md` exists at the root, the repo has multiple contexts. The map points to where each one lives: + +``` +/ +├── CONTEXT-MAP.md +├── docs/ +│ └── adr/ ← system-wide decisions +├── src/ +│ ├── ordering/ +│ │ ├── CONTEXT.md +│ │ └── docs/adr/ ← context-specific decisions +│ └── billing/ +│ ├── CONTEXT.md +│ └── docs/adr/ +``` + +Create files lazily — only when you have something to write. If no `CONTEXT.md` exists, create one when the first term is resolved. If no `docs/adr/` exists, create it when the first ADR is needed. + +## During the session + +### Challenge against the glossary + +When the user uses a term that conflicts with the existing language in `CONTEXT.md`, call it out immediately. "Your glossary defines 'cancellation' as X, but you seem to mean Y — which is it?" + +### Sharpen fuzzy language + +When the user uses vague or overloaded terms, propose a precise canonical term. "You're saying 'account' — do you mean the Customer or the User? Those are different things." + +### Discuss concrete scenarios + +When domain relationships are being discussed, stress-test them with specific scenarios. Invent scenarios that probe edge cases and force the user to be precise about the boundaries between concepts. + +### Cross-reference with code + +When the user states how something works, check whether the code agrees. If you find a contradiction, surface it: "Your code cancels entire Orders, but you just said partial cancellation is possible — which is right?" + +### Update CONTEXT.md inline + +When a term is resolved, update `CONTEXT.md` right there. Don't batch these up — capture them as they happen. Use the format in [CONTEXT-FORMAT.md](./CONTEXT-FORMAT.md). + +`CONTEXT.md` should be totally devoid of implementation details. Do not treat `CONTEXT.md` as a spec, a scratch pad, or a repository for implementation decisions. It is a glossary and nothing else. + +### Offer ADRs sparingly + +Only offer to create an ADR when all three are true: + +1. **Hard to reverse** — the cost of changing your mind later is meaningful +2. **Surprising without context** — a future reader will wonder "why did they do it this way?" +3. **The result of a real trade-off** — there were genuine alternatives and you picked one for specific reasons + +If any of the three is missing, skip the ADR. Use the format in [ADR-FORMAT.md](./ADR-FORMAT.md). + + diff --git a/.agents/skills/handoff/SKILL.md b/.agents/skills/handoff/SKILL.md new file mode 100644 index 0000000..0aa5b99 --- /dev/null +++ b/.agents/skills/handoff/SKILL.md @@ -0,0 +1,15 @@ +--- +name: handoff +description: Compact the current conversation into a handoff document for another agent to pick up. +argument-hint: "What will the next session be used for?" +--- + +Write a handoff document summarising the current conversation so a fresh agent can continue the work. Save to the temporary directory of the user's OS - not the current workspace. + +Include a "suggested skills" section in the document, which suggests skills that the agent should invoke. + +Do not duplicate content already captured in other artifacts (PRDs, plans, ADRs, issues, commits, diffs). Reference them by path or URL instead. + +Redact any sensitive information, such as API keys, passwords, or personally identifiable information. + +If the user passed arguments, treat them as a description of what the next session will focus on and tailor the doc accordingly. diff --git a/.agents/skills/improve-codebase-architecture/DEEPENING.md b/.agents/skills/improve-codebase-architecture/DEEPENING.md new file mode 100644 index 0000000..ecaf5d7 --- /dev/null +++ b/.agents/skills/improve-codebase-architecture/DEEPENING.md @@ -0,0 +1,37 @@ +# Deepening + +How to deepen a cluster of shallow modules safely, given its dependencies. Assumes the vocabulary in [LANGUAGE.md](LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**. + +## Dependency categories + +When assessing a candidate for deepening, classify its dependencies. The category determines how the deepened module is tested across its seam. + +### 1. In-process + +Pure computation, in-memory state, no I/O. Always deepenable — merge the modules and test through the new interface directly. No adapter needed. + +### 2. Local-substitutable + +Dependencies that have local test stand-ins (PGLite for Postgres, in-memory filesystem). Deepenable if the stand-in exists. The deepened module is tested with the stand-in running in the test suite. The seam is internal; no port at the module's external interface. + +### 3. Remote but owned (Ports & Adapters) + +Your own services across a network boundary (microservices, internal APIs). Define a **port** (interface) at the seam. The deep module owns the logic; the transport is injected as an **adapter**. Tests use an in-memory adapter. Production uses an HTTP/gRPC/queue adapter. + +Recommendation shape: *"Define a port at the seam, implement an HTTP adapter for production and an in-memory adapter for testing, so the logic sits in one deep module even though it's deployed across a network."* + +### 4. True external (Mock) + +Third-party services (Stripe, Twilio, etc.) you don't control. The deepened module takes the external dependency as an injected port; tests provide a mock adapter. + +## Seam discipline + +- **One adapter means a hypothetical seam. Two adapters means a real one.** Don't introduce a port unless at least two adapters are justified (typically production + test). A single-adapter seam is just indirection. +- **Internal seams vs external seams.** A deep module can have internal seams (private to its implementation, used by its own tests) as well as the external seam at its interface. Don't expose internal seams through the interface just because tests use them. + +## Testing strategy: replace, don't layer + +- Old unit tests on shallow modules become waste once tests at the deepened module's interface exist — delete them. +- Write new tests at the deepened module's interface. The **interface is the test surface**. +- Tests assert on observable outcomes through the interface, not internal state. +- Tests should survive internal refactors — they describe behaviour, not implementation. If a test has to change when the implementation changes, it's testing past the interface. diff --git a/.agents/skills/improve-codebase-architecture/HTML-REPORT.md b/.agents/skills/improve-codebase-architecture/HTML-REPORT.md new file mode 100644 index 0000000..8adc368 --- /dev/null +++ b/.agents/skills/improve-codebase-architecture/HTML-REPORT.md @@ -0,0 +1,123 @@ +# HTML Report Format + +The architectural review is rendered as a single self-contained HTML file in the OS temp directory. Tailwind and Mermaid both come from CDNs. Mermaid handles graph-shaped diagrams reliably; hand-built divs and inline SVG handle the more editorial visuals (mass diagrams, cross-sections). Mix the two — don't lean on Mermaid for everything, it'll start to look generic. + +## Scaffold + +```html + + + + + Architecture review — {{repo name}} + + + + + +
+
...
+
...
+
...
+
+ + +``` + +## Header + +Repo name, date, and a compact legend: solid box = module, dashed line = seam, red arrow = leakage, thick dark box = deep module. No introduction paragraph — straight into the candidates. + +## Candidate card + +The diagrams carry the weight. Prose is sparse, plain, and uses the glossary terms ([LANGUAGE.md](LANGUAGE.md)) without ceremony. + +Each candidate is one `
`: + +- **Title** — short, names the deepening (e.g. "Collapse the Order intake pipeline"). +- **Badge row** — recommendation strength (`Strong` = emerald, `Worth exploring` = amber, `Speculative` = slate), plus a tag for the dependency category (`in-process`, `local-substitutable`, `ports & adapters`, `mock`). +- **Files** — monospaced list, `font-mono text-sm`. +- **Before / After diagram** — the centrepiece. Two columns, side by side. See patterns below. +- **Problem** — one sentence. What hurts. +- **Solution** — one sentence. What changes. +- **Wins** — bullets, ≤6 words each. e.g. "Tests hit one interface", "Pricing logic stops leaking", "Delete 4 shallow wrappers". +- **ADR callout** (if applicable) — one line in an amber-tinted box. + +No paragraphs of explanation. If the diagram needs a paragraph to be understood, redraw the diagram. + +## Diagram patterns + +Pick the pattern that fits the candidate. Mix them. Don't make every diagram look the same — variety is part of the point. + +### Mermaid graph (the workhorse for dependencies / call flow) + +Use a Mermaid `flowchart` or `graph` when the point is "X calls Y calls Z, and look at the mess." Wrap it in a Tailwind-styled card so it doesn't feel parachuted in. Style with classDef to colour leakage edges red and the deep module dark. Sequence diagrams work well for "before: 6 round-trips; after: 1." + +```html +
+
+    flowchart LR
+      A[OrderHandler] --> B[OrderValidator]
+      B --> C[OrderRepo]
+      C -.leak.-> D[PricingClient]
+      classDef leak stroke:#dc2626,stroke-width:2px;
+      class C,D leak
+  
+
+``` + +### Hand-built boxes-and-arrows (when Mermaid's layout fights you) + +Modules as `
`s with borders and labels. Arrows as inline SVG `` or `` elements positioned absolutely over a relative container. Reach for this when you want the "after" diagram to feel like one thick-bordered deep module with greyed-out internals — Mermaid won't render that with the right weight. + +### Cross-section (good for layered shallowness) + +Stack horizontal bands (`h-12 border-l-4`) to show layers a call passes through. Before: 6 thin layers each doing nothing. After: 1 thick band labelled with the consolidated responsibility. + +### Mass diagram (good for "interface as wide as implementation") + +Two rectangles per module — one for interface surface area, one for implementation. Before: interface rectangle is nearly as tall as the implementation rectangle (shallow). After: interface rectangle is short, implementation rectangle is tall (deep). + +### Call-graph collapse + +Before: a tree of function calls rendered as nested boxes. After: the same tree collapsed into one box, with the now-internal calls shown faded inside it. + +## Style guidance + +- Lean editorial, not corporate-dashboard. Generous whitespace. Serif optional for headings (`font-serif` works well with stone/slate). +- Colour sparingly: one accent (emerald or indigo) plus red for leakage and amber for warnings. +- Keep diagrams ~320px tall so before/after sits comfortably side by side without scrolling. +- Use `text-xs uppercase tracking-wider` for module labels inside diagrams — they should read as schematic, not as UI. +- The only scripts are the Tailwind CDN and the Mermaid ESM import. The report is otherwise static — no app code, no interactivity beyond Mermaid's own rendering. + +## Top recommendation section + +One larger card. Candidate name, one sentence on why, anchor link to its card. That's it. + +## Tone + +Plain English, concise — but the architectural nouns and verbs come straight from [LANGUAGE.md](LANGUAGE.md). Concision is not an excuse to drift. + +**Use exactly:** module, interface, implementation, depth, deep, shallow, seam, adapter, leverage, locality. + +**Never substitute:** component, service, unit (for module) · API, signature (for interface) · boundary (for seam) · layer, wrapper (for module, when you mean module). + +**Phrasings that fit the style:** + +- "Order intake module is shallow — interface nearly matches the implementation." +- "Pricing leaks across the seam." +- "Deepen: one interface, one place to test." +- "Two adapters justify the seam: HTTP in prod, in-memory in tests." + +**Wins bullets** name the gain in glossary terms: *"locality: bugs concentrate in one module"*, *"leverage: one interface, N call sites"*, *"interface shrinks; implementation absorbs the wrappers"*. Don't write *"easier to maintain"* or *"cleaner code"* — those terms aren't in the glossary and don't earn their place. + +No hedging, no throat-clearing, no "it's worth noting that…". If a sentence could be a bullet, make it a bullet. If a bullet could be cut, cut it. If a term isn't in [LANGUAGE.md](LANGUAGE.md), reach for one that is before inventing a new one. diff --git a/.agents/skills/improve-codebase-architecture/INTERFACE-DESIGN.md b/.agents/skills/improve-codebase-architecture/INTERFACE-DESIGN.md new file mode 100644 index 0000000..3197723 --- /dev/null +++ b/.agents/skills/improve-codebase-architecture/INTERFACE-DESIGN.md @@ -0,0 +1,44 @@ +# Interface Design + +When the user wants to explore alternative interfaces for a chosen deepening candidate, use this parallel sub-agent pattern. Based on "Design It Twice" (Ousterhout) — your first idea is unlikely to be the best. + +Uses the vocabulary in [LANGUAGE.md](LANGUAGE.md) — **module**, **interface**, **seam**, **adapter**, **leverage**. + +## Process + +### 1. Frame the problem space + +Before spawning sub-agents, write a user-facing explanation of the problem space for the chosen candidate: + +- The constraints any new interface would need to satisfy +- The dependencies it would rely on, and which category they fall into (see [DEEPENING.md](DEEPENING.md)) +- A rough illustrative code sketch to ground the constraints — not a proposal, just a way to make the constraints concrete + +Show this to the user, then immediately proceed to Step 2. The user reads and thinks while the sub-agents work in parallel. + +### 2. Spawn sub-agents + +Spawn 3+ sub-agents in parallel using the Agent tool. Each must produce a **radically different** interface for the deepened module. + +Prompt each sub-agent with a separate technical brief (file paths, coupling details, dependency category from [DEEPENING.md](DEEPENING.md), what sits behind the seam). The brief is independent of the user-facing problem-space explanation in Step 1. Give each agent a different design constraint: + +- Agent 1: "Minimize the interface — aim for 1–3 entry points max. Maximise leverage per entry point." +- Agent 2: "Maximise flexibility — support many use cases and extension." +- Agent 3: "Optimise for the most common caller — make the default case trivial." +- Agent 4 (if applicable): "Design around ports & adapters for cross-seam dependencies." + +Include both [LANGUAGE.md](LANGUAGE.md) vocabulary and CONTEXT.md vocabulary in the brief so each sub-agent names things consistently with the architecture language and the project's domain language. + +Each sub-agent outputs: + +1. Interface (types, methods, params — plus invariants, ordering, error modes) +2. Usage example showing how callers use it +3. What the implementation hides behind the seam +4. Dependency strategy and adapters (see [DEEPENING.md](DEEPENING.md)) +5. Trade-offs — where leverage is high, where it's thin + +### 3. Present and compare + +Present designs sequentially so the user can absorb each one, then compare them in prose. Contrast by **depth** (leverage at the interface), **locality** (where change concentrates), and **seam placement**. + +After comparing, give your own recommendation: which design you think is strongest and why. If elements from different designs would combine well, propose a hybrid. Be opinionated — the user wants a strong read, not a menu. diff --git a/.agents/skills/improve-codebase-architecture/LANGUAGE.md b/.agents/skills/improve-codebase-architecture/LANGUAGE.md new file mode 100644 index 0000000..530c276 --- /dev/null +++ b/.agents/skills/improve-codebase-architecture/LANGUAGE.md @@ -0,0 +1,53 @@ +# Language + +Shared vocabulary for every suggestion this skill makes. Use these terms exactly — don't substitute "component," "service," "API," or "boundary." Consistent language is the whole point. + +## Terms + +**Module** +Anything with an interface and an implementation. Deliberately scale-agnostic — applies equally to a function, class, package, or tier-spanning slice. +_Avoid_: unit, component, service. + +**Interface** +Everything a caller must know to use the module correctly. Includes the type signature, but also invariants, ordering constraints, error modes, required configuration, and performance characteristics. +_Avoid_: API, signature (too narrow — those refer only to the type-level surface). + +**Implementation** +What's inside a module — its body of code. Distinct from **Adapter**: a thing can be a small adapter with a large implementation (a Postgres repo) or a large adapter with a small implementation (an in-memory fake). Reach for "adapter" when the seam is the topic; "implementation" otherwise. + +**Depth** +Leverage at the interface — the amount of behaviour a caller (or test) can exercise per unit of interface they have to learn. A module is **deep** when a large amount of behaviour sits behind a small interface. A module is **shallow** when the interface is nearly as complex as the implementation. + +**Seam** _(from Michael Feathers)_ +A place where you can alter behaviour without editing in that place. The *location* at which a module's interface lives. Choosing where to put the seam is its own design decision, distinct from what goes behind it. +_Avoid_: boundary (overloaded with DDD's bounded context). + +**Adapter** +A concrete thing that satisfies an interface at a seam. Describes *role* (what slot it fills), not substance (what's inside). + +**Leverage** +What callers get from depth. More capability per unit of interface they have to learn. One implementation pays back across N call sites and M tests. + +**Locality** +What maintainers get from depth. Change, bugs, knowledge, and verification concentrate at one place rather than spreading across callers. Fix once, fixed everywhere. + +## Principles + +- **Depth is a property of the interface, not the implementation.** A deep module can be internally composed of small, mockable, swappable parts — they just aren't part of the interface. A module can have **internal seams** (private to its implementation, used by its own tests) as well as the **external seam** at its interface. +- **The deletion test.** Imagine deleting the module. If complexity vanishes, the module wasn't hiding anything (it was a pass-through). If complexity reappears across N callers, the module was earning its keep. +- **The interface is the test surface.** Callers and tests cross the same seam. If you want to test *past* the interface, the module is probably the wrong shape. +- **One adapter means a hypothetical seam. Two adapters means a real one.** Don't introduce a seam unless something actually varies across it. + +## Relationships + +- A **Module** has exactly one **Interface** (the surface it presents to callers and tests). +- **Depth** is a property of a **Module**, measured against its **Interface**. +- A **Seam** is where a **Module**'s **Interface** lives. +- An **Adapter** sits at a **Seam** and satisfies the **Interface**. +- **Depth** produces **Leverage** for callers and **Locality** for maintainers. + +## Rejected framings + +- **Depth as ratio of implementation-lines to interface-lines** (Ousterhout): rewards padding the implementation. We use depth-as-leverage instead. +- **"Interface" as the TypeScript `interface` keyword or a class's public methods**: too narrow — interface here includes every fact a caller must know. +- **"Boundary"**: overloaded with DDD's bounded context. Say **seam** or **interface**. diff --git a/.agents/skills/improve-codebase-architecture/SKILL.md b/.agents/skills/improve-codebase-architecture/SKILL.md new file mode 100644 index 0000000..c12b263 --- /dev/null +++ b/.agents/skills/improve-codebase-architecture/SKILL.md @@ -0,0 +1,81 @@ +--- +name: improve-codebase-architecture +description: Find deepening opportunities in a codebase, informed by the domain language in CONTEXT.md and the decisions in docs/adr/. Use when the user wants to improve architecture, find refactoring opportunities, consolidate tightly-coupled modules, or make a codebase more testable and AI-navigable. +--- + +# Improve Codebase Architecture + +Surface architectural friction and propose **deepening opportunities** — refactors that turn shallow modules into deep ones. The aim is testability and AI-navigability. + +## Glossary + +Use these terms exactly in every suggestion. Consistent language is the point — don't drift into "component," "service," "API," or "boundary." Full definitions in [LANGUAGE.md](LANGUAGE.md). + +- **Module** — anything with an interface and an implementation (function, class, package, slice). +- **Interface** — everything a caller must know to use the module: types, invariants, error modes, ordering, config. Not just the type signature. +- **Implementation** — the code inside. +- **Depth** — leverage at the interface: a lot of behaviour behind a small interface. **Deep** = high leverage. **Shallow** = interface nearly as complex as the implementation. +- **Seam** — where an interface lives; a place behaviour can be altered without editing in place. (Use this, not "boundary.") +- **Adapter** — a concrete thing satisfying an interface at a seam. +- **Leverage** — what callers get from depth. +- **Locality** — what maintainers get from depth: change, bugs, knowledge concentrated in one place. + +Key principles (see [LANGUAGE.md](LANGUAGE.md) for the full list): + +- **Deletion test**: imagine deleting the module. If complexity vanishes, it was a pass-through. If complexity reappears across N callers, it was earning its keep. +- **The interface is the test surface.** +- **One adapter = hypothetical seam. Two adapters = real seam.** + +This skill is _informed_ by the project's domain model. The domain language gives names to good seams; ADRs record decisions the skill should not re-litigate. + +## Process + +### 1. Explore + +Read the project's domain glossary and any ADRs in the area you're touching first. + +Then use the Agent tool with `subagent_type=Explore` to walk the codebase. Don't follow rigid heuristics — explore organically and note where you experience friction: + +- Where does understanding one concept require bouncing between many small modules? +- Where are modules **shallow** — interface nearly as complex as the implementation? +- Where have pure functions been extracted just for testability, but the real bugs hide in how they're called (no **locality**)? +- Where do tightly-coupled modules leak across their seams? +- Which parts of the codebase are untested, or hard to test through their current interface? + +Apply the **deletion test** to anything you suspect is shallow: would deleting it concentrate complexity, or just move it? A "yes, concentrates" is the signal you want. + +### 2. Present candidates as an HTML report + +Write a self-contained HTML file to the OS temp directory so nothing lands in the repo. Resolve the temp dir from `$TMPDIR`, falling back to `/tmp` (or `%TEMP%` on Windows), and write to `/architecture-review-.html` so each run gets a fresh file. Open it for the user — `xdg-open ` on Linux, `open ` on macOS, `start ` on Windows — and tell them the absolute path. + +The report uses **Tailwind via CDN** for layout and styling, and **Mermaid via CDN** for diagrams where a graph/flow/sequence reliably communicates the structure. Mix Mermaid with hand-crafted CSS/SVG visuals — use Mermaid when relationships are graph-shaped (call graphs, dependencies, sequences), and hand-built divs/SVG when you want something more editorial (mass diagrams, cross-sections, collapse animations). Each candidate gets a **before/after visualisation**. Be visual. + +For each candidate, the same template as before, but rendered as a card: + +- **Files** — which files/modules are involved +- **Problem** — why the current architecture is causing friction +- **Solution** — plain English description of what would change +- **Benefits** — explained in terms of locality and leverage, and how tests would improve +- **Before / After diagram** — side-by-side, custom-drawn, illustrating the shallowness and the deepening +- **Recommendation strength** — one of `Strong`, `Worth exploring`, `Speculative`, rendered as a badge + +End the report with a **Top recommendation** section: which candidate you'd tackle first and why. + +**Use CONTEXT.md vocabulary for the domain, and [LANGUAGE.md](LANGUAGE.md) vocabulary for the architecture.** If `CONTEXT.md` defines "Order," talk about "the Order intake module" — not "the FooBarHandler," and not "the Order service." + +**ADR conflicts**: if a candidate contradicts an existing ADR, only surface it when the friction is real enough to warrant revisiting the ADR. Mark it clearly in the card (e.g. a warning callout: _"contradicts ADR-0007 — but worth reopening because…"_). Don't list every theoretical refactor an ADR forbids. + +See [HTML-REPORT.md](HTML-REPORT.md) for the full HTML scaffold, diagram patterns, and styling guidance. + +Do NOT propose interfaces yet. After the file is written, ask the user: "Which of these would you like to explore?" + +### 3. Grilling loop + +Once the user picks a candidate, drop into a grilling conversation. Walk the design tree with them — constraints, dependencies, the shape of the deepened module, what sits behind the seam, what tests survive. + +Side effects happen inline as decisions crystallize: + +- **Naming a deepened module after a concept not in `CONTEXT.md`?** Add the term to `CONTEXT.md` — same discipline as `/grill-with-docs` (see [CONTEXT-FORMAT.md](../grill-with-docs/CONTEXT-FORMAT.md)). Create the file lazily if it doesn't exist. +- **Sharpening a fuzzy term during the conversation?** Update `CONTEXT.md` right there. +- **User rejects the candidate with a load-bearing reason?** Offer an ADR, framed as: _"Want me to record this as an ADR so future architecture reviews don't re-suggest it?"_ Only offer when the reason would actually be needed by a future explorer to avoid re-suggesting the same thing — skip ephemeral reasons ("not worth it right now") and self-evident ones. See [ADR-FORMAT.md](../grill-with-docs/ADR-FORMAT.md). +- **Want to explore alternative interfaces for the deepened module?** See [INTERFACE-DESIGN.md](INTERFACE-DESIGN.md). diff --git a/.agents/skills/prototype/LOGIC.md b/.agents/skills/prototype/LOGIC.md new file mode 100644 index 0000000..526ecb1 --- /dev/null +++ b/.agents/skills/prototype/LOGIC.md @@ -0,0 +1,79 @@ +# Logic Prototype + +A tiny interactive terminal app that lets the user drive a state model by hand. Use this when the question is about **business logic, state transitions, or data shape** — the kind of thing that looks reasonable on paper but only feels wrong once you push it through real cases. + +## When this is the right shape + +- "I'm not sure if this state machine handles the edge case where X then Y." +- "Does this data model actually let me represent the case where..." +- "I want to feel out what the API should look like before writing it." +- Anything where the user wants to **press buttons and watch state change**. + +If the question is "what should this look like" — wrong branch. Use [UI.md](UI.md). + +## Process + +### 1. State the question + +Before writing code, write down what state model and what question you're prototyping. One paragraph, in the prototype's README or a comment at the top of the file. A logic prototype that answers the wrong question is pure waste — make the question explicit so it can be checked later, whether the user is watching now or returning to it AFK. + +### 2. Pick the language + +Use whatever the host project uses. If the project has no obvious runtime (e.g. a docs repo), ask. + +Match the project's existing conventions for tooling — don't add a new package manager or runtime just for the prototype. + +### 3. Isolate the logic in a portable module + +Put the actual logic — the bit that's answering the question — behind a small, pure interface that could be lifted out and dropped into the real codebase later. The TUI around it is throwaway; the logic module shouldn't be. + +The right shape depends on the question: + +- **A pure reducer** — `(state, action) => state`. Good when actions are discrete events and state is a single value. +- **A state machine** — explicit states and transitions. Good when "which actions are even legal right now" is part of the question. +- **A small set of pure functions** over a plain data type. Good when there's no implicit current state — just transformations. +- **A class or module with a clear method surface** when the logic genuinely owns ongoing internal state. + +Pick whichever shape best fits the question being asked, *not* whichever is easiest to wire to a TUI. Keep it pure: no I/O, no terminal code, no `console.log` for control flow. The TUI imports it and calls into it; nothing flows the other direction. + +This is what makes the prototype useful past its own lifetime. When the question's been answered, the validated reducer / machine / function set can be lifted into the real module — the TUI shell gets deleted. + +### 4. Build the smallest TUI that exposes the state + +Build it as a **lightweight TUI** — on every tick, clear the screen (`console.clear()` / `print("\033[2J\033[H")` / equivalent) and re-render the whole frame. The user should always see one stable view, not an ever-growing scrollback. + +Each frame has two parts, in this order: + +1. **Current state**, pretty-printed and diff-friendly (one field per line, or formatted JSON). Use **bold** for field names or section headers and **dim** for less important context (timestamps, IDs, derived values). Native ANSI escape codes are fine — `\x1b[1m` bold, `\x1b[2m` dim, `\x1b[0m` reset. No need to pull in a styling library unless one is already in the project. +2. **Keyboard shortcuts**, listed at the bottom: `[a] add user [d] delete user [t] tick clock [q] quit`. Bold the key, dim the description, or vice-versa — whatever reads cleanly. + +Behaviour: + +1. **Initialise state** — a single in-memory object/struct. Render the first frame on start. +2. **Read one keystroke (or one line)** at a time, dispatch to a handler that mutates state. +3. **Re-render** the full frame after every action — don't append, replace. +4. **Loop until quit.** + +The whole frame should fit on one screen. + +### 5. Make it runnable in one command + +Add a script to the project's existing task runner (`package.json` scripts, `Makefile`, `justfile`, `pyproject.toml`). The user should run `pnpm run ` or equivalent — never need to remember a path. + +If the host project has no task runner, just put the command at the top of the prototype's README. + +### 6. Hand it over + +Give the user the run command. They'll drive it themselves; the interesting moments are when they say "wait, that shouldn't be possible" or "huh, I assumed X would be different" — those are the bugs in the _idea_, which is the whole point. If they want new actions added, add them. Prototypes evolve. + +### 7. Capture the answer + +When the prototype has done its job, the answer to the question is the only thing worth keeping. If the user is around, ask what it taught them. If not, leave a `NOTES.md` next to the prototype so the answer can be filled in (or filled in by you, if you've watched the session) before the prototype gets deleted. + +## Anti-patterns + +- **Don't add tests.** A prototype that needs tests is no longer a prototype. +- **Don't wire it to the real database.** Use an in-memory store unless the question is specifically about persistence. +- **Don't generalise.** No "what if we wanted to support X later." The prototype answers one question. +- **Don't blur the logic and the TUI together.** If the reducer / state machine references `console.log`, prompts, or terminal escape codes, it's no longer portable. Keep the TUI as a thin shell over a pure module. +- **Don't ship the TUI shell into production.** The shell is optimised for being driven by hand from a terminal. The logic module behind it is the bit worth keeping. diff --git a/.agents/skills/prototype/SKILL.md b/.agents/skills/prototype/SKILL.md new file mode 100644 index 0000000..64f3e61 --- /dev/null +++ b/.agents/skills/prototype/SKILL.md @@ -0,0 +1,30 @@ +--- +name: prototype +description: Build a throwaway prototype to flesh out a design before committing to it. Routes between two branches — a runnable terminal app for state/business-logic questions, or several radically different UI variations toggleable from one route. Use when the user wants to prototype, sanity-check a data model or state machine, mock up a UI, explore design options, or says "prototype this", "let me play with it", "try a few designs". +--- + +# Prototype + +A prototype is **throwaway code that answers a question**. The question decides the shape. + +## Pick a branch + +Identify which question is being answered — from the user's prompt, the surrounding code, or by asking if the user is around: + +- **"Does this logic / state model feel right?"** → [LOGIC.md](LOGIC.md). Build a tiny interactive terminal app that pushes the state machine through cases that are hard to reason about on paper. +- **"What should this look like?"** → [UI.md](UI.md). Generate several radically different UI variations on a single route, switchable via a URL search param and a floating bottom bar. + +The two branches produce very different artifacts — getting this wrong wastes the whole prototype. If the question is genuinely ambiguous and the user isn't reachable, default to whichever branch better matches the surrounding code (a backend module → logic; a page or component → UI) and state the assumption at the top of the prototype. + +## Rules that apply to both + +1. **Throwaway from day one, and clearly marked as such.** Locate the prototype code close to where it will actually be used (next to the module or page it's prototyping for) so context is obvious — but name it so a casual reader can see it's a prototype, not production. For throwaway UI routes, obey whatever routing convention the project already uses; don't invent a new top-level structure. +2. **One command to run.** Whatever the project's existing task runner supports — `pnpm `, `python `, `bun `, etc. The user must be able to start it without thinking. +3. **No persistence by default.** State lives in memory. Persistence is the thing the prototype is _checking_, not something it should depend on. If the question explicitly involves a database, hit a scratch DB or a local file with a clear "PROTOTYPE — wipe me" name. +4. **Skip the polish.** No tests, no error handling beyond what makes the prototype _runnable_, no abstractions. The point is to learn something fast and then delete it. +5. **Surface the state.** After every action (logic) or on every variant switch (UI), print or render the full relevant state so the user can see what changed. +6. **Delete or absorb when done.** When the prototype has answered its question, either delete it or fold the validated decision into the real code — don't leave it rotting in the repo. + +## When done + +The _answer_ is the only thing worth keeping from a prototype. Capture it somewhere durable (commit message, ADR, issue, or a `NOTES.md` next to the prototype) along with the question it was answering. If the user is around, that capture is a quick conversation; if not, leave the placeholder so they (or you, on the next pass) can fill in the verdict before deleting the prototype. diff --git a/.agents/skills/prototype/UI.md b/.agents/skills/prototype/UI.md new file mode 100644 index 0000000..f3b6e64 --- /dev/null +++ b/.agents/skills/prototype/UI.md @@ -0,0 +1,112 @@ +# UI Prototype + +Generate **several radically different UI variations** on a single route, switchable from a floating bottom bar. The user flips between variants in the browser, picks one (or steals bits from each), then throws the rest away. + +If the question is about logic/state rather than what something looks like — wrong branch. Use [LOGIC.md](LOGIC.md). + +## When this is the right shape + +- "What should this page look like?" +- "I want to see a few options for this dashboard before committing." +- "Try a different layout for the settings screen." +- Any time the user would otherwise spend a day picking between three vague mockups in their head. + +## Two sub-shapes — strongly prefer sub-shape A + +A UI prototype is much easier to judge when it's **butting up against the rest of the app** — real header, real sidebar, real data, real density. A throwaway route on its own is a vacuum: every variant looks fine in isolation. Default to sub-shape A whenever there's a plausible existing page to host the variants. Only reach for sub-shape B if the prototype genuinely has no nearby home. + +### Sub-shape A — adjustment to an existing page (preferred) + +The route already exists. Variants are rendered **on the same route**, gated by a `?variant=` URL search param. The existing data fetching, params, and auth all stay — only the rendering swaps. This is the default; pick it unless there's a specific reason not to. + +If the prototype is for something that doesn't yet have a page but *would naturally live inside one* (a new section of the dashboard, a new card on the settings screen, a new step in an existing flow) — that's still sub-shape A. Mount the variants inside the host page. + +### Sub-shape B — a new page (last resort) + +Only use this when the thing being prototyped genuinely has no existing page to live inside — e.g. an entirely new top-level surface, or a flow that can't be embedded anywhere sensible. + +Create a **throwaway route** following whatever routing convention the project already uses — don't invent a new top-level structure. Name it so it's obviously a prototype (e.g. include the word `prototype` in the path or filename). Same `?variant=` pattern. + +Before committing to sub-shape B, sanity-check: is there really no existing page this could be embedded in? An empty route hides design problems that a populated one would expose. + +In both sub-shapes the floating bottom bar is identical. + +## Process + +### 1. State the question and pick N + +Default to **3 variants**. More than 5 stops being radically different and starts being noise — cap there. + +Write down the plan in one line, in the prototype's location or a top-of-file comment: + +> "Three variants of the settings page, switchable via `?variant=`, on the existing `/settings` route." + +This works whether the user is here to push back or not. + +### 2. Generate radically different variants + +Draft each variant. Hold each one to: + +- The page's purpose and the data it has access to. +- The project's component library / styling system (TailwindCSS, shadcn, MUI, plain CSS, whatever). +- A clear exported component name, e.g. `VariantA`, `VariantB`, `VariantC`. + +Variants must be **structurally different** — different layout, different information hierarchy, different primary affordance, not just different colours. Three slightly-tweaked card grids isn't a UI prototype, it's wallpaper. If two drafts come out too similar, redo one with explicit "do not use a card grid" guidance. + +### 3. Wire them together + +Create a single switcher component on the route: + +```tsx +// pseudo-code — adapt to the project's framework +const variant = searchParams.get('variant') ?? 'A'; +return ( + <> + {variant === 'A' && } + {variant === 'B' && } + {variant === 'C' && } + + +); +``` + +For sub-shape A (existing page): keep all the existing data fetching above the switcher; only the rendered subtree changes per variant. + +For sub-shape B (new page): the throwaway route under `/prototype/` mounts the same switcher. + +### 4. Build the floating switcher + +A small fixed-position bar at the bottom-centre of the screen with three pieces: + +- **Left arrow** — cycles to the previous variant (wraps around). +- **Variant label** — shows the current variant key and, if the variant exports a name, that name too. e.g. `B — Sidebar layout`. +- **Right arrow** — cycles forward (wraps around). + +Behaviour: + +- Clicking an arrow updates the URL search param (use the framework's router — `router.replace` on Next, `navigate` on React Router, etc) so the variant is shareable and reload-stable. +- Keyboard: `←` and `→` arrow keys also cycle. Don't intercept arrow keys when an ``, `