[WIP] Lark bot enhancement#562
Conversation
Two production failure modes observed 2026-05-03 leave a Lark user visibly stuck even though the cluster is healthy. Both healed locally, no NyxID changes: (1) /llm /route /model says "binding 已失效, 请发送 /init" while /init says "已绑定, 请先 /unbind". After PR #558 re-DCR'd the cluster's OAuth client to add the `proxy` scope, the user's existing NyxID binding (issued for the previous client_id) is rejected at broker token-exchange. ModelChannelSlashCommandHandler caught the rejection and pointed the user at /init — which then refused because the local readmodel still holds the (now-dead) binding_id, looping the user forever. Self-heal: when the broker throws BindingRevokedException / BindingNotFoundException / BindingScopeMismatchException, dispatch RevokeBindingCommand to the local ExternalIdentityBindingGAgent (reason="auto_self_heal_*") so the readmodel flips to revoked, and tell the user the binding was cleared and /init will work now. Mirrors the dispatch shape UnbindChannelSlashCommandHandler uses for explicit /unbind, but skips the NyxID-side revoke (NyxID is the one that just told us the binding is gone). (2) Bot replies as "..." forever when the LLM call fails. The streaming sink fires the first chunk via channel-relay/reply, consuming the single-use reply token and creating a placeholder message. If the LLM then fails (e.g. upstream 429), pre-fix the runtime fell through to RunLlmReplyAsync which issued a fresh /reply against the dead token and got `401 Reply token already used` — leaving the user staring at "..." with no error explanation. Self-heal: ConversationGAgent.TryCompleteStreamedReplyAsync now takes the Failed branch when streaming has already committed the placeholder. Edits the existing message in place via RunStreamChunkAsync (channel-relay/reply/update — no reply token needed) with the classified failure text, then persists ConversationTurnCompletedEvent so the runtime envelope retry loop does not refire and consume the dead token again. If the edit also fails (rare: Lark may refuse stale-message edits), persist the last flushed partial as terminal — same defence-in-depth pattern the existing PR #374 fix uses for the Completed path. Tests: - 4 new ModelSlashCommandHandlerTests pinning the binding self-heal for each rejection shape + degraded-path when IActorRuntime is missing. - 2 new ConversationGAgentDedupTests pinning the streaming-Failed branch edits the placeholder + falls through to "persist last flushed partial" when the edit also fails. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (851/851) dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests --no-build (36/36 in dedup suite) dotnet test test/Aevatar.Foundation.Core.Tests (230/230) bash tools/ci/test_stability_guards.sh (passed) bash tools/ci/query_projection_priming_guard.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… path Both reviewers caught the same bug: pre-fix the self-heal returned the "本地已自动清理" message even when IActorRuntime was unregistered or when CreateAsync / HandleEventAsync threw — the local readmodel was NOT actually cleaned in those paths, so the user follows the message to /init, /init still sees the stale binding and refuses, recreating the exact loop this PR exists to break. Split the self-heal API into cleanedMessage + degradedMessage. The cleaned message is returned ONLY when the local revoke envelope was actually dispatched to the binding actor; otherwise the degraded message points the user at /unbind explicitly. Also add a single retry on the dispatch path (mirror UnbindChannelSlashCommandHandler's PR #521 v4-pro review fix) so a one-off Orleans hiccup doesn't downgrade an otherwise self-healable binding to manual /unbind guidance. Tests: - List_DegradesToUnbindGuidance_WhenSelfHealActorRuntimeMissing now asserts the reply contains "/unbind" and DOES NOT contain "已自动清理" — pinning the no-runtime degraded path. - List_DegradesToUnbindGuidance_WhenSelfHealDispatchKeepsThrowing is a new test exercising ThrowingActorRuntime (every CreateAsync throws) and asserts AttemptCount == 2 (retry-once contract) plus the degraded reply shape. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (852/852) bash tools/ci/test_stability_guards.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly-self-heal Self-heal binding rejections + streaming-failed reply-token loop
DeepSeek v4-pro with thinking mode rejects requests when reasoning_content from prior assistant turns is not echoed back. This change: - Adds ReasoningContent to ChatMessage, LLMResponse, and LLMStreamChunk - Propagates reasoning content through ChatRuntime streaming rounds - Appends reasoning_content to conversation history for multi-turn - Implements ExtractReasoningContent in MEAILLMProvider - Wires reasoning content into non-streaming ConvertResponse Closes #563
Channel conversation LLM runs inside Orleans actors with no HTTP auth context. The scope resolver returns null, causing ActorBackedUserMemoryStore to throw InvalidOperationException on every turn and log a warning. - Add TryResolveScopeId/TryResolveWriteActorId that return null instead of throwing - ReadProjectedStateAsync returns null when no scope is available - GetAsync returns UserMemoryDocument.Empty, BuildPromptSectionAsync returns empty string — no warning logged - Write operations (Save/Add/Remove) still throw since they only run from Studio API with auth context Closes #564
…ths, history persistence
…, remove dead try-catch
Gracefully handle missing scope in UserMemoryStore (fixes #564)
…ing-content Propagate reasoning_content through LLM pipeline (fixes #563)
| "/model encountered NyxID-side binding rejection ({Reason}) but IProjectionReadinessPort is not registered; cannot verify local readmodel cleanup. actor={ActorId}, subject={Platform}:{Tenant}:{User}", | ||
| reason, | ||
| actorId, | ||
| context.Subject.Platform, context.Subject.Tenant, context.Subject.ExternalUserId); |
There was a problem hiding this comment.
[Consensus: 3 models] severity=blocker, category=arch
The new TryDispatchLocalBindingRevokeAsync runs EnsureProjectionForActorAsync + WaitForBindingStateAsync inside the slash-command request path. Three reviewers flag this as a CLAUDE.md violation: query-time projection priming is forbidden — the application/query layer must not activate a projection scope and synchronously wait for read-model materialization on the request hot path. IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Refactor self-heal so the revocation is fire-and-forget (publish a continuation event, return early to the user with a 'binding cleanup in progress' message), or perform the projection wait off the request path. CLAUDE.md ref: 「禁止 query-time replay/priming」 / 「QueryPort/QueryService/ApplicationService 不得在请求路径...同步补投影」.
Per-model verbatim
-
codex: This makes the
/modelread/list path activate a projection scope and then synchronously wait for read-model cleanup. That violatesCLAUDE.md: 禁止 query-time replay/primingand the ADR note thatIProjectionReadinessPortis a write-side completion port, not a turn/query dependency. Move this repair behind an explicit write-side command/port that owns projection activation, and keep/modelhonest if the cleanup cannot already be observed. -
v4-pro:
TryDispatchLocalBindingRevokeAsynccallsEnsureProjectionForActorAsync+WaitForBindingStateAsyncsynchronously in the slash-command handler path. This is projection priming in the request path: activating a projection and waiting for the readmodel to refresh within the request execution. Violates CLAUDE.md:
禁止 query-time priming:查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。
Move projection activation/binder setup before the slash-command path (e.g. upstream in the conversation pipeline when the inbound message arrives), or make the self-heal fire-and-forget (dispatch revoke + don't synchronously wait for readmodel). The degraded-message fallback already handles cases where the readmodel isn't cleaned within the request.
- kimi:
TryDispatchLocalBindingRevokeAsynccalls_bindingProjectionPort.EnsureProjectionForActorAsync(actorId, ct)synchronously inside the slash command handler request path. This is projection priming in the application layer. CLAUDE.md forbids query-time priming: "查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。" While the OAuth endpoint does similar priming, endpoints are Host layer; a per-user-message slash handler is not. Consider activating binding projection scopes eagerly at startup (e.g. a hosted service that primes all known binding actors) instead of per-request.
| var actor = await _actorRuntime | ||
| .CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct) | ||
| .ConfigureAwait(false); | ||
| var envelope = new EventEnvelope |
There was a problem hiding this comment.
[Consensus: 4 models] severity=blocker, category=arch
actor.HandleEventAsync(envelope, ct) dispatches RevokeBindingCommand directly through IActorRuntime, bypassing IActorDispatchPort and the actor's inbox / serialization guarantees. Four reviewers flag this as a CLAUDE.md violation of Runtime 与 Dispatch 分责: IActorRuntime may resolve lifecycle/topology, but command delivery must go through the runtime-neutral publisher (publish/send). Inline HandleEventAsync reduces to direct method invocation on the target actor — exactly the pattern CLAUDE.md forbids: 「禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象」. Replace with the runtime-neutral dispatch port (IActorDispatchPort.PublishAsync(...) or the equivalent) and let the actor's inbox process the revoke command.
Per-model verbatim
-
codex: Calling
actor.HandleEventAsyncdirectly bypassesIActorDispatchPortand manually inlines delivery into the target actor. This violatesCLAUDE.md: Runtime 与 Dispatch 分责/投递语义必须 runtime-neutral;IActorRuntimeshould only resolve lifecycle/topology, while command delivery should go throughIActorDispatchPort.DispatchAsyncwithEnvelopeRouteSemantics.CreateDirect(...)or the existing command-dispatch helper. -
v4-pro:
actor.HandleEventAsync(envelope, ct)dispatches theRevokeBindingCommanddirectly to the actor via inline method call, bypassing the actor's inbox queue and serialization guarantees. Violates CLAUDE.md:
投递语义必须 runtime-neutral:publish/send 统一表示'进入目标 actor inbox 等待处理',不得因目标是 self 或底层 runtime 差异而退化为 inline dispatch。
The comment at L126 says this mirrors UnbindChannelSlashCommandHandler — but that handler also uses the same pattern and should be fixed too. Use IActorDispatchPort (or equivalent standard dispatch mechanism) instead of CreateAsync + HandleEventAsync.
-
kimi: The handler directly calls
actor.HandleEventAsync(envelope, ct)viaIActorRuntime, bypassing the standardIActorDispatchPort. CLAUDE.md requires runtime-neutral dispatch: "禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象。" WhileUnbindChannelSlashCommandHandleruses the same pattern, the new self-heal path amplifies the violation by coupling it with projection priming and retry logic. Refactor to useIActorDispatchPortor a dedicated binding command port so the slash handler does not depend on runtime internals. -
mimo-v2.5-pro: The slash-command handler now directly orchestrates actor dispatch via
IActorRuntime,EventEnvelope, andRevokeBindingCommand. Per CLAUDE.md, slash handlers are channel-routed application services — they should not construct and dispatch actor envelopes. ExtractSelfHealRevokedBindingAsync/TryDispatchLocalBindingRevokeAsyncinto a dedicatedIBindingSelfHealService(application layer) and inject that instead. This also simplifies testing: the current test file has to mockIActorRuntime,ExternalIdentityBindingProjectionPort, ANDIProjectionReadinessPortjust to cover one catch branch.
| private readonly ILogger<ModelChannelSlashCommandHandler> _logger; | ||
|
|
||
| public ModelChannelSlashCommandHandler( | ||
| ILogger<ModelChannelSlashCommandHandler> logger, |
There was a problem hiding this comment.
[Consensus: 2 models] severity=major, category=di
Constructor injects the concrete ExternalIdentityBindingProjectionPort (and accepts nullable IActorRuntime + IProjectionReadinessPort). Two issues: (a) DI: CLAUDE.md 依赖反转 requires depending on abstractions — define IExternalIdentityBindingProjectionPort and inject the interface (current registration in IdentityServiceCollectionExtensions.cs is services.TryAddSingleton<ExternalIdentityBindingProjectionPort>() against the concrete type). (b) silent degradation: nullable self-heal collaborators mean if any registration is missing in production DI the self-heal — the user-facing fix this PR delivers — silently disappears with no error. Either make these required (non-nullable), or add a startup validation that logs a warning when self-heal will be inactive.
Per-model verbatim
-
glm-5.1:
IActorRuntime,ExternalIdentityBindingProjectionPort, andIProjectionReadinessPortare all nullable and degrade gracefully, but the self-heal feature is the primary user-facing fix in this PR. If any of these aren't registered in the production DI container, the handler silently falls back to the degraded message and the binding loop is unbroken. Verify registrations exist in the host composition root — if they're not yet wired, this should be a blocker. -
kimi: Constructor injects the concrete
ExternalIdentityBindingProjectionPortinstead of an abstraction. CLAUDE.md: "依赖反转:上层依赖抽象,禁止跨层反向依赖和对具体实现的直接耦合。" The port is registered as a singleton concrete class (IdentityServiceCollectionExtensions.cs:86,services.TryAddSingleton<ExternalIdentityBindingProjectionPort>()). DefineIExternalIdentityBindingProjectionPort, register against it, and inject the interface.
|
|
||
| public async Task<bool> RemoveEntryAsync(string id, CancellationToken ct = default) | ||
| { | ||
| var actor = await EnsureWriteActorAsync(ct); |
There was a problem hiding this comment.
[Consensus: 3 models] severity=major, category=concurrency
RemoveEntryAsync was reordered so EnsureWriteActorAsync runs before ReadProjectedStateAsync. Three reviewers flag this side-effect change: previously, calling RemoveEntryAsync("missing") was a no-op that returned false without materializing any actor; now it always activates the user-memory actor first (state hydration, reminder registration, storage I/O), even when the entry does not exist (returns false at the existence check). This violates CLAUDE.md 默认路径须定义资源语义 — a no-op delete now implicitly creates a resource, which has cost (cluster activation, cache pressure) and changes resource lifecycle. Move actor resolution after the existence check; only resolve the scope/actor when there is real work to do.
Per-model verbatim
-
codex:
RemoveEntryAsync("missing")now creates or materializes the user-memory actor before checking whether the entry exists. That turns a no-op delete into implicit resource creation, violatingCLAUDE.md: 默认路径须定义资源语义; keep the projected read first, and if the no-scope case must throw, resolve the scope ID explicitly without callingEnsureWriteActorAsyncuntil a matching entry is found. -
kimi:
RemoveEntryAsyncwas reordered to callEnsureWriteActorAsyncbeforeReadProjectedStateAsync. This causes unnecessary actor activation when the entry does not exist. Actor creation has side effects (state hydration, reminder registration, storage reads). Move actor resolution back after the existence check, or document why eager activation is required. -
mimo-v2.5-pro: The reorder of
EnsureWriteActorAsyncbeforeReadProjectedStateAsyncinRemoveEntryAsyncchanges the side-effect ordering. Previously the actor was only created when the entry existed. Now everyRemoveEntryAsynccall creates the actor even when the entry is missing (returnsfalseat line 133). The actor creation is idempotent, but it's an unnecessary round-trip in the common "entry not found" case. Consider movingEnsureWriteActorAsyncback after the existence check, or documenting that the actor must exist before the read (if there's a race-condition fix reason for the reorder).
| } | ||
|
|
||
| public async Task<string> BuildPromptSectionAsync(int maxChars = 2000, CancellationToken ct = default) | ||
| { |
There was a problem hiding this comment.
[codex] severity=major, category=bug
Removing the previous try/catch means a transient projection/read-model failure in BuildPromptSectionAsync will now fail the whole chat/prompt construction path instead of degrading by omitting memory. Restore the best-effort behavior here: catch non-cancellation exceptions from GetAsync, log the warning, and return string.Empty.
| runtime.Actors.Should().NotContainKey("user-memory-user-1", | ||
| "no actor should be created when entry is missing"); | ||
| var actor = runtime.Actors["user-memory-user-1"]; | ||
| actor.ReceivedEnvelopes.Should().BeEmpty("no remove command should be dispatched when entry is missing"); |
There was a problem hiding this comment.
[v4-pro] severity=major, category=test
RemoveEntryAsync_MissingEntry_ReturnsFalse changed from asserting runtime.Actors.Should().NotContainKey(...) (actor NOT created) to asserting runtime.Actors["user-memory-user-1"] EXISTS with empty envelopes. But the new RemoveEntryAsync reads state first and returns false before calling EnsureWriteActorAsync when the entry is missing — the actor should not be created at all. Either the test is wrong or the test infrastructure (EmptyReader, FakeActorRuntime) triggers an unexpected actor creation. Verify the test passes against the actual code change.
| AttachOpenAIRawRepresentationForReasoning(meaiMsg, msg); | ||
| result.Add(meaiMsg); | ||
| } | ||
|
|
There was a problem hiding this comment.
[Consensus: 3 models] severity=major, category=arch
AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's serialized JSON via rawMessage.Patch.Set("$.reasoning_content"u8, ...) under #pragma warning disable SCME0001. Three reviewers flag the same fragility: Patch is an SDK-internal experimental surface (SCME0001), reasoning_content is not a documented field on AssistantChatMessage, and the JSONPath relies on the SDK's current serialization shape. Any SDK update that changes the patch model, the serializer, or the message structure will silently break or drop reasoning content propagation — there is no compile-time guarantee. Mitigations: (a) pin the OpenAI SDK version, (b) add an integration test that asserts the serialized JSON contains reasoning_content, and (c) file a feature request upstream for first-class reasoning content support so the workaround can be retired.
Per-model verbatim
-
glm-5.1:
rawMessage.Patch.Set("$.reasoning_content"u8, ...)relies on the OpenAI SDK's internalPatch/SCME0001surface to injectreasoning_contentinto the serialized JSON. This is fragile — any SDK update that changes the patch model or serialization pipeline will silently drop reasoning content or throw at runtime. Consider isolating this behind a version-checked helper (or a feature flag) so a future SDK bump doesn't cause a silent regression. -
v4-pro:
AttachOpenAIRawRepresentationForReasoninginjectsreasoning_contentinto the OpenAI SDK'sAssistantChatMessageviarawMessage.Patch.Set("$.reasoning_content"u8, ...). This hardcoded JSON-path depends on the SDK's internal serialization structure (AssistantChatMessage.ModelReaderWriter.Writeoutput) which could change between versions and silently break reasoning propagation. Consider a customAssistantChatMessagesubclass or an explicitBinaryContentwrapper if the SDK doesn't natively support reasoning_content. -
mimo-v2.5-pro:
AttachOpenAIRawRepresentationForReasoningusesrawMessage.Patch.Set("$.reasoning_content"u8, ...)with#pragma warning disable SCME0001. This is a fragile internal API — the OpenAI SDK doesn't documentreasoning_contentas a supported field onAssistantChatMessage, and thePatchproperty is implementation-detail. If the SDK changes wire format or removes the Patch escape hatch, this will silently break reasoning round-tripping. Add a fallback: if the patch throws (or ifRawRepresentationdoesn't survive a serialize-then-deserialize cycle), log a warning and degrade gracefully rather than crashing the entire chat call.
| CancellationToken ct) | ||
| { | ||
| try | ||
| { |
There was a problem hiding this comment.
[v4-pro] severity=minor, category=perf
MergeProxyRouteCandidatesAsync calls _nyxClient.DiscoverProxyServicesAsync synchronously within GetServicesAsync. If the /api/v1/proxy/services endpoint is slow or unreachable (and the catch swallows the exception with a warning), the catalog load time doubles on every /model or /route invocation. Consider making the proxy-service discovery happen on a background refresh, or caching with a TTL, rather than per-request.
| NyxIdLlmServicesResult result, | ||
| string bearerToken, | ||
| CancellationToken ct) | ||
| { |
There was a problem hiding this comment.
[mimo-v2.5-pro] severity=minor, category=design
The proxy endpoint path /api/v1/proxy/services?per_page=100 is hardcoded here AND in NyxIdLlmServiceCatalogClient.cs:63. Two independent HTTP clients call the same endpoint. Extract the path into a shared constant (or better, into NyxIdLlmProviderSource / a config key) so the two call-sites stay in sync if the path changes.
… tools
CC Switch (and other Anthropic→OpenAI Responses translators) advertise
Claude Code tools as a mix of `{type: "function", name, parameters, …}`
function declarations and OpenAI built-in tool declarations like
`{type: "web_search_preview"}` / `{type: "file_search", vector_store_ids: […]}`
/ `{type: "code_interpreter"}` / `{type: "computer_use_preview"}`. The
built-in forms have no `name` or `function` block — they're routing
hints to the model provider, not custom function definitions.
Previously the normalizer required every tool entry to have a name,
so a single built-in declaration anywhere in the array failed the
whole request with `invalid_tools: "Each tool requires a non-empty
name."` — exactly what the user hit when configuring CC Switch
against `chrono-llm/gpt-5.5`.
Skip non-function-typed entries silently (aevatar's classifier only
owns forward / substitute / additive function tools), keep validating
name for function-type entries. Improve the error message to include
the failing tool index for easier diagnosis on future malformed
inputs.
Tests:
- A mixed array (`web_search_preview` + `file_search` + a real
function tool `Bash`) now returns 200 and the LLM provider sees
exactly the one function tool; built-ins are dropped.
- A function-type entry without a name still 400s with
`function tool at index 1 requires a non-empty name`.
Revert-and-rerun confirmed the built-in pass-through test goes red
when the skip branch is disabled.
…data fallback Two issues surfaced when CC Switch users tried multi-turn tool conversations on chrono-llm / llm-deepseek: 1. `function_call_output requires previous_response_id` 400 on the followup turn. CC Switch / Codex translating Claude Code's prior tool-result turn forwards `function_call_output` items in `input` but does NOT propagate `previous_response_id` — they don't model OpenAI's server-side session. Strict #629 §13 normalization rejected the request and the agent couldn't continue any multi-turn tool conversation. Fix: when `previous_response_id` is absent, fold `function_call_output` entries into the user prompt with synthetic `[tool_result call_id=…]` markers and clear ToolResults. Continuation contract still applies when the client actually sends `previous_response_id`. 2. `Model metadata for `llm-deepseek/deepseek-v4-pro` not found` warning from CC Switch — upstream deepseek `/v1/models` is OpenAI-spec minimal (no context_length / max_output_tokens), the no-hardcode rule forbids in-code defaults, but the user explicitly asked for "兜底" so requests don't get crippled by CC Switch's conservative fallback metadata. Fix: config-driven `Aevatar:Responses:ModelMetadataFallbacks`. Lookup precedence per entry: upstream fields → exact `{slug}/{model}` match → group-wide `{slug}` match. Fallback only fills nulls; never overwrites upstream values. Empty config = no-op. Default deployment ships with deepseek + llm-deepseek 64k/8k defaults in appsettings.json; deployment can override / remove. Stays out of code per the `feedback_no_hardcoded_metadata` rule that explicitly permits config-driven slug→config. Tests cover both regressions + revert-and-rerun confirms the fold test catches the exact `previous_response_required` 400 the user reported.
Covers nyxid CLI API key issuance, cc-switch codex provider TOML, and end-to-end curl smoke test. Notes Aevatar is auto_connected so users only add the LLM provider they want to route through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds §6 to the cc-switch runbook describing the planned /v1/messages endpoint: capability matrix vs /v1/responses, protocol-mismatch rationale (Messages is stateless, aevatar runtime is stateful — surface is intentionally narrow), and the cc-switch Claude-tab config to use once it lands. Renumbers downstream sections and updates the troubleshooting row that previously said Messages was not on the roadmap.
The actor state, proto, and ports are protocol-neutral — they carry scope/owner/status and forwarded tool calls without any Responses-only fields. Naming them ResponseSession blocks reuse by /v1/messages (Anthropic Messages, planned Path B) and misleads readers into thinking the actor is OpenAI-Responses-specific. Mechanical rename across 25 files (400 lines changed, 400 added): ResponseSession -> LlmSession response_session(s) -> llm_session(s) RESPONSE_SESSION_ -> LLM_SESSION_ IResponseSession* -> ILlmSession* Field `response_id` is kept on the record because Path A (Responses) keeps emitting OpenAI-style response ids on the wire; Path B will synthesize its own session identifier into the same field. Refs #642
…cade
Path B is a stateless facade over the same LlmSessionGAgent +
NyxIdLLMProvider + IResponsesCompletionApplicationService pipeline that
serves /v1/responses. No new GAgent type is introduced — the previously
renamed LlmSessionGAgent is protocol-neutral and the facade reuses it
verbatim. cc-switch users running Claude Code can now point at the
Aevatar/NyxID base URL and get an end-to-end Messages-protocol session.
Surface (src/Aevatar.Mainnet.Host.Api/Messages/):
POST /v1/messages AllowAnonymous, manual bearer extraction,
mirrors /v1/responses auth model
- non-streaming returns Anthropic message envelope JSON
- streaming emits Anthropic SSE schedule:
message_start
content_block_start (text or tool_use)
content_block_delta (text_delta /
input_json_delta)
content_block_stop
message_delta
message_stop
Lossy translation boundary (documented inline + tested):
- thinking block -> ChatMessage.ReasoningContent (lossless)
- tool_use block -> ToolCall.ArgumentsJson (text args lossless;
image args dropped)
- tool_result block -> ChatMessage.Tool(callId, output) (text only)
- cache_control -> ignored in v1
Stateless contract: Anthropic Messages has no previous_response_id, so
each POST opens + closes its own LlmSession (24h TTL kept for parity
with Path A audit/projection).
Tests (test/Aevatar.Hosting.Tests/MainnetMessagesEndpointsTests.cs):
PostMessages_NonStreaming_ShouldReturnAnthropicMessageEnvelope
PostMessages_Streaming_ShouldEmitAnthropicSseFrames
PostMessages_WithToolCall_ShouldEmitToolUseContentBlock
PostMessages_WithoutBearer_ShouldReturn401WithAnthropicErrorEnvelope
PostMessages_WithToolResultBlockInUserContent_ShouldFlattenIntoToolRoleMessage
Refs #642
Defines the four-phase chain contract (accepted/committed/delivered/finalized) across ConversationGAgent / IChannelLlmReplyRunDispatcher / AgentRunGAgent with a single observable state per phase. Locks in: - AgentRunStatus 5-state proto enum, with REPLY_HANDED_OFF replacing the reply_dispatched bool (which becomes reserved). - ConversationState.last_reply_delivery single field carrying user-visible delivery ack from the channel sink. - Typed DispatchOutcome on IChannelLlmReplyRunDispatcher.DispatchAsync so the synchronous return point only promises accepted. - finalized as an absorbing state; late/stale signals must no-op. Companion docs/canon/lark-reply-completion-semantics.md adds sequence diagrams, failure matrix, state machine views, and the implementation checklist used by the upcoming code-side commits (issues #647 / #648 / #649). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the chain-level contract pieces of ADR-0021 that the rest of the
reply chain work depends on:
agent_run.proto
- AgentRunStatus gains AGENT_RUN_STATUS_REPLY_HANDED_OFF.
- AgentRunGAgentState.reply_dispatched (field 12) marked reserved; the
explicit status replaces the implicit bool. Legacy event replay now
promotes straight to REPLY_HANDED_OFF.
- New cleanup_completed_at_unix_ms field (combined with terminal status,
this is the chain.finalized observable).
- New AgentRunCleanupCompletedEvent for the terminal-cleanup write.
conversation_state.proto / conversation_events.proto
- New ReplyDeliveryStatus message (Pending / Delivered / DeliveryFailed)
on ConversationGAgentState.last_reply_delivery as the
chain.delivered observable.
- New LlmReplyDeliveredEvent + LlmReplyDeliveryFailedEvent that drive
ConversationGAgent into those outcomes.
IChannelLlmReplyRunDispatcher
- DispatchAsync now returns Task<DispatchOutcome> with a typed phase
(Accepted / RejectedStale / RejectedDuplicate). The interface has no
NuGet consumers and the three in-repo call sites are adapted in this
commit; behaviour is preserved.
AgentRunDispatcher
- Performs a cheap freshness check at the boundary (mirrors
AgentRunGAgent.MaxRunRequestAgeMs) and returns RejectedStale instead
of enqueuing requests that the run actor would only drop.
AgentRunGAgent
- All read sites of State.ReplyDispatched move to status checks:
REPLY_HANDED_OFF replaces (ReplyProduced && ReplyDispatched).
- ApplyReplyProduced legacy-event path promotes status to
REPLY_HANDED_OFF; ApplyReplyDispatched promotes committed ->
handed-off; the new-event path leaves status at REPLY_PRODUCED until
the dispatched event lands.
Tests
- ConversationGAgentDedupTests RecordingRunDispatcher mock adapted to
the new return type.
- AgentRunGAgentTests assertions migrated from ReplyDispatched bool to
explicit REPLY_PRODUCED / REPLY_HANDED_OFF status expectations.
Docs
- canon checklist fixed to reflect that the handed-off transition uses
the existing AgentRunReplyDispatchedEvent (no new event required) and
that AgentRunCleanupCompletedEvent drives cleanup_completed_at.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tionGAgent
Makes ADR-0021 chain.delivered observable from
ConversationGAgentState.last_reply_delivery instead of inference from
log lines or channel sink return codes. ConversationGAgent now:
* Persists LlmReplyDeliveredEvent before ConversationTurnCompletedEvent
on the non-streaming success path (HandleLlmReplyReadyAsync ->
RunLlmReplyAsync) and on the streaming completion path
(PersistStreamedCompletionAsync — which is the unified streaming
sink that all partial / full / failure-self-heal branches funnel
through, so any user-visible content counts as delivered).
* Persists LlmReplyDeliveryFailedEvent before
ConversationContinueFailedEvent on the non-streaming failure path
so DeliveryFailed carries the structured reason while the
chain-finalizing failure event is still last.
* Wires two new state-matcher entries (ApplyLastReplyDelivered /
ApplyLastReplyDeliveryFailed) that populate
ConversationGAgentState.last_reply_delivery — single-field by
design, multi-turn history reconstructable from event log.
Raise order is delivered → completed (resp. failed) so existing
consumers of "events.Last() is ConversationTurnCompletedEvent /
ConversationContinueFailedEvent" stay correct.
Tests
- HandleLlmReplyReadyAsync_WhenDuplicateCorrelationId_CollapsesToSingleOutboundCommit
now expects 3 events (the new Delivered event sits between
NeedsLlmReplyEvent and TurnCompleted), with a regression check
that LlmReplyDeliveredEvent is present in the log.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0021 §6 / canon §8 streaming closeout contract — stream-local
terminal, aggregated Usage on a single closeout point, FinishReason
visibility — implemented at the run-actor boundary instead of inside
ChatRuntime. Respects aevatar#596 phasing: ChatRuntime stays a
transitional local loop (Phase A); the run-actor edge owns the
contract surface (Phase A targets) so the eventual ChatRuntime
tear-down in Phase B does not need to renegotiate it.
IConversationReplyGenerator
- GenerateReplyAsync now returns Task<ConversationReplyResult> with
Text, Usage (ReplyTokenUsage), and FinishReason. ReplyTokenUsage is
a Channel.Runtime-local projection mirroring AI.Abstractions.TokenUsage
so Channel.Runtime does not gain a reverse layer dependency on
AI.Abstractions (CLAUDE.md "依赖反转").
NyxIdConversationReplyGenerator
- Aggregates Usage across all internal LLM rounds (tool-call loop)
via SumUsage and tracks the last non-empty FinishReason. The
foreach over the ChatRuntime stream extracts both before falling
through to the DeltaContent-only forwarding to the streaming sink,
so neither metric is dropped when DeltaContent is empty.
AgentRunGAgent
- Consumes ConversationReplyResult at the run-actor edge and logs
the closeout once (runId / correlation / prompt|completion|total
tokens / finishReason). LlmReplyReadyEvent semantics unchanged
until a follow-up PR persists the closeout into actor state.
ChatRuntime
- Field-level patch only: NormalizeStreamChunk now forwards
chunk.FinishReason on the projected stream chunk (previously
swallowed). No restructuring; ChatRuntime remains transitional.
Tests
- 3 IConversationReplyGenerator mocks in AgentRunGAgentTests adapted
to the new return type.
- 5 ConversationReplyGeneratorTests reply assertions migrated to
reply.Text.
- New regression GenerateReplyAsync_AggregatesUsageAndFinishReasonAtActorEdge
using UsageReportingProviderFactory mock — provider emits Usage on
a mid-stream bookkeeping chunk and IsLast separately; the test
asserts the actor-edge result carries both the aggregated tokens
(7/11/18) and FinishReason ("stop").
Test state: 803 (ChannelRuntime) + 134 (Channel.Protocol) + 542 (AI)
= 1479 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0021 §6 / canon §9 absorbing-finalized contract: late and stale
reply-chain signals must no-op once a run has reached
chain.finalized (terminal AgentRunStatus + cleanup_completed_at != 0
on the run actor; ProcessedCommandIds containment on the
conversation actor). Lifts the previously ad-hoc terminal checks
into single helpers and applies them to every handler entry.
AgentRunGAgent
- New AgentRunGAgent.IsTerminal(status) helper (internal static for
test access) and instance overload covering Dropped / Failed /
ReplyHandedOff. New IsCleanupAlreadyCompleted() against
AgentRunGAgentState.cleanup_completed_at_unix_ms.
- HandleStartAsync uses IsTerminal at entry; only re-schedules
cleanup when cleanup has not already completed.
- HandleCleanupAsync uses IsTerminal at entry, then short-circuits
on cleanup_completed_at != 0 to keep destroy idempotent. On the
valid path it now persists AgentRunCleanupCompletedEvent before
IActorRuntime.DestroyAsync, so the chain.finalized observable
(cleanup_completed_at != 0) survives replay.
- State matcher wires ApplyCleanupCompleted to write
cleanup_completed_at_unix_ms.
ConversationGAgent
- New IsLlmReplyTurnFinalized(correlationId) helper centralizes the
`ProcessedCommandIds.Contains("llm:<corr>")` check.
- HandleLlmReplyReadyAsync, HandleLlmReplyCardStreamChunkAsync, and
HandleNyxRelayStreamingChunkCoreAsync all dedupe through the
helper.
- HandleDeferredLlmReplyDroppedAsync now dedupes the same way: a
late drop notification for an already-finalized turn (run-actor
cleanup callback fires after a successful reply already landed)
no-ops instead of overwriting last_reply_delivery with a
synthetic NotRetryable ConversationContinueFailedEvent.
Tests
- Five new #649 regressions on AgentRunGAgent covering the late-
signal classes called out in the issue:
* Duplicate cleanup callback destroys actor once + persists
cleanup_completed_at.
* Cleanup for a stale RunId no-ops, leaves cleanup_completed_at
zero.
* Cleanup before terminal status no-ops.
* Duplicate start after cleanup_completed does not re-schedule
a fresh cleanup callback or re-run the LLM.
* Duplicate start after stale-gate Drop does not re-run the
LLM and does not persist additional drop events.
Test state: 808 ChannelRuntime + 134 Channel.Protocol + 542 AI =
1484 pass.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-05-12_lark-bot-reply-chain-regressions
…letons Issue #664 step 3b: ResponsesUserSkillsToolProvider must ctor-inject the same singleton instances that the IAgentToolSource enumeration returns to ChatRuntime, otherwise the two paths could diverge silently. Switch AddSkills / AddOrnnSkills to TryAddSingleton<concrete> plus a factory-based IAgentToolSource alias guarded against duplicate registration, so concrete-by-type and enumerable-by-interface resolve to one instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NyxID Responses 直连路径之前只把 ResponsesAevatarToolProvider 暴露的本 地工具(TodoWrite/Task/WebFetch/WebSearch)注入到 LLMRequest,没有接 入统一 skill 主干,导致 codex/Cursor 等通过 NyxID apikey 接 aevatar 的 调用方无法使用自己在 Ornn 配置的 skills。 主要改动: - IResponsesToolProvider 改为异步并带 typed ResponsesToolProviderContext (caller scope + tool metadata),与未来 IAgentToolSource.DiscoverToolsAsync 对齐;ResponsesToolClassifier.ClassifyAsync 同步重命名并按 effective tool name 去重 additive,避免同名工具向 LLM 重复发送,命中冲突写 warning。 - 新增 ResponsesUserSkillsToolProvider:显式 ctor-inject SkillsAgentToolSource + OrnnAgentToolSource 这两个 skill 主干,桥接 use_skill + ornn_search_skills 两个 additive 工具;不枚举 IAgentToolSource 集合,避免后续新增的 source 被误桥接。 - Mainnet host 追加 TryAddEnumerable<IResponsesToolProvider, ResponsesUserSkillsToolProvider>()。 - /v1/messages 显式继续传 Array.Empty<IResponsesToolProvider>() 以保持 Anthropic 客户端 tool harness 不被 Aevatar 工具 shadow。 - 测试:classifier 同名 additive 去重 + 异步签名; ResponsesUserSkillsToolProvider 只返回两个 skill 主干工具; /v1/responses 端到端验证桥接工具进入 LLMRequest.Tools; /v1/messages 回归测试验证 DI 中注册的 provider 不会注入到 Messages 路径。后两个均通过 revert-and-rerun 验证能捕捉对应 bug。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/canon/nyxid-responses-direct.md: new canon doc describing the /v1/models /v1/responses /v1/messages surface exposed via the NyxID proxy, OpenRouter-style <service-slug>/<model> routing, and the Responses caller-scope/credential contract. - docs/canon/nyxid-llm-integration.md: cross-link Responses direct path and note the legacy bare-model gateway routing. - docs/operations/2026-05-13-aevatar-responses-via-nyxid-setup.md: refresh the Responses-via-NyxID setup walkthrough. - src/Aevatar.Mainnet.Host.Api/README.md: list /v1/models and /v1/messages alongside /v1/responses with the routing and tool-shadow caveats. - docs/2026-05-14-gagent-types-analysis.md: snapshot of the current GAgent inheritance landscape used while scoping issue #664 follow-ups. - docs/README.md: regenerate index for the new canon, ADR, and history entries that have landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paseo writes per-session run-continuation JSON to ./.sisyphus/run-continuation/ to track agent IDs across invocations. It's local tool state, not source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eue failure
ADR-0021 defines DispatchPhase.RejectedDuplicate but the dispatcher
previously did a "GetAsync ?? CreateAsync" that silently re-enqueued
start commands for a runId whose actor already existed. The envelope
also used Guid.NewGuid() for the command id, so the run actor inbox
had no envelope-level idempotency to fall back on.
Changes:
- Serialize dispatch under a SemaphoreSlim and reject when
IActorRuntime.ExistsAsync(actorId) is true; outcome carries
RejectedDuplicate + the existing actorId.
- Build a deterministic commandId ("agent-run-start:{runId}") and put
it on EventEnvelope.Runtime.Deduplication.OperationId so the inbox
has a stable dedup key if the dispatcher boundary is ever bypassed.
- If ProduceAsync fails after CreateAsync succeeded, destroy the
freshly created run actor so a retry can re-create instead of
bouncing on RejectedDuplicate.
- Tests: parallel WhenAll dispatch returns exactly one Accepted + one
RejectedDuplicate with the existing inbox unchanged; enqueue failure
destroys the actor and a follow-up dispatch is accepted; stale
request never reaches CreateAsync.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 632d09f documented /v1/responses with conservative wording from before issue #664 landed: it said "Ornn 还没桥接" and "aevatar_* additive tool 接口已预留." Issue #664 (commits 6c11877, 4c15423) now ships ResponsesUserSkillsToolProvider, so the docs need to describe the shipped behavior: - canon/nyxid-responses-direct.md: split tool classes into three (substitute / additive / forwarded), name use_skill + ornn_search_skills as the live additive set, and call out that they walk through NyxID proxy under the caller's bearer (so a restricted NyxID API key must keep the Ornn API service in --allowed-services). - operations/2026-05-13-aevatar-responses-via-nyxid-setup.md: note that the bridge is /v1/responses-only, mention the Ornn-service allowlist step in the codex/Responses configuration block, refresh the Messages-client notes, and replace the "Ornn 还没桥接" troubleshooting row with bridge-aware ones. - src/Aevatar.Mainnet.Host.Api/README.md: replace the aevatar_* placeholder line with use_skill / ornn_search_skills additives and spell out that /v1/messages does NOT inject them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-chain-regressions Test/2026 05 12 lark bot reply chain regressions
No description provided.