Skip to content

[WIP] Lark bot enhancement#562

Open
eanzhao wants to merge 133 commits into
devfrom
feature/lark-bot
Open

[WIP] Lark bot enhancement#562
eanzhao wants to merge 133 commits into
devfrom
feature/lark-bot

Conversation

@eanzhao
Copy link
Copy Markdown
Contributor

@eanzhao eanzhao commented May 4, 2026

No description provided.

eanzhao and others added 3 commits May 3, 2026 18:47
Two production failure modes observed 2026-05-03 leave a Lark user
visibly stuck even though the cluster is healthy. Both healed locally,
no NyxID changes:

(1) /llm /route /model says "binding 已失效, 请发送 /init" while /init
says "已绑定, 请先 /unbind". After PR #558 re-DCR'd the cluster's
OAuth client to add the `proxy` scope, the user's existing NyxID
binding (issued for the previous client_id) is rejected at broker
token-exchange. ModelChannelSlashCommandHandler caught the rejection
and pointed the user at /init — which then refused because the local
readmodel still holds the (now-dead) binding_id, looping the user
forever.

Self-heal: when the broker throws BindingRevokedException /
BindingNotFoundException / BindingScopeMismatchException, dispatch
RevokeBindingCommand to the local ExternalIdentityBindingGAgent
(reason="auto_self_heal_*") so the readmodel flips to revoked, and
tell the user the binding was cleared and /init will work now.
Mirrors the dispatch shape UnbindChannelSlashCommandHandler uses
for explicit /unbind, but skips the NyxID-side revoke (NyxID is the
one that just told us the binding is gone).

(2) Bot replies as "..." forever when the LLM call fails. The
streaming sink fires the first chunk via channel-relay/reply,
consuming the single-use reply token and creating a placeholder
message. If the LLM then fails (e.g. upstream 429), pre-fix the
runtime fell through to RunLlmReplyAsync which issued a fresh /reply
against the dead token and got `401 Reply token already used` —
leaving the user staring at "..." with no error explanation.

Self-heal: ConversationGAgent.TryCompleteStreamedReplyAsync now
takes the Failed branch when streaming has already committed the
placeholder. Edits the existing message in place via
RunStreamChunkAsync (channel-relay/reply/update — no reply token
needed) with the classified failure text, then persists
ConversationTurnCompletedEvent so the runtime envelope retry loop
does not refire and consume the dead token again. If the edit also
fails (rare: Lark may refuse stale-message edits), persist the
last flushed partial as terminal — same defence-in-depth pattern
the existing PR #374 fix uses for the Completed path.

Tests:
  - 4 new ModelSlashCommandHandlerTests pinning the binding self-heal
    for each rejection shape + degraded-path when IActorRuntime is
    missing.
  - 2 new ConversationGAgentDedupTests pinning the streaming-Failed
    branch edits the placeholder + falls through to "persist last
    flushed partial" when the edit also fails.

Verification:
  dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (851/851)
  dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests --no-build (36/36 in dedup suite)
  dotnet test test/Aevatar.Foundation.Core.Tests             (230/230)
  bash tools/ci/test_stability_guards.sh                     (passed)
  bash tools/ci/query_projection_priming_guard.sh            (passed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
… path

Both reviewers caught the same bug: pre-fix the self-heal returned the
"本地已自动清理" message even when IActorRuntime was unregistered or
when CreateAsync / HandleEventAsync threw — the local readmodel was
NOT actually cleaned in those paths, so the user follows the message
to /init, /init still sees the stale binding and refuses, recreating
the exact loop this PR exists to break.

Split the self-heal API into cleanedMessage + degradedMessage. The
cleaned message is returned ONLY when the local revoke envelope was
actually dispatched to the binding actor; otherwise the degraded
message points the user at /unbind explicitly.

Also add a single retry on the dispatch path (mirror
UnbindChannelSlashCommandHandler's PR #521 v4-pro review fix) so a
one-off Orleans hiccup doesn't downgrade an otherwise self-healable
binding to manual /unbind guidance.

Tests:
  - List_DegradesToUnbindGuidance_WhenSelfHealActorRuntimeMissing now
    asserts the reply contains "/unbind" and DOES NOT contain
    "已自动清理" — pinning the no-runtime degraded path.
  - List_DegradesToUnbindGuidance_WhenSelfHealDispatchKeepsThrowing
    is a new test exercising ThrowingActorRuntime (every CreateAsync
    throws) and asserts AttemptCount == 2 (retry-once contract) plus
    the degraded reply shape.

Verification:
  dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (852/852)
  bash tools/ci/test_stability_guards.sh                     (passed)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ly-self-heal

Self-heal binding rejections + streaming-failed reply-token loop
@eanzhao eanzhao changed the title Lark bot enhancement [WIP] Lark bot enhancement May 4, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 4, 2026

Codecov Report

❌ Patch coverage is 70.16208% with 1215 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.50%. Comparing base (6dfd516) to head (814fcd1).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines Patch % Lines
...r.Mainnet.Host.Api/Responses/ResponsesEndpoints.cs 66.11% 313 Missing and 33 partials ⚠️
...tar.Mainnet.Host.Api/Messages/MessagesApiModels.cs 52.52% 119 Missing and 50 partials ⚠️
...tar.Mainnet.Host.Api/Messages/MessagesEndpoints.cs 65.22% 125 Missing and 12 partials ⚠️
...Host.Api/Responses/ResponsesAevatarToolProvider.cs 58.33% 106 Missing and 24 partials ⚠️
...et.Host.Api/Responses/ResponsesModelsAggregator.cs 48.52% 97 Missing and 8 partials ⚠️
...r.Mainnet.Host.Api/Responses/ResponsesApiModels.cs 84.17% 28 Missing and 22 partials ⚠️
...on/Studio/Services/NyxIdLlmServiceCatalogParser.cs 79.82% 23 Missing and 23 partials ⚠️
...rc/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs 68.81% 22 Missing and 7 partials ⚠️
...c/Aevatar.AI.ToolProviders.Web/WebFetchUrlGuard.cs 71.26% 19 Missing and 6 partials ⚠️
...actions/Queries/ResponsesAgentToolStateSnapshot.cs 51.06% 23 Missing ⚠️
... and 25 more
@@            Coverage Diff             @@
##              dev     #562      +/-   ##
==========================================
+ Coverage   72.13%   72.50%   +0.36%     
==========================================
  Files        1261     1296      +35     
  Lines       91065    96570    +5505     
  Branches    11924    12648     +724     
==========================================
+ Hits        65693    70014    +4321     
- Misses      20673    21506     +833     
- Partials     4699     5050     +351     
Flag Coverage Δ
ci 72.50% <70.16%> (+0.36%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
...evatar.AI.Abstractions/LLMProviders/LLMResponse.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.Core/Chat/ChatHistory.cs 81.91% <100.00%> (+0.59%) ⬆️
...atar.AI.LLMProviders.Tornado/TornadoLLMProvider.cs 61.19% <100.00%> (+13.19%) ⬆️
...evatar.AI.ToolProviders.Lark/ILarkCardKitClient.cs 100.00% <100.00%> (ø)
....ToolProviders.Lark/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs 69.50% <100.00%> (-0.20%) ⬇️
...Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs 100.00% <100.00%> (ø)
src/Aevatar.AI.ToolProviders.Ornn/OrnnOptions.cs 100.00% <100.00%> (ø)
...I.ToolProviders.Web/ServiceCollectionExtensions.cs 100.00% <100.00%> (ø)
...strap.Extensions.AI/ServiceCollectionExtensions.cs 84.32% <100.00%> (-2.06%) ⬇️
... and 58 more

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

eanzhao and others added 14 commits May 4, 2026 16:58
DeepSeek v4-pro with thinking mode rejects requests when reasoning_content
from prior assistant turns is not echoed back. This change:

- Adds ReasoningContent to ChatMessage, LLMResponse, and LLMStreamChunk
- Propagates reasoning content through ChatRuntime streaming rounds
- Appends reasoning_content to conversation history for multi-turn
- Implements ExtractReasoningContent in MEAILLMProvider
- Wires reasoning content into non-streaming ConvertResponse

Closes #563
Channel conversation LLM runs inside Orleans actors with no HTTP auth
context. The scope resolver returns null, causing
ActorBackedUserMemoryStore to throw InvalidOperationException on every
turn and log a warning.

- Add TryResolveScopeId/TryResolveWriteActorId that return null instead
  of throwing
- ReadProjectedStateAsync returns null when no scope is available
- GetAsync returns UserMemoryDocument.Empty, BuildPromptSectionAsync
  returns empty string — no warning logged
- Write operations (Save/Add/Remove) still throw since they only run
  from Studio API with auth context

Closes #564
Gracefully handle missing scope in UserMemoryStore (fixes #564)
…ing-content

Propagate reasoning_content through LLM pipeline (fixes #563)
"/model encountered NyxID-side binding rejection ({Reason}) but IProjectionReadinessPort is not registered; cannot verify local readmodel cleanup. actor={ActorId}, subject={Platform}:{Tenant}:{User}",
reason,
actorId,
context.Subject.Platform, context.Subject.Tenant, context.Subject.ExternalUserId);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=blocker, category=arch

The new TryDispatchLocalBindingRevokeAsync runs EnsureProjectionForActorAsync + WaitForBindingStateAsync inside the slash-command request path. Three reviewers flag this as a CLAUDE.md violation: query-time projection priming is forbidden — the application/query layer must not activate a projection scope and synchronously wait for read-model materialization on the request hot path. IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Refactor self-heal so the revocation is fire-and-forget (publish a continuation event, return early to the user with a 'binding cleanup in progress' message), or perform the projection wait off the request path. CLAUDE.md ref: 「禁止 query-time replay/priming」 / 「QueryPort/QueryService/ApplicationService 不得在请求路径...同步补投影」.

Per-model verbatim
  • codex: This makes the /model read/list path activate a projection scope and then synchronously wait for read-model cleanup. That violates CLAUDE.md: 禁止 query-time replay/priming and the ADR note that IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Move this repair behind an explicit write-side command/port that owns projection activation, and keep /model honest if the cleanup cannot already be observed.

  • v4-pro: TryDispatchLocalBindingRevokeAsync calls EnsureProjectionForActorAsync + WaitForBindingStateAsync synchronously in the slash-command handler path. This is projection priming in the request path: activating a projection and waiting for the readmodel to refresh within the request execution. Violates CLAUDE.md:

禁止 query-time priming:查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。

Move projection activation/binder setup before the slash-command path (e.g. upstream in the conversation pipeline when the inbound message arrives), or make the self-heal fire-and-forget (dispatch revoke + don't synchronously wait for readmodel). The degraded-message fallback already handles cases where the readmodel isn't cleaned within the request.

  • kimi: TryDispatchLocalBindingRevokeAsync calls _bindingProjectionPort.EnsureProjectionForActorAsync(actorId, ct) synchronously inside the slash command handler request path. This is projection priming in the application layer. CLAUDE.md forbids query-time priming: "查询前若需要先'确保投影存在/刷新 read model',该动作必须在显式 activation、lease、binder 或后台物化流程中完成;禁止在 query 方法内同步补投影。" While the OAuth endpoint does similar priming, endpoints are Host layer; a per-user-message slash handler is not. Consider activating binding projection scopes eagerly at startup (e.g. a hosted service that primes all known binding actors) instead of per-request.

var actor = await _actorRuntime
.CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct)
.ConfigureAwait(false);
var envelope = new EventEnvelope
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 4 models] severity=blocker, category=arch

actor.HandleEventAsync(envelope, ct) dispatches RevokeBindingCommand directly through IActorRuntime, bypassing IActorDispatchPort and the actor's inbox / serialization guarantees. Four reviewers flag this as a CLAUDE.md violation of Runtime 与 Dispatch 分责: IActorRuntime may resolve lifecycle/topology, but command delivery must go through the runtime-neutral publisher (publish/send). Inline HandleEventAsync reduces to direct method invocation on the target actor — exactly the pattern CLAUDE.md forbids: 「禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象」. Replace with the runtime-neutral dispatch port (IActorDispatchPort.PublishAsync(...) or the equivalent) and let the actor's inbox process the revoke command.

Per-model verbatim
  • codex: Calling actor.HandleEventAsync directly bypasses IActorDispatchPort and manually inlines delivery into the target actor. This violates CLAUDE.md: Runtime 与 Dispatch 分责 / 投递语义必须 runtime-neutral; IActorRuntime should only resolve lifecycle/topology, while command delivery should go through IActorDispatchPort.DispatchAsync with EnvelopeRouteSemantics.CreateDirect(...) or the existing command-dispatch helper.

  • v4-pro: actor.HandleEventAsync(envelope, ct) dispatches the RevokeBindingCommand directly to the actor via inline method call, bypassing the actor's inbox queue and serialization guarantees. Violates CLAUDE.md:

投递语义必须 runtime-neutral:publish/send 统一表示'进入目标 actor inbox 等待处理',不得因目标是 self 或底层 runtime 差异而退化为 inline dispatch。

The comment at L126 says this mirrors UnbindChannelSlashCommandHandler — but that handler also uses the same pattern and should be fixed too. Use IActorDispatchPort (or equivalent standard dispatch mechanism) instead of CreateAsync + HandleEventAsync.

  • kimi: The handler directly calls actor.HandleEventAsync(envelope, ct) via IActorRuntime, bypassing the standard IActorDispatchPort. CLAUDE.md requires runtime-neutral dispatch: "禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象。" While UnbindChannelSlashCommandHandler uses the same pattern, the new self-heal path amplifies the violation by coupling it with projection priming and retry logic. Refactor to use IActorDispatchPort or a dedicated binding command port so the slash handler does not depend on runtime internals.

  • mimo-v2.5-pro: The slash-command handler now directly orchestrates actor dispatch via IActorRuntime, EventEnvelope, and RevokeBindingCommand. Per CLAUDE.md, slash handlers are channel-routed application services — they should not construct and dispatch actor envelopes. Extract SelfHealRevokedBindingAsync / TryDispatchLocalBindingRevokeAsync into a dedicated IBindingSelfHealService (application layer) and inject that instead. This also simplifies testing: the current test file has to mock IActorRuntime, ExternalIdentityBindingProjectionPort, AND IProjectionReadinessPort just to cover one catch branch.

private readonly ILogger<ModelChannelSlashCommandHandler> _logger;

public ModelChannelSlashCommandHandler(
ILogger<ModelChannelSlashCommandHandler> logger,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 2 models] severity=major, category=di

Constructor injects the concrete ExternalIdentityBindingProjectionPort (and accepts nullable IActorRuntime + IProjectionReadinessPort). Two issues: (a) DI: CLAUDE.md 依赖反转 requires depending on abstractions — define IExternalIdentityBindingProjectionPort and inject the interface (current registration in IdentityServiceCollectionExtensions.cs is services.TryAddSingleton<ExternalIdentityBindingProjectionPort>() against the concrete type). (b) silent degradation: nullable self-heal collaborators mean if any registration is missing in production DI the self-heal — the user-facing fix this PR delivers — silently disappears with no error. Either make these required (non-nullable), or add a startup validation that logs a warning when self-heal will be inactive.

Per-model verbatim
  • glm-5.1: IActorRuntime, ExternalIdentityBindingProjectionPort, and IProjectionReadinessPort are all nullable and degrade gracefully, but the self-heal feature is the primary user-facing fix in this PR. If any of these aren't registered in the production DI container, the handler silently falls back to the degraded message and the binding loop is unbroken. Verify registrations exist in the host composition root — if they're not yet wired, this should be a blocker.

  • kimi: Constructor injects the concrete ExternalIdentityBindingProjectionPort instead of an abstraction. CLAUDE.md: "依赖反转:上层依赖抽象,禁止跨层反向依赖和对具体实现的直接耦合。" The port is registered as a singleton concrete class (IdentityServiceCollectionExtensions.cs:86, services.TryAddSingleton<ExternalIdentityBindingProjectionPort>()). Define IExternalIdentityBindingProjectionPort, register against it, and inject the interface.


public async Task<bool> RemoveEntryAsync(string id, CancellationToken ct = default)
{
var actor = await EnsureWriteActorAsync(ct);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=major, category=concurrency

RemoveEntryAsync was reordered so EnsureWriteActorAsync runs before ReadProjectedStateAsync. Three reviewers flag this side-effect change: previously, calling RemoveEntryAsync("missing") was a no-op that returned false without materializing any actor; now it always activates the user-memory actor first (state hydration, reminder registration, storage I/O), even when the entry does not exist (returns false at the existence check). This violates CLAUDE.md 默认路径须定义资源语义 — a no-op delete now implicitly creates a resource, which has cost (cluster activation, cache pressure) and changes resource lifecycle. Move actor resolution after the existence check; only resolve the scope/actor when there is real work to do.

Per-model verbatim
  • codex: RemoveEntryAsync("missing") now creates or materializes the user-memory actor before checking whether the entry exists. That turns a no-op delete into implicit resource creation, violating CLAUDE.md: 默认路径须定义资源语义; keep the projected read first, and if the no-scope case must throw, resolve the scope ID explicitly without calling EnsureWriteActorAsync until a matching entry is found.

  • kimi: RemoveEntryAsync was reordered to call EnsureWriteActorAsync before ReadProjectedStateAsync. This causes unnecessary actor activation when the entry does not exist. Actor creation has side effects (state hydration, reminder registration, storage reads). Move actor resolution back after the existence check, or document why eager activation is required.

  • mimo-v2.5-pro: The reorder of EnsureWriteActorAsync before ReadProjectedStateAsync in RemoveEntryAsync changes the side-effect ordering. Previously the actor was only created when the entry existed. Now every RemoveEntryAsync call creates the actor even when the entry is missing (returns false at line 133). The actor creation is idempotent, but it's an unnecessary round-trip in the common "entry not found" case. Consider moving EnsureWriteActorAsync back after the existence check, or documenting that the actor must exist before the read (if there's a race-condition fix reason for the reorder).

}

public async Task<string> BuildPromptSectionAsync(int maxChars = 2000, CancellationToken ct = default)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[codex] severity=major, category=bug

Removing the previous try/catch means a transient projection/read-model failure in BuildPromptSectionAsync will now fail the whole chat/prompt construction path instead of degrading by omitting memory. Restore the best-effort behavior here: catch non-cancellation exceptions from GetAsync, log the warning, and return string.Empty.

runtime.Actors.Should().NotContainKey("user-memory-user-1",
"no actor should be created when entry is missing");
var actor = runtime.Actors["user-memory-user-1"];
actor.ReceivedEnvelopes.Should().BeEmpty("no remove command should be dispatched when entry is missing");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=major, category=test

RemoveEntryAsync_MissingEntry_ReturnsFalse changed from asserting runtime.Actors.Should().NotContainKey(...) (actor NOT created) to asserting runtime.Actors["user-memory-user-1"] EXISTS with empty envelopes. But the new RemoveEntryAsync reads state first and returns false before calling EnsureWriteActorAsync when the entry is missing — the actor should not be created at all. Either the test is wrong or the test infrastructure (EmptyReader, FakeActorRuntime) triggers an unexpected actor creation. Verify the test passes against the actual code change.

AttachOpenAIRawRepresentationForReasoning(meaiMsg, msg);
result.Add(meaiMsg);
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[Consensus: 3 models] severity=major, category=arch

AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's serialized JSON via rawMessage.Patch.Set("$.reasoning_content"u8, ...) under #pragma warning disable SCME0001. Three reviewers flag the same fragility: Patch is an SDK-internal experimental surface (SCME0001), reasoning_content is not a documented field on AssistantChatMessage, and the JSONPath relies on the SDK's current serialization shape. Any SDK update that changes the patch model, the serializer, or the message structure will silently break or drop reasoning content propagation — there is no compile-time guarantee. Mitigations: (a) pin the OpenAI SDK version, (b) add an integration test that asserts the serialized JSON contains reasoning_content, and (c) file a feature request upstream for first-class reasoning content support so the workaround can be retired.

Per-model verbatim
  • glm-5.1: rawMessage.Patch.Set("$.reasoning_content"u8, ...) relies on the OpenAI SDK's internal Patch/SCME0001 surface to inject reasoning_content into the serialized JSON. This is fragile — any SDK update that changes the patch model or serialization pipeline will silently drop reasoning content or throw at runtime. Consider isolating this behind a version-checked helper (or a feature flag) so a future SDK bump doesn't cause a silent regression.

  • v4-pro: AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's AssistantChatMessage via rawMessage.Patch.Set("$.reasoning_content"u8, ...). This hardcoded JSON-path depends on the SDK's internal serialization structure (AssistantChatMessage.ModelReaderWriter.Write output) which could change between versions and silently break reasoning propagation. Consider a custom AssistantChatMessage subclass or an explicit BinaryContent wrapper if the SDK doesn't natively support reasoning_content.

  • mimo-v2.5-pro: AttachOpenAIRawRepresentationForReasoning uses rawMessage.Patch.Set("$.reasoning_content"u8, ...) with #pragma warning disable SCME0001. This is a fragile internal API — the OpenAI SDK doesn't document reasoning_content as a supported field on AssistantChatMessage, and the Patch property is implementation-detail. If the SDK changes wire format or removes the Patch escape hatch, this will silently break reasoning round-tripping. Add a fallback: if the patch throws (or if RawRepresentation doesn't survive a serialize-then-deserialize cycle), log a warning and degrade gracefully rather than crashing the entire chat call.

CancellationToken ct)
{
try
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[v4-pro] severity=minor, category=perf

MergeProxyRouteCandidatesAsync calls _nyxClient.DiscoverProxyServicesAsync synchronously within GetServicesAsync. If the /api/v1/proxy/services endpoint is slow or unreachable (and the catch swallows the exception with a warning), the catalog load time doubles on every /model or /route invocation. Consider making the proxy-service discovery happen on a background refresh, or caching with a TTL, rather than per-request.

NyxIdLlmServicesResult result,
string bearerToken,
CancellationToken ct)
{
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[mimo-v2.5-pro] severity=minor, category=design

The proxy endpoint path /api/v1/proxy/services?per_page=100 is hardcoded here AND in NyxIdLlmServiceCatalogClient.cs:63. Two independent HTTP clients call the same endpoint. Extract the path into a shared constant (or better, into NyxIdLlmProviderSource / a config key) so the two call-sites stay in sync if the path changes.

eanzhao and others added 17 commits May 13, 2026 17:25
… tools

CC Switch (and other Anthropic→OpenAI Responses translators) advertise
Claude Code tools as a mix of `{type: "function", name, parameters, …}`
function declarations and OpenAI built-in tool declarations like
`{type: "web_search_preview"}` / `{type: "file_search", vector_store_ids: […]}`
/ `{type: "code_interpreter"}` / `{type: "computer_use_preview"}`. The
built-in forms have no `name` or `function` block — they're routing
hints to the model provider, not custom function definitions.

Previously the normalizer required every tool entry to have a name,
so a single built-in declaration anywhere in the array failed the
whole request with `invalid_tools: "Each tool requires a non-empty
name."` — exactly what the user hit when configuring CC Switch
against `chrono-llm/gpt-5.5`.

Skip non-function-typed entries silently (aevatar's classifier only
owns forward / substitute / additive function tools), keep validating
name for function-type entries. Improve the error message to include
the failing tool index for easier diagnosis on future malformed
inputs.

Tests:
- A mixed array (`web_search_preview` + `file_search` + a real
  function tool `Bash`) now returns 200 and the LLM provider sees
  exactly the one function tool; built-ins are dropped.
- A function-type entry without a name still 400s with
  `function tool at index 1 requires a non-empty name`.

Revert-and-rerun confirmed the built-in pass-through test goes red
when the skip branch is disabled.
…data fallback

Two issues surfaced when CC Switch users tried multi-turn tool conversations
on chrono-llm / llm-deepseek:

1. `function_call_output requires previous_response_id` 400 on the followup
   turn. CC Switch / Codex translating Claude Code's prior tool-result turn
   forwards `function_call_output` items in `input` but does NOT propagate
   `previous_response_id` — they don't model OpenAI's server-side session.
   Strict #629 §13 normalization rejected the request and the agent couldn't
   continue any multi-turn tool conversation.

   Fix: when `previous_response_id` is absent, fold `function_call_output`
   entries into the user prompt with synthetic `[tool_result call_id=…]`
   markers and clear ToolResults. Continuation contract still applies when
   the client actually sends `previous_response_id`.

2. `Model metadata for `llm-deepseek/deepseek-v4-pro` not found` warning
   from CC Switch — upstream deepseek `/v1/models` is OpenAI-spec minimal
   (no context_length / max_output_tokens), the no-hardcode rule forbids
   in-code defaults, but the user explicitly asked for "兜底" so requests
   don't get crippled by CC Switch's conservative fallback metadata.

   Fix: config-driven `Aevatar:Responses:ModelMetadataFallbacks`.
   Lookup precedence per entry: upstream fields → exact `{slug}/{model}`
   match → group-wide `{slug}` match. Fallback only fills nulls; never
   overwrites upstream values. Empty config = no-op. Default deployment
   ships with deepseek + llm-deepseek 64k/8k defaults in appsettings.json;
   deployment can override / remove. Stays out of code per the
   `feedback_no_hardcoded_metadata` rule that explicitly permits
   config-driven slug→config.

Tests cover both regressions + revert-and-rerun confirms the fold test
catches the exact `previous_response_required` 400 the user reported.
Covers nyxid CLI API key issuance, cc-switch codex provider TOML, and
end-to-end curl smoke test. Notes Aevatar is auto_connected so users
only add the LLM provider they want to route through it.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds §6 to the cc-switch runbook describing the planned /v1/messages endpoint:
capability matrix vs /v1/responses, protocol-mismatch rationale (Messages is
stateless, aevatar runtime is stateful — surface is intentionally narrow), and
the cc-switch Claude-tab config to use once it lands. Renumbers downstream
sections and updates the troubleshooting row that previously said Messages was
not on the roadmap.
The actor state, proto, and ports are protocol-neutral — they carry
scope/owner/status and forwarded tool calls without any Responses-only
fields. Naming them ResponseSession blocks reuse by /v1/messages
(Anthropic Messages, planned Path B) and misleads readers into thinking
the actor is OpenAI-Responses-specific.

Mechanical rename across 25 files (400 lines changed, 400 added):
  ResponseSession        -> LlmSession
  response_session(s)    -> llm_session(s)
  RESPONSE_SESSION_      -> LLM_SESSION_
  IResponseSession*      -> ILlmSession*

Field `response_id` is kept on the record because Path A (Responses)
keeps emitting OpenAI-style response ids on the wire; Path B will
synthesize its own session identifier into the same field.

Refs #642
…cade

Path B is a stateless facade over the same LlmSessionGAgent +
NyxIdLLMProvider + IResponsesCompletionApplicationService pipeline that
serves /v1/responses. No new GAgent type is introduced — the previously
renamed LlmSessionGAgent is protocol-neutral and the facade reuses it
verbatim. cc-switch users running Claude Code can now point at the
Aevatar/NyxID base URL and get an end-to-end Messages-protocol session.

Surface (src/Aevatar.Mainnet.Host.Api/Messages/):
  POST /v1/messages         AllowAnonymous, manual bearer extraction,
                            mirrors /v1/responses auth model
  - non-streaming           returns Anthropic message envelope JSON
  - streaming               emits Anthropic SSE schedule:
                              message_start
                              content_block_start (text or tool_use)
                              content_block_delta (text_delta /
                                                  input_json_delta)
                              content_block_stop
                              message_delta
                              message_stop

Lossy translation boundary (documented inline + tested):
  - thinking block       -> ChatMessage.ReasoningContent (lossless)
  - tool_use block       -> ToolCall.ArgumentsJson      (text args lossless;
                                                       image args dropped)
  - tool_result block    -> ChatMessage.Tool(callId, output) (text only)
  - cache_control        -> ignored in v1

Stateless contract: Anthropic Messages has no previous_response_id, so
each POST opens + closes its own LlmSession (24h TTL kept for parity
with Path A audit/projection).

Tests (test/Aevatar.Hosting.Tests/MainnetMessagesEndpointsTests.cs):
  PostMessages_NonStreaming_ShouldReturnAnthropicMessageEnvelope
  PostMessages_Streaming_ShouldEmitAnthropicSseFrames
  PostMessages_WithToolCall_ShouldEmitToolUseContentBlock
  PostMessages_WithoutBearer_ShouldReturn401WithAnthropicErrorEnvelope
  PostMessages_WithToolResultBlockInUserContent_ShouldFlattenIntoToolRoleMessage

Refs #642
Defines the four-phase chain contract (accepted/committed/delivered/finalized)
across ConversationGAgent / IChannelLlmReplyRunDispatcher / AgentRunGAgent
with a single observable state per phase. Locks in:

- AgentRunStatus 5-state proto enum, with REPLY_HANDED_OFF replacing
  the reply_dispatched bool (which becomes reserved).
- ConversationState.last_reply_delivery single field carrying
  user-visible delivery ack from the channel sink.
- Typed DispatchOutcome on IChannelLlmReplyRunDispatcher.DispatchAsync
  so the synchronous return point only promises accepted.
- finalized as an absorbing state; late/stale signals must no-op.

Companion docs/canon/lark-reply-completion-semantics.md adds
sequence diagrams, failure matrix, state machine views, and the
implementation checklist used by the upcoming code-side commits
(issues #647 / #648 / #649).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Lands the chain-level contract pieces of ADR-0021 that the rest of the
reply chain work depends on:

agent_run.proto
  - AgentRunStatus gains AGENT_RUN_STATUS_REPLY_HANDED_OFF.
  - AgentRunGAgentState.reply_dispatched (field 12) marked reserved; the
    explicit status replaces the implicit bool. Legacy event replay now
    promotes straight to REPLY_HANDED_OFF.
  - New cleanup_completed_at_unix_ms field (combined with terminal status,
    this is the chain.finalized observable).
  - New AgentRunCleanupCompletedEvent for the terminal-cleanup write.

conversation_state.proto / conversation_events.proto
  - New ReplyDeliveryStatus message (Pending / Delivered / DeliveryFailed)
    on ConversationGAgentState.last_reply_delivery as the
    chain.delivered observable.
  - New LlmReplyDeliveredEvent + LlmReplyDeliveryFailedEvent that drive
    ConversationGAgent into those outcomes.

IChannelLlmReplyRunDispatcher
  - DispatchAsync now returns Task<DispatchOutcome> with a typed phase
    (Accepted / RejectedStale / RejectedDuplicate). The interface has no
    NuGet consumers and the three in-repo call sites are adapted in this
    commit; behaviour is preserved.

AgentRunDispatcher
  - Performs a cheap freshness check at the boundary (mirrors
    AgentRunGAgent.MaxRunRequestAgeMs) and returns RejectedStale instead
    of enqueuing requests that the run actor would only drop.

AgentRunGAgent
  - All read sites of State.ReplyDispatched move to status checks:
    REPLY_HANDED_OFF replaces (ReplyProduced && ReplyDispatched).
  - ApplyReplyProduced legacy-event path promotes status to
    REPLY_HANDED_OFF; ApplyReplyDispatched promotes committed ->
    handed-off; the new-event path leaves status at REPLY_PRODUCED until
    the dispatched event lands.

Tests
  - ConversationGAgentDedupTests RecordingRunDispatcher mock adapted to
    the new return type.
  - AgentRunGAgentTests assertions migrated from ReplyDispatched bool to
    explicit REPLY_PRODUCED / REPLY_HANDED_OFF status expectations.

Docs
  - canon checklist fixed to reflect that the handed-off transition uses
    the existing AgentRunReplyDispatchedEvent (no new event required) and
    that AgentRunCleanupCompletedEvent drives cleanup_completed_at.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…tionGAgent

Makes ADR-0021 chain.delivered observable from
ConversationGAgentState.last_reply_delivery instead of inference from
log lines or channel sink return codes. ConversationGAgent now:

  * Persists LlmReplyDeliveredEvent before ConversationTurnCompletedEvent
    on the non-streaming success path (HandleLlmReplyReadyAsync ->
    RunLlmReplyAsync) and on the streaming completion path
    (PersistStreamedCompletionAsync — which is the unified streaming
    sink that all partial / full / failure-self-heal branches funnel
    through, so any user-visible content counts as delivered).
  * Persists LlmReplyDeliveryFailedEvent before
    ConversationContinueFailedEvent on the non-streaming failure path
    so DeliveryFailed carries the structured reason while the
    chain-finalizing failure event is still last.
  * Wires two new state-matcher entries (ApplyLastReplyDelivered /
    ApplyLastReplyDeliveryFailed) that populate
    ConversationGAgentState.last_reply_delivery — single-field by
    design, multi-turn history reconstructable from event log.

Raise order is delivered → completed (resp. failed) so existing
consumers of "events.Last() is ConversationTurnCompletedEvent /
ConversationContinueFailedEvent" stay correct.

Tests
  - HandleLlmReplyReadyAsync_WhenDuplicateCorrelationId_CollapsesToSingleOutboundCommit
    now expects 3 events (the new Delivered event sits between
    NeedsLlmReplyEvent and TurnCompleted), with a regression check
    that LlmReplyDeliveredEvent is present in the log.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0021 §6 / canon §8 streaming closeout contract — stream-local
terminal, aggregated Usage on a single closeout point, FinishReason
visibility — implemented at the run-actor boundary instead of inside
ChatRuntime. Respects aevatar#596 phasing: ChatRuntime stays a
transitional local loop (Phase A); the run-actor edge owns the
contract surface (Phase A targets) so the eventual ChatRuntime
tear-down in Phase B does not need to renegotiate it.

IConversationReplyGenerator
  - GenerateReplyAsync now returns Task<ConversationReplyResult> with
    Text, Usage (ReplyTokenUsage), and FinishReason. ReplyTokenUsage is
    a Channel.Runtime-local projection mirroring AI.Abstractions.TokenUsage
    so Channel.Runtime does not gain a reverse layer dependency on
    AI.Abstractions (CLAUDE.md "依赖反转").

NyxIdConversationReplyGenerator
  - Aggregates Usage across all internal LLM rounds (tool-call loop)
    via SumUsage and tracks the last non-empty FinishReason. The
    foreach over the ChatRuntime stream extracts both before falling
    through to the DeltaContent-only forwarding to the streaming sink,
    so neither metric is dropped when DeltaContent is empty.

AgentRunGAgent
  - Consumes ConversationReplyResult at the run-actor edge and logs
    the closeout once (runId / correlation / prompt|completion|total
    tokens / finishReason). LlmReplyReadyEvent semantics unchanged
    until a follow-up PR persists the closeout into actor state.

ChatRuntime
  - Field-level patch only: NormalizeStreamChunk now forwards
    chunk.FinishReason on the projected stream chunk (previously
    swallowed). No restructuring; ChatRuntime remains transitional.

Tests
  - 3 IConversationReplyGenerator mocks in AgentRunGAgentTests adapted
    to the new return type.
  - 5 ConversationReplyGeneratorTests reply assertions migrated to
    reply.Text.
  - New regression GenerateReplyAsync_AggregatesUsageAndFinishReasonAtActorEdge
    using UsageReportingProviderFactory mock — provider emits Usage on
    a mid-stream bookkeeping chunk and IsLast separately; the test
    asserts the actor-edge result carries both the aggregated tokens
    (7/11/18) and FinishReason ("stop").

Test state: 803 (ChannelRuntime) + 134 (Channel.Protocol) + 542 (AI)
= 1479 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
ADR-0021 §6 / canon §9 absorbing-finalized contract: late and stale
reply-chain signals must no-op once a run has reached
chain.finalized (terminal AgentRunStatus + cleanup_completed_at != 0
on the run actor; ProcessedCommandIds containment on the
conversation actor). Lifts the previously ad-hoc terminal checks
into single helpers and applies them to every handler entry.

AgentRunGAgent
  - New AgentRunGAgent.IsTerminal(status) helper (internal static for
    test access) and instance overload covering Dropped / Failed /
    ReplyHandedOff. New IsCleanupAlreadyCompleted() against
    AgentRunGAgentState.cleanup_completed_at_unix_ms.
  - HandleStartAsync uses IsTerminal at entry; only re-schedules
    cleanup when cleanup has not already completed.
  - HandleCleanupAsync uses IsTerminal at entry, then short-circuits
    on cleanup_completed_at != 0 to keep destroy idempotent. On the
    valid path it now persists AgentRunCleanupCompletedEvent before
    IActorRuntime.DestroyAsync, so the chain.finalized observable
    (cleanup_completed_at != 0) survives replay.
  - State matcher wires ApplyCleanupCompleted to write
    cleanup_completed_at_unix_ms.

ConversationGAgent
  - New IsLlmReplyTurnFinalized(correlationId) helper centralizes the
    `ProcessedCommandIds.Contains("llm:<corr>")` check.
  - HandleLlmReplyReadyAsync, HandleLlmReplyCardStreamChunkAsync, and
    HandleNyxRelayStreamingChunkCoreAsync all dedupe through the
    helper.
  - HandleDeferredLlmReplyDroppedAsync now dedupes the same way: a
    late drop notification for an already-finalized turn (run-actor
    cleanup callback fires after a successful reply already landed)
    no-ops instead of overwriting last_reply_delivery with a
    synthetic NotRetryable ConversationContinueFailedEvent.

Tests
  - Five new #649 regressions on AgentRunGAgent covering the late-
    signal classes called out in the issue:
      * Duplicate cleanup callback destroys actor once + persists
        cleanup_completed_at.
      * Cleanup for a stale RunId no-ops, leaves cleanup_completed_at
        zero.
      * Cleanup before terminal status no-ops.
      * Duplicate start after cleanup_completed does not re-schedule
        a fresh cleanup callback or re-run the LLM.
      * Duplicate start after stale-gate Drop does not re-run the
        LLM and does not persist additional drop events.

Test state: 808 ChannelRuntime + 134 Channel.Protocol + 542 AI =
1484 pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
YueZh127 and others added 11 commits May 15, 2026 15:52
…letons

Issue #664 step 3b: ResponsesUserSkillsToolProvider must ctor-inject
the same singleton instances that the IAgentToolSource enumeration
returns to ChatRuntime, otherwise the two paths could diverge silently.
Switch AddSkills / AddOrnnSkills to TryAddSingleton<concrete> plus a
factory-based IAgentToolSource alias guarded against duplicate
registration, so concrete-by-type and enumerable-by-interface resolve
to one instance.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
NyxID Responses 直连路径之前只把 ResponsesAevatarToolProvider 暴露的本
地工具(TodoWrite/Task/WebFetch/WebSearch)注入到 LLMRequest,没有接
入统一 skill 主干,导致 codex/Cursor 等通过 NyxID apikey 接 aevatar 的
调用方无法使用自己在 Ornn 配置的 skills。

主要改动:

- IResponsesToolProvider 改为异步并带 typed ResponsesToolProviderContext
  (caller scope + tool metadata),与未来 IAgentToolSource.DiscoverToolsAsync
  对齐;ResponsesToolClassifier.ClassifyAsync 同步重命名并按 effective
  tool name 去重 additive,避免同名工具向 LLM 重复发送,命中冲突写
  warning。
- 新增 ResponsesUserSkillsToolProvider:显式 ctor-inject
  SkillsAgentToolSource + OrnnAgentToolSource 这两个 skill 主干,桥接
  use_skill + ornn_search_skills 两个 additive 工具;不枚举
  IAgentToolSource 集合,避免后续新增的 source 被误桥接。
- Mainnet host 追加 TryAddEnumerable<IResponsesToolProvider,
  ResponsesUserSkillsToolProvider>()。
- /v1/messages 显式继续传 Array.Empty<IResponsesToolProvider>() 以保持
  Anthropic 客户端 tool harness 不被 Aevatar 工具 shadow。
- 测试:classifier 同名 additive 去重 + 异步签名;
  ResponsesUserSkillsToolProvider 只返回两个 skill 主干工具;
  /v1/responses 端到端验证桥接工具进入 LLMRequest.Tools;
  /v1/messages 回归测试验证 DI 中注册的 provider 不会注入到 Messages
  路径。后两个均通过 revert-and-rerun 验证能捕捉对应 bug。

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
- docs/canon/nyxid-responses-direct.md: new canon doc describing the
  /v1/models /v1/responses /v1/messages surface exposed via the NyxID
  proxy, OpenRouter-style <service-slug>/<model> routing, and the
  Responses caller-scope/credential contract.
- docs/canon/nyxid-llm-integration.md: cross-link Responses direct path
  and note the legacy bare-model gateway routing.
- docs/operations/2026-05-13-aevatar-responses-via-nyxid-setup.md:
  refresh the Responses-via-NyxID setup walkthrough.
- src/Aevatar.Mainnet.Host.Api/README.md: list /v1/models and
  /v1/messages alongside /v1/responses with the routing and tool-shadow
  caveats.
- docs/2026-05-14-gagent-types-analysis.md: snapshot of the current
  GAgent inheritance landscape used while scoping issue #664 follow-ups.
- docs/README.md: regenerate index for the new canon, ADR, and history
  entries that have landed.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Paseo writes per-session run-continuation JSON to ./.sisyphus/run-continuation/
to track agent IDs across invocations. It's local tool state, not source.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…eue failure

ADR-0021 defines DispatchPhase.RejectedDuplicate but the dispatcher
previously did a "GetAsync ?? CreateAsync" that silently re-enqueued
start commands for a runId whose actor already existed. The envelope
also used Guid.NewGuid() for the command id, so the run actor inbox
had no envelope-level idempotency to fall back on.

Changes:

- Serialize dispatch under a SemaphoreSlim and reject when
  IActorRuntime.ExistsAsync(actorId) is true; outcome carries
  RejectedDuplicate + the existing actorId.
- Build a deterministic commandId ("agent-run-start:{runId}") and put
  it on EventEnvelope.Runtime.Deduplication.OperationId so the inbox
  has a stable dedup key if the dispatcher boundary is ever bypassed.
- If ProduceAsync fails after CreateAsync succeeded, destroy the
  freshly created run actor so a retry can re-create instead of
  bouncing on RejectedDuplicate.
- Tests: parallel WhenAll dispatch returns exactly one Accepted + one
  RejectedDuplicate with the existing inbox unchanged; enqueue failure
  destroys the actor and a follow-up dispatch is accepted; stale
  request never reaches CreateAsync.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Commit 632d09f documented /v1/responses with conservative wording from
before issue #664 landed: it said "Ornn 还没桥接" and "aevatar_* additive
tool 接口已预留." Issue #664 (commits 6c11877, 4c15423) now ships
ResponsesUserSkillsToolProvider, so the docs need to describe the
shipped behavior:

- canon/nyxid-responses-direct.md: split tool classes into three
  (substitute / additive / forwarded), name use_skill +
  ornn_search_skills as the live additive set, and call out that they
  walk through NyxID proxy under the caller's bearer (so a restricted
  NyxID API key must keep the Ornn API service in --allowed-services).
- operations/2026-05-13-aevatar-responses-via-nyxid-setup.md: note that
  the bridge is /v1/responses-only, mention the Ornn-service allowlist
  step in the codex/Responses configuration block, refresh the
  Messages-client notes, and replace the "Ornn 还没桥接" troubleshooting
  row with bridge-aware ones.
- src/Aevatar.Mainnet.Host.Api/README.md: replace the aevatar_*
  placeholder line with use_skill / ornn_search_skills additives and
  spell out that /v1/messages does NOT inject them.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…-chain-regressions

Test/2026 05 12 lark bot reply chain regressions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants