[WIP] Lark bot enhancement by eanzhao · Pull Request #562 · aevatarAI/aevatar

eanzhao · 2026-05-04T07:03:50Z

No description provided.

Two production failure modes observed 2026-05-03 leave a Lark user visibly stuck even though the cluster is healthy. Both healed locally, no NyxID changes: (1) /llm /route /model says "binding 已失效, 请发送 /init" while /init says "已绑定, 请先 /unbind". After PR #558 re-DCR'd the cluster's OAuth client to add the `proxy` scope, the user's existing NyxID binding (issued for the previous client_id) is rejected at broker token-exchange. ModelChannelSlashCommandHandler caught the rejection and pointed the user at /init — which then refused because the local readmodel still holds the (now-dead) binding_id, looping the user forever. Self-heal: when the broker throws BindingRevokedException / BindingNotFoundException / BindingScopeMismatchException, dispatch RevokeBindingCommand to the local ExternalIdentityBindingGAgent (reason="auto_self_heal_*") so the readmodel flips to revoked, and tell the user the binding was cleared and /init will work now. Mirrors the dispatch shape UnbindChannelSlashCommandHandler uses for explicit /unbind, but skips the NyxID-side revoke (NyxID is the one that just told us the binding is gone). (2) Bot replies as "..." forever when the LLM call fails. The streaming sink fires the first chunk via channel-relay/reply, consuming the single-use reply token and creating a placeholder message. If the LLM then fails (e.g. upstream 429), pre-fix the runtime fell through to RunLlmReplyAsync which issued a fresh /reply against the dead token and got `401 Reply token already used` — leaving the user staring at "..." with no error explanation. Self-heal: ConversationGAgent.TryCompleteStreamedReplyAsync now takes the Failed branch when streaming has already committed the placeholder. Edits the existing message in place via RunStreamChunkAsync (channel-relay/reply/update — no reply token needed) with the classified failure text, then persists ConversationTurnCompletedEvent so the runtime envelope retry loop does not refire and consume the dead token again. If the edit also fails (rare: Lark may refuse stale-message edits), persist the last flushed partial as terminal — same defence-in-depth pattern the existing PR #374 fix uses for the Completed path. Tests: - 4 new ModelSlashCommandHandlerTests pinning the binding self-heal for each rejection shape + degraded-path when IActorRuntime is missing. - 2 new ConversationGAgentDedupTests pinning the streaming-Failed branch edits the placeholder + falls through to "persist last flushed partial" when the edit also fails. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (851/851) dotnet test test/Aevatar.GAgents.Channel.Protocol.Tests --no-build (36/36 in dedup suite) dotnet test test/Aevatar.Foundation.Core.Tests (230/230) bash tools/ci/test_stability_guards.sh (passed) bash tools/ci/query_projection_priming_guard.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

… path Both reviewers caught the same bug: pre-fix the self-heal returned the "本地已自动清理" message even when IActorRuntime was unregistered or when CreateAsync / HandleEventAsync threw — the local readmodel was NOT actually cleaned in those paths, so the user follows the message to /init, /init still sees the stale binding and refuses, recreating the exact loop this PR exists to break. Split the self-heal API into cleanedMessage + degradedMessage. The cleaned message is returned ONLY when the local revoke envelope was actually dispatched to the binding actor; otherwise the degraded message points the user at /unbind explicitly. Also add a single retry on the dispatch path (mirror UnbindChannelSlashCommandHandler's PR #521 v4-pro review fix) so a one-off Orleans hiccup doesn't downgrade an otherwise self-healable binding to manual /unbind guidance. Tests: - List_DegradesToUnbindGuidance_WhenSelfHealActorRuntimeMissing now asserts the reply contains "/unbind" and DOES NOT contain "已自动清理" — pinning the no-runtime degraded path. - List_DegradesToUnbindGuidance_WhenSelfHealDispatchKeepsThrowing is a new test exercising ThrowingActorRuntime (every CreateAsync throws) and asserts AttemptCount == 2 (retry-once contract) plus the degraded reply shape. Verification: dotnet test test/Aevatar.GAgents.ChannelRuntime.Tests --no-build (852/852) bash tools/ci/test_stability_guards.sh (passed) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…ly-self-heal Self-heal binding rejections + streaming-failed reply-token loop

codecov · 2026-05-04T07:23:02Z

Codecov Report

❌ Patch coverage is 70.16208% with 1215 lines in your changes missing coverage. Please review.
✅ Project coverage is 72.50%. Comparing base (6dfd516) to head (814fcd1).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines	Patch %	Lines
...r.Mainnet.Host.Api/Responses/ResponsesEndpoints.cs	66.11%	313 Missing and 33 partials ⚠️
...tar.Mainnet.Host.Api/Messages/MessagesApiModels.cs	52.52%	119 Missing and 50 partials ⚠️
...tar.Mainnet.Host.Api/Messages/MessagesEndpoints.cs	65.22%	125 Missing and 12 partials ⚠️
...Host.Api/Responses/ResponsesAevatarToolProvider.cs	58.33%	106 Missing and 24 partials ⚠️
...et.Host.Api/Responses/ResponsesModelsAggregator.cs	48.52%	97 Missing and 8 partials ⚠️
...r.Mainnet.Host.Api/Responses/ResponsesApiModels.cs	84.17%	28 Missing and 22 partials ⚠️
...on/Studio/Services/NyxIdLlmServiceCatalogParser.cs	79.82%	23 Missing and 23 partials ⚠️
...rc/Aevatar.AI.LLMProviders.MEAI/MEAILLMProvider.cs	68.81%	22 Missing and 7 partials ⚠️
...c/Aevatar.AI.ToolProviders.Web/WebFetchUrlGuard.cs	71.26%	19 Missing and 6 partials ⚠️
...actions/Queries/ResponsesAgentToolStateSnapshot.cs	51.06%	23 Missing ⚠️
... and 25 more

@@            Coverage Diff             @@
##              dev     #562      +/-   ##
==========================================
+ Coverage   72.13%   72.50%   +0.36%     
==========================================
  Files        1261     1296      +35     
  Lines       91065    96570    +5505     
  Branches    11924    12648     +724     
==========================================
+ Hits        65693    70014    +4321     
- Misses      20673    21506     +833     
- Partials     4699     5050     +351

Flag	Coverage Δ
ci	`72.50% <70.16%> (+0.36%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
...evatar.AI.Abstractions/LLMProviders/LLMResponse.cs	`100.00% <100.00%> (ø)`
src/Aevatar.AI.Core/Chat/ChatHistory.cs	`81.91% <100.00%> (+0.59%)`	⬆️
...atar.AI.LLMProviders.Tornado/TornadoLLMProvider.cs	`61.19% <100.00%> (+13.19%)`	⬆️
...evatar.AI.ToolProviders.Lark/ILarkCardKitClient.cs	`100.00% <100.00%> (ø)`
....ToolProviders.Lark/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
...c/Aevatar.AI.ToolProviders.NyxId/NyxIdApiClient.cs	`69.50% <100.00%> (-0.20%)`	⬇️
...Aevatar.AI.ToolProviders.NyxId/NyxIdToolOptions.cs	`100.00% <100.00%> (ø)`
src/Aevatar.AI.ToolProviders.Ornn/OrnnOptions.cs	`100.00% <100.00%> (ø)`
...I.ToolProviders.Web/ServiceCollectionExtensions.cs	`100.00% <100.00%> (ø)`
...strap.Extensions.AI/ServiceCollectionExtensions.cs	`84.32% <100.00%> (-2.06%)`	⬇️
... and 58 more

... and 12 files with indirect coverage changes

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

DeepSeek v4-pro with thinking mode rejects requests when reasoning_content from prior assistant turns is not echoed back. This change: - Adds ReasoningContent to ChatMessage, LLMResponse, and LLMStreamChunk - Propagates reasoning content through ChatRuntime streaming rounds - Appends reasoning_content to conversation history for multi-turn - Implements ExtractReasoningContent in MEAILLMProvider - Wires reasoning content into non-streaming ConvertResponse Closes #563

Channel conversation LLM runs inside Orleans actors with no HTTP auth context. The scope resolver returns null, causing ActorBackedUserMemoryStore to throw InvalidOperationException on every turn and log a warning. - Add TryResolveScopeId/TryResolveWriteActorId that return null instead of throwing - ReadProjectedStateAsync returns null when no scope is available - GetAsync returns UserMemoryDocument.Empty, BuildPromptSectionAsync returns empty string — no warning logged - Write operations (Save/Add/Remove) still throw since they only run from Studio API with auth context Closes #564

…ths, history persistence

…, remove dead try-catch

Gracefully handle missing scope in UserMemoryStore (fixes #564)

…ing-content Propagate reasoning_content through LLM pipeline (fixes #563)

eanzhao · 2026-05-05T07:38:57Z

+                "/model encountered NyxID-side binding rejection ({Reason}) but IProjectionReadinessPort is not registered; cannot verify local readmodel cleanup. actor={ActorId}, subject={Platform}:{Tenant}:{User}",
+                reason,
+                actorId,
+                context.Subject.Platform, context.Subject.Tenant, context.Subject.ExternalUserId);


[Consensus: 3 models] severity=blocker, category=arch

The new TryDispatchLocalBindingRevokeAsync runs EnsureProjectionForActorAsync + WaitForBindingStateAsync inside the slash-command request path. Three reviewers flag this as a CLAUDE.md violation: query-time projection priming is forbidden — the application/query layer must not activate a projection scope and synchronously wait for read-model materialization on the request hot path. IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Refactor self-heal so the revocation is fire-and-forget (publish a continuation event, return early to the user with a 'binding cleanup in progress' message), or perform the projection wait off the request path. CLAUDE.md ref: 「禁止 query-time replay/priming」 / 「QueryPort/QueryService/ApplicationService 不得在请求路径...同步补投影」.

Per-model verbatim

codex: This makes the /model read/list path activate a projection scope and then synchronously wait for read-model cleanup. That violates CLAUDE.md: 禁止 query-time replay/priming and the ADR note that IProjectionReadinessPort is a write-side completion port, not a turn/query dependency. Move this repair behind an explicit write-side command/port that owns projection activation, and keep /model honest if the cleanup cannot already be observed.

v4-pro: TryDispatchLocalBindingRevokeAsync calls EnsureProjectionForActorAsync + WaitForBindingStateAsync synchronously in the slash-command handler path. This is projection priming in the request path: activating a projection and waiting for the readmodel to refresh within the request execution. Violates CLAUDE.md:

禁止 query-time priming：查询前若需要先'确保投影存在/刷新 read model'，该动作必须在显式 activation、lease、binder 或后台物化流程中完成；禁止在 query 方法内同步补投影。

Move projection activation/binder setup before the slash-command path (e.g. upstream in the conversation pipeline when the inbound message arrives), or make the self-heal fire-and-forget (dispatch revoke + don't synchronously wait for readmodel). The degraded-message fallback already handles cases where the readmodel isn't cleaned within the request.

kimi: TryDispatchLocalBindingRevokeAsync calls _bindingProjectionPort.EnsureProjectionForActorAsync(actorId, ct) synchronously inside the slash command handler request path. This is projection priming in the application layer. CLAUDE.md forbids query-time priming: "查询前若需要先'确保投影存在/刷新 read model'，该动作必须在显式 activation、lease、binder 或后台物化流程中完成；禁止在 query 方法内同步补投影。" While the OAuth endpoint does similar priming, endpoints are Host layer; a per-user-message slash handler is not. Consider activating binding projection scopes eagerly at startup (e.g. a hosted service that primes all known binding actors) instead of per-request.

eanzhao · 2026-05-05T07:38:59Z

+                var actor = await _actorRuntime
+                    .CreateAsync<ExternalIdentityBindingGAgent>(actorId, ct)
+                    .ConfigureAwait(false);
+                var envelope = new EventEnvelope


[Consensus: 4 models] severity=blocker, category=arch

actor.HandleEventAsync(envelope, ct) dispatches RevokeBindingCommand directly through IActorRuntime, bypassing IActorDispatchPort and the actor's inbox / serialization guarantees. Four reviewers flag this as a CLAUDE.md violation of Runtime 与 Dispatch 分责: IActorRuntime may resolve lifecycle/topology, but command delivery must go through the runtime-neutral publisher (publish/send). Inline HandleEventAsync reduces to direct method invocation on the target actor — exactly the pattern CLAUDE.md forbids: 「禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象」. Replace with the runtime-neutral dispatch port (IActorDispatchPort.PublishAsync(...) or the equivalent) and let the actor's inbox process the revoke command.

Per-model verbatim

codex: Calling actor.HandleEventAsync directly bypasses IActorDispatchPort and manually inlines delivery into the target actor. This violates CLAUDE.md: Runtime 与 Dispatch 分责 / 投递语义必须 runtime-neutral; IActorRuntime should only resolve lifecycle/topology, while command delivery should go through IActorDispatchPort.DispatchAsync with EnvelopeRouteSemantics.CreateDirect(...) or the existing command-dispatch helper.

v4-pro: actor.HandleEventAsync(envelope, ct) dispatches the RevokeBindingCommand directly to the actor via inline method call, bypassing the actor's inbox queue and serialization guarantees. Violates CLAUDE.md:

投递语义必须 runtime-neutral：publish/send 统一表示'进入目标 actor inbox 等待处理'，不得因目标是 self 或底层 runtime 差异而退化为 inline dispatch。

The comment at L126 says this mirrors UnbindChannelSlashCommandHandler — but that handler also uses the same pattern and should be fixed too. Use IActorDispatchPort (or equivalent standard dispatch mechanism) instead of CreateAsync + HandleEventAsync.

kimi: The handler directly calls actor.HandleEventAsync(envelope, ct) via IActorRuntime, bypassing the standard IActorDispatchPort. CLAUDE.md requires runtime-neutral dispatch: "禁止在基类、业务层或中间适配层绕过标准 publisher 直操底层传输对象。" While UnbindChannelSlashCommandHandler uses the same pattern, the new self-heal path amplifies the violation by coupling it with projection priming and retry logic. Refactor to use IActorDispatchPort or a dedicated binding command port so the slash handler does not depend on runtime internals.

mimo-v2.5-pro: The slash-command handler now directly orchestrates actor dispatch via IActorRuntime, EventEnvelope, and RevokeBindingCommand. Per CLAUDE.md, slash handlers are channel-routed application services — they should not construct and dispatch actor envelopes. Extract SelfHealRevokedBindingAsync / TryDispatchLocalBindingRevokeAsync into a dedicated IBindingSelfHealService (application layer) and inject that instead. This also simplifies testing: the current test file has to mock IActorRuntime, ExternalIdentityBindingProjectionPort, AND IProjectionReadinessPort just to cover one catch branch.

eanzhao · 2026-05-05T07:39:00Z

    private readonly ILogger<ModelChannelSlashCommandHandler> _logger;

    public ModelChannelSlashCommandHandler(
        ILogger<ModelChannelSlashCommandHandler> logger,


[Consensus: 2 models] severity=major, category=di

Constructor injects the concrete ExternalIdentityBindingProjectionPort (and accepts nullable IActorRuntime + IProjectionReadinessPort). Two issues: (a) DI: CLAUDE.md 依赖反转 requires depending on abstractions — define IExternalIdentityBindingProjectionPort and inject the interface (current registration in IdentityServiceCollectionExtensions.cs is services.TryAddSingleton<ExternalIdentityBindingProjectionPort>() against the concrete type). (b) silent degradation: nullable self-heal collaborators mean if any registration is missing in production DI the self-heal — the user-facing fix this PR delivers — silently disappears with no error. Either make these required (non-nullable), or add a startup validation that logs a warning when self-heal will be inactive.

Per-model verbatim

glm-5.1: IActorRuntime, ExternalIdentityBindingProjectionPort, and IProjectionReadinessPort are all nullable and degrade gracefully, but the self-heal feature is the primary user-facing fix in this PR. If any of these aren't registered in the production DI container, the handler silently falls back to the degraded message and the binding loop is unbroken. Verify registrations exist in the host composition root — if they're not yet wired, this should be a blocker.

kimi: Constructor injects the concrete ExternalIdentityBindingProjectionPort instead of an abstraction. CLAUDE.md: "依赖反转：上层依赖抽象，禁止跨层反向依赖和对具体实现的直接耦合。" The port is registered as a singleton concrete class (IdentityServiceCollectionExtensions.cs:86, services.TryAddSingleton<ExternalIdentityBindingProjectionPort>()). Define IExternalIdentityBindingProjectionPort, register against it, and inject the interface.

eanzhao · 2026-05-05T07:39:01Z


    public async Task<bool> RemoveEntryAsync(string id, CancellationToken ct = default)
    {
+        var actor = await EnsureWriteActorAsync(ct);


[Consensus: 3 models] severity=major, category=concurrency

RemoveEntryAsync was reordered so EnsureWriteActorAsync runs before ReadProjectedStateAsync. Three reviewers flag this side-effect change: previously, calling RemoveEntryAsync("missing") was a no-op that returned false without materializing any actor; now it always activates the user-memory actor first (state hydration, reminder registration, storage I/O), even when the entry does not exist (returns false at the existence check). This violates CLAUDE.md 默认路径须定义资源语义 — a no-op delete now implicitly creates a resource, which has cost (cluster activation, cache pressure) and changes resource lifecycle. Move actor resolution after the existence check; only resolve the scope/actor when there is real work to do.

Per-model verbatim

codex: RemoveEntryAsync("missing") now creates or materializes the user-memory actor before checking whether the entry exists. That turns a no-op delete into implicit resource creation, violating CLAUDE.md: 默认路径须定义资源语义; keep the projected read first, and if the no-scope case must throw, resolve the scope ID explicitly without calling EnsureWriteActorAsync until a matching entry is found.

kimi: RemoveEntryAsync was reordered to call EnsureWriteActorAsync before ReadProjectedStateAsync. This causes unnecessary actor activation when the entry does not exist. Actor creation has side effects (state hydration, reminder registration, storage reads). Move actor resolution back after the existence check, or document why eager activation is required.

mimo-v2.5-pro: The reorder of EnsureWriteActorAsync before ReadProjectedStateAsync in RemoveEntryAsync changes the side-effect ordering. Previously the actor was only created when the entry existed. Now every RemoveEntryAsync call creates the actor even when the entry is missing (returns false at line 133). The actor creation is idempotent, but it's an unnecessary round-trip in the common "entry not found" case. Consider moving EnsureWriteActorAsync back after the existence check, or documenting that the actor must exist before the read (if there's a race-condition fix reason for the reorder).

eanzhao · 2026-05-05T07:39:03Z

    }

    public async Task<string> BuildPromptSectionAsync(int maxChars = 2000, CancellationToken ct = default)
    {


[codex] severity=major, category=bug

Removing the previous try/catch means a transient projection/read-model failure in BuildPromptSectionAsync will now fail the whole chat/prompt construction path instead of degrading by omitting memory. Restore the best-effort behavior here: catch non-cancellation exceptions from GetAsync, log the warning, and return string.Empty.

eanzhao · 2026-05-05T07:39:04Z

-        runtime.Actors.Should().NotContainKey("user-memory-user-1",
-            "no actor should be created when entry is missing");
+        var actor = runtime.Actors["user-memory-user-1"];
+        actor.ReceivedEnvelopes.Should().BeEmpty("no remove command should be dispatched when entry is missing");


[v4-pro] severity=major, category=test

RemoveEntryAsync_MissingEntry_ReturnsFalse changed from asserting runtime.Actors.Should().NotContainKey(...) (actor NOT created) to asserting runtime.Actors["user-memory-user-1"] EXISTS with empty envelopes. But the new RemoveEntryAsync reads state first and returns false before calling EnsureWriteActorAsync when the entry is missing — the actor should not be created at all. Either the test is wrong or the test infrastructure (EmptyReader, FakeActorRuntime) triggers an unexpected actor creation. Verify the test passes against the actual code change.

eanzhao · 2026-05-05T07:39:05Z

+            AttachOpenAIRawRepresentationForReasoning(meaiMsg, msg);
            result.Add(meaiMsg);
        }



[Consensus: 3 models] severity=major, category=arch

AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's serialized JSON via rawMessage.Patch.Set("$.reasoning_content"u8, ...) under #pragma warning disable SCME0001. Three reviewers flag the same fragility: Patch is an SDK-internal experimental surface (SCME0001), reasoning_content is not a documented field on AssistantChatMessage, and the JSONPath relies on the SDK's current serialization shape. Any SDK update that changes the patch model, the serializer, or the message structure will silently break or drop reasoning content propagation — there is no compile-time guarantee. Mitigations: (a) pin the OpenAI SDK version, (b) add an integration test that asserts the serialized JSON contains reasoning_content, and (c) file a feature request upstream for first-class reasoning content support so the workaround can be retired.

Per-model verbatim

glm-5.1: rawMessage.Patch.Set("$.reasoning_content"u8, ...) relies on the OpenAI SDK's internal Patch/SCME0001 surface to inject reasoning_content into the serialized JSON. This is fragile — any SDK update that changes the patch model or serialization pipeline will silently drop reasoning content or throw at runtime. Consider isolating this behind a version-checked helper (or a feature flag) so a future SDK bump doesn't cause a silent regression.

v4-pro: AttachOpenAIRawRepresentationForReasoning injects reasoning_content into the OpenAI SDK's AssistantChatMessage via rawMessage.Patch.Set("$.reasoning_content"u8, ...). This hardcoded JSON-path depends on the SDK's internal serialization structure (AssistantChatMessage.ModelReaderWriter.Write output) which could change between versions and silently break reasoning propagation. Consider a custom AssistantChatMessage subclass or an explicit BinaryContent wrapper if the SDK doesn't natively support reasoning_content.

mimo-v2.5-pro: AttachOpenAIRawRepresentationForReasoning uses rawMessage.Patch.Set("$.reasoning_content"u8, ...) with #pragma warning disable SCME0001. This is a fragile internal API — the OpenAI SDK doesn't document reasoning_content as a supported field on AssistantChatMessage, and the Patch property is implementation-detail. If the SDK changes wire format or removes the Patch escape hatch, this will silently break reasoning round-tripping. Add a fallback: if the patch throws (or if RawRepresentation doesn't survive a serialize-then-deserialize cycle), log a warning and degrade gracefully rather than crashing the entire chat call.

eanzhao · 2026-05-05T07:39:07Z

+        CancellationToken ct)
+    {
+        try
+        {


[v4-pro] severity=minor, category=perf

MergeProxyRouteCandidatesAsync calls _nyxClient.DiscoverProxyServicesAsync synchronously within GetServicesAsync. If the /api/v1/proxy/services endpoint is slow or unreachable (and the catch swallows the exception with a warning), the catalog load time doubles on every /model or /route invocation. Consider making the proxy-service discovery happen on a background refresh, or caching with a TTL, rather than per-request.

eanzhao · 2026-05-05T07:39:08Z

+        NyxIdLlmServicesResult result,
+        string bearerToken,
+        CancellationToken ct)
+    {


[mimo-v2.5-pro] severity=minor, category=design

The proxy endpoint path /api/v1/proxy/services?per_page=100 is hardcoded here AND in NyxIdLlmServiceCatalogClient.cs:63. Two independent HTTP clients call the same endpoint. Extract the path into a shared constant (or better, into NyxIdLlmProviderSource / a config key) so the two call-sites stay in sync if the path changes.

… tools CC Switch (and other Anthropic→OpenAI Responses translators) advertise Claude Code tools as a mix of `{type: "function", name, parameters, …}` function declarations and OpenAI built-in tool declarations like `{type: "web_search_preview"}` / `{type: "file_search", vector_store_ids: […]}` / `{type: "code_interpreter"}` / `{type: "computer_use_preview"}`. The built-in forms have no `name` or `function` block — they're routing hints to the model provider, not custom function definitions. Previously the normalizer required every tool entry to have a name, so a single built-in declaration anywhere in the array failed the whole request with `invalid_tools: "Each tool requires a non-empty name."` — exactly what the user hit when configuring CC Switch against `chrono-llm/gpt-5.5`. Skip non-function-typed entries silently (aevatar's classifier only owns forward / substitute / additive function tools), keep validating name for function-type entries. Improve the error message to include the failing tool index for easier diagnosis on future malformed inputs. Tests: - A mixed array (`web_search_preview` + `file_search` + a real function tool `Bash`) now returns 200 and the LLM provider sees exactly the one function tool; built-ins are dropped. - A function-type entry without a name still 400s with `function tool at index 1 requires a non-empty name`. Revert-and-rerun confirmed the built-in pass-through test goes red when the skip branch is disabled.

…data fallback Two issues surfaced when CC Switch users tried multi-turn tool conversations on chrono-llm / llm-deepseek: 1. `function_call_output requires previous_response_id` 400 on the followup turn. CC Switch / Codex translating Claude Code's prior tool-result turn forwards `function_call_output` items in `input` but does NOT propagate `previous_response_id` — they don't model OpenAI's server-side session. Strict #629 §13 normalization rejected the request and the agent couldn't continue any multi-turn tool conversation. Fix: when `previous_response_id` is absent, fold `function_call_output` entries into the user prompt with synthetic `[tool_result call_id=…]` markers and clear ToolResults. Continuation contract still applies when the client actually sends `previous_response_id`. 2. `Model metadata for `llm-deepseek/deepseek-v4-pro` not found` warning from CC Switch — upstream deepseek `/v1/models` is OpenAI-spec minimal (no context_length / max_output_tokens), the no-hardcode rule forbids in-code defaults, but the user explicitly asked for "兜底" so requests don't get crippled by CC Switch's conservative fallback metadata. Fix: config-driven `Aevatar:Responses:ModelMetadataFallbacks`. Lookup precedence per entry: upstream fields → exact `{slug}/{model}` match → group-wide `{slug}` match. Fallback only fills nulls; never overwrites upstream values. Empty config = no-op. Default deployment ships with deepseek + llm-deepseek 64k/8k defaults in appsettings.json; deployment can override / remove. Stays out of code per the `feedback_no_hardcoded_metadata` rule that explicitly permits config-driven slug→config. Tests cover both regressions + revert-and-rerun confirms the fold test catches the exact `previous_response_required` 400 the user reported.

Covers nyxid CLI API key issuance, cc-switch codex provider TOML, and end-to-end curl smoke test. Notes Aevatar is auto_connected so users only add the LLM provider they want to route through it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Adds §6 to the cc-switch runbook describing the planned /v1/messages endpoint: capability matrix vs /v1/responses, protocol-mismatch rationale (Messages is stateless, aevatar runtime is stateful — surface is intentionally narrow), and the cc-switch Claude-tab config to use once it lands. Renumbers downstream sections and updates the troubleshooting row that previously said Messages was not on the roadmap.

The actor state, proto, and ports are protocol-neutral — they carry scope/owner/status and forwarded tool calls without any Responses-only fields. Naming them ResponseSession blocks reuse by /v1/messages (Anthropic Messages, planned Path B) and misleads readers into thinking the actor is OpenAI-Responses-specific. Mechanical rename across 25 files (400 lines changed, 400 added): ResponseSession -> LlmSession response_session(s) -> llm_session(s) RESPONSE_SESSION_ -> LLM_SESSION_ IResponseSession* -> ILlmSession* Field `response_id` is kept on the record because Path A (Responses) keeps emitting OpenAI-style response ids on the wire; Path B will synthesize its own session identifier into the same field. Refs #642

…cade Path B is a stateless facade over the same LlmSessionGAgent + NyxIdLLMProvider + IResponsesCompletionApplicationService pipeline that serves /v1/responses. No new GAgent type is introduced — the previously renamed LlmSessionGAgent is protocol-neutral and the facade reuses it verbatim. cc-switch users running Claude Code can now point at the Aevatar/NyxID base URL and get an end-to-end Messages-protocol session. Surface (src/Aevatar.Mainnet.Host.Api/Messages/): POST /v1/messages AllowAnonymous, manual bearer extraction, mirrors /v1/responses auth model - non-streaming returns Anthropic message envelope JSON - streaming emits Anthropic SSE schedule: message_start content_block_start (text or tool_use) content_block_delta (text_delta / input_json_delta) content_block_stop message_delta message_stop Lossy translation boundary (documented inline + tested): - thinking block -> ChatMessage.ReasoningContent (lossless) - tool_use block -> ToolCall.ArgumentsJson (text args lossless; image args dropped) - tool_result block -> ChatMessage.Tool(callId, output) (text only) - cache_control -> ignored in v1 Stateless contract: Anthropic Messages has no previous_response_id, so each POST opens + closes its own LlmSession (24h TTL kept for parity with Path A audit/projection). Tests (test/Aevatar.Hosting.Tests/MainnetMessagesEndpointsTests.cs): PostMessages_NonStreaming_ShouldReturnAnthropicMessageEnvelope PostMessages_Streaming_ShouldEmitAnthropicSseFrames PostMessages_WithToolCall_ShouldEmitToolUseContentBlock PostMessages_WithoutBearer_ShouldReturn401WithAnthropicErrorEnvelope PostMessages_WithToolResultBlockInUserContent_ShouldFlattenIntoToolRoleMessage Refs #642

Defines the four-phase chain contract (accepted/committed/delivered/finalized) across ConversationGAgent / IChannelLlmReplyRunDispatcher / AgentRunGAgent with a single observable state per phase. Locks in: - AgentRunStatus 5-state proto enum, with REPLY_HANDED_OFF replacing the reply_dispatched bool (which becomes reserved). - ConversationState.last_reply_delivery single field carrying user-visible delivery ack from the channel sink. - Typed DispatchOutcome on IChannelLlmReplyRunDispatcher.DispatchAsync so the synchronous return point only promises accepted. - finalized as an absorbing state; late/stale signals must no-op. Companion docs/canon/lark-reply-completion-semantics.md adds sequence diagrams, failure matrix, state machine views, and the implementation checklist used by the upcoming code-side commits (issues #647 / #648 / #649). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Lands the chain-level contract pieces of ADR-0021 that the rest of the reply chain work depends on: agent_run.proto - AgentRunStatus gains AGENT_RUN_STATUS_REPLY_HANDED_OFF. - AgentRunGAgentState.reply_dispatched (field 12) marked reserved; the explicit status replaces the implicit bool. Legacy event replay now promotes straight to REPLY_HANDED_OFF. - New cleanup_completed_at_unix_ms field (combined with terminal status, this is the chain.finalized observable). - New AgentRunCleanupCompletedEvent for the terminal-cleanup write. conversation_state.proto / conversation_events.proto - New ReplyDeliveryStatus message (Pending / Delivered / DeliveryFailed) on ConversationGAgentState.last_reply_delivery as the chain.delivered observable. - New LlmReplyDeliveredEvent + LlmReplyDeliveryFailedEvent that drive ConversationGAgent into those outcomes. IChannelLlmReplyRunDispatcher - DispatchAsync now returns Task<DispatchOutcome> with a typed phase (Accepted / RejectedStale / RejectedDuplicate). The interface has no NuGet consumers and the three in-repo call sites are adapted in this commit; behaviour is preserved. AgentRunDispatcher - Performs a cheap freshness check at the boundary (mirrors AgentRunGAgent.MaxRunRequestAgeMs) and returns RejectedStale instead of enqueuing requests that the run actor would only drop. AgentRunGAgent - All read sites of State.ReplyDispatched move to status checks: REPLY_HANDED_OFF replaces (ReplyProduced && ReplyDispatched). - ApplyReplyProduced legacy-event path promotes status to REPLY_HANDED_OFF; ApplyReplyDispatched promotes committed -> handed-off; the new-event path leaves status at REPLY_PRODUCED until the dispatched event lands. Tests - ConversationGAgentDedupTests RecordingRunDispatcher mock adapted to the new return type. - AgentRunGAgentTests assertions migrated from ReplyDispatched bool to explicit REPLY_PRODUCED / REPLY_HANDED_OFF status expectations. Docs - canon checklist fixed to reflect that the handed-off transition uses the existing AgentRunReplyDispatchedEvent (no new event required) and that AgentRunCleanupCompletedEvent drives cleanup_completed_at. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…tionGAgent Makes ADR-0021 chain.delivered observable from ConversationGAgentState.last_reply_delivery instead of inference from log lines or channel sink return codes. ConversationGAgent now: * Persists LlmReplyDeliveredEvent before ConversationTurnCompletedEvent on the non-streaming success path (HandleLlmReplyReadyAsync -> RunLlmReplyAsync) and on the streaming completion path (PersistStreamedCompletionAsync — which is the unified streaming sink that all partial / full / failure-self-heal branches funnel through, so any user-visible content counts as delivered). * Persists LlmReplyDeliveryFailedEvent before ConversationContinueFailedEvent on the non-streaming failure path so DeliveryFailed carries the structured reason while the chain-finalizing failure event is still last. * Wires two new state-matcher entries (ApplyLastReplyDelivered / ApplyLastReplyDeliveryFailed) that populate ConversationGAgentState.last_reply_delivery — single-field by design, multi-turn history reconstructable from event log. Raise order is delivered → completed (resp. failed) so existing consumers of "events.Last() is ConversationTurnCompletedEvent / ConversationContinueFailedEvent" stay correct. Tests - HandleLlmReplyReadyAsync_WhenDuplicateCorrelationId_CollapsesToSingleOutboundCommit now expects 3 events (the new Delivered event sits between NeedsLlmReplyEvent and TurnCompleted), with a regression check that LlmReplyDeliveredEvent is present in the log. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ADR-0021 §6 / canon §8 streaming closeout contract — stream-local terminal, aggregated Usage on a single closeout point, FinishReason visibility — implemented at the run-actor boundary instead of inside ChatRuntime. Respects aevatar#596 phasing: ChatRuntime stays a transitional local loop (Phase A); the run-actor edge owns the contract surface (Phase A targets) so the eventual ChatRuntime tear-down in Phase B does not need to renegotiate it. IConversationReplyGenerator - GenerateReplyAsync now returns Task<ConversationReplyResult> with Text, Usage (ReplyTokenUsage), and FinishReason. ReplyTokenUsage is a Channel.Runtime-local projection mirroring AI.Abstractions.TokenUsage so Channel.Runtime does not gain a reverse layer dependency on AI.Abstractions (CLAUDE.md "依赖反转"). NyxIdConversationReplyGenerator - Aggregates Usage across all internal LLM rounds (tool-call loop) via SumUsage and tracks the last non-empty FinishReason. The foreach over the ChatRuntime stream extracts both before falling through to the DeltaContent-only forwarding to the streaming sink, so neither metric is dropped when DeltaContent is empty. AgentRunGAgent - Consumes ConversationReplyResult at the run-actor edge and logs the closeout once (runId / correlation / prompt|completion|total tokens / finishReason). LlmReplyReadyEvent semantics unchanged until a follow-up PR persists the closeout into actor state. ChatRuntime - Field-level patch only: NormalizeStreamChunk now forwards chunk.FinishReason on the projected stream chunk (previously swallowed). No restructuring; ChatRuntime remains transitional. Tests - 3 IConversationReplyGenerator mocks in AgentRunGAgentTests adapted to the new return type. - 5 ConversationReplyGeneratorTests reply assertions migrated to reply.Text. - New regression GenerateReplyAsync_AggregatesUsageAndFinishReasonAtActorEdge using UsageReportingProviderFactory mock — provider emits Usage on a mid-stream bookkeeping chunk and IsLast separately; the test asserts the actor-edge result carries both the aggregated tokens (7/11/18) and FinishReason ("stop"). Test state: 803 (ChannelRuntime) + 134 (Channel.Protocol) + 542 (AI) = 1479 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ADR-0021 §6 / canon §9 absorbing-finalized contract: late and stale reply-chain signals must no-op once a run has reached chain.finalized (terminal AgentRunStatus + cleanup_completed_at != 0 on the run actor; ProcessedCommandIds containment on the conversation actor). Lifts the previously ad-hoc terminal checks into single helpers and applies them to every handler entry. AgentRunGAgent - New AgentRunGAgent.IsTerminal(status) helper (internal static for test access) and instance overload covering Dropped / Failed / ReplyHandedOff. New IsCleanupAlreadyCompleted() against AgentRunGAgentState.cleanup_completed_at_unix_ms. - HandleStartAsync uses IsTerminal at entry; only re-schedules cleanup when cleanup has not already completed. - HandleCleanupAsync uses IsTerminal at entry, then short-circuits on cleanup_completed_at != 0 to keep destroy idempotent. On the valid path it now persists AgentRunCleanupCompletedEvent before IActorRuntime.DestroyAsync, so the chain.finalized observable (cleanup_completed_at != 0) survives replay. - State matcher wires ApplyCleanupCompleted to write cleanup_completed_at_unix_ms. ConversationGAgent - New IsLlmReplyTurnFinalized(correlationId) helper centralizes the `ProcessedCommandIds.Contains("llm:<corr>")` check. - HandleLlmReplyReadyAsync, HandleLlmReplyCardStreamChunkAsync, and HandleNyxRelayStreamingChunkCoreAsync all dedupe through the helper. - HandleDeferredLlmReplyDroppedAsync now dedupes the same way: a late drop notification for an already-finalized turn (run-actor cleanup callback fires after a successful reply already landed) no-ops instead of overwriting last_reply_delivery with a synthetic NotRetryable ConversationContinueFailedEvent. Tests - Five new #649 regressions on AgentRunGAgent covering the late- signal classes called out in the issue: * Duplicate cleanup callback destroys actor once + persists cleanup_completed_at. * Cleanup for a stale RunId no-ops, leaves cleanup_completed_at zero. * Cleanup before terminal status no-ops. * Duplicate start after cleanup_completed does not re-schedule a fresh cleanup callback or re-run the LLM. * Duplicate start after stale-gate Drop does not re-run the LLM and does not persist additional drop events. Test state: 808 ChannelRuntime + 134 Channel.Protocol + 542 AI = 1484 pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-05-12_lark-bot-reply-chain-regressions

…letons Issue #664 step 3b: ResponsesUserSkillsToolProvider must ctor-inject the same singleton instances that the IAgentToolSource enumeration returns to ChatRuntime, otherwise the two paths could diverge silently. Switch AddSkills / AddOrnnSkills to TryAddSingleton<concrete> plus a factory-based IAgentToolSource alias guarded against duplicate registration, so concrete-by-type and enumerable-by-interface resolve to one instance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

NyxID Responses 直连路径之前只把 ResponsesAevatarToolProvider 暴露的本地工具（TodoWrite/Task/WebFetch/WebSearch）注入到 LLMRequest，没有接入统一 skill 主干，导致 codex/Cursor 等通过 NyxID apikey 接 aevatar 的调用方无法使用自己在 Ornn 配置的 skills。主要改动： - IResponsesToolProvider 改为异步并带 typed ResponsesToolProviderContext （caller scope + tool metadata），与未来 IAgentToolSource.DiscoverToolsAsync 对齐；ResponsesToolClassifier.ClassifyAsync 同步重命名并按 effective tool name 去重 additive，避免同名工具向 LLM 重复发送，命中冲突写 warning。 - 新增 ResponsesUserSkillsToolProvider：显式 ctor-inject SkillsAgentToolSource + OrnnAgentToolSource 这两个 skill 主干，桥接 use_skill + ornn_search_skills 两个 additive 工具；不枚举 IAgentToolSource 集合，避免后续新增的 source 被误桥接。 - Mainnet host 追加 TryAddEnumerable<IResponsesToolProvider, ResponsesUserSkillsToolProvider>()。 - /v1/messages 显式继续传 Array.Empty<IResponsesToolProvider>() 以保持 Anthropic 客户端 tool harness 不被 Aevatar 工具 shadow。 - 测试：classifier 同名 additive 去重 + 异步签名； ResponsesUserSkillsToolProvider 只返回两个 skill 主干工具； /v1/responses 端到端验证桥接工具进入 LLMRequest.Tools； /v1/messages 回归测试验证 DI 中注册的 provider 不会注入到 Messages 路径。后两个均通过 revert-and-rerun 验证能捕捉对应 bug。 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

- docs/canon/nyxid-responses-direct.md: new canon doc describing the /v1/models /v1/responses /v1/messages surface exposed via the NyxID proxy, OpenRouter-style <service-slug>/<model> routing, and the Responses caller-scope/credential contract. - docs/canon/nyxid-llm-integration.md: cross-link Responses direct path and note the legacy bare-model gateway routing. - docs/operations/2026-05-13-aevatar-responses-via-nyxid-setup.md: refresh the Responses-via-NyxID setup walkthrough. - src/Aevatar.Mainnet.Host.Api/README.md: list /v1/models and /v1/messages alongside /v1/responses with the routing and tool-shadow caveats. - docs/2026-05-14-gagent-types-analysis.md: snapshot of the current GAgent inheritance landscape used while scoping issue #664 follow-ups. - docs/README.md: regenerate index for the new canon, ADR, and history entries that have landed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Paseo writes per-session run-continuation JSON to ./.sisyphus/run-continuation/ to track agent IDs across invocations. It's local tool state, not source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…eue failure ADR-0021 defines DispatchPhase.RejectedDuplicate but the dispatcher previously did a "GetAsync ?? CreateAsync" that silently re-enqueued start commands for a runId whose actor already existed. The envelope also used Guid.NewGuid() for the command id, so the run actor inbox had no envelope-level idempotency to fall back on. Changes: - Serialize dispatch under a SemaphoreSlim and reject when IActorRuntime.ExistsAsync(actorId) is true; outcome carries RejectedDuplicate + the existing actorId. - Build a deterministic commandId ("agent-run-start:{runId}") and put it on EventEnvelope.Runtime.Deduplication.OperationId so the inbox has a stable dedup key if the dispatcher boundary is ever bypassed. - If ProduceAsync fails after CreateAsync succeeded, destroy the freshly created run actor so a retry can re-create instead of bouncing on RejectedDuplicate. - Tests: parallel WhenAll dispatch returns exactly one Accepted + one RejectedDuplicate with the existing inbox unchanged; enqueue failure destroys the actor and a follow-up dispatch is accepted; stale request never reaches CreateAsync. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Commit 632d09f documented /v1/responses with conservative wording from before issue #664 landed: it said "Ornn 还没桥接" and "aevatar_* additive tool 接口已预留." Issue #664 (commits 6c11877, 4c15423) now ships ResponsesUserSkillsToolProvider, so the docs need to describe the shipped behavior: - canon/nyxid-responses-direct.md: split tool classes into three (substitute / additive / forwarded), name use_skill + ornn_search_skills as the live additive set, and call out that they walk through NyxID proxy under the caller's bearer (so a restricted NyxID API key must keep the Ornn API service in --allowed-services). - operations/2026-05-13-aevatar-responses-via-nyxid-setup.md: note that the bridge is /v1/responses-only, mention the Ornn-service allowlist step in the codex/Responses configuration block, refresh the Messages-client notes, and replace the "Ornn 还没桥接" troubleshooting row with bridge-aware ones. - src/Aevatar.Mainnet.Host.Api/README.md: replace the aevatar_* placeholder line with use_skill / ornn_search_skills additives and spell out that /v1/messages does NOT inject them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…-chain-regressions Test/2026 05 12 lark bot reply chain regressions

eanzhao and others added 3 commits May 3, 2026 18:47

Merge pull request #561 from aevatarAI/fix/2026-05-03_binding-and-rep…

00fa7c7

…ly-self-heal Self-heal binding rejections + streaming-failed reply-token loop

eanzhao requested review from jason-aelf and louis4li as code owners May 4, 2026 07:03

eanzhao changed the title ~~Lark bot enhancement~~ [WIP] Lark bot enhancement May 4, 2026

eanzhao and others added 14 commits May 4, 2026 16:58

Fix reasoning_content round-trip: outbound serialization, fallback pa…

2690a61

…ths, history persistence

Make RemoveEntryAsync throw on missing scope, add no-scope read tests…

f1ddc4b

…, remove dead try-catch

Merge pull request #566 from aevatarAI/fix/2026-05-04_user-memory-scope

6505a06

Gracefully handle missing scope in UserMemoryStore (fixes #564)

Add tests for reasoning_content round-trip coverage

99e56e3

Add DSML and hook-blocking tests for reasoning_content coverage

e757a3b

Merge pull request #565 from aevatarAI/fix/2026-05-04_deepseek-reason…

a016c19

…ing-content Propagate reasoning_content through LLM pipeline (fixes #563)

Fix DeepSeek reasoning serialization

58ad94f

Fix stale binding projection repair

dc436a1

Activate binding projection during self heal

41a0a70

Delete stale binding readmodel on revoke

419516c

List proxy LLM route candidates

f65a80d

Restore owner LLM fallback for Lark users

8001adf

eanzhao commented May 5, 2026

View reviewed changes

eanzhao and others added 17 commits May 13, 2026 17:25

Add actor handoff regression tests for lark reply chain

9462fa1

Merge branch 'dev' into feature/lark-bot

72ed9ce

Merge branch 'dev' into feature/lark-bot

1e00c80

Add lark-bot reply chain regression tests

f603372

Fix messages facade protocol gaps

731e357

Fix Studio member identity routing

a2aca48

eanzhao requested review from AbigailDeng and potter-sun as code owners May 15, 2026 07:45

YueZh127 and others added 11 commits May 15, 2026 15:52

Merge remote-tracking branch 'origin/feature/lark-bot' into test/2026…

e9699ea

…-05-12_lark-bot-reply-chain-regressions

Update lark-bot reply chain regression coverage

ad5f2bb

Ignore .sisyphus/ paseo agent run-continuation state

7d1fe46

Paseo writes per-session run-continuation JSON to ./.sisyphus/run-continuation/ to track agent IDs across invocations. It's local tool state, not source. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Merge latest lark-bot into regression branch

e17ec1c

Fix reply generator regression assertions

04a8920

Merge pull request #651 from aevatarAI/test/2026-05-12_lark-bot-reply…

73d8bad

…-chain-regressions Test/2026 05 12 lark bot reply chain regressions

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Lark bot enhancement#562

[WIP] Lark bot enhancement#562
eanzhao wants to merge 133 commits into
devfrom
feature/lark-bot

eanzhao commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

eanzhao May 5, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

eanzhao commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov Bot commented May 4, 2026 •

edited

Loading