Skip to content

Fix provider, tool safety, i18n, and cloud stability issues#97

Merged
guima-why merged 16 commits into
mainfrom
fix_issue_260603
Jun 5, 2026
Merged

Fix provider, tool safety, i18n, and cloud stability issues#97
guima-why merged 16 commits into
mainfrom
fix_issue_260603

Conversation

@guima-why

Copy link
Copy Markdown
Collaborator

Summary

This PR fixes a batch of agent runtime, provider streaming, tool safety, cloud deployment, persistence, UI/i18n, and documentation issues reported across GitHub issues #67, #68, #76, #77, #78, #80, #81, #82, #83, #84, #85, #87, #88, #89, #90, and #92.

It also rebases the branch onto the latest origin/main, resolves translation conflicts, runs make translate, and keeps generated translation artifacts (messages.pot, messages.mo) and root docs/ files out of the final branch diff while leaving them available locally.

Fixed Issues

Fixes #67 - Provider streaming fallback, watchdog, and qwenpaw shutdown safety

  • Prevented transient provider failures from permanently downgrading the active model by using a temporary fallback provider/model for the retry path instead of mutating ProviderManager state.
  • Wrapped streaming __anext__() calls with an idle timeout so half-open or stalled streams can fall back instead of hanging forever.
  • Converted qwenpaw config-load failures from sys.exit(1) inside the async generator into a provider configuration error that is surfaced as a non-retryable stream error event.
  • Added regression coverage for fallback state preservation, stream timeout behavior, fallback telemetry, and configuration error handling.

Fixes #68 - Skill renderer shell execution injection after parameter substitution

  • Hardened skill rendering so shell execution checks apply after arguments are rendered, preventing $ARGUMENTS or named parameters from injecting new shell execution blocks that bypass the original permission check.
  • Added escaping/sanitization paths for rendered skill parameters and tightened project skill shell execution handling.
  • Updated skill renderer and skill tool tests to cover injected fenced shell blocks, bang-prefixed shell lines, and post-render permission behavior.

Fixes #76 - /status context token estimation excludes tools and undercounts non-OpenAI models

  • Added tool definition token accounting to ContextManager and synchronized tool definitions from AgentLoop whenever the registry/provider changes.
  • Added model-family-aware fallback token estimation for Qwen/Kimi/GLM/Doubao/MiniMax/Gemini-style models, improving CJK-heavy token estimates where cl100k_base is a poor proxy.
  • Updated /status and context manager tests so context usage includes tool schemas and no longer significantly underestimates API input tokens.

Fixes #77 - ROS stack status, telemetry, and async blocking behavior

  • Narrowed stack success detection to genuine success states, excluding rollback and delete completion states from is_success.
  • Moved deployment success telemetry to the point where polling confirms the final stack operation succeeded, rather than immediately after create/update API submission.
  • Wrapped synchronous ROS SDK calls in async thread execution so stack create/update/delete/status/resource calls do not block the event loop.
  • Added ROS stack tests for rollback/delete statuses, telemetry timing, async SDK wrapping, stack outputs, and failure reporting.

Fixes #78 - Bash readonly classification: broad pip matching and duplicated sed detection

  • Tightened pip-like command matching so pip, pip3, and versioned pip executables match, while unrelated commands such as pipx, pip-audit, and pipeline-* do not inherit pip readonly behavior.
  • Centralized sed in-place detection so permission and readonly classification use one shared implementation.
  • Added readonly command tests for pip-like names and sed in-place variants.

Fixes #80 - Atomic writes and incorrect decoding/version comparison

  • Corrected session index fallback JSON string decoding so escaped backslashes are handled before newline/tab/quote unescaping.
  • Switched config and input history persistence to atomic temp-file plus replace flows to avoid partial/corrupt writes on concurrent sessions or interrupted processes.
  • Changed invalid-version update comparisons to return False instead of lexicographic ordering, preventing bogus update decisions such as 9.0.0-local > 10.0.0-local.
  • Added tests for atomic config/history writes, session index decoding, and invalid version handling.

Fixes #81 - Aliyun credential robustness and OAuth HTTP client lifecycle

  • Made Aliyun CLI config parsing skip malformed profiles instead of crashing when a profile lacks a name field.
  • Added defensive handling for malformed profile data and credential lookup fallbacks.
  • Added lifecycle management for internally-created OAuth HTTP clients and updated OAuth flows to close clients after browser login/refresh operations.
  • Added tests for malformed CLI configs, profile selection, OAuth refresh, and HTTP client close behavior.

Fixes #82 - Context compaction splits tool-use round trips

  • Reworked compaction split selection so preserved messages respect tool-use/tool-result boundaries instead of blindly using preserve_recent_turns * 2.
  • Prevented compacted contexts from starting with orphaned tool results or separating assistant tool_use blocks from their matching tool_result messages.
  • Added context manager tests for tool-use round trips and safe split points.

Fixes #83 - Tool interface and fallback correctness issues

  • Updated memory tools to match the base tool interface contract by using keyword-only execute parameters.
  • Made Python grep fallback match globs against relative paths so src/**/*.py behaves consistently with rg --glob.
  • Made list_files skip entries that cannot be statted, including broken symlinks, instead of failing the entire directory listing.
  • Added tests for memory tool signatures, grep fallback path globs, and broken symlink directory listing.

Fixes #84 - UI/status display and shell history performance

  • Fixed /model custom base URL telemetry by reading apiBase from the active provider config instead of treating activeProvider as a dict.
  • Improved compact token formatting so thousands are rounded with better precision instead of truncating with integer division.
  • Added mtime/size-based shell history caching so suggestions do not reread and parse the full shell history file on every keystroke.
  • Added tests for model telemetry, status formatting, and shell history cache refresh behavior.

Fixes #85 - i18n pluralization uses English suffix rules

  • Added ngettext support in the i18n layer and replaced hardcoded English plural suffix formatting in resume picker UI with gettext plural forms.
  • Updated translation catalogs for all supported locales via make translate.
  • Added tests for localized plural rendering and translation completeness.

Fixes #87 - Read-only tools auto-approve sensitive file reads

  • Added path safety checks for read-class tools so out-of-project and sensitive paths are not silently auto-approved.
  • Added a trusted root mechanism for explicitly allowed read directories and wired it through permission contexts, loaders, ACP terminal rules, and shell escape flows.
  • Applied sensitive-path checks to read-only bash commands such as cat, grep, head, and related file-reading commands.
  • Added tests for project boundaries, trusted roots, sensitive files, ACP permission rules, and shell read permissions.

Fixes #88 - Readonly whitelist commands allow dangerous exec/delete arguments

  • Added argument-level bash safety checks for dangerous readonly-looking commands: find/fd exec/delete flags, GNU sed shell execution, ripgrep --pre, sort --compress-program, and related unsafe forms.
  • Added robust argv parsing helpers so command checks reason over parsed arguments rather than only command basenames or raw strings.
  • Added tests for dangerous find, fd, sed, rg, and sort invocations across permission and readonly classification paths.

Fixes #89 - read_file and web_fetch read full content before truncation

  • Added hard read limits for file reads, including byte and line caps, binary/oversized file handling, and clearer truncation reporting.
  • Reworked web fetch to stream response bytes up to the configured cap instead of reading the complete body first via response.text.
  • Added resource-exhaustion tests for large files, large HTTP responses, streaming truncation, and decode behavior.

Fixes #90 - AgentTool background task lifecycle and event queue cross-talk

  • Tracked background agent tasks through task state/manager plumbing so task lifecycle can be updated and stopped more predictably.
  • Ensured background task cancellation and exceptions close event queues and update task state instead of leaving fire-and-forget tasks unmanaged.
  • Moved per-call event queue handling away from shared cross-talk-prone state and made AgentTool execution concurrency-safe behavior match the actual lifecycle constraints.
  • Added tests for background success/failure/cancellation, queue closure, task stop behavior, and concurrent event queue isolation.

Fixes #92 - ShellHistoryProvider rereads full history on each keystroke

  • Added shell history caching keyed by file metadata so repeated suggestions reuse parsed entries until the history file changes.
  • Preserved deduplication and suggestion filtering while avoiding repeated full-file reads during interactive typing.
  • Added tests for cache hits, cache invalidation, and bounded suggestion behavior.

Additional Notes

  • Updated ACP protocol reference docs under website/docs and all website i18n copies.
  • Kept root docs/ planning files, src/iac_code/i18n/messages.pot, and compiled messages.mo files out of git tracking; they remain generated/ignored artifacts.
  • Rebased onto the latest origin/main and reviewed compatibility with new main changes in update checking, provider robustness, and agent stream termination.

Test Plan

  • PATH=/Users/ehzyo/.local/bin:$PATH make translate
  • PATH=/Users/ehzyo/.local/bin:$PATH make lint
  • PATH=/Users/ehzyo/.local/bin:$PATH make test (4454 passed, 244 warnings)
  • Rebase compatibility spot-check for agent_loop, update_checker, OpenAI/OpenRouter provider changes, and ProviderManager fallback behavior

guima-why added 16 commits June 5, 2026 00:01
Fix context compaction so tool_use/tool_result round trips are not split across summarized and preserved messages.

Track background agent asyncio tasks in TaskManager so /tasks stop and task_stop cancel running coroutines, consume background task exceptions, and avoid shared event_queue state on concurrency-safe AgentTool calls.

Fixes #82. Fixes #90.
Fix issue #80 by adding a shared atomic text writer, using it for YAML config and input history saves, preserving truncated session JSON escapes, and disabling lexicographic comparison for invalid update versions.
Fixes #92, #84, #85, and the approved medium scope of #76.\n\n- Cache shell history suggestions by history file identity and cap results.\n- Fix /model custom apiBase detection and preserve custom apiBase on model switch.\n- Improve /status compact token formatting.\n- Use gettext ngettext for resume picker plurals and update catalogs.\n- Include tool definitions in context usage and add model-family token estimate fallbacks.
Fix ROS stack success detection, terminal telemetry timing, and blocking SDK calls.

Harden Aliyun CLI credential loading and close OAuth clients on owned lifecycle paths.

Refs #77, #81.
@guima-why guima-why merged commit fce3881 into main Jun 5, 2026
14 checks passed
@guima-why guima-why deleted the fix_issue_260603 branch June 5, 2026 02:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment