Fix provider, tool safety, i18n, and cloud stability issues by guima-why · Pull Request #97 · aliyun/iac-code

guima-why · 2026-06-05T01:45:50Z

Summary

This PR fixes a batch of agent runtime, provider streaming, tool safety, cloud deployment, persistence, UI/i18n, and documentation issues reported across GitHub issues #67, #68, #76, #77, #78, #80, #81, #82, #83, #84, #85, #87, #88, #89, #90, and #92.

It also rebases the branch onto the latest origin/main, resolves translation conflicts, runs make translate, and keeps generated translation artifacts (messages.pot, messages.mo) and root docs/ files out of the final branch diff while leaving them available locally.

Fixed Issues

Fixes #67 - Provider streaming fallback, watchdog, and qwenpaw shutdown safety

Prevented transient provider failures from permanently downgrading the active model by using a temporary fallback provider/model for the retry path instead of mutating ProviderManager state.
Wrapped streaming __anext__() calls with an idle timeout so half-open or stalled streams can fall back instead of hanging forever.
Converted qwenpaw config-load failures from sys.exit(1) inside the async generator into a provider configuration error that is surfaced as a non-retryable stream error event.
Added regression coverage for fallback state preservation, stream timeout behavior, fallback telemetry, and configuration error handling.

Fixes #68 - Skill renderer shell execution injection after parameter substitution

Hardened skill rendering so shell execution checks apply after arguments are rendered, preventing $ARGUMENTS or named parameters from injecting new shell execution blocks that bypass the original permission check.
Added escaping/sanitization paths for rendered skill parameters and tightened project skill shell execution handling.
Updated skill renderer and skill tool tests to cover injected fenced shell blocks, bang-prefixed shell lines, and post-render permission behavior.

Fixes #76 - `/status` context token estimation excludes tools and undercounts non-OpenAI models

Added tool definition token accounting to ContextManager and synchronized tool definitions from AgentLoop whenever the registry/provider changes.
Added model-family-aware fallback token estimation for Qwen/Kimi/GLM/Doubao/MiniMax/Gemini-style models, improving CJK-heavy token estimates where cl100k_base is a poor proxy.
Updated /status and context manager tests so context usage includes tool schemas and no longer significantly underestimates API input tokens.

Fixes #77 - ROS stack status, telemetry, and async blocking behavior

Narrowed stack success detection to genuine success states, excluding rollback and delete completion states from is_success.
Moved deployment success telemetry to the point where polling confirms the final stack operation succeeded, rather than immediately after create/update API submission.
Wrapped synchronous ROS SDK calls in async thread execution so stack create/update/delete/status/resource calls do not block the event loop.
Added ROS stack tests for rollback/delete statuses, telemetry timing, async SDK wrapping, stack outputs, and failure reporting.

Fixes #78 - Bash readonly classification: broad pip matching and duplicated sed detection

Tightened pip-like command matching so pip, pip3, and versioned pip executables match, while unrelated commands such as pipx, pip-audit, and pipeline-* do not inherit pip readonly behavior.
Centralized sed in-place detection so permission and readonly classification use one shared implementation.
Added readonly command tests for pip-like names and sed in-place variants.

Fixes #80 - Atomic writes and incorrect decoding/version comparison

Corrected session index fallback JSON string decoding so escaped backslashes are handled before newline/tab/quote unescaping.
Switched config and input history persistence to atomic temp-file plus replace flows to avoid partial/corrupt writes on concurrent sessions or interrupted processes.
Changed invalid-version update comparisons to return False instead of lexicographic ordering, preventing bogus update decisions such as 9.0.0-local > 10.0.0-local.
Added tests for atomic config/history writes, session index decoding, and invalid version handling.

Fixes #81 - Aliyun credential robustness and OAuth HTTP client lifecycle

Made Aliyun CLI config parsing skip malformed profiles instead of crashing when a profile lacks a name field.
Added defensive handling for malformed profile data and credential lookup fallbacks.
Added lifecycle management for internally-created OAuth HTTP clients and updated OAuth flows to close clients after browser login/refresh operations.
Added tests for malformed CLI configs, profile selection, OAuth refresh, and HTTP client close behavior.

Fixes #82 - Context compaction splits tool-use round trips

Reworked compaction split selection so preserved messages respect tool-use/tool-result boundaries instead of blindly using preserve_recent_turns * 2.
Prevented compacted contexts from starting with orphaned tool results or separating assistant tool_use blocks from their matching tool_result messages.
Added context manager tests for tool-use round trips and safe split points.

Fixes #83 - Tool interface and fallback correctness issues

Updated memory tools to match the base tool interface contract by using keyword-only execute parameters.
Made Python grep fallback match globs against relative paths so src/**/*.py behaves consistently with rg --glob.
Made list_files skip entries that cannot be statted, including broken symlinks, instead of failing the entire directory listing.
Added tests for memory tool signatures, grep fallback path globs, and broken symlink directory listing.

Fixes #84 - UI/status display and shell history performance

Fixed /model custom base URL telemetry by reading apiBase from the active provider config instead of treating activeProvider as a dict.
Improved compact token formatting so thousands are rounded with better precision instead of truncating with integer division.
Added mtime/size-based shell history caching so suggestions do not reread and parse the full shell history file on every keystroke.
Added tests for model telemetry, status formatting, and shell history cache refresh behavior.

Fixes #85 - i18n pluralization uses English suffix rules

Added ngettext support in the i18n layer and replaced hardcoded English plural suffix formatting in resume picker UI with gettext plural forms.
Updated translation catalogs for all supported locales via make translate.
Added tests for localized plural rendering and translation completeness.

Fixes #87 - Read-only tools auto-approve sensitive file reads

Added path safety checks for read-class tools so out-of-project and sensitive paths are not silently auto-approved.
Added a trusted root mechanism for explicitly allowed read directories and wired it through permission contexts, loaders, ACP terminal rules, and shell escape flows.
Applied sensitive-path checks to read-only bash commands such as cat, grep, head, and related file-reading commands.
Added tests for project boundaries, trusted roots, sensitive files, ACP permission rules, and shell read permissions.

Fixes #88 - Readonly whitelist commands allow dangerous exec/delete arguments

Added argument-level bash safety checks for dangerous readonly-looking commands: find/fd exec/delete flags, GNU sed shell execution, ripgrep --pre, sort --compress-program, and related unsafe forms.
Added robust argv parsing helpers so command checks reason over parsed arguments rather than only command basenames or raw strings.
Added tests for dangerous find, fd, sed, rg, and sort invocations across permission and readonly classification paths.

Fixes #89 - `read_file` and `web_fetch` read full content before truncation

Added hard read limits for file reads, including byte and line caps, binary/oversized file handling, and clearer truncation reporting.
Reworked web fetch to stream response bytes up to the configured cap instead of reading the complete body first via response.text.
Added resource-exhaustion tests for large files, large HTTP responses, streaming truncation, and decode behavior.

Fixes #90 - AgentTool background task lifecycle and event queue cross-talk

Tracked background agent tasks through task state/manager plumbing so task lifecycle can be updated and stopped more predictably.
Ensured background task cancellation and exceptions close event queues and update task state instead of leaving fire-and-forget tasks unmanaged.
Moved per-call event queue handling away from shared cross-talk-prone state and made AgentTool execution concurrency-safe behavior match the actual lifecycle constraints.
Added tests for background success/failure/cancellation, queue closure, task stop behavior, and concurrent event queue isolation.

Fixes #92 - ShellHistoryProvider rereads full history on each keystroke

Added shell history caching keyed by file metadata so repeated suggestions reuse parsed entries until the history file changes.
Preserved deduplication and suggestion filtering while avoiding repeated full-file reads during interactive typing.
Added tests for cache hits, cache invalidation, and bounded suggestion behavior.

Additional Notes

Updated ACP protocol reference docs under website/docs and all website i18n copies.
Kept root docs/ planning files, src/iac_code/i18n/messages.pot, and compiled messages.mo files out of git tracking; they remain generated/ignored artifacts.
Rebased onto the latest origin/main and reviewed compatibility with new main changes in update checking, provider robustness, and agent stream termination.

Test Plan

PATH=/Users/ehzyo/.local/bin:$PATH make translate
PATH=/Users/ehzyo/.local/bin:$PATH make lint
PATH=/Users/ehzyo/.local/bin:$PATH make test (4454 passed, 244 warnings)
Rebase compatibility spot-check for agent_loop, update_checker, OpenAI/OpenRouter provider changes, and ProviderManager fallback behavior

Fix context compaction so tool_use/tool_result round trips are not split across summarized and preserved messages. Track background agent asyncio tasks in TaskManager so /tasks stop and task_stop cancel running coroutines, consume background task exceptions, and avoid shared event_queue state on concurrency-safe AgentTool calls. Fixes #82. Fixes #90.

Fix issue #80 by adding a shared atomic text writer, using it for YAML config and input history saves, preserving truncated session JSON escapes, and disabling lexicographic comparison for invalid update versions.

Fixes #92, #84, #85, and the approved medium scope of #76.\n\n- Cache shell history suggestions by history file identity and cap results.\n- Fix /model custom apiBase detection and preserve custom apiBase on model switch.\n- Improve /status compact token formatting.\n- Use gettext ngettext for resume picker plurals and update catalogs.\n- Include tool definitions in context usage and add model-family token estimate fallbacks.

Fix ROS stack success detection, terminal telemetry timing, and blocking SDK calls. Harden Aliyun CLI credential loading and close OAuth clients on owned lifecycle paths. Refs #77, #81.

guima-why added 16 commits June 5, 2026 00:01

fix: stabilize provider streaming fallback

9a142ee

docs: document provider streaming integration

16c3e36

fix: isolate skill renderer shell execution

98fae95

docs: document skill renderer shell fix merge

06c0d01

fix: correct tool implementation edge cases

7000a1d

fix: make state persistence atomic

3231712

Fix issue #80 by adding a shared atomic text writer, using it for YAML config and input history saves, preserving truncated session JSON escapes, and disabling lexicographic comparison for invalid update versions.

fix: stabilize cloud provider operations

647d0e5

Fix ROS stack success detection, terminal telemetry timing, and blocking SDK calls. Harden Aliyun CLI credential loading and close OAuth clients on owned lifecycle paths. Refs #77, #81.

fix: harden tool safety permissions

f743443

docs: document tool safety permissions merge

a16f3b9

fix: stabilize session trusted root config lookup

af9cc38

Fix bash safety and agent i18n review issues

9b9e65e

chore: stop tracking ignored artifacts

25c4b86

chore: refresh translations after rebase

25a1a96

test: make read_file utf8 limit test portable

3a6929c

guima-why merged commit fce3881 into main Jun 5, 2026
14 checks passed

guima-why deleted the fix_issue_260603 branch June 5, 2026 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix provider, tool safety, i18n, and cloud stability issues#97

Fix provider, tool safety, i18n, and cloud stability issues#97
guima-why merged 16 commits into
mainfrom
fix_issue_260603

guima-why commented Jun 5, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

guima-why commented Jun 5, 2026

Summary

Fixed Issues

Fixes #67 - Provider streaming fallback, watchdog, and qwenpaw shutdown safety

Fixes #68 - Skill renderer shell execution injection after parameter substitution

Fixes #76 - /status context token estimation excludes tools and undercounts non-OpenAI models

Fixes #77 - ROS stack status, telemetry, and async blocking behavior

Fixes #78 - Bash readonly classification: broad pip matching and duplicated sed detection

Fixes #80 - Atomic writes and incorrect decoding/version comparison

Fixes #81 - Aliyun credential robustness and OAuth HTTP client lifecycle

Fixes #82 - Context compaction splits tool-use round trips

Fixes #83 - Tool interface and fallback correctness issues

Fixes #84 - UI/status display and shell history performance

Fixes #85 - i18n pluralization uses English suffix rules

Fixes #87 - Read-only tools auto-approve sensitive file reads

Fixes #88 - Readonly whitelist commands allow dangerous exec/delete arguments

Fixes #89 - read_file and web_fetch read full content before truncation

Fixes #90 - AgentTool background task lifecycle and event queue cross-talk

Fixes #92 - ShellHistoryProvider rereads full history on each keystroke

Additional Notes

Test Plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Fixes #76 - `/status` context token estimation excludes tools and undercounts non-OpenAI models

Fixes #89 - `read_file` and `web_fetch` read full content before truncation