Skip to content

feat: add agent content focus extraction#143

Merged
chaliy merged 1 commit into
mainfrom
codex/agent-extraction-mode
Jul 4, 2026
Merged

feat: add agent content focus extraction#143
chaliy merged 1 commit into
mainfrom
codex/agent-extraction-mode

Conversation

@chaliy

@chaliy chaliy commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

What

Add low-noise content_focus modes for AI agents: readable and agent. The default fetcher now reports the extraction method in page metadata and MCP/help/docs expose the option. Also relaxes the git attribution guidance so GIT_USER_NAME/GIT_USER_EMAIL are not a blocker and YOLOP attribution is allowed when requested.

Why

FetchKit is primarily an agent tool, so callers need cleaner content than full-page HTML conversion when pages include nav/sidebar/footer noise. Contributors also should not be blocked by env-only git user setup when normal git config is present.

How

Implemented deterministic readability scoring over bounded semantic/content containers, with fallback to existing semantic boilerplate stripping. Added unit and fetcher coverage for the new extraction path, updated specs, README, and tool help.

Risk

  • Low / Medium
  • New behavior is opt-in via content_focus; default full behavior remains unchanged. Heuristic extraction may miss some pages and fall back to main.

Checklist

  • Unit tests are passed
  • Smoke tests are passed
  • Documentation is updated
  • Specs are up to date and not in conflict

@chaliy chaliy marked this pull request as ready for review July 4, 2026 22:29
@chaliy chaliy merged commit 9b6e75c into main Jul 4, 2026
11 checks passed
@chaliy chaliy deleted the codex/agent-extraction-mode branch July 4, 2026 22:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant