Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 5 additions & 0 deletions .changeset/merge-read-media-into-read.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
---
"@moonshot-ai/kimi-code": minor
---

Merge media reading into the Read tool: image and video files are now returned as multimodal content directly, replacing the separate ReadMediaFile tool.
Original file line number Diff line number Diff line change
Expand Up @@ -83,8 +83,13 @@ const editChip: ChipProvider = (toolCall) => {

const writeChip: ChipProvider = (toolCall) => formatWriteChip(computeWriteStats(toolCall.args));

const readChip: ChipProvider = (_toolCall, result) =>
pluralize(countNonEmptyLines(result.output), 'line');
const readChip: ChipProvider = (toolCall, result) => {
// Media reads carry a content-part envelope; readMediaChip returns ''
// for anything else, falling back to the text line count.
const media = readMediaChip(toolCall, result);
if (media !== '') return media;
return pluralize(countNonEmptyLines(result.output), 'line');
};

const grepChip: ChipProvider = (_toolCall, result) => {
const matches = countNonEmptyLines(result.output);
Expand Down Expand Up @@ -118,6 +123,7 @@ const REGISTRY: Record<string, ChipProvider> = {
Edit: editChip,
Write: writeChip,
Read: readChip,
// Pre-merge media tool — kept so recorded sessions still render.
ReadMediaFile: readMediaChip,
Grep: grepChip,
Glob: globChip,
Expand Down
41 changes: 32 additions & 9 deletions apps/kimi-code/src/tui/components/messages/tool-renderers/media.ts
Original file line number Diff line number Diff line change
@@ -1,12 +1,16 @@
/**
* ReadMediaFile renderer.
* Media result renderer.
*
* The ReadMediaFile tool `output` is the JSON-serialized array of
* content parts the tool returned — which includes the full base64 of
* the image/video. Dumping that string into the transcript blasts a
* multi-screen blob of base64. This renderer parses the envelope and
* surfaces just the human-readable bits (kind, path, mime, size) via
* a header chip + a tiny expanded body. It never emits the base64.
* When Read hits an image or video, the tool `output` is the
* JSON-serialized array of content parts it returned — which includes
* the full base64 of the media. Dumping that string into the transcript
* blasts a multi-screen blob of base64. This renderer parses the
* envelope and surfaces just the human-readable bits (kind, path, mime,
* size) via a header chip + a tiny expanded body. It never emits the
* base64. Text reads fall through to the regular read summary.
*
* `ReadMediaFile` (the pre-merge media tool) keeps its registry entry so
* sessions recorded before the merge still render.
*
* On error, or when the output isn't the expected media envelope, we
* fall back to the truncated renderer so the user still sees the raw
Expand All @@ -18,6 +22,7 @@ import { Text } from '@earendil-works/pi-tui';
import chalk from 'chalk';

import type { ChipProvider } from './chip';
import { readSummary } from './summary';
import { renderTruncated } from './truncated';
import type { ResultRenderer } from './types';

Expand All @@ -31,7 +36,8 @@ export interface ReadMediaSummary {
}

const PATH_TAG_RE = /^<(image|video)\s+path="([^"]+)">$/;
const ORIGINAL_SIZE_RE = /original size\s+(\d+x\d+px)/;
const ORIGINAL_SIZE_RE =
/original size\s+(\d+x\d+px)|original dimensions:\s+(\d+)x(\d+)\s+pixels?/i;
const DATA_URL_RE = /^data:([^;]+);base64,(.*)$/s;

function bytesFromBase64(b64: string): number {
Expand Down Expand Up @@ -72,7 +78,13 @@ export function parseReadMediaOutput(output: string): ReadMediaSummary | null {
continue;
}
const size = ORIGINAL_SIZE_RE.exec(text);
if (size) originalSize = size[1];
if (size) {
if (size[1] !== undefined) {
originalSize = size[1];
} else if (size[2] !== undefined && size[3] !== undefined) {
originalSize = `${size[2]}x${size[3]}px`;
}
}
continue;
}

Expand Down Expand Up @@ -150,3 +162,14 @@ export const readMediaSummary: ResultRenderer = (toolCall, result, ctx) => {
out.push(new Text(` ${dim(tail.join(' · '))}`, 0, 0));
return out;
};

/**
* Read renders by content: a media envelope gets the media summary,
* anything else (numbered text lines, errors) the regular read summary.
*/
export const readOrMediaSummary: ResultRenderer = (toolCall, result, ctx) => {
if (!result.is_error && parseReadMediaOutput(result.output) !== null) {
return readMediaSummary(toolCall, result, ctx);
}
return readSummary(toolCall, result, ctx);
};
Original file line number Diff line number Diff line change
Expand Up @@ -10,15 +10,14 @@
* choose, so adding a new tool means appending one case.
*/

import { readMediaSummary } from './media';
import { readMediaSummary, readOrMediaSummary } from './media';
import { shellExecutionResultRenderer } from '../shell-execution';
import { goalSummary } from './goal';
import {
editSummary,
fetchSummary,
globSummary,
grepSummary,
readSummary,
thinkSummary,
webSearchSummary,
writeSummary,
Expand All @@ -39,7 +38,8 @@ export function isGenericToolResult(toolName: string): boolean {
export function pickResultRenderer(toolName: string): ResultRenderer {
switch (toolName) {
case 'Read':
return readSummary;
return readOrMediaSummary;
// Pre-merge media tool — kept so recorded sessions still render.
case 'ReadMediaFile':
return readMediaSummary;
case 'Grep':
Expand Down
2 changes: 1 addition & 1 deletion apps/kimi-code/src/tui/utils/image-attachment-store.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
* (640×480)]` / `[video #2 sample.mov]`). The placeholder is what the
* user sees in the input field; on submit, `extractMediaAttachments`
* walks the text and expands image placeholders to image content parts
* and video placeholders to file-path tags for `ReadMediaFile`.
* and video placeholders to file-path tags for `Read`.
*
* Scope is per-`KimiTUI` instance. Reloads (`/new`, `/clear`,
* session switch) call `clear()` so ids restart from 1 and stale
Expand Down
2 changes: 1 addition & 1 deletion apps/kimi-code/src/tui/utils/image-placeholder.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
* - Order is preserved for text/image/video segments. Image placeholders
* expand to image content parts so the prompt reaches the provider
* without relying on a model tool call. Video placeholders still expand
* to file-path tags so `ReadMediaFile` can own video upload behavior.
* to file-path tags so `Read` can own video upload behavior.
* - Adjacent text segments are flattened — empty / whitespace-only
* segments drop out so we never emit `{type:'text', text:' '}`
* noise between two media parts.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,9 @@ import {
parseReadMediaOutput,
readMediaChip,
readMediaSummary,
readOrMediaSummary,
} from '#/tui/components/messages/tool-renderers/media';
import { pickChip } from '#/tui/components/messages/tool-renderers/chip';
import { darkColors } from '#/tui/theme/colors';
import type { ToolCallBlockData, ToolResultBlockData } from '#/tui/types';

Expand Down Expand Up @@ -35,10 +37,17 @@ const PNG_DATA_URL = `data:image/png;base64,${PNG_B64}`;

function imageOutput(path: string, b64 = PNG_B64, mime = 'image/png'): string {
return JSON.stringify([
{
type: 'text',
text:
`<system>Read image file. Mime type: ${mime}. Size: 70 bytes. ` +
'Original dimensions: 1x1 pixels. If you need to output coordinates, ' +
'output relative coordinates first and compute absolute coordinates using the original image size. ' +
'If you generate or edit images or videos via commands or scripts, read the result back immediately before continuing.</system>',
},
{ type: 'text', text: `<image path="${path}">` },
{ type: 'image_url', imageUrl: { url: `data:${mime};base64,${b64}` } },
{ type: 'text', text: '</image>' },
{ type: 'text', text: `Loaded image file "${path}" (${mime}, 70 bytes, original size 1x1px).` },
]);
}

Expand Down Expand Up @@ -121,6 +130,7 @@ describe('readMediaSummary renderer', () => {
);
expect(out).toContain('/tmp/a.png');
expect(out).toContain('image/png');
expect(out).toContain('1x1px');
// Crucially: the base64 must never reach the screen.
expect(out).not.toContain(PNG_B64);
expect(out).not.toContain(PNG_DATA_URL);
Expand Down Expand Up @@ -148,3 +158,36 @@ describe('readMediaSummary renderer', () => {
expect(out).toContain('some plain string output');
});
});

describe('Read content dispatch', () => {
it('routes a media envelope to the media summary', () => {
const out = strip(
joinRender(readOrMediaSummary(call('Read'), result(imageOutput('/tmp/a.png')), expandedCtx)),
);
expect(out).toContain('/tmp/a.png');
expect(out).toContain('image/png');
expect(out).not.toContain(PNG_B64);
});

it('routes numbered text lines to the regular read summary (empty collapsed body)', () => {
const out = joinRender(readOrMediaSummary(call('Read'), result('1\thello\n2\tworld'), ctx));
expect(out.trim()).toBe('');
});

it('Read chip shows media meta for media reads and line counts for text reads', () => {
const chip = pickChip('Read');
expect(chip).toBeDefined();
const mediaText = strip(chip!(call('Read'), result(imageOutput('/tmp/a.png'))));
expect(mediaText).toMatch(/image/);
expect(mediaText).toContain('image/png');
const textText = strip(chip!(call('Read'), result('1\thello\n2\tworld')));
expect(textText).toBe('2 lines');
});

it('keeps the legacy ReadMediaFile chip entry for recorded sessions', () => {
const chip = pickChip('ReadMediaFile');
expect(chip).toBeDefined();
const text = strip(chip!(call('ReadMediaFile'), result(imageOutput('/tmp/a.png'))));
expect(text).toMatch(/image/);
});
});
7 changes: 2 additions & 5 deletions docs/en/reference/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@ File tools handle reading, writing, and searching the local filesystem — the f

| Tool | Default Approval | Description |
| --- | --- | --- |
| `Read` | Auto-allow | Read a text file's contents |
| `Read` | Auto-allow | Read a text, image, or video file |
| `Write` | Requires approval | Create or overwrite a file |
| `Edit` | Requires approval | Precise string replacement |
| `Grep` | Auto-allow | Full-text search powered by ripgrep |
| `Glob` | Auto-allow | Find files by glob pattern |
| `ReadMediaFile` | Auto-allow | Read an image or video file |

**`Read`** accepts a file path (`path`) plus optional `line_offset` (starting line number; negative values count from the end) and `n_lines` (maximum number of lines to read). Returns at most 1000 lines or 100 KB per call; content beyond that limit is accompanied by a truncation notice. If the file is an image or video, the tool suggests using `ReadMediaFile` instead.
**`Read`** accepts a file path (`path`) plus optional `line_offset` (starting line number; negative values count from the end) and `n_lines` (maximum number of lines to read). The file kind is detected by extension and magic bytes: text files return at most 1000 lines or 100 KB per call, with a truncation notice beyond that limit; images and videos are sent to the model as multimodal content (`line_offset` / `n_lines` are ignored for them) with a 100 MB size limit, subject to the current model's vision capabilities (`image_in` / `video_in`).

**`Write`** accepts `path`, `content`, and an optional `mode` (`overwrite` or `append`; defaults to overwrite). The parent directory must already exist; `append` mode appends content to the end of the file without automatically adding a newline.

Expand All @@ -27,8 +26,6 @@ File tools handle reading, writing, and searching the local filesystem — the f

**`Glob`** matches files in a specified directory (`path`; defaults to the working directory) by glob pattern (`pattern`). Results are sorted by modification time in descending order, with a maximum of 1000 entries. Pure wildcard patterns (e.g., `**`) and patterns containing brace expansion (`{a,b,c}`) are rejected.

**`ReadMediaFile`** sends an image or video to the model as multimodal content. Accepts only `path`; the file size limit is 100 MB. Availability depends on the current model's vision capabilities (`image_in` / `video_in`).

## Shell

| Tool | Default Approval | Description |
Expand Down
7 changes: 2 additions & 5 deletions docs/zh/reference/tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,13 @@

| 工具 | 默认审批 | 说明 |
| --- | --- | --- |
| `Read` | 自动放行 | 读取文本文件内容 |
| `Read` | 自动放行 | 读取文本、图片或视频文件 |
| `Write` | 需审批 | 创建或覆盖文件 |
| `Edit` | 需审批 | 精确字符串替换 |
| `Grep` | 自动放行 | 基于 ripgrep 的全文搜索 |
| `Glob` | 自动放行 | 按 glob 模式查找文件 |
| `ReadMediaFile` | 自动放行 | 读取图片或视频文件 |

**`Read`** 接受文件路径(`path`)以及可选的 `line_offset`(起始行号,支持负数从末尾倒数)和 `n_lines`(读取行数上限)。单次最多返回 1000 行或 100 KB,超出部分会附带截断提示。如果文件是图片或视频,工具会提示改用 `ReadMediaFile`
**`Read`** 接受文件路径(`path`)以及可选的 `line_offset`(起始行号,支持负数从末尾倒数)和 `n_lines`(读取行数上限)。文件类型由扩展名和魔数自动识别:文本文件单次最多返回 1000 行或 100 KB,超出部分会附带截断提示;图片和视频以多模态内容发送给模型(`line_offset` / `n_lines` 对其无效),文件大小上限 100 MB,是否支持取决于当前模型的视觉能力(`image_in` / `video_in`)

**`Write`** 接受 `path`、`content` 和可选的 `mode`(`overwrite` 或 `append`,默认覆盖)。父目录必须已存在;`append` 模式将内容追加到文件末尾,不自动添加换行。

Expand All @@ -27,8 +26,6 @@

**`Glob`** 按 glob 模式(`pattern`)在指定目录(`path`,默认工作目录)中匹配文件,结果按修改时间倒序排列,最多返回 1000 条。纯通配符模式(如 `**`)和含花括号扩展(`{a,b,c}`)的模式会被拒绝。

**`ReadMediaFile`** 将图片或视频以多模态内容发送给模型,仅接受 `path`,文件大小上限 100 MB。是否可用取决于当前模型的视觉能力(`image_in` / `video_in`)。

## Shell

| 工具 | 默认审批 | 说明 |
Expand Down
4 changes: 2 additions & 2 deletions packages/acp-adapter/src/kaos-acp.ts
Original file line number Diff line number Diff line change
Expand Up @@ -133,7 +133,7 @@ export class AcpKaos implements Kaos {
/**
* Binary reads bypass the ACP text RPC by design: `fs/readTextFile`
* returns a decoded string and would corrupt or reject non-UTF-8
* payloads (images, video, archives — anything `ReadMediaFile` may
* payloads (images, video, archives — anything the media read path may
* touch). The ACP bridge only owns the *text* surface; raw bytes
* stay on the local filesystem via `inner`.
*/
Expand All @@ -144,7 +144,7 @@ export class AcpKaos implements Kaos {
/**
* Return a small UTF-8 header derived from the same ACP text source as
* `readText` / `readLines`, used only by text-read callers for sniffing.
* Keep `readBytes` local so binary callers such as ReadMediaFile stay safe.
* Keep `readBytes` local so binary callers such as Read's media path stay safe.
*/
async readTextPreview(path: string, n: number): Promise<Buffer> {
const text = await this.readText(path);
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@ const DEFAULT_APPROVE_TOOLS = new Set([
'Read',
'Grep',
'Glob',
'ReadMediaFile',
'SetTodoList',
'TodoList',
'TaskList',
Expand Down
4 changes: 1 addition & 3 deletions packages/agent-core/src/agent/tool/index.ts
Original file line number Diff line number Diff line change
Expand Up @@ -378,16 +378,14 @@ export class ToolManager {
const goalToolsEnabled = this.agent.type === 'main';
this.builtinTools = new Map(
[
new b.ReadTool(kaos, workspace),
new b.ReadTool(kaos, workspace, modelCapabilities, videoUploader),
new b.WriteTool(kaos, workspace),
new b.EditTool(kaos, workspace),
new b.GrepTool(kaos, workspace),
new b.GlobTool(kaos, workspace),
new b.BashTool(kaos, cwd, background, {
allowBackground,
}),
(modelCapabilities.image_in || modelCapabilities.video_in) &&
new b.ReadMediaFileTool(kaos, workspace, modelCapabilities, videoUploader),
new b.EnterPlanModeTool(this.agent),
new b.ExitPlanModeTool(this.agent),
// Goal tools are main-agent-only.
Expand Down
2 changes: 1 addition & 1 deletion packages/agent-core/src/mcp/output.ts
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
* (dropping unsupported shapes).
* 2. Wrap media-only outputs in `<mcp_tool_result name="…">` tags so the
* model can attribute binary output when several tools return media.
* Mirrors the in-tree `ReadMediaFile` convention.
* Mirrors the in-tree `Read` media convention.
* 3. Apply size limits: text/think share a 100K character budget; binary
* parts (image/audio/video URLs) each carry an independent 10 MB cap and
* collapse to a notice when oversize, so a single screenshot cannot
Expand Down
1 change: 0 additions & 1 deletion packages/agent-core/src/profile/default/agent.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ tools:
- CronCreate
- CronList
- CronDelete
- ReadMediaFile
- TodoList
- Skill
- WebSearch
Expand Down
1 change: 0 additions & 1 deletion packages/agent-core/src/profile/default/coder.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,6 @@ whenToUse: |
tools:
- Bash
- Read
- ReadMediaFile
- Glob
- Grep
- Write
Expand Down
1 change: 0 additions & 1 deletion packages/agent-core/src/profile/default/explore.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,6 @@ whenToUse: |
tools:
- Bash
- Read
- ReadMediaFile
- Glob
- Grep
- WebSearch
Expand Down
1 change: 0 additions & 1 deletion packages/agent-core/src/profile/default/plan.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ whenToUse: |
Use this agent when the parent agent needs a step-by-step implementation plan, key file identification, and architectural trade-off analysis before code changes are made.
tools:
- Read
- ReadMediaFile
- Glob
- Grep
- WebSearch
Expand Down
13 changes: 0 additions & 13 deletions packages/agent-core/src/tools/builtin/file/read-media.md

This file was deleted.

Loading
Loading