Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b0e5686
feat: refactor MCP, fix Codex errors, reorganize AI agents documentat…
andrii-harbour Apr 2, 2026
bc66b32
feat(evals): Level 3 DOCX agent benchmark suite (#2664)
tupizz Apr 3, 2026
ca7b783
feat(evals): update agent conditions and documentation for SuperDoc M…
andrii-harbour Apr 6, 2026
92c0299
feat(sdk): optimize tool definitions and prompts for efficient MCP wo…
tupizz Apr 6, 2026
6ae414c
feat(document-api): update JSON schema and documentation for mutation…
andrii-harbour Apr 7, 2026
9ea2a77
refactor: enhance evaluation suite with new configurations and documents
andrii-harbour Apr 8, 2026
da29e95
refactor(evals): update entity names in documentation and tasks
andrii-harbour Apr 8, 2026
dc61f7a
feat(docs): add AI documentation and enhance getting started guide
andrii-harbour Apr 8, 2026
d450ee9
fix(evals): update execution promptfoo configuration and remove obsol…
andrii-harbour Apr 8, 2026
d09ce20
chore(evals): update README and remove obsolete DOCX files
andrii-harbour Apr 8, 2026
a2f7e31
Merge remote-tracking branch 'origin/main' into andrii/sd-2451-refact…
tupizz Apr 9, 2026
003f808
feat: improve agent redline targeting and validation (#2764)
tupizz Apr 9, 2026
85e4ed5
fix: refresh lockfile for evals deps
tupizz Apr 9, 2026
cd17f4f
Merge remote-tracking branch 'origin/main' into andrii/sd-2451-refact…
tupizz Apr 9, 2026
e3fb8cc
fix: update documentation and address comments
andrii-harbour Apr 13, 2026
d346ad5
refactor: consolidate shared logic for benchmark providers
andrii-harbour Apr 13, 2026
3a78044
docs: update LLM tools documentation with action details
andrii-harbour Apr 13, 2026
a3b6227
feat(session-manager): add telemetry metadata for document editing so…
andrii-harbour Apr 13, 2026
60733d2
docs: add new AI getting started page with redirect to overview
andrii-harbour Apr 13, 2026
3f86e32
Merge branch 'main' into andrii/sd-2451-refactor-mcp-set-up
andrii-harbour Apr 13, 2026
99e518d
chore: update pnpm-lock.yaml with new dependencies and version updates
andrii-harbour Apr 13, 2026
a9aab66
docs(sd-2451): align AI doc voice with brand guidelines (#2802)
caio-pizzol Apr 14, 2026
252dfbc
Merge branch 'main' into andrii/sd-2451-refactor-mcp-set-up
caio-pizzol Apr 14, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,7 +124,6 @@ The `evals/` directory contains a Promptfoo-based evaluation suite for validatin
|---------|-------------|------|
| `pnpm --filter @superdoc-testing/evals run eval` | Run deterministic evals (reading + argument tests) | ~$0.30 |
| `pnpm --filter @superdoc-testing/evals run eval:reading` | Run reading tool tests only | ~$0.15 |
| `pnpm --filter @superdoc-testing/evals run eval:gdpval` | Run GDPval benchmark (Model+SuperDoc vs Model-Only) | ~$1-2 |
| `pnpm --filter @superdoc-testing/evals run eval:view` | Open Promptfoo web UI with results | Free |
| `pnpm --filter @superdoc-testing/evals run baseline:save <label>` | Save versioned results snapshot | Free |

Expand Down
171 changes: 171 additions & 0 deletions apps/docs/ai/agents/best-practices.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,171 @@
---
title: Best practices
sidebarTitle: Best practices
description: Get better results from LLM document editing — prompting, tool call patterns, and workflow tips
keywords: "llm best practices, ai document editing, prompt engineering, superdoc tools, tool calling, document automation"
---

These patterns help your LLM agent produce reliable, efficient document edits.

## Use the bundled system prompt

`getSystemPrompt()` returns a tested prompt that teaches the model how to use SuperDoc tools — targeting, workflow order, and multi-action tools. Load it once and pass it as the system message.

```typescript
import { getSystemPrompt } from '@superdoc-dev/sdk';

const systemPrompt = await getSystemPrompt();
// Pass as the system message in your LLM call
```

You can extend it with task-specific instructions. Append your own rules after the bundled prompt:

```typescript
const systemPrompt = await getSystemPrompt();
const fullPrompt = `${systemPrompt}\n\n## Additional rules\n- Use tracked changes for all edits.\n- Always search before editing.`;
```

Or start from scratch with something like this:

````markdown
You edit `.docx` files using SuperDoc intent tools. Be efficient and minimize tool calls.

## Workflow

1. **Read** — Use `superdoc_get_content` to understand the document.
2. **Search** — Use `superdoc_search` to find stable handles or block addresses.
3. **Edit** — Use the focused tool that matches the job:
- `superdoc_edit` for insert, replace, delete, undo, redo
- `superdoc_format` for inline or paragraph formatting
- `superdoc_create` for paragraphs and headings
- `superdoc_comment` for comment threads
- `superdoc_track_changes` for review decisions
4. **Batch only when useful** — Use `superdoc_mutations` for preview/apply or atomic multi-step edits.

## Rules

- Search before mutating so targets come from fresh results.
- Use focused intent tools for normal edits.
- Use `superdoc_mutations` when you need an atomic batch or preview/apply flow.
- Set `changeMode: "tracked"` when edits need human review.
- Feed tool errors back so you can recover.
````

## Read first, search, then edit

A typical edit takes 3-5 tool calls:

1. `superdoc_get_content` — understand what's in the document
2. `superdoc_search` — find the exact location (returns stable handles/addresses)
3. Edit tool (`superdoc_edit`, `superdoc_format`, etc.) — apply the change using targets from search

This matters because handles from search results point to the exact right location. If the model guesses a block address instead of searching for it, edits land in the wrong place.

## Minimize tool calls

Instruct the LLM to plan all edits before calling tools. A well-structured prompt like "Find the termination clause and rewrite it to allow 30-day notice" should take 3-5 calls, not 15.

Batch multiple changes only when atomic execution is genuinely helpful — use `superdoc_mutations` for that.

## Use focused tools; `superdoc_mutations` is an escape hatch

For straightforward edits, use the focused intent tools (`superdoc_edit`, `superdoc_format`, `superdoc_create`, `superdoc_list`, `superdoc_comment`). They validate arguments, give clear errors, and are easier for models to call correctly.

Reach for `superdoc_mutations` only when you need:
- Preview/apply semantics (show what will change before committing)
- Atomic multi-step edits (all-or-nothing batch)
- A workflow that would otherwise require refreshing targets between steps

## Feed errors back

`dispatchSuperDocTool` returns structured errors. Pass them back as tool results — most models self-correct on the next turn.

```typescript
try {
const result = await dispatchSuperDocTool(doc, toolCall.function.name, JSON.parse(toolCall.function.arguments));
messages.push({ role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify(result) });
} catch (err: any) {
// Return the error as a tool result — the model will see it and adjust
messages.push({ role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify({ error: err.message }) });
}
```

## Add examples for repeatable workflows

If the same kind of edit runs across many documents (e.g., always rewriting a specific clause, always adding a comment to a section), include a concrete tool call example in your system prompt. Models that see a working example of the exact tool invocation produce correct calls more reliably than models that only see the schema.

## Use tracked changes for review workflows

Add `changeMode: "tracked"` to edit tool calls, or instruct the model via the system prompt:

```
Use tracked changes for all edits so a human can review them.
```

This way every AI edit appears as a tracked change that users can accept or reject in SuperDoc or Microsoft Word.

## Pin your model version

Use a specific model ID (e.g., `gpt-4.1` or `claude-sonnet-4-6`) rather than an alias like `gpt-4o`. Aliases can change behavior between releases and break working tool call patterns.

## Cache tools and prompts

Tools and the system prompt don't change between requests. Load them once at startup and reuse across all conversations.

```typescript
let cachedTools: any[] | null = null;
let cachedSystemPrompt: string | null = null;

async function ensureToolsLoaded() {
if (!cachedTools) {
const result = await chooseTools({ provider: 'openai' });
cachedTools = result.tools;
}
if (!cachedSystemPrompt) {
cachedSystemPrompt = await getSystemPrompt();
}
return { tools: cachedTools, systemPrompt: cachedSystemPrompt };
}
```

## Prompt examples

These prompts have been tested against the SuperDoc tool set. Use them as inspiration for your own workflows, or include them as few-shot examples in your system prompt.

### Document review

- "Find the termination clause and rewrite it to require 30-day written notice. Use tracked changes."
- "Apply yellow highlight to every sentence that contains an indemnification obligation."
- "Replace all references to 'Contractor' with 'Service Provider' and make each replacement italic with tracked changes enabled."
- "Underline every sentence that references payment terms or late fees."
- "Insert CONFIDENTIAL — DO NOT DISTRIBUTE at the very top of the document and make it bold, red, 14pt."
- "Scan the document for inconsistent capitalization of defined terms and fix them with tracked changes enabled."

### Formatting and structure

- "Format the entire document in Times New Roman, 12-point."
- "Make all Heading 2 paragraphs bold and set them to 14-point font."
- "Keep each section heading with the paragraph that follows it so they don't split across pages."
- "Remove all extra blank paragraphs and convert all double spaces after periods to single spaces."
- "Right-align all section headings."

### Content generation and editing

- "Add a new heading 'Learning Objectives' at the top, followed by a bullet list with 3 key takeaways from the document content."
- "Read the document and add a heading 'Executive Summary' at the end, followed by a one-paragraph summary and a bullet list of the 5 key provisions."
- "Find the governing law section and insert a new paragraph after it: 'Any disputes arising under this Agreement shall be resolved through binding arbitration.'"
- "Find all paragraphs that mention 'personally identifiable information' and add a comment: 'Verify PII handling complies with current data retention policy.'"
- "Convert the list of references at the end into a numbered list and restart numbering at 1."

### Search and replace

- "Rewrite all dates in this document in the format January 1, 2026."
- "Replace every occurrence of 'FY2024' with 'FY2025' throughout the document."
- "Add the § symbol before every section number reference."

## Related

- [LLM tools](/document-engine/ai-agents/llm-tools) — tool catalog and SDK functions
- [How to use](/document-engine/ai-agents/integrations) — step-by-step integration guide
- [Debugging](/document-engine/ai-agents/debugging) — troubleshoot tool call failures
- [Document API](/document-api/overview) — the operation set behind the tools
142 changes: 142 additions & 0 deletions apps/docs/ai/agents/debugging.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,142 @@
---
title: Debugging
sidebarTitle: Debugging
description: Troubleshoot LLM tool calls — logging, error shapes, and common failure modes
keywords: "llm debugging, tool call errors, superdoc tools, ai document editing, troubleshooting, document api"
---

When tool calls fail or produce unexpected results, use these patterns to diagnose the issue.

## LLM tools wrap the Document API

Every LLM tool call maps to a [Document API](/document-api/overview) operation under the hood. `superdoc_edit` with `action: "replace"` calls the same function as `doc.replace()`.

This gives you a clear debugging strategy:

1. **Test the Document API directly.** Call the underlying SDK method with the same arguments. If it works, the operation is fine — the problem is in the prompt or the tool schema.
2. **If the API call fails,** the issue is in the operation itself — check arguments, targets, and document state.
3. **If the API call succeeds but the LLM tool call fails,** the model is calling the tool incorrectly. Fix the prompt, add examples, or check the tool schema.

```typescript
// Instead of going through the LLM, test the operation directly:
const result = await doc.replace({
target: { handle: 'some-handle' },
content: 'New text',
});
console.log(result); // Does this work?
```

This narrows every issue to one of two layers: the operation or the prompt.

## Log tool calls and results

Add logging around `dispatchSuperDocTool` to see exactly what the model is requesting and what comes back.

```typescript
for (const toolCall of choice.message.tool_calls) {
const args = JSON.parse(toolCall.function.arguments);

// Log what the model wants to do
console.log(`[agent] tool: ${toolCall.function.name}`, JSON.stringify(args, null, 2));

try {
const result = await dispatchSuperDocTool(doc, toolCall.function.name, args);

// Log the result (truncate large responses)
const resultStr = JSON.stringify(result);
console.log(`[agent] result: ${resultStr.substring(0, 500)}`);

messages.push({ role: 'tool', tool_call_id: toolCall.id, content: resultStr });
} catch (err: any) {
console.error(`[agent] error: ${err.message}`);
messages.push({ role: 'tool', tool_call_id: toolCall.id, content: JSON.stringify({ error: err.message }) });
}
}
```

What to look for in logs:
- **Tool name** — is the model calling the right tool?
- **Arguments** — are required fields present? Is the `action` correct?
- **Targets** — are handles/addresses from a recent search, or did the model guess?
- **Result** — did the operation return data or an error?

## Error shapes

`dispatchSuperDocTool` throws errors in two categories:

**Validation errors** — bad arguments before the operation runs:
```json
{ "error": "Missing required parameter: action" }
{ "error": "Unknown action 'bold' for tool superdoc_format. Valid actions: inline, set_style, set_alignment, set_indentation, set_spacing" }
{ "error": "Parameter 'target' is required for action 'replace'" }
```

**Execution errors** — the operation ran but failed:
```json
{ "error": "Target not found: no node matches the given handle" }
{ "error": "Invalid address: block at index 42 does not exist" }
```

Both types are returned as strings in `err.message`. Pass them back as tool results — the model usually self-corrects.

## Common failure modes

| Symptom | Cause | Fix |
| --- | --- | --- |
| Model calls the wrong tool | System prompt missing or too vague | Use `getSystemPrompt()` or add workflow instructions |
| "Target not found" errors | Model uses stale or guessed handles | Instruct model to always search before editing |
| Edits land in the wrong place | Model invented a block address | Use `superdoc_search` to get fresh handles |
| Infinite tool call loop | Model never reaches a stopping point | Add a max iterations guard (see below) |
| Model doesn't use tools at all | Tools not passed to the API call | Verify `chooseTools()` result is in the `tools` param |
| "Missing required parameter" | Model forgot `action` or another field | Check the tool schema — add examples to the prompt |
| Collaboration edits not appearing | SDK not in the same collab room | Verify the collaboration URL and documentId match |
| Operation works via API but fails via tool | Model passes wrong argument types/names | Log the parsed arguments and compare to the API signature |

## Inspect tools directly

Dump the tool schemas to verify the SDK loaded correctly:

```typescript
import { listTools, getToolCatalog } from '@superdoc-dev/sdk';

// See all tools for a provider
const tools = await listTools('openai');
console.log(JSON.stringify(tools, null, 2));

// Get the full catalog with metadata
const catalog = await getToolCatalog();
console.log(`Loaded ${catalog.tools.length} tools`);
```

## Max iterations guard

Prevent runaway loops by capping the number of iterations:

```typescript
const MAX_ITERATIONS = 20;
let iterations = 0;

while (iterations++ < MAX_ITERATIONS) {
const response = await openai.chat.completions.create({ model, messages, tools });
const message = response.choices[0].message;
messages.push(message);

if (!message.tool_calls?.length) break;

for (const call of message.tool_calls) {
const result = await dispatchSuperDocTool(doc, call.function.name, JSON.parse(call.function.arguments));
messages.push({ role: 'tool', tool_call_id: call.id, content: JSON.stringify(result) });
}
}

if (iterations >= MAX_ITERATIONS) {
console.warn('[agent] Hit max iterations — stopping');
}
```

## Related

- [LLM tools](/document-engine/ai-agents/llm-tools) — tool catalog and SDK functions
- [How to use](/document-engine/ai-agents/integrations) — step-by-step integration guide
- [Best practices](/document-engine/ai-agents/best-practices) — prompting and workflow tips
- [Document API](/document-api/overview) — the underlying operations that tools call
Loading
Loading