Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 11 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
- 🌐 **Safari** - Control tabs, navigate, execute JavaScript
- 🌍 **Chrome (CDP)** - Open sessions, navigate, click/type, extract data, screenshots
- 📸 **Screen Capture** - Capture the active display and share image output with the model
- 🔎 **Screen Text** - Extract visible text from the active display with Vision OCR, with optional macOS 27 visual summaries
- 🖥️ **System** - Open apps, adjust brightness/volume, visual effects

## Available Skills
Expand All @@ -32,6 +33,7 @@ This repo currently includes one shareable skill:
- Safari: open/close/switch/navigate/reload/history/page-info scripts
- System: `open-application.applescript`, brightness + volume scripts
- Screenshot: `capture-screenshot.applescript`
- Screen text MCP: `extract_screen_text`
- Files/Finder MCP: `find_files`, `list_directory`, `get_file_info`, `copy_file`, `copy_directory`, `move_file`, `rename_file`, `trash_file`, `reveal_in_finder`, `get_finder_selection`
- Clipboard MCP: `get_clipboard_text`, `set_clipboard_text`, `clear_clipboard`, `get_clipboard_files`, `set_clipboard_files`, `save_clipboard_image`, `set_clipboard_image`
- Window/Workspace MCP: `get_frontmost_app`, `list_windows`, `focus_window`, `move_window`, `resize_window`, `center_window`, `tile_windows`, `minimize`, `hide_app`, `quit_app`
Expand Down Expand Up @@ -115,7 +117,8 @@ Replace `/FULL/PATH/TO/altic-mcp` with your actual path (e.g., `/Users/johndoe/D
- ✅ **Automation** - Allow Claude to control apps (Messages, Notes, Safari)
- ✅ **Finder Automation** - For Finder selection, reveal, and Trash file tools
- ✅ **Accessibility** - Required for screen glow, system controls, and window management tools such as focus_window, move_window, resize_window, center_window, tile_windows, minimize, hide_app, and quit_app
- ✅ **Screen Recording** - Required for screenshot capture tools and improves window title/id discovery for list_windows on recent macOS versions
- ✅ **Screen Recording** - Required for screenshot and screen text extraction tools and improves window title/id discovery for list_windows on recent macOS versions
- ✅ **macOS 27 Apple Intelligence / Foundation Models availability** - Required only for `extract_screen_text` visual summary mode; OCR-only mode works without it

Clipboard text operations normally do not require extra permissions. Clipboard
file and image operations use macOS pasteboard APIs and may prompt for security
Expand Down Expand Up @@ -163,6 +166,13 @@ echo "hello" > /tmp/altic-file-smoke/source/example.txt
- Copy an image or screenshot, then call `save_clipboard_image`
- Use `set_clipboard_image` with an existing PNG or JPEG file, then paste into an app that accepts images

## Manual Smoke Tests For Screen Text Tools

- Open a window with visible text, then call `extract_screen_text` with `include_visual_summary=false`.
- Confirm the returned JSON includes visible text, `line_count`, `average_confidence`, and a valid `screenshot_path`.
- On macOS 27 with Foundation Models available, call `extract_screen_text` with `include_visual_summary=true` and confirm `visual_summary` is populated.
- On systems without visual summary support, confirm OCR text is still returned and `visual_error` explains the missing macOS 27/Foundation Models capability.

## Manual Smoke Tests For Window Tools

- Call `get_frontmost_app` while Finder or Safari is active.
Expand Down
31 changes: 31 additions & 0 deletions server.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
notes,
reminders,
safari,
screen_text,
screenshot,
system,
window,
Expand Down Expand Up @@ -1056,6 +1057,36 @@ async def capture_active_screen(
return screenshot.capture_active_screen(output_path)


@mcp.tool()
async def extract_screen_text(
output_path: str = Field(default=""),
max_chars: int = Field(default=20000, ge=1, le=200000),
include_visual_summary: bool = Field(default=False),
visual_prompt: str = Field(
default=screen_text.DEFAULT_VISUAL_PROMPT,
),
) -> str:
"""
Capture the active display and extract visible text with macOS Vision OCR.
Optionally request a macOS 27 Foundation Models visual summary.

Args:
output_path: Optional file path for the captured PNG
max_chars: Maximum OCR text characters to return
include_visual_summary: Ask macOS 27 Foundation Models to summarize the image
visual_prompt: Prompt used when visual summary is enabled

Returns:
JSON string with OCR text, screenshot path, confidence metadata, and optional visual summary.
"""
return screen_text.extract_screen_text(
output_path,
max_chars,
include_visual_summary,
visual_prompt,
)


@mcp.tool()
async def add_screen_glow() -> str:
"""
Expand Down
38 changes: 37 additions & 1 deletion skills/altic-studio/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,9 +13,10 @@ license: Apache-2.0
3. MCP file mode for safe Finder and filesystem operations
4. MCP clipboard mode for text, file, and image pasteboard operations
5. MCP window/workspace mode for arranging macOS apps and windows
6. MCP screen text mode for reading visible text from the active display

It also includes Swift utility scripts for active-display screenshots, clipboard
file/image operations, and window/workspace management on macOS.
file/image operations, screen OCR, and window/workspace management on macOS.

## Mode A: AppleScript (macOS apps)

Expand Down Expand Up @@ -61,6 +62,7 @@ The full Altic automation surface is exposed as scripts under `skills/altic-stud
- `turn-down-volume.applescript` - args: `[amount_0_to_100]`
- `capture-screenshot.applescript` - args: `[output_path] [full|interactive|window]`
- `capture-active-screen.swift` - args: `<output_path>` (captures full display containing frontmost app)
- `extract-screen-text.swift` - args: `<output_path> [include_visual_summary] [visual_prompt]` (captures active display and extracts OCR text)
- `clipboard.swift` - subcommands: `get-files`, `set-files <paths...>`, `save-image <output_path>`, `set-image <image_path>`
- `window-manager.swift` - subcommands: `get_frontmost_app`, `list_windows`, `focus_window`, `move_window`, `resize_window`, `center_window`, `tile_windows`, `minimize`, `hide_app`, `quit_app`

Expand All @@ -70,6 +72,12 @@ Swift command template (for active-display screenshots):
swift "skills/altic-studio/scripts/capture-active-screen.swift" "/tmp/active-screen.png"
```

Swift command template (for screen OCR):

```bash
swift "skills/altic-studio/scripts/extract-screen-text.swift" "/tmp/screen-text.png" false
```

Swift command template (for window management):

```bash
Expand All @@ -90,6 +98,7 @@ Use MCP tools for deterministic Chrome automation:
- `chrome_close_session`
- `chrome_list_sessions`
- `capture_active_screen`
- `extract_screen_text`

Execution pattern:

Expand All @@ -100,6 +109,29 @@ Execution pattern:
5. Capture screenshots on checkpoints or failures.
6. Close session.

## Mode B2: Screen Text and Visual Understanding (MCP)

Use `extract_screen_text` when the user asks to read, transcribe, copy, or inspect
visible text on the active screen. Default to OCR-only mode because it is faster,
deterministic, and does not require Foundation Models availability.

Available tool:

- `extract_screen_text` - args: `[output_path] [max_chars] [include_visual_summary] [visual_prompt]`

Screen text workflow rules:

- Use `extract_screen_text` with `include_visual_summary=false` for requests like
"read the screen", "what text is visible", or "extract the error message".
- Use `include_visual_summary=true` only when the user asks what is shown, what
to click, how to interpret the visible UI, or asks for visual understanding
beyond raw text.
- Visual summary mode requires macOS 27 plus Apple Foundation Models
availability. If unavailable, use the OCR result and report the returned
`visual_error`.
- Use `capture_active_screen` instead when the user needs image inspection or a
screenshot artifact rather than extracted text.

## Mode C: File Finder and File Operations (MCP)

Use MCP file tools instead of shell commands when the user asks to find, inspect,
Expand Down Expand Up @@ -211,6 +243,8 @@ Window workflow rules:
confirmation.
- For window mutations, verify with `list_windows` when the user needs
confirmation.
- For screen text extraction, use the returned OCR JSON as the source of truth.
If visual summary mode fails, continue with OCR text when it is sufficient.

## Permissions Checklist

Expand All @@ -220,6 +254,8 @@ Window workflow rules:
- Automation permission for app control
- Accessibility permission for system controls and window management
- Screen Recording permission for screenshots and improved window discovery
- Screen Recording permission for screen text extraction
- macOS 27 and Apple Foundation Models availability for screen visual summary mode
- Safari setting: Allow JavaScript from Apple Events
- Google Chrome installed for CDP tools
- Full Disk Access for reading Messages database
Expand Down
1 change: 1 addition & 0 deletions skills/altic-studio/scripts/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ osascript "skills/altic-studio/scripts/create-calendar-event.applescript" "Team
osascript "skills/altic-studio/scripts/navigate-safari.applescript" "https://example.com"
osascript "skills/altic-studio/scripts/capture-screenshot.applescript" "/tmp/screen.png" "full"
swift "skills/altic-studio/scripts/capture-active-screen.swift" "/tmp/active-screen.png"
swift "skills/altic-studio/scripts/extract-screen-text.swift" "/tmp/screen-text.png" false
swift "skills/altic-studio/scripts/clipboard.swift" get-files
swift "skills/altic-studio/scripts/clipboard.swift" set-files "/Users/example/Desktop/report.pdf"
swift "skills/altic-studio/scripts/clipboard.swift" save-image "/tmp/clipboard.png"
Expand Down
Loading