Skip to content

ghbaud/WheelHouse

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1,471 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

WheelHouse

License

Control your PC with your voice and mouse.
WheelHouse combines speech recognition, mouse integration, and smart automation to give you a powerful new way to interact with Windows. Issue commands, dictate text with intelligent formatting, and use your mouse to control brightness, volume, and more. It is useful for it accessibility users as well as users who want a 10 foot interface to their Windows computer.

⚠️ Development Status: This project is actively under development. It is a personal project with a single developer/user. The system is functional but continuously evolving. Expect frequent changes, refactoring, and improvements.


Quick Start

git clone <repo-url> WheelHouse
cd WheelHouse
.\bootstrap.ps1

The bootstrap script automatically:

  • Checks for Python 3.12 and Poetry (installs via winget if missing)
  • Runs poetry install for all services (dynamically discovered)
  • Installs Ollama and pulls the embedding model for semantic search
  • Seeds grepai configuration and starts indexing

If PowerShell blocks the script:

powershell -ExecutionPolicy Bypass -File .\bootstrap.ps1

Overview

WheelHouse is a comprehensive voice and automation control system for Windows, designed to streamline workflows through advanced input handling, smart home integrations, and AI-powered command processing.

This repository contains the entire WheelHouse system, organized as a Poetry-managed monorepo with two primary services:

  • google_stt_server: A standalone speech-to-text transcription server (Python with Poetry).
  • wheelhouse: The core desktop application that processes commands and controls the system (Python with Poetry).

Both services use Poetry for dependency management and packaging. Each service has its own pyproject.toml in its respective directory (services/google_stt_server/ and services/wheelhouse/).

TL;DR: WheelHouse provides a 10-foot interface for controlling Windows using both voice commands and mouse input. It also supports intelligent text dictation into any focused text element.


How It Works

When you speak into your computer’s default microphone, WheelHouse:

  1. Captures audio and sends it to the Google Speech-to-Text API.
  2. Processes the transcription against user-defined and built-in regular expressions:
    • Text replacements (e.g., replacing “exclamation point” with !).
    • Command detection (e.g., “browser” to switch to a browser window, “notepad” to open Notepad, “backspace 3” to send three backspaces).
  3. Executes commands immediately if the utterance matches a standalone command phrase.
  4. Handles dictation intelligently if the input is not a command:
    • Applies context-aware spacing and capitalization.
    • Correctly formats after punctuation or sentence boundaries.

Text appears with streaming incremental updates as you speak—the first word typically arrives in 1.5-2 seconds, with subsequent words flowing continuously at 100-200ms intervals. This real-time streaming means you see results while speaking, rather than waiting for silence like traditional dictation systems. You can edit inserted text via typing, speaking again, or issuing commands like "delete."


Intelligent Text Handling

WheelHouse leverages Google’s cloud-based AI for accurate transcription, including:

  • Proper nouns (e.g., John Smith, Washington DC).
  • Contextual capitalization (e.g., $7.47, Boeing 747, 7:47 a.m.).
  • Real-world awareness (e.g., “Carl and I are flying out of SeaTac on United Wednesday and landing in DC around 7:00.”).

A major challenge—resolving differences between Windows text controls—has been solved so that dictation works reliably and with very low latency across a wide range of applications.


Auxiliary Features

  • Audio Awareness

    • Automatically disables speech recognition when the system or specific Sonos speakers are playing audio to avoid misinterpretation.
  • Mouse Wheel Controls (Logitech MX Series)

    • Adjust screen brightness or system audio volume using the thumbwheel.
    • Supports both hardware-level brightness controls (where available) and software-defined brightness.
    • Volume adjustments work directly on the output device or, in advanced setups, on connected audio hardware (e.g., Sonos Beam).

Getting Started

Development Setup

This project uses Poetry for Python dependency management. To work with the codebase:

  1. Install Poetry: Follow instructions at python-poetry.org

  2. Navigate to a service directory:

    cd services/wheelhouse
    # or
    cd services/google_stt_server
  3. Install dependencies:

    poetry install
  4. Run commands within the Poetry environment:

    poetry run python main.py
    poetry run pytest

Each service is independently managed with its own pyproject.toml and virtual environment.

AI-Assisted Development

This project uses GitHub Copilot with Model Context Protocol (MCP) servers for enhanced development workflows. The MCP servers are optional—the project works fine without them.

Using WheelHouse Dev Mode

This project includes a custom VS Code chat mode (.github/chatmodes/wheelhouse-dev.chatmode.md) that provides consistent AI behavior across sessions:

To use the chat mode:

  1. Open GitHub Copilot Chat in VS Code
  2. Select "WheelHouse Dev" from the mode dropdown (top of chat panel)
  3. The mode automatically:
    • Enforces MCP-first tool usage (no unnecessary terminal commands)
    • Routes to appropriate skills based on your request (commit, merge, release, docs, etc.)
    • Requires explicit approval before commits/pushes
    • Maintains token discipline (loads only needed context)

What it does:

  • Skill Routing: Recognizes workflow triggers and loads appropriate skills

    • "commit" → commit-workflow
    • "merge" → merge-workflow
    • "release" → release-workflow
    • "docstring" / ":flow:" → documentation-system
    • "refactor" / "architecture" → architecture-patterns
    • "development" / "guidelines" → development-guidelines
  • MCP-First: Checks for MCP tools before falling back to terminal

  • Approval Gates: Never auto-commits without explicit user request

  • Adaptive: User intent overrides precedence (you can explicitly request any skill)

See .github/copilot-instructions.md for initialization protocol and .github/chatmodes/wheelhouse-dev.chatmode.md for complete mode specification.

MCP Servers Configured:

  1. Memory Server (context persistence)

    • Maintains a knowledge graph of architectural patterns, design decisions, and project context
    • Helps AI assistants provide accurate suggestions based on project-specific conventions
    • Stores memory in your user profile (%USERPROFILE%\.mcp\WheelHouse\memory.db)
  2. WheelHouse Workflows (development automation)

    • Automates pre-commit validation, release preparation, and memory management
    • Provides natural language workflow commands (e.g., "Am I ready to commit?")
    • Reduces token usage by 85-90% compared to memory-based procedures
    • See services/mcp_workflows/TOOLS.md for details
  3. WheelHouse Skills (lazy-loaded development procedures)

    • Progressive disclosure skill system (metadata → SKILL.md → references/)
    • Skills discoverable via mcp_wheelhouse-sk_list_skills
    • Load full skill with mcp_wheelhouse-sk_get_skill(name)
    • See .skills/README.md for skill architecture
  4. Filesystem (file operations without terminal interruptions)

    • Enables frictionless file operations (read, write, edit, move)
    • Configured with workspace and trash folder access
  5. GitKraken (git operations)

    • Provides git operations (status, branch, commit, push, etc.) without terminal context switches

Documentation

For complete technical details, architecture, and development information, please refer to the documentation in docs/developers/.


License

This project is licensed under the Apache License 2.0.
See the LICENSE file for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages