Skip to content

rwforest/databricks-reviewer

Repository files navigation

DOCX Tech Review Tool

Automated technical review of DOCX book chapters using Databricks Foundation Model API. Generates review comments and adds them as proper MS Word comments.

Features

  • 🔍 Auto-discovers skills from .agent/skills/ directory
  • 💬 Real Word comments (not inline text) - visible in Review pane
  • 📝 Editable CSV output - review and filter before applying
  • 📊 Batch processing with progress logging

Prerequisites

  • Python 3.10+
  • Databricks workspace with Foundation Model API access
  • API token with serving endpoint permissions

Installation

1. Initialize Skills (git submodules)

Skills are managed as git submodules and auto-update on every pull:

# First-time setup: initialize and fetch all skills
git submodule update --init --recursive

To refresh skills to the latest version at any time:

git submodule update --remote --recursive

To auto-update submodules on every git pull:

git config submodule.recurse true

Skills are sourced from two repositories:

  • Databricks skills: databricks-solutions/ai-dev-kit.agent/skills/ai-dev-kit/databricks-skills/
  • MLflow skills: mlflow/skills.agent/skills/mlflow-skills/

2. Set up Python environment

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

3. Configure environment

cp .env.example .env

Edit .env with your Databricks credentials:

DATABRICKS_HOST=https://your-workspace.cloud.databricks.com
DATABRICKS_TOKEN=your-api-token
LLM_MODEL=databricks-claude-sonnet-4
AUTHOR_NAME=Your Name  # Optional - used for review comments
EDITOR_NAME=Editor Name  # Optional - used for Track Changes edits (falls back to AUTHOR_NAME if not set)

Usage

Batch Processing (All DOCX files in folder)

Process all DOCX files in a folder with three workflow modes:

# Mode 3 (default): process current directory
python3 batch_review.py

# Process a specific folder
python3 batch_review.py --folder "/path/to/chapters"
python3 batch_review.py --folder "../Ready for Review" --mode 1

# Mode 1: Generate CSV + apply comments only → _reviewed.docx
python3 batch_review.py --mode 1

# Mode 2: Use existing CSV + apply track changes only → _edited.docx
python3 batch_review.py --mode 2

# Process specific file in a folder
python3 batch_review.py --folder "/path/to/chapters" --file "Chapter 11.docx" --mode 3

# Continue processing if a file fails
python3 batch_review.py --mode 3 --continue-on-error

Workflow Modes:

  • Mode 1 - Generate CSV + apply comments only → _reviewed.docx

    • Generates unified review CSV with both comments and suggested edits
    • Applies only Word comments (no track changes)
    • Use this to review comments first before applying edits
  • Mode 2 - Use existing CSV + apply track changes only → _edited.docx

    • Loads review items from existing CSV (from mode 1)
    • Applies only track changes (no comments, no CSV regeneration)
    • Use this after reviewing/editing the CSV from mode 1
  • Mode 3 - Generate CSV + apply both (default) → _reviewed_edited.docx

    • Generates unified review CSV
    • Applies both Word comments AND track changes
    • Use this for complete review in one step

Single File Processing

Full pipeline (comments + suggested edits)

python3 docx_tech_review.py "Chapter 11 - MLflow 3.docx"

Generates both:

  • Review comments (Word sidebar comments) → _reviewed.docx
  • Suggested edits (Track Changes) → applied to _reviewed.docx

Generate CSV only (for review before applying)

python3 docx_tech_review.py "Chapter 11.docx" --generate

Edit CSV then apply

# Edit the CSV in Excel/Numbers - remove unwanted comments
python3 docx_tech_review.py "Chapter 11.docx" --apply

Specify author name

python3 docx_tech_review.py "Chapter 11.docx" --author "Tech Reviewer"

Track Changes Mode (Edit Suggestions)

Generate text edits with Track Changes (strikethrough + insertions):

Generate edit suggestions

python3 docx_tech_review.py "Chapter 11.docx" --generate-edits

Creates Chapter 11_edits.csv with suggested changes.

Review and apply edits

# Edit the CSV - remove/modify unwanted changes
python3 docx_tech_review.py "Chapter 11.docx" --apply-edits

Creates Chapter 11_edited.docx with Track Changes enabled.

Output

File Description
<name>_review.csv Review comments (paragraph_index, issue_type, comment, severity)
<name>_reviewed.docx Document with Word comments added
review_*.log Detailed processing log

Files

File Purpose
docx_tech_review.py Main review script (single file)
batch_review.py Batch processing script (all DOCX files)
.env.example Configuration template
requirements.txt Python dependencies

How It Works

  1. Extract paragraphs from DOCX file
  2. Load skill context from .agent/skills/ (auto-discovered)
  3. Send to LLM in batches for technical review
  4. Parse comments from LLM response (JSON format)
  5. Save to CSV for optional editing
  6. Apply as Word comments using python-docx 1.2.0+

License

MIT

About

Review Databricks write up with Databricks Skills and FMAPI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors