Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 49 additions & 7 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,57 @@ on:
branches: [main, develop]
pull_request:

env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

jobs:
lint:
name: Lint & Format
runs-on: ubuntu-latest
permissions:
contents: write # Needed for auto-commit

steps:
- uses: actions/checkout@v4
with:
ref: ${{ github.head_ref }}

- name: Set up Rust
uses: dtolnay/rust-toolchain@stable

- name: Install uv
uses: astral-sh/setup-uv@v5
with:
cache-dependency-glob: "pyproject.toml"

- name: Set up Python
run: uv python install 3.12

- name: Sync dependencies
run: uv sync --all-extras

- name: Run pre-commit hooks
id: precommit
run: uv run pre-commit run --all-files

- name: Check README sync
run: uv run python scripts/sync_readme.py --check

- name: Auto-commit formatting fixes
if: failure()
uses: stefanzweifel/git-auto-commit-action@v5
with:
commit_message: "style: auto-fix formatting and linting issues"
commit_options: "--no-verify"
# Fail the job even if we commit so the user knows they need to pull

- name: Fail if pre-commit failed
if: steps.precommit.outcome == 'failure'
run: exit 1

test:
name: ${{ matrix.os }} / py${{ matrix.python-version }}
needs: lint
runs-on: ${{ matrix.os }}

strategy:
Expand Down Expand Up @@ -41,17 +89,11 @@ jobs:
- name: Check native import
run: uv run python -c "import scriber._native; print('native ok')"

- name: Rust format check
run: cargo fmt --check

- name: Rust clippy
run: cargo clippy --all-targets -- -D warnings

- name: Rust tests
run: cargo test

- name: Run tests
run: uv run pytest

- name: CLI smoke
run: uv run scriber . --only-tree --output -
run: uv run scriber . --only-tree --output -
5 changes: 4 additions & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ on:
tags:
- "v*"

env:
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: true

jobs:
build:
name: Build ${{ matrix.os }}
Expand Down Expand Up @@ -82,4 +85,4 @@ jobs:
- name: Publish
uses: pypa/gh-action-pypi-publish@release/v1
with:
packages-dir: dist-artifacts
packages-dir: dist-artifacts
57 changes: 57 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
# Environments
.env
.venv
env/
venv/
ENV/

# Python
__pycache__/
*.py[cod]
*$py.class
*.so
.Python
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
wheels/
share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# Rust
target/
**/*.rs.bk

# Caches and tooling
.pytest_cache/
.ruff_cache/
.mypy_cache/
.coverage
htmlcov/
.tox/
.nox/

# Scriber specific
.scriber/
scriber_pack.md
*.scriber_pack.md
context.md

# IDEs and Editors
.idea/
.vscode/
*.swp
*.swo
*~
.DS_Store
30 changes: 30 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.5.0
hooks:
- id: trailing-whitespace
- id: end-of-file-fixer
- id: check-yaml
- id: check-toml

- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.3.5
hooks:
- id: ruff
args: [ --fix ]
- id: ruff-format

- repo: local
hooks:
- id: cargo-fmt
name: cargo fmt
entry: cargo fmt --manifest-path Cargo.toml --
language: system
types: [rust]
pass_filenames: true
- id: cargo-clippy
name: cargo clippy
entry: cargo clippy --manifest-path Cargo.toml -- -D warnings
language: system
types: [rust]
pass_filenames: false
28 changes: 22 additions & 6 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,29 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).


## [2.1.0] - 2026-05-31

### Added
- **Frontend Graph Tracking**: Added dependency parsing support for modern frontend frameworks (`.vue`, `.svelte`, `.astro`), HTML templates, and CSS stylesheets within JS/TS graph construction.
- **Packaging Profiles (`--profile`)**: Added `default`, `audit`, `debug`, `refactor`, and `docs` profiles to quickly bias the file scoring and inclusion criteria without manually tweaking config options.
- **CLI Introspection**: Added `--explain` flag as an alias. Enhanced `--why` output to show estimated token cost, content mode, and omission reasons for any target file.
- **Automated README Sync**: Added `scripts/sync_readme.py` tool to automatically sync CLI arguments, profiles documentation, and version tags across the `README.md`.
- **AI-Native Navigation & Optimization**: Implemented XML anchors for symbols, aggressive test file quarantine, and support file pruning to keep focused mode clean and strictly token-capped.
- **Dependency Limiting**: Introduced `top_dependencies` (defaulting to 10) in the configuration to limit the width of the graph traversal and pull in only the highest-confidence dependencies per file.
- **Version Alignment**: Synchronized Python and Rust crate versions. `scriber --version` now reports both Python and native API versions.

### Fixed
- **Cache Stability**: Fixed graph warm-cache edge generation and stale import cache validation (now strictly validating `mtime` and `size`).
- **Resilience & Scanners**: Added pure-Python fallback for `read_text_lossy`, optimized scanner ordering (whitelist before binary check), and corrected the test role classifier to prevent false positives on files naturally named `tests.py`.
- **Excerpt Fallback Bug**: Fixed rendering and token estimations for empty excerpt files; they now correctly fall back to outline AST structures or full content if budget allows.

## [2.0.0] - 2026-05-30

### Added
- **Native Rust Acceleration (`scriber._native`)**: Full transition of filesystem scanning, high-performance file reading/writing, and binary classification to a compiled Rust extension built using Maturin and PyO3.
- **🌳 Fast Parallel Scanner**: Re-engineered directory scanning utilizing the `WalkBuilder` from the `ignore` crate, fully respecting `.gitignore` rules with blazing fast native execution.
- **🧪 Rigorous Verification & Equivalence Testing**: Comprehensive suite of regression and equivalence tests validating 100% exact matching behavior between Rust and Python scanner modules.
- **📦 Multi-Platform Binary Wheels**: CI/CD integration using `PyO3/maturin-action` to compile and distribute native wheels across Linux, macOS, and Windows.
- **Native Rust Acceleration (`scriber._native`)**: Full transition of filesystem scanning, high-performance file reading/writing, and binary classification to a compiled Rust extension built using Maturin and PyO3.
- **Fast Parallel Scanner**: Re-engineered directory scanning utilizing the `WalkBuilder` from the `ignore` crate, fully respecting `.gitignore` rules with blazing fast native execution.
- **Rigorous Verification & Equivalence Testing**: Comprehensive suite of regression and equivalence tests validating 100% exact matching behavior between Rust and Python scanner modules.
- **Multi-Platform Binary Wheels**: CI/CD integration using `PyO3/maturin-action` to compile and distribute native wheels across Linux, macOS, and Windows.


## [1.1.2] - 2025-09-30
Expand Down Expand Up @@ -53,7 +69,7 @@ The CLI now falls back to simple text-based output if `rich` is not installed.

### Added
- Configured a GitHub Actions pipeline for automated testing and releases.
- `-v` and `--version` to scriber app
- `-v` and `--version` to scriber app
- The `--config` flag now accepts a path to a `pyproject.toml` file, providing more flexibility for monorepo configurations.

### Fixed
Expand All @@ -68,4 +84,4 @@ The CLI now falls back to simple text-based output if `rich` is not installed.
- **Clipboard Integration**: Enabled copying the generated project structure to the clipboard.
- **Command-Line Interface**: Created a command-line tool with a configurable `init` command for saving settings to `pyproject.toml`.
- **Configuration**: Introduced `pyproject.toml` as the single source of truth for project metadata and configuration.
- **Testing**: Added a test suite using `pytest` to ensure core functionality and CLI commands work as expected.
- **Testing**: Added a test suite using `pytest` to ensure core functionality and CLI commands work as expected.
3 changes: 1 addition & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[package]
name = "project-scriber-native"
version = "2.0.0"
version = "2.1.0"
edition = "2021"

[lib]
Expand All @@ -16,4 +16,3 @@ serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
memchr = "2.7"
regex = "1.10"

96 changes: 71 additions & 25 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,13 +9,13 @@
<a href="https://pypi.org/project/project-scriber/"><img src="https://img.shields.io/pypi/v/project-scriber?style=flat" alt="PyPI Version"></a>
</p>

An intelligent tool to map, analyze, and compile project source code into a single, context-optimized text file for Large Language Models (LLMs). **Version 2.0** brings advanced dependency graph analysis, strict whitelist-based file inclusion, zero-dependency lightweight execution, and progress tracking!
An intelligent tool to map, analyze, and compile project source code into a single, context-optimized text file for Large Language Models (LLMs). **Version 2** brings advanced dependency graph analysis, strict whitelist-based file inclusion, zero-dependency lightweight execution, and progress tracking!

-----

## 📖 Table of Contents

- [🤔 Why ProjectScriber 2.0?](#-why-projectscriber-20)
- [🤔 Why ProjectScriber?](#-why-projectscriber)
- [✨ Key Features](#-key-features)
- [🚀 Quick Start](#-quick-start)
- [💾 Installation](#-installation)
Expand All @@ -25,14 +25,14 @@ An intelligent tool to map, analyze, and compile project source code into a sing

-----

## 🤔 Why ProjectScriber 2.0?
## 🤔 Why ProjectScriber?

When working with Large Language Models, providing the full context of a codebase is crucial for getting accurate analysis, documentation, or refactoring suggestions. However, blindly pasting an entire project wastes tokens and introduces noise.

**ProjectScriber 2.0** automates context building using a **Whitelist-First** philosophy and an **Intelligent Scoring Engine**. It analyzes your codebase's dependency graph (e.g., Python imports), determines which files are most relevant to the code you're working on, and bundles them into a single, clean markdown file, strictly respecting your token budgets and file-type configurations.
**ProjectScriber** automates context building using a **Whitelist-First** philosophy and an **Intelligent Scoring Engine**. It analyzes your codebase's dependency graph (e.g., Python imports), determines which files are most relevant to the code you're working on, and bundles them into a single, clean markdown file, strictly respecting your token budgets and file-type configurations.

<p align="center">
📁 <b>Your Codebase</b> → 📦 <b>ProjectScriber 2.0</b> → 📋 <b>LLM-Ready Context</b>
📁 <b>Your Codebase</b> → 📦 <b>ProjectScriber</b> → 📋 <b>LLM-Ready Context</b>
</p>

-----
Expand Down Expand Up @@ -123,32 +123,78 @@ uv pip install project-scriber

### CLI Options

<!-- BEGIN SCRIBER:CLI_OPTIONS -->
| Option | Description |
|:---|:---|
| `paths` | Project file/folder paths used as seeds. Defaults to current directory `.`. |
| `--config [path]` | Path to `pyproject.toml`. Its parent directory becomes the project root. |
| `--path-base [base]`| Base for relative paths: `project` (default) or `cwd`. |
| `--format [md, txt]` | Output format. Defaults to `md` (Markdown). |
| `--output [file]` | Output file path. Use `-` for stdout. |
| `--dry-run` | Show pack summary without writing the output file. |
| `--open` | Open the generated file in the default editor. |
| `--validate-config`| Validate the `[tool.scriber]` configuration and exit. |
| `--only-tree` | Render only the scored tree/map, without any file contents. |
| `--[no-]modules` | Enable/Disable automatic related module selection (dependency graph scanning). |
| `--[no-]support` | Enable/Disable support files (like `.env.example`, `.github/workflows`). |
| `--support-content` | Override support file content policy (`full`, `auto`, `tree_only`). |
| `paths` | Project file/folder paths used as seeds. Defaults to current directory. |
| `--profile` | Preset configuration profile. |
| `--config` | Path to pyproject.toml. Its parent directory becomes the project root. |
| `--path-base` | Base directory for relative paths when --config is used. |
| `--format` | Output format. |
| `--output` | Output file path, relative to project root unless absolute. Use '-' for stdout. |
| `--only-tree` | Render only scored tree/map, without file contents. |
| `--modules` | Enable automatic related module selection. |
| `--no-modules` | Disable automatic related module selection. |
| `--support` | Enable support files. |
| `--no-support` | Disable support files. |
| `--support-content` | Override default support file content policy. |
| `--max-files` | Maximum number of files in the pack. |
| `--max-tokens` | Approximate token budget using char-based estimation. `0` disables budget. |
| `--min-score` | Minimum relevance score (0-100) for non-seed files to be included. |
| `--init` | Append a default `[tool.scriber]` config to `pyproject.toml` and exit. |
| `--force` | Force overwrite of the config block when used with `--init`. |
| `--version` | Show program's version number and exit. |
| `--max-tokens` | Approximate token budget for included file contents. 0 disables budget. |
| `--min-score` | Minimum score for non-seed files. |
| `--init` | Append a default [tool.scriber] config to pyproject.toml and exit. |
| `--force` | Allow --init to append even if [tool.scriber] already exists. |
| `--project` | Force project snapshot mode. |
| `--explain, --explain-selection` | Explain reason for file selection in detail. |
| `--explain-graph` | Print relation graph statistics and relations. |
| `--why` | Print exactly which rules/edges pulled the specified file into the pack. |
| `--graph-json` | Export the RelationGraph as a JSON file to the specified path. |
| `--validate-config` | Validate pyproject.toml scriber config. |
| `--dry-run` | Perform a dry run without saving the pack file. |
| `--open` | Open the output file automatically after creation. |
| `--timings` | Show execution timings for each phase. |
| `--version` | Show version information and exit. |
<!-- END SCRIBER:CLI_OPTIONS -->

<!-- BEGIN SCRIBER:PROFILES -->
### Profiles

ProjectScriber comes with several preset profiles to quickly bias the file scoring and inclusion criteria:

| Profile | Description |
|:---|:---|
| `default` | Standard scoring behavior. |
| `audit` | Boosts tests, config files, CI environments, and dependency files. Assumes full support content inclusion. |
| `debug` | Boosts direct/reverse dependencies, tests, runtime support, and files close to the seed path. |
| `refactor` | Boosts files within the same package, related tests, and direct dependencies. |
| `docs` | Heavily boosts documentation files while suppressing test and code file scores. Assumes tree_only support content by default. |
<!-- END SCRIBER:PROFILES -->

-----

## 🛠️ IDE Integrations

### PyCharm / IntelliJ IDEA (External Tools)

You can integrate ProjectScriber directly into PyCharm's right-click context menu to quickly generate LLM context packs for any selected file or folder!

1. Open **Settings / Preferences** ➔ **Tools** ➔ **External Tools**.
2. Click the **`+`** button to add a new tool.
3. Configure it as follows:

* **Name:** `Scriber`
* **Group:** `External Tools`
* **Description:** `Runs ProjectScriber on the selected directory and copies output to clipboard`
* **Program:** `scriber` *(or the absolute path to your `scriber.exe` e.g., `C:\Tools\Python\Python313\Scripts\scriber.exe`)*
* **Arguments:** `"$FilePath$" --config $ProjectFileDir$/pyproject.toml`
* **Working directory:** `$ProjectFileDir$`

Now, you can simply right-click any file or directory in your Project tree, select **External Tools** ➔ **Scriber**, and the context pack will be generated instantly based on your project configuration!

-----

## ⚙️ Configuration

ProjectScriber 2.0 configures itself through the standard `pyproject.toml` using the `[tool.scriber]` table.
ProjectScriber 2.1.0 configures itself through the standard `pyproject.toml` using the `[tool.scriber]` table.
Generate the default block using:

```shell
Expand Down Expand Up @@ -217,7 +263,7 @@ patterns = [
```

### Whitelist Policy
ProjectScriber 2.0 uses a strict **whitelist** approach:
ProjectScriber 2.1.0 uses a strict **whitelist** approach:
1. Files must match either a `code_pattern` or a `support_pattern` to be considered.
2. Unrecognized extensions and binary files are automatically excluded, keeping your LLM context safe from binary garbage.
3. Lock files are included in the tree by default, but their contents are omitted to save tokens.
Expand Down Expand Up @@ -246,4 +292,4 @@ Contributions are welcome!
3. **Run Tests**:
```shell
uv run pytest
```
```
Loading
Loading