Skip to content

Commit 9d474ee

Browse files
committed
initial commit
0 parents  commit 9d474ee

84 files changed

Lines changed: 12605 additions & 0 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.github/workflows/docs.yml

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,25 @@
1+
name: docs
2+
3+
on:
4+
push:
5+
branches: [master]
6+
workflow_dispatch:
7+
8+
permissions:
9+
contents: write
10+
11+
jobs:
12+
deploy:
13+
runs-on: ubuntu-latest
14+
env:
15+
SITE_URL: https://${{ github.repository_owner }}.github.io/${{ github.event.repository.name }}/
16+
REPO_URL: https://github.com/${{ github.repository }}
17+
steps:
18+
- uses: actions/checkout@v4
19+
with:
20+
fetch-depth: 0
21+
- uses: astral-sh/setup-uv@v5
22+
with:
23+
python-version: "3.12"
24+
- run: uv sync --extra dev --extra docs
25+
- run: uv run mkdocs gh-deploy --force

.github/workflows/publish.yml

Lines changed: 115 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,115 @@
1+
name: publish
2+
3+
on:
4+
push:
5+
tags:
6+
- "v*"
7+
branches:
8+
- master
9+
workflow_dispatch:
10+
inputs:
11+
dry_run:
12+
description: "Skip actual PyPI publish"
13+
required: false
14+
default: "false"
15+
type: choice
16+
options:
17+
- "true"
18+
- "false"
19+
20+
permissions:
21+
contents: write
22+
23+
jobs:
24+
validate:
25+
runs-on: ubuntu-latest
26+
steps:
27+
- uses: actions/checkout@v4
28+
with:
29+
fetch-depth: 0
30+
- uses: astral-sh/setup-uv@v5
31+
with:
32+
python-version: "3.12"
33+
cache-dependency-glob: "uv.lock"
34+
- name: Install dev dependencies
35+
run: uv sync --extra dev
36+
- name: ruff check
37+
run: uv run ruff check .
38+
- name: ruff format check
39+
run: uv run ruff format --check .
40+
- name: mypy
41+
run: uv run mypy sourcery
42+
- name: pytest
43+
run: uv run pytest -q
44+
- name: Version validation
45+
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
46+
run: |
47+
TAG_VERSION="${GITHUB_REF#refs/tags/v}"
48+
TOML_VERSION=$(grep '^version = ' pyproject.toml | head -1 | sed 's/version = "\([^"]*\)"/\1/')
49+
echo "Tag version: $TAG_VERSION"
50+
echo "TOML version: $TOML_VERSION"
51+
if [ "$TAG_VERSION" != "$TOML_VERSION" ]; then
52+
echo "Error: Tag version ($TAG_VERSION) does not match pyproject.toml version ($TOML_VERSION)"
53+
exit 1
54+
fi
55+
- name: Branch protection
56+
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
57+
run: |
58+
git branch -r --contains "${GITHUB_SHA}" | grep -q 'origin/master' || {
59+
echo "Error: Tag must be on master branch"
60+
exit 1
61+
}
62+
63+
publish:
64+
needs: validate
65+
runs-on: ubuntu-latest
66+
if: (github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')) || github.event_name == 'workflow_dispatch'
67+
steps:
68+
- uses: actions/checkout@v4
69+
with:
70+
fetch-depth: 0
71+
- uses: astral-sh/setup-uv@v5
72+
with:
73+
python-version: "3.12"
74+
cache-dependency-glob: "uv.lock"
75+
- name: Build
76+
run: uv build
77+
- name: Upload build artifacts
78+
uses: actions/upload-artifact@v4
79+
with:
80+
name: dist
81+
path: dist/
82+
retention-days: 7
83+
- name: Publish to PyPI
84+
if: github.event_name != 'workflow_dispatch' || github.event.inputs.dry_run != 'true'
85+
env:
86+
PYPI_TOKEN: ${{ secrets.PYPI_TOKEN }}
87+
run: uv publish --token "$PYPI_TOKEN"
88+
- name: Create GitHub Release
89+
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
90+
uses: softprops/action-gh-release@v2
91+
with:
92+
generate_release_notes: true
93+
files: dist/*
94+
env:
95+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
96+
- name: Build summary
97+
run: |
98+
DRY_RUN="${{ github.event.inputs.dry_run || 'false' }}"
99+
if [ "$DRY_RUN" = "true" ]; then
100+
echo "## Dry Run - Build completed (not published)" >> "$GITHUB_STEP_SUMMARY"
101+
elif [ "${{ github.event_name }}" = "workflow_dispatch" ]; then
102+
echo "## Manual Publish (no GitHub Release)" >> "$GITHUB_STEP_SUMMARY"
103+
else
104+
echo "## Published ${{ github.ref_name }}" >> "$GITHUB_STEP_SUMMARY"
105+
fi
106+
echo "" >> "$GITHUB_STEP_SUMMARY"
107+
echo "### Files" >> "$GITHUB_STEP_SUMMARY"
108+
echo '```' >> "$GITHUB_STEP_SUMMARY"
109+
ls -lah dist/ >> "$GITHUB_STEP_SUMMARY"
110+
echo '```' >> "$GITHUB_STEP_SUMMARY"
111+
echo "" >> "$GITHUB_STEP_SUMMARY"
112+
echo "### Hashes" >> "$GITHUB_STEP_SUMMARY"
113+
echo '```' >> "$GITHUB_STEP_SUMMARY"
114+
cd dist && sha256sum * >> "$GITHUB_STEP_SUMMARY"
115+
echo '```' >> "$GITHUB_STEP_SUMMARY"

.gitignore

Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
# Python caches and bytecode
2+
__pycache__/
3+
*.py[cod]
4+
*$py.class
5+
6+
# Tool caches
7+
.mypy_cache/
8+
.pytest_cache/
9+
.ruff_cache/
10+
.coverage
11+
.coverage.*
12+
htmlcov/
13+
14+
# Virtual environments
15+
.venv/
16+
venv/
17+
env/
18+
19+
# Local runtime and generated artifacts
20+
.blackgeorge/
21+
.sourcery/
22+
site/
23+
benchmark_results/
24+
examples/output/
25+
26+
# Environment and secrets
27+
.env
28+
.env.*
29+
30+
# Editor / OS noise
31+
.DS_Store
32+
Thumbs.db
33+
.idea/
34+
.vscode/

AGENTS.md

Lines changed: 93 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,93 @@
1+
## Project Overview
2+
Sourcery is a schema-first extraction framework built on BlackGeorge runtime primitives.
3+
4+
Primary goal:
5+
- extract typed entities/claims from unstructured text and documents,
6+
- ground extractions to source spans,
7+
- provide deterministic post-processing and reviewable output.
8+
9+
Core runtime model:
10+
- Sourcery owns extraction domain logic (chunking, prompts, alignment, merge, reconciliation).
11+
- BlackGeorge owns model execution, flow/workforce orchestration, events, pause/resume, and run storage.
12+
13+
## Tech Stack
14+
- Python 3.12+
15+
- `uv` for environment and dependency management
16+
- Pydantic v2 for contracts
17+
- BlackGeorge for orchestration/runtime
18+
- pytest + ruff + mypy for quality gates
19+
- MkDocs for docs
20+
21+
## Repository Layout
22+
- `sourcery/contracts/` data contracts and public typed models
23+
- `sourcery/pipeline/` chunking, prompt compilation, alignment, merge, validation
24+
- `sourcery/runtime/` engine and BlackGeorge integration
25+
- `sourcery/ingest/` source loaders (text/file/pdf/html/url/ocr)
26+
- `sourcery/io/` JSONL, visualization, reviewer UI
27+
- `sourcery/observability/` trace/event collection
28+
- `sourcery/benchmarks/` Sourcery vs LangExtract benchmark runner
29+
- `tests/` pytest suite
30+
- `docs/` MkDocs pages
31+
32+
## Setup Commands
33+
- Base install: `uv sync`
34+
- Dev tooling: `uv sync --extra dev`
35+
- Ingestion extras: `uv sync --extra ingest`
36+
- Benchmark extras: `uv sync --extra benchmark`
37+
- Docs tooling: `uv sync --extra docs`
38+
- All common extras: `uv sync --extra dev --extra ingest --extra docs --extra benchmark`
39+
40+
## Development Commands
41+
- Run tests: `uv run --extra dev pytest -q`
42+
- Lint: `uv run ruff check .`
43+
- Format (if needed): `uv run ruff format .`
44+
- Type check: `uv run mypy .`
45+
- Serve docs: `uv run mkdocs serve`
46+
- Build docs: `uv run mkdocs build`
47+
- Run benchmark: `uv run sourcery-benchmark --text-types english,japanese,french,spanish --max-chars 4500 --max-passes 2 --sourcery-model deepseek/deepseek-chat`
48+
49+
## Code Style and Conventions
50+
- Keep all public/runtime code fully type-annotated.
51+
- Keep `mypy` strict-clean (`[tool.mypy] strict = true`).
52+
- Keep `ruff` clean; line length is 100.
53+
- Prefer explicit, deterministic logic over implicit behavior.
54+
- Use Pydantic contracts for cross-module boundaries.
55+
- Keep black box boundaries clear:
56+
- contracts: types only,
57+
- pipeline: deterministic extraction transforms,
58+
- runtime: orchestration/provider integration,
59+
- io/observability: output + telemetry.
60+
61+
## Testing Requirements
62+
- Any behavior change must include or update tests.
63+
- Bug fixes should add a regression test when feasible.
64+
- Keep all existing tests green before finishing.
65+
- Prefer focused unit tests near changed modules plus one integration test when runtime behavior changes.
66+
67+
## Documentation Requirements
68+
- If public API, runtime behavior, or config changes, update docs in `docs/`.
69+
- If you add a new docs page, update `mkdocs.yml` navigation !!!
70+
- Keep `README.md`, `USAGE.md`, and `CODE_EXAMPLES.md` consistent with code behavior.
71+
72+
## Runtime and Safety Notes
73+
- `RuntimeConfig.model` must be set to a valid provider/model route.
74+
- Provider keys must come from environment variables; never hardcode secrets.
75+
- Do not commit `.env` or API keys.
76+
- Do not commit runtime state/databases under `.sourcery/` unless explicitly requested.
77+
- Do not use destructive git commands (`reset --hard`, `checkout --`) unless explicitly asked.
78+
79+
## PR / Commit Checklist
80+
- `uv run ruff check .`
81+
- `uv run mypy .`
82+
- `uv run --extra dev pytest -q`
83+
- Update docs if behavior/API changed
84+
- Keep changes minimal and scoped to the task
85+
86+
## Preferred Commit Style
87+
- Conventional commits:
88+
- `feat: ...`
89+
- `fix: ...`
90+
- `refactor: ...`
91+
- `docs: ...`
92+
- `test: ...`
93+
- `chore: ...`

CLAUDE.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
@AGENTS.md

0 commit comments

Comments
 (0)