Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/agents/unsafe-reviewer.agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
---
name: unsafe-reviewer
description: Reviews unsafe Rust code for soundness and proper SAFETY documentation
tools: ['read', 'search']
---

# Unsafe Code Reviewer

You are a specialist in reviewing `unsafe` Rust code for soundness.

## Focus Areas

- Every `unsafe` block must have a `// SAFETY:` comment explaining the invariant
- Check that safety invariants actually hold (not just documented)
- Verify SIMD intrinsics match the target architecture feature gates
- Look for undefined behavior: uninitialized memory, alignment violations, data races

## Constraints

- Review only — do not modify code
- Flag issues with specific line references and suggested fixes
<!-- TODO: Add repo-specific unsafe patterns and known-safe abstractions -->
10 changes: 10 additions & 0 deletions .github/hooks/security.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
{
"version": 1,
"hooks": {
"preToolUse": [{
"type": "command",
"bash": "./scripts/security-check.sh",
"powershell": "pwsh -File scripts/security-check.ps1"
}]
}
}
9 changes: 9 additions & 0 deletions .github/instructions/ci-workflows.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
applyTo: '.github/workflows/*.yml'
---
# CI Workflow Conventions
**When to read:** Modifying GitHub Actions workflow files.

- Do not modify CI workflows unless explicitly asked
- The `RUST_CONFIG` env var in ci.yml sets `-Dwarnings` — do not remove or weaken this
- Test jobs run on both Linux and Windows — ensure changes are cross-platform
9 changes: 9 additions & 0 deletions .github/instructions/unsafe-code.instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
applyTo: '**/*.rs'
---
# Unsafe Code Conventions
**When to read:** Writing or modifying Rust source files that use `unsafe`.

- Document every `unsafe` block with a `// SAFETY:` comment explaining the invariant
- Prefer safe abstractions over raw unsafe code
- When touching architecture-specific intrinsics (SIMD), validate across platforms per diskann-wide/README.md
14 changes: 14 additions & 0 deletions .github/prompts/generate-tests.prompt.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
mode: 'agent'
description: 'Generate tests for the current file following DiskANN conventions'
---

Generate tests for the code in #{file:${file}}.

Follow these DiskANN testing conventions:
- Do not add tests for derived traits (Clone, Debug, PartialEq)
- Do not add tests for enums unless they have explicit functionality
- Look for existing test infrastructure before creating new patterns
- Use `VirtualStorageProvider::new_overlay()` for storage in tests (never `new_physical()`)
- Use functions from `random.rs` instead of `rand::thread_rng`
- Check if the crate has a `test` module with shared helpers
41 changes: 41 additions & 0 deletions .github/skills/ci-failure-triage/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
name: ci-failure-triage
description: 'Diagnose CI failures. Use when "CI failed", "build broken", "test failure in CI", or "triage CI".'
---

# CI Failure Triage

Systematic approach to diagnosing failures in the DiskANN CI pipeline.

## When to Use

- A CI workflow run has failed
- Tests pass locally but fail in CI
- Platform-specific failures (Linux vs Windows)

## Process

### Step 1: Identify Failure Type

Check which CI job failed:
- `format` — run `cargo fmt --all --check`
- `clippy` / `clippy-features` — run `cargo clippy --workspace --all-targets --config 'build.rustflags=["-Dwarnings"]'`
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can these not be picked from verify scripts? Why are they replicated here? Further, the flags are not the same.

- `test` — run `cargo test`
- `sde-*` — architecture-specific issues (AVX-512, baseline x86-64)

### Step 2: Reproduce Locally

<!-- TODO: Add SDE setup instructions for architecture emulation -->
- For format/clippy/test: run `./scripts/verify`
- For feature-gated failures: check `DISKANN_FEATURES` env var in ci.yml
- For SDE failures: see diskann-wide/README.md for cross-platform validation

### Step 3: Fix and Verify

- Apply fix, then run `./scripts/verify`
- If SIMD-related, validate per diskann-wide/README.md

## Constraints

- Do not disable or weaken CI checks to fix failures
- Do not skip platform-specific test runs
47 changes: 47 additions & 0 deletions .github/skills/new-crate/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
---
name: new-crate
description: 'Scaffold a new workspace crate. Use when "add a crate", "create a new crate", "new package", or "scaffold crate".'
---

# New Crate Scaffolding

Scaffold a new Cargo workspace crate following the tiered architecture.

## When to Use

- Adding a new crate to the workspace
- Creating a new module that should be a separate crate

## Process

### Step 1: Determine Tier

<!-- TODO: Document criteria for tier placement -->
- Tier 1 (Foundation): SIMD, vector primitives, platform utils
- Tier 2 (Core Libraries): Linear algebra, utilities, quantization
- Tier 3 (Algorithm & Storage): Core algorithm, providers, disk indexing
- Tier 4 (Infrastructure & Tools): Benchmarks, CLI tools

### Step 2: Create Crate

1. Run `cargo new <crate-name> --lib` in the repo root
2. Add to `[workspace.members]` in root `Cargo.toml`
3. Add to `default-members` if it's Tier 1 or Tier 2
4. Set `version`, `edition`, `license` from `workspace.package`
5. Add `[lints] workspace = true` to inherit workspace lints

### Step 3: Configure Dependencies

- Only depend on crates from equal or lower tiers
- Add workspace dependency entry in root `Cargo.toml` if other crates will depend on it
- Follow dependency rules documented in AGENTS.md

## Constraints

- Never add Tier 3 dependencies to benchmark-runner, benchmark-core, or benchmark-simd
- Use `workspace = true` for shared dependencies

## Validation

- `cargo check -p <crate-name>`
- `cargo clippy -p <crate-name> --all-targets --config 'build.rustflags=["-Dwarnings"]'`
13 changes: 13 additions & 0 deletions .github/workflows/copilot-setup-steps.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
name: "Copilot Setup Steps"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does this do?

on: workflow_dispatch
jobs:
copilot-setup-steps:
runs-on: ubuntu-latest
environment: copilot
timeout-minutes: 30
permissions:
contents: read
steps:
- uses: actions/checkout@v5
- name: Setup environment
run: ./scripts/setup
104 changes: 104 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# AGENTS.md

## Verification

- Run: `cargo fmt --all --check && cargo clippy --workspace --all-targets --config 'build.rustflags=["-Dwarnings"]' && cargo test`
- Or: `./scripts/verify` (bash) / `./scripts/verify.ps1` (PowerShell)
- If it fails, fix the root cause and re-run before committing.

## Environment

- Rust edition 2021, toolchain version in `rust-toolchain.toml`
- Cargo workspace with 18 crates across 4 tiers (see `Cargo.toml`)
- CI uses `cargo-nextest` — see `.cargo/nextest.toml` for timeouts/retries
- `rustfmt` and `clippy` not installed by default — run `./scripts/setup` first

### Crate Organization

**Tier 1: Foundation**
- `diskann-wide/` - Low-level SIMD, bit manipulation, type width abstractions
- `diskann-vector/` - Vector primitives and operations
- `diskann-platform/` - Platform-specific utilities

**Tier 2: Core Libraries**
- `diskann-linalg/` - Linear algebra operations
- `diskann-utils/` - Shared utilities (Reborrow, MatrixView traits)
- `diskann-quantization/` - Vector quantization (PQ, SQ)

**Tier 3: Algorithm & Storage**
- `diskann/` - Core ANN graph algorithm and in-memory indexing (CENTRAL crate)
- `diskann-providers/` - Storage abstraction layer
- `diskann-disk/` - Disk-based indexing with io_uring support
- `diskann-label-filter/` - Inverted index for filtered search

**Tier 4: Infrastructure & Tools**
- `diskann-benchmark-runner/` - Test runner infrastructure
- `diskann-benchmark-core/` - Benchmark framework
- `diskann-benchmark-simd/` - SIMD-specific benchmarks
- `diskann-benchmark/` - Benchmark definitions and runners
- `diskann-tools/` - CLI utilities (autotuner, etc.)

## Crate Dependency Rules

- Tier 1 and Tier 2 crates may be added as dependencies of any internal crate
- `diskann` may be added as a dependency of any equal or higher tier internal crate except those below
- Do not add Tier 3 crates as dependencies of these Tier 4 crates:
- `diskann-benchmark-runner`
- `diskann-benchmark-core` (`diskann` is allowed)
- `diskann-benchmark-simd`

## Error Handling

Three regimes — use the one matching the crate's tier:

- **Low-level crates**: Bespoke, precise, non-allocating error types. Use `thiserror`. Chain with `std::error::Error::source`. Do not use `ANNError`.
- **Mid-level (diskann algorithms)**: Use `diskann::ANNError` with its context machinery. Use `#[track_caller]` when converting to ANNError. Use only for unrecoverable errors. Consider `diskann::error::ToRanked` for traits with associated error types.
- **High-level (tooling)**: Use `anyhow::Error`.
- **Never** use a single crate-level error enum — it provides no per-function failure docs, generates oversized structs, and bloats code.

## Guardrails

- Do not use `rand::thread_rng` — use functions from `random.rs`
- Do not use rayon global thread pool methods — use `*_in_pool` wrappers from `rayon_utils.rs` (see `clippy.toml`)
- Do not use `VirtualStorageProvider::new_physical()` in tests — use `new_overlay()`
- Do not use `vfs::PhysicalFS::new` in tests — use `VirtualStorageProvider::new_overlay()`
- When touching architecture-specific intrinsics, run cross-platform validation per `diskann-wide/README.md`

## Testing

- Baseline caching system for regression detection — see `diskann/src/test/cache.rs` and `diskann/src/test/cmp.rs`
- Do not add tests for derived traits (Clone, Debug, PartialEq)
- Do not add tests for enums unless they have explicit functionality
- Look for existing setup/execution infrastructure before creating new patterns
- Code coverage of changes is required for PRs

## AVX-512, Aarch64, and multi-platform

When touching architecture-specific intrinsics, run cross-platform validation per `diskann-wide/README.md`:

- Testing AVX-512 code on non-AVX-512 capable x86-64 machines.
- Testing Aarch64 code on x86-64 machines.
- Testing code compiled for and running on the `x86-64` CPU (no AVX/AVX2) does not execute unsupported instructions.

## Constraints

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these really needed. cant we assume a agent knows not to do these?

- Keep diffs minimal and scoped to the request
- Update or add tests for any behavior change
- Do not modify CI, dependency versions, or security settings unless asked
- Never print, log, or commit secrets

## Definition of Done

- `./scripts/verify` passes (fmt + clippy + tests)
- No new lint warnings introduced
- Changes are scoped to the request — no drive-by refactors
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does not belong here.


## Where to find more

- Path-specific rules: `.github/instructions/`
- Multi-step workflows: `.github/skills/*/SKILL.md`

## Best Practices for Writing an Effective AGENTS.md
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remove


- Keep AGENTS.md concise — ideally within 30–80 lines.
- Do not include structural details such as directory listings or README‑style content, as research shows these can degrade agent performance.
Comment on lines +101 to +104
Loading
Loading