Skip to content

Update instructions for agents#1014

Open
harsha-simhadri wants to merge 11 commits into
mainfrom
users/harshasi/update-readme-instructions
Open

Update instructions for agents#1014
harsha-simhadri wants to merge 11 commits into
mainfrom
users/harshasi/update-readme-instructions

Conversation

@harsha-simhadri
Copy link
Copy Markdown
Contributor

No description provided.

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 5, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.47%. Comparing base (45428af) to head (bfc12be).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1014      +/-   ##
==========================================
- Coverage   89.49%   89.47%   -0.02%     
==========================================
  Files         448      461      +13     
  Lines       84118    85559    +1441     
==========================================
+ Hits        75282    76558    +1276     
- Misses       8836     9001     +165     
Flag Coverage Δ
miri 89.47% <ø> (-0.02%) ⬇️
unittests 89.32% <ø> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.
see 69 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread agents.md Outdated

---

## Quick Reference
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can probably remove since agents would not how to work with Rust workspace

@harsha-simhadri harsha-simhadri changed the title Update README and instructions for agents Update instructions for agents May 9, 2026
harsha-simhadri and others added 3 commits May 10, 2026 16:12
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@harsha-simhadri harsha-simhadri marked this pull request as ready for review May 11, 2026 00:04
@harsha-simhadri harsha-simhadri requested review from a team and Copilot May 11, 2026 00:04
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates and consolidates agent/onboarding and review guidance across the repository, including moving GitHub Copilot-specific review instructions into .github/copilot-instructions.md and splitting agent guidance into crate-local AGENTS.md files.

Changes:

  • Fixes/updates documentation references for the test baseline caching system in diskann/README.md.
  • Introduces new onboarding/guidance documents: root AGENTS.md, plus crate-local diskann/AGENTS.md and diskann-wide/AGENTS.md.
  • Replaces .github/instructions.md with .github/copilot-instructions.md and removes the legacy agents.md.

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
diskann/README.md Updates developer docs for test baselines and comparison helpers.
diskann/AGENTS.md Adds crate-local agent notes for baseline caching usage.
diskann-wide/AGENTS.md Adds crate-local agent notes for cross-platform/SIMD validation.
AGENTS.md Adds a consolidated, repo-wide agent onboarding guide and boundaries.
agents.md Removes legacy agent onboarding guide (superseded by AGENTS.md + crate-local docs).
.github/instructions.md Removes old GitHub review instructions file.
.github/copilot-instructions.md Adds Copilot-specific code review instruction set.
Comments suppressed due to low confidence (2)

diskann/README.md:13

  • The baseline cache directory is documented as diskann/tests/generated, but the crate actually writes to diskann/test/generated (see diskann/src/test/cache.rs uses CARGO_MANIFEST_DIR/test/generated). Update this path here (and any later examples) so developers don’t regenerate into a non-existent tests/ directory.
Developers are strongly encouraged to consider the [caching infrastructure](src/test/cache.rs)
when writing index tests to provide an early warning of algorithmic changes.

This infrastructure serializes test results into a file in `diskann/tests/generated`
that serves as the baseline in the normal test flow. Any difference between the baseline

diskann/README.md:48

  • This section references diskann::test::cmp::*, but the test module is #[cfg(test)] mod test; and its helpers are pub(crate) (internal-only). To avoid implying this is a public API (and to match in-crate usage), the README should prefer crate::test::cmp::VerboseEq / crate::test::cmp::verbose_eq in examples/instructions.
When comparing baselines, developers should use the `diskann::test::cmp::VerboseEq`
which provides more diagnostics regarding the source of structural inequality than the
standard libraries `PartialEq` trait. Additional utilities include

* `diskann::test::cmp::verbose_eq!`: A macro for automatically implementing `VerboseEq`.
  This macro can be used until a proper `derive` macro is implemented:
  ```rust
  use diskann::test::cmp::verbose_eq;


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann/AGENTS.md Outdated
Comment thread diskann/AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md
Comment on lines +49 to +55
### Never

- Modify files in `diskann/tests/generated/` by hand — these are auto-generated baselines. Regenerate with `DISKANN_TEST=overwrite`.
- Modify `rust-toolchain.toml`, `.github/workflows/`, or `.codecov.yml` without explicit approval.
- Use the global Rayon thread pool — use `RayonThreadPool`/`RayonThreadPoolRef` (enforced by `clippy.toml` disallowed methods).
- Use `rand::thread_rng` — use the project's `random.rs` utilities instead (enforced by `clippy.toml`).
- Use `vfs::PhysicalFS::new` or `VirtualStorageProvider::new_physical()` in tests — use `VirtualStorageProvider::new_overlay()`.
Comment thread .github/copilot-instructions.md Outdated
Comment thread diskann/README.md
Comment on lines 22 to 27

Before checking in new test results, it's a good idea to completely delete `diskann/tests/generated`
to ensure that unused baselines get removed from the repository.

The API for registering and retrieving test results is in `diskann/src/tests/cache`
The API for registering and retrieving test results is in [`diskann/src/test/cache.rs`](src/test/cache.rs)
and consists of:
Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few small nits, but otherwise looks good.


- Do not introduce `panic!` paths for recoverable errors — propagate with `Result` instead.
- Keep error types small. Avoid large enums/structs that blow up the stack; look for ways to reduce field sizes (e.g., compute derivable fields, use enums instead of `&'static str`).
- Prefer `ANNError::new(ANNErrorKind::…, e)` over the old `log_*`-style constructors, which force eager string formatting and double-log errors.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: There is not double-log from these APIs (the name is a misnomer). The biggest issue is eager string formatting.

- Doc comments and README examples must match actual API signatures and serialized shapes.
- Stale examples that fail to compile or deserialize are treated as bugs.
- Do not leave dead references to APIs that no longer exist.
- When changing a function signature or removing a parameter, update all doc comments that mention the old signature.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More suggestions on documentation:

  • Doc and inline comments should describe what the code does. Do not describe behavior by contrasting it with other code, except when referencing documented external behavior.
  • Avoid comments that simply restate what is already clear from function signatures or where clauses.
  • Do not list functions or types in module docs that rustdoc already documents.
  • Module-level docs should describe the purpose and structure of the module, not its contents.


## Rayon and Parallelism

- Never use the global Rayon thread pool. Always execute parallel work within the provided `RayonThreadPool` or `RayonThreadPoolRef`.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This guidance is more for diskann-providers/diskann-disk than lower level crates. Lower level crates like diskann-quantization should use the dynamically scoped thread pool, but advertise this using the Parallelism enum.

harsha-simhadri and others added 2 commits May 11, 2026 12:07
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Comment thread AGENTS.md
@@ -0,0 +1,182 @@
# DiskANN Repository - Agent Onboarding Guide

**Last Updated**: 2026-05-04 (based on v0.50.1, Rust 1.92)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a lot of previous commentary on testing is removed and not reproduced anywhere else. Is this purposeful? It seems like useful information that shouldn't be lost.

Same comment on code coverage. Should we add something on both topics to the "Always" list?

Comment thread AGENTS.md
- `diskann-label-filter/` - Inverted index for filtered search
- `diskann-garnet/` - Garnet (Redis-compatible) Provider and FFI endpoints for vector sets

**Tier 4: Infrastructure & Tools**
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Infrastructure" is a bit of a vague name here, it kind of seems like every tier contains something that could be called infrastructure. Maybe "Benchmarks and Tools" would be better?

Comment thread AGENTS.md

## Error Handling

There are three regimes of error handling and the strategy to use depends on the regime.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It sounds like these categories may conform neatly to the tiers discussed above, although maybe there are exceptions. If so it could be good to state that explicitly.

Comment thread AGENTS.md
Ok(())
}

// ❌ Bad — eager string formatting, double-logs on creation
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment from Mark about double-logging

@@ -0,0 +1,64 @@
When performing a code review, check that:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can copilot instructions point to the existing AGENTS.md files where applicable? It looks like a lot of instructions are manually duplicated here, so we risk them getting out of sync.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
Do we really need .github/copilot-instructions.md? if so, how is it semantically different from /AGENTS.md? Why can't we keep all agent-related instructions in /AGENTS.md?


## Error Handling

- Do not introduce `panic!` paths for recoverable errors — propagate with `Result` instead.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This probably deserves a mention/move to the top-level AGENTS.md

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@@ -0,0 +1,64 @@
When performing a code review, check that:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1.
Do we really need .github/copilot-instructions.md? if so, how is it semantically different from /AGENTS.md? Why can't we keep all agent-related instructions in /AGENTS.md?

Comment thread AGENTS.md
@@ -0,0 +1,182 @@
# DiskANN Repository - Agent Onboarding Guide
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would explicitly follow next structure in this document:

  1. How code is organized (Crate Organization)
  2. How to contribute (aka How we write code) - and move all stuff from copilot-instructions to this section.
  3. How to validate code before submitting a PR + briefly mention that CI will perform extensive validation in .github/workflows/ci.yml.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants