Skip to content

Polarityinc/Promising-Spec-Library

Promising Spec Library

Keystone trailer (click to play)

Acceptance specs and scaffolds for Polarity Keystone evals.

License: Apache 2.0 Status: living Keystone v0.1.13 Issues

Agents · Specs · Quickstart · Learnings · Docs


What this is

Nine agent personas, twelve acceptance specs, one reference implementation, and a field notebook of what actually runs on Keystone. Build evals quickly without rediscovering the platform's gotchas.

The repo holds descriptions and tests, not finished agents. When you want to run a spec, hand the matching scaffold to your AI coding tool and have it generate the agent for you. The reference implementation at agents/stripe-refund-aud/ shows the working pattern.


Agents

# Slug Specialty Status
1 general-coder Generalist coding ✓ tested
2 bug-fixer Diagnose + minimal patches ✓ tested
3 db-architect Postgres schema, seeds, SQL needs Postgres infra
4 security-auditor Vulnerability detection ✓ tested
5 web-builder HTTP servers, REST APIs ✓ tested
6 data-pipeline ETL across services needs multi-service infra
7 devops-shell Dockerfiles, infra-as-code needs Docker infra
8 research-summarizer Read docs, write summaries ✓ tested
9 stripe-refund-aud Refund Stripe charges, AUD only ✓ implemented

✓ tested = a throwaway agent built from the scaffold passed the linked spec on Keystone, then was deleted (test artifacts stay outside the repo). ✓ implemented = real code lives in the folder, ready to upload.


Specs

# Spec Domain Agent Status
0 hello-world general (cli) ✓ runs
1 summarize-changelog general research-summarizer ✓ tested
2 bugfix-linked-list code-agents bug-fixer ✓ tested
3 refactor-god-class code-agents general-coder ✓ tested
4 language-matrix-csv code-agents general-coder ✓ tested (Python only)
5 rest-api-todo web-agents web-builder ✓ tested
6 webhook-receiver-hmac web-agents web-builder ✓ tested
7 postgres-ecommerce data-agents db-architect pending
8 security-review security-agents security-auditor ✓ tested
9 dockerize-flask-app devops-agents devops-shell pending
10 enterprise-reconciliation data-agents data-pipeline pending
refund-aud-only finance-agents stripe-refund-aud ✓ implemented

Quickstart

You need a Keystone API key. Get one at https://app.paragon.run/app/keystone/settings.

# install ks, wire your key + AI-coder skill files, run the baseline
curl -fsSL https://ks.polarity.so/install.sh | bash
ks setup
ks eval run specs/general/hello-world.yaml

ks setup is the full wizard: drops AI-coder skill files for Claude Code, Cursor, Gemini CLI, OpenCode, Codex, Windsurf, etc. into the matching .claude/, .cursor/, etc. directories so your tool already knows the Keystone shape. Those directories are gitignored on purpose (regenerated per machine).

To run the one spec with a real agent committed in this repo:

# install the SDK locally so we can upload the snapshot
pip install polarity-keystone

# upload the agent (one-time, per code change)
python - <<'PY'
import polarity_keystone as pk
snap = pk.Keystone().agents.upload(
    name="stripe-refund-aud",
    path="agents/stripe-refund-aud",
    entrypoint=["python3", "/agent/agent.py"],
    runtime="python:3.11",
)
print(snap.id, snap.version)
PY

# run the eval (XAI_API_KEY needed because the agent calls Grok-4)
XAI_API_KEY=xai-... ks eval run specs/finance-agents/refund-aud-only.yaml

Every other spec needs you to build the agent first from its scaffold's instructions. Read LEARNINGS.md before you do; six undocumented Keystone behaviors have already cost us hours.


Creating specs from natural language

Hand a plain-English description of what you want to test to your AI coding tool (Claude Code, Cursor, etc.) and let it draft the spec. After you've run ks setup once, the skill files under .claude/, .cursor/, etc. teach those tools the canonical Keystone spec shape.

A natural-language prompt becomes a spec yaml file

Watch the full walkthrough (1 min).


Documentation


Contributing

PRs welcome. The full guide is in .github/CONTRIBUTING.md. Quick version: open an issue, copy the matching template (agents/_template.md or specs/_template.yaml), validate locally with bash scripts/validate.sh, open a PR.

Security issues: see SECURITY.md. Conduct: Code of Conduct.


License

Apache 2.0. See LICENSE.

Copyright © 2026 Polarity, Inc.

About

Best examples and practices when it comes to guiding your agents using YAML spec files! Control scores, set rules, and encapsulate the number of sandboxes you need to run experiments.

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors