Skip to content

Commit a6a219f

Browse files
committed
feat: add OpenRouter provider support
Add OpenRouter as a first-class LLM provider alongside OpenAI, Gemini, and Anthropic. OpenRouter aggregates models from many upstream providers behind a single API key, enabling access to a wide range of models (e.g. openai/gpt-4o-mini, anthropic/claude-3.5-sonnet, deepseek/deepseek-v3.2). Changes: - Register 'openrouter' in supported_providers and default_models. - Add OpenRouterProvider (extends OpenAIProvider with custom base_url). - Add OPENROUTER_API_KEY to APIKeyManager config. - Auto-detect OpenRouter from slash-style model IDs (e.g. provider/model). - Accept any listed OpenRouter model in test_query (catalog-based check). - Add structured-output JSON fallback for models that don't support OpenAI's .parse() endpoint (with text-repair recovery path). - Fix BaseAnnotator.query_llm signature to accept agent_description kwarg. - Update README with OpenRouter setup and usage documentation. - Add tutorial notebook: 110_openrouter_sample_annotation.ipynb. - Extend tests for providers, API keys, model detection, and integration. Made-with: Cursor
1 parent e66c712 commit a6a219f

13 files changed

Lines changed: 608 additions & 30 deletions

README.md

Lines changed: 31 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,12 @@
1717
[badge-zenodo]: https://zenodo.org/badge/899554552.svg
1818

1919

20-
🧬 CellAnnotator is an [scverse ecosystem package](https://scverse.org/packages/#ecosystem), designed to annotate cell types in scRNA-seq data based on marker genes using large language models (LLMs). It supports OpenAI, Google Gemini, and Anthropic Claude models out of the box, with more providers planned for the future.
20+
🧬 CellAnnotator is an [scverse ecosystem package](https://scverse.org/packages/#ecosystem), designed to annotate cell types in scRNA-seq data based on marker genes using large language models (LLMs). It supports OpenAI, Google Gemini, Anthropic Claude, and OpenRouter models out of the box.
2121

2222

2323
## ✨ Key Features
2424

25-
- 🤖 **LLM-agnostic backend**: Seamlessly use models from OpenAI, Anthropic (Claude), and Gemini (Google) — just set your provider and API key.
25+
- 🤖 **LLM-agnostic backend**: Seamlessly use models from OpenAI, Anthropic (Claude), Gemini (Google), or OpenRouter — just set your provider and API key.
2626
- 🧬 **Automatically annotate cells** including type, state, and confidence fields.
2727
- 🔄 **Consistent annotations** across all samples in your study.
2828
- 🧠 **Infuse prior knowledge** by providing information about your biological system.
@@ -60,6 +60,7 @@ After installation, head over to the LLM provider of your choice to generate an
6060
- OpenAI: [API key](https://help.openai.com/en/articles/4936850-where-do-i-find-my-openai-api-key)
6161
- Google (Gemini): [API key](https://ai.google.dev/gemini-api/docs/api-key)
6262
- Anthropic (Claude): [API key](https://docs.anthropic.com/en/docs/get-started)
63+
- OpenRouter: [API key](https://openrouter.ai/settings/keys)
6364

6465

6566
🔒 Keep this key private and don't share it with anyone. `CellAnnotator` will try to read the key as an environmental variable - either expose it to the environment yourself, or store it as an `.env` file anywhere within the repository where you conduct your analysis and plan to run `CellAnnotator`. The package will then use [dotenv](https://pypi.org/project/python-dotenv/) to export the key from the `env` file as an environmental variable.
@@ -78,6 +79,31 @@ cell_ann = CellAnnotator(
7879

7980
By default, this will store annotations in `adata.obs['cell_type_predicted']`. Head over to our 📚 [tutorials](https://cell-annotator.readthedocs.io/en/latest/notebooks/tutorials/index.html) to see more advanced use cases, and learn how to adapt this to your own data. You can run `CellAnnotator` for just a single sample of data, or across multiple samples. In the latter case, it will attempt to harmonize annotations across samples.
8081

82+
### Advanced provider options
83+
84+
`CellAnnotator` can also be used in single-sample mode by setting `sample_key=None`.
85+
86+
Example:
87+
88+
```python
89+
from cell_annotator import CellAnnotator
90+
91+
cell_ann = CellAnnotator(
92+
adata=adata,
93+
species="human",
94+
tissue="pancreas",
95+
cluster_key="leiden_1",
96+
sample_key=None, # single-sample mode
97+
provider="openrouter",
98+
model="openai/gpt-4o-mini",
99+
api_key="YOUR_OPENROUTER_API_KEY",
100+
)
101+
102+
cell_ann.get_expected_cell_type_markers(n_markers=3)
103+
cell_ann.get_cluster_markers()
104+
cell_ann.annotate_clusters(key_added="cell_type_predicted")
105+
```
106+
81107

82108

83109
## 💸 Costs and models
@@ -89,12 +115,14 @@ CellAnnotator is LLM-agnostic and works with multiple providers:
89115

90116
- **Anthropic Claude:** Claude models are supported. See the [Anthropic pricing page](https://docs.anthropic.com/claude/docs/pricing) for details.
91117

118+
- **OpenRouter:** OpenRouter routes requests to many model families (including OpenAI, Anthropic, and others) behind a single API key. Use `provider="openrouter"` and pass a model slug such as `openai/gpt-4o-mini` or `anthropic/claude-3.5-sonnet`.
119+
92120
You can select your provider and model by setting the appropriate parameters. More providers may be supported in the future as the LLM ecosystem evolves.
93121

94122

95123

96124
## 🔐 Data privacy
97-
This package sends cluster marker genes, and the `species` and `tissue` you define, to the selected LLM provider (e.g., OpenAI, Google, or Anthropic). **No actual gene expression values are sent.**
125+
This package sends cluster marker genes, and the `species` and `tissue` you define, to the selected LLM provider (e.g., OpenAI, Google, Anthropic, or OpenRouter routes). **No actual gene expression values are sent.**
98126

99127
Please ensure your usage of this package aligns with your institution's guidelines on data privacy and the use of external AI models. Each provider has its own privacy policy and terms of service. Review these carefully before using CellAnnotator with sensitive or regulated data.
100128

Lines changed: 174 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,174 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"metadata": {},
6+
"source": [
7+
"# OpenRouter sample annotation with Leiden clusters\n",
8+
"\n",
9+
"This tutorial shows how to annotate one or more samples with `CellAnnotator`\n",
10+
"using an OpenRouter model and a user-provided Leiden key."
11+
]
12+
},
13+
{
14+
"cell_type": "code",
15+
"execution_count": null,
16+
"metadata": {},
17+
"outputs": [],
18+
"source": [
19+
"import scanpy as sc\n",
20+
"from cell_annotator import CellAnnotator"
21+
]
22+
},
23+
{
24+
"cell_type": "markdown",
25+
"metadata": {},
26+
"source": [
27+
"## Configuration\n",
28+
"\n",
29+
"- `OPENROUTER_API_KEY`: your OpenRouter API key\n",
30+
"- `OPENROUTER_MODEL`: model slug (e.g. `openai/gpt-4o-mini`)\n",
31+
"- `LEIDEN_KEY`: cluster column in `adata.obs`\n",
32+
"- `SAMPLE_KEY`: sample column in `adata.obs`, or `None` for a single sample\n",
33+
"- `ADATA_PATH`: path to your `.h5ad` dataset"
34+
]
35+
},
36+
{
37+
"cell_type": "code",
38+
"execution_count": null,
39+
"metadata": {},
40+
"outputs": [],
41+
"source": [
42+
"OPENROUTER_API_KEY = \"\" # e.g. sk-or-v1-...\n",
43+
"OPENROUTER_MODEL = \"openai/gpt-4o-mini\"\n",
44+
"LEIDEN_KEY = \"leiden\"\n",
45+
"ADATA_PATH = \"path/to/your_data.h5ad\"\n",
46+
"\n",
47+
"SPECIES = \"human\"\n",
48+
"TISSUE = \"pancreas\"\n",
49+
"STAGE = \"adult\"\n",
50+
"SAMPLE_KEY = \"sample\" # set to None for single-sample datasets\n",
51+
"\n",
52+
"if not OPENROUTER_API_KEY:\n",
53+
" raise ValueError(\"Set OPENROUTER_API_KEY before continuing.\")\n",
54+
"if not OPENROUTER_MODEL:\n",
55+
" raise ValueError(\"Set OPENROUTER_MODEL before continuing.\")"
56+
]
57+
},
58+
{
59+
"cell_type": "markdown",
60+
"metadata": {},
61+
"source": [
62+
"## Load data"
63+
]
64+
},
65+
{
66+
"cell_type": "code",
67+
"execution_count": null,
68+
"metadata": {},
69+
"outputs": [],
70+
"source": [
71+
"adata = sc.read_h5ad(ADATA_PATH)\n",
72+
"\n",
73+
"if LEIDEN_KEY not in adata.obs.columns:\n",
74+
" raise KeyError(f\"Column '{LEIDEN_KEY}' was not found in adata.obs\")\n",
75+
"if SAMPLE_KEY is not None and SAMPLE_KEY not in adata.obs.columns:\n",
76+
" raise KeyError(f\"Column '{SAMPLE_KEY}' was not found in adata.obs\")\n",
77+
"\n",
78+
"print(adata)\n",
79+
"print(\"Leiden key:\", LEIDEN_KEY)\n",
80+
"print(\"Sample key:\", SAMPLE_KEY)"
81+
]
82+
},
83+
{
84+
"cell_type": "markdown",
85+
"metadata": {},
86+
"source": [
87+
"## Initialize annotator"
88+
]
89+
},
90+
{
91+
"cell_type": "code",
92+
"execution_count": null,
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"cell_ann = CellAnnotator(\n",
97+
" adata=adata,\n",
98+
" species=SPECIES,\n",
99+
" tissue=TISSUE,\n",
100+
" stage=STAGE,\n",
101+
" cluster_key=LEIDEN_KEY,\n",
102+
" sample_key=SAMPLE_KEY,\n",
103+
" provider=\"openrouter\",\n",
104+
" model=OPENROUTER_MODEL,\n",
105+
" api_key=OPENROUTER_API_KEY,\n",
106+
")\n",
107+
"\n",
108+
"cell_ann"
109+
]
110+
},
111+
{
112+
"cell_type": "markdown",
113+
"metadata": {},
114+
"source": [
115+
"## Run annotation"
116+
]
117+
},
118+
{
119+
"cell_type": "code",
120+
"execution_count": null,
121+
"metadata": {},
122+
"outputs": [],
123+
"source": [
124+
"cell_ann.get_expected_cell_type_markers(n_markers=3)\n",
125+
"cell_ann.get_cluster_markers()\n",
126+
"cell_ann.annotate_clusters(key_added=\"cell_type_predicted\")"
127+
]
128+
},
129+
{
130+
"cell_type": "markdown",
131+
"metadata": {},
132+
"source": [
133+
"## Inspect and save results"
134+
]
135+
},
136+
{
137+
"cell_type": "code",
138+
"execution_count": null,
139+
"metadata": {},
140+
"outputs": [],
141+
"source": [
142+
"adata.obs[[LEIDEN_KEY, \"cell_type_predicted\"]].head(10)"
143+
]
144+
},
145+
{
146+
"cell_type": "code",
147+
"execution_count": null,
148+
"metadata": {},
149+
"outputs": [],
150+
"source": [
151+
"if \"X_umap\" in adata.obsm:\n",
152+
" sc.pl.umap(adata, color=[LEIDEN_KEY, \"cell_type_predicted\"], wspace=0.35)\n",
153+
"else:\n",
154+
" print(\"No UMAP embedding found; skipping plot.\")\n",
155+
"\n",
156+
"output_path = ADATA_PATH.replace(\".h5ad\", \"_annotated.h5ad\")\n",
157+
"adata.write(output_path)\n",
158+
"print(f\"Saved annotated object to: {output_path}\")"
159+
]
160+
}
161+
],
162+
"metadata": {
163+
"kernelspec": {
164+
"display_name": "Python 3",
165+
"language": "python",
166+
"name": "python3"
167+
},
168+
"language_info": {
169+
"name": "python"
170+
}
171+
},
172+
"nbformat": 4,
173+
"nbformat_minor": 5
174+
}

src/cell_annotator/_constants.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,9 +19,10 @@ class PackageConstants:
1919
"openai": "gpt-4o-mini",
2020
"gemini": "gemini-2.5-flash-lite",
2121
"anthropic": "claude-haiku-4-5",
22+
"openrouter": "openai/gpt-4o-mini",
2223
}
2324
# Supported LLM providers
24-
supported_providers: list[str] = ["openai", "gemini", "anthropic"]
25+
supported_providers: list[str] = ["openai", "gemini", "anthropic", "openrouter"]
2526
default_cluster_key: str = "leiden"
2627
cell_type_key: str = "cell_type_harmonized"
2728

src/cell_annotator/model/_api_keys.py

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,11 @@ class APIKeyManager:
3232
"setup_url": "https://console.anthropic.com/settings/keys",
3333
"description": "Anthropic Claude models",
3434
},
35+
"openrouter": {
36+
"env_var": "OPENROUTER_API_KEY",
37+
"setup_url": "https://openrouter.ai/settings/keys",
38+
"description": "OpenRouter models (aggregated providers)",
39+
},
3540
}
3641

3742
def __init__(self, auto_load_env: bool = True):
@@ -186,6 +191,10 @@ def validate_model_access(self, model: str) -> tuple[bool, str | None]:
186191
provider = "gemini"
187192
elif any(claude_name in model_lower for claude_name in ["claude", "anthropic"]):
188193
provider = "anthropic"
194+
elif "/" in model and not model_lower.startswith("models/"):
195+
# OpenRouter uses '<provider>/<model>' slugs (e.g. 'openai/gpt-4o-mini').
196+
# The 'models/' guard avoids false-matching Gemini IDs like 'models/gemini-1.5-flash'.
197+
provider = "openrouter"
189198
elif any(openai_name in model_lower for openai_name in ["gpt", "o1", "davinci", "curie", "babbage", "ada"]):
190199
provider = "openai"
191200
else:

0 commit comments

Comments
 (0)