Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -539,4 +539,4 @@ to this version by default. Legacy v0.x users can pin `graphrag-sdk==0.8.2`.
### Fixed

- `hnswlib` import guard in SemanticResolution and LLMVerifiedResolution — raises clear `ImportError` instead of `AttributeError` when hnswlib is not installed.
- 14 ruff lint errors (import sorting, line length) resolved; CI no longer ignores lint rules.
- 14 ruff lint errors (import sorting, line length) resolved; CI no longer ignores lint rules.
63 changes: 54 additions & 9 deletions docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@ Complete reference for all public classes and methods exported by `graphrag_sdk`
- [Connection](#connection)
- [Providers](#providers)
- [Data Models](#data-models)
- [TokenUsage](#tokenusage)
- [Schema](#schema)
- [Ingestion Strategies](#ingestion-strategies)
- [Ingestion Pipeline](#ingestion-pipeline)
Expand Down Expand Up @@ -269,11 +270,11 @@ LLMInterface(model_name: str, model_params: dict | None = None, max_concurrency:
| Method | Signature | Description |
|--------|-----------|-------------|
| `invoke` | `(prompt: str, **kwargs) -> LLMResponse` | Sync text generation (abstract) |
| `ainvoke` | `(prompt: str, *, max_retries=3, **kwargs) -> LLMResponse` | Async with retry + backoff |
| `ainvoke_messages` | `(messages: list[ChatMessage], *, max_retries=3, **kwargs) -> LLMResponse` | Multi-turn native messages (see below) |
| `ainvoke` | `(prompt: str, *, ctx=None, max_retries=3, **kwargs) -> LLMResponse` | Async with retry + backoff; records usage into `ctx` |
| `ainvoke_messages` | `(messages: list[ChatMessage], *, ctx=None, max_retries=3, **kwargs) -> LLMResponse` | Multi-turn native messages; records usage into `ctx` |
| `invoke_with_model` | `(prompt: str, response_model: Type[BaseModel], **kwargs) -> BaseModel` | Structured output |
| `ainvoke_with_model` | `(prompt: str, response_model: Type[BaseModel], *, max_retries=3) -> BaseModel` | Async structured output |
| `abatch_invoke` | `(prompts: list[str], *, max_concurrency=None, max_retries=3) -> list[LLMBatchItem]` | Concurrent batch |
| `abatch_invoke` | `(prompts: list[str], *, ctx=None, max_concurrency=None, max_retries=3) -> list[LLMBatchItem]` | Concurrent batch; threads `ctx` to each `ainvoke` |

`ainvoke_messages()` is used by `completion()` when conversation history is provided. The default implementation concatenates messages into a single prompt string and calls `ainvoke()`, so custom providers work without changes. `LiteLLM` and `OpenRouterLLM` override this with native multi-turn implementations.

Expand All @@ -283,9 +284,9 @@ LLMInterface(model_name: str, model_params: dict | None = None, max_concurrency:
|--------|-----------|-------------|
| `model_name` | `@property -> str` | Embedding model identifier (abstract) |
| `embed_query` | `(text: str, **kwargs) -> list[float]` | Single text embedding (abstract) |
| `aembed_query` | `(text: str, **kwargs) -> list[float]` | Async single (default: thread pool) |
| `aembed_query` | `(text: str, *, ctx=None, **kwargs) -> list[float]` | Async single; records `embedding_tokens` into `ctx` |
| `embed_documents` | `(texts: list[str], **kwargs) -> list[list[float]]` | Batch (default: sequential) |
| `aembed_documents` | `(texts: list[str], **kwargs) -> list[list[float]]` | Async batch (default: thread pool) |
| `aembed_documents` | `(texts: list[str], *, ctx=None, **kwargs) -> list[list[float]]` | Async batch; records `embedding_tokens` into `ctx` |

### LLMBatchItem

Expand Down Expand Up @@ -411,25 +412,34 @@ class IngestionResult(DataModel):
relationships_created: int = 0
chunks_indexed: int = 0
metadata: dict[str, Any] = {}
usage: TokenUsage = TokenUsage() # Accumulated token counts for this ingest
```

See [Token Usage](token-usage.md) for what each counter covers.

### RagResult

```python
class RagResult(DataModel):
answer: str
retriever_result: RetrieverResult | None = None # Populated when return_context=True
metadata: dict[str, Any] = {} # Contains model, num_context_items, strategy
usage: TokenUsage = TokenUsage() # Accumulated token counts for this completion
```

See [Token Usage](token-usage.md) for what each counter covers.

### RetrieverResult

```python
class RetrieverResult(DataModel):
items: list[RetrieverResultItem] = []
metadata: dict[str, Any] = {}
usage: TokenUsage = TokenUsage() # Accumulated token counts for this retrieve
```

See [Token Usage](token-usage.md) for what each counter covers.

### RetrieverResultItem

```python
Expand Down Expand Up @@ -519,6 +529,32 @@ Deterministic entity ID from normalized name and optional type. When `entity_typ

---

## TokenUsage

```python
from graphrag_sdk import TokenUsage

class TokenUsage(DataModel):
prompt_tokens: int = 0 # Total tokens sent to LLM (input)
completion_tokens: int = 0 # Total tokens generated by LLM (output)
embedding_tokens: int = 0 # Total tokens sent to the embedder
```

All fields default to `0`. Supports `+` (returns new instance) and `+=` (in-place accumulation) for aggregation across batch results.

```python
# Aggregate batch ingest usage
results = await rag.ingest(["a.pdf", "b.pdf"])
total = sum(
(r.usage for r in results if isinstance(r, IngestionResult)),
start=TokenUsage(),
)
```

See the full guide in [Token Usage](token-usage.md).

---

## Schema

```python
Expand Down Expand Up @@ -745,16 +781,25 @@ VectorStore(connection, embedder=None, index_name="chunk_embeddings", embedding_
from graphrag_sdk import Context
```

Execution context for logging and budget tracking.
Execution context threaded through every strategy call for logging, budget tracking, and token usage accumulation.

```python
Context(tenant_id: str = "default", latency_budget_ms: float = 60000.0)
Context(
tenant_id: str = "default",
latency_budget_ms: float | None = None,
metadata: dict[str, Any] = {},
)
```

| Method/Property | Description |
|----------------|-------------|
| `ctx.log(message, log_level=logging.INFO)` | Log a message |
| `ctx.budget_exceeded` | True if elapsed time > latency_budget_ms |
| `ctx.log(message, log_level=logging.INFO)` | Log a message with tenant/trace prefix |
| `ctx.budget_exceeded` | True if elapsed time > `latency_budget_ms` |
| `ctx.remaining_budget_ms` | Remaining budget in ms, or `None` |
| `ctx.elapsed_ms` | Milliseconds since context creation |
| `ctx.usage` | `TokenUsage` accumulator for this operation |
| `ctx.record_usage(*, prompt_tokens=0, completion_tokens=0, embedding_tokens=0)` | Add token counts to the accumulator |
| `ctx.child(**overrides)` | Create a child context with inherited tenant/trace |

---

Expand Down
22 changes: 17 additions & 5 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,12 +118,17 @@ The default is `256` (matched-Matryoshka dimensions of `text-embedding-3-large`)
result = await rag.ingest("path/to/document.txt")
print(f"Created {result.nodes_created} nodes, {result.relationships_created} relationships")
print(f"Indexed {result.chunks_indexed} chunks")
# Token costs for this ingest (extraction LLM + chunk embeddings)
print(f"Prompt tokens: {result.usage.prompt_tokens}")
print(f"Completion tokens: {result.usage.completion_tokens}")
print(f"Embedding tokens: {result.usage.embedding_tokens}")
```

### From raw text

```python
result = await rag.ingest("acme_doc", text="Acme Corp was founded in 1985 by Jane Doe in Austin, Texas.")
result = await rag.ingest(text="Acme Corp was founded in 1985 by Jane Doe in Austin, Texas.",
document_id="acme_doc")
print(f"Created {result.nodes_created} nodes, {result.relationships_created} relationships")
print(f"Indexed {result.chunks_indexed} chunks")
```
Expand Down Expand Up @@ -151,6 +156,10 @@ Use `completion()` for the full RAG pipeline — retrieval + answer generation:
```python
result = await rag.completion("Who works at Acme Corp?")
print(result.answer)
# See what it cost
print(f"Tokens used — prompt: {result.usage.prompt_tokens}, "
f"completion: {result.usage.completion_tokens}, "
f"embedding: {result.usage.embedding_tokens}")
```

### With context inspection
Expand Down Expand Up @@ -201,14 +210,16 @@ Supported roles: `"system"`, `"user"`, `"assistant"`. Invalid roles raise `Value
Use `get_statistics()` to see a summary of what the graph contains:

```python
stats = await rag.graph_store.get_statistics()
stats = await rag.get_statistics()
print(f"Nodes: {stats['node_count']}, Edges: {stats['edge_count']}")
```

You can also run raw Cypher queries against the graph:

```python
results = await rag.graph_store.query_raw("MATCH (p:Person)-[:WORKS_AT]->(o:Organization) RETURN p.name, o.name LIMIT 10")
results = await rag.graph_store.query_raw(
"MATCH (p:Person)-[:WORKS_AT]->(o:Organization) RETURN p.name, o.name LIMIT 10"
)
for row in results.result_set:
print(row)
```
Expand All @@ -221,8 +232,8 @@ After all documents have been ingested, run `finalize()` to deduplicate entities

```python
results = await rag.finalize()
print(f"Deduplicated: {results['entities_deduplicated']}")
print(f"Embedded: {results['entities_embedded']} entities, {results['relationships_embedded']} relationships")
print(f"Deduplicated: {results.entities_deduplicated}")
print(f"Embedded: {results.entities_embedded} entities, {results.relationships_embedded} relationships")
```

This step is important for query accuracy. It merges duplicate entities (e.g., "J. Doe" and "Jane Doe") and ensures all entities have vector embeddings for semantic search.
Expand All @@ -233,6 +244,7 @@ This step is important for query accuracy. It merges duplicate entities (e.g., "

- [docs/configuration.md](configuration.md) -- Tuning connection settings, chunking parameters, and retrieval options.
- [docs/strategies.md](strategies.md) -- Custom extraction and resolution strategies.
- [docs/token-usage.md](token-usage.md) -- Cost tracking, billing dashboards, and observability patterns.
- [docs/benchmark.md](benchmark.md) -- Reproducing benchmark results on the GraphRAG-Bench Novel corpus (20 novels, 2,010 questions).

---
Expand Down
2 changes: 2 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ GraphRAG SDK builds knowledge graphs from documents and answers questions over t
- **Fully modular** -- swap chunking, extraction, resolution, retrieval, and reranking strategies
- **Production-ready** -- async-first, connection pooling, circuit breaker, batched writes
- **Full provenance** -- every answer traces back to its source document and chunk
- **Built-in cost tracking** -- `result.usage.prompt_tokens / completion_tokens / embedding_tokens` on every response

## Quick Start

Expand Down Expand Up @@ -43,5 +44,6 @@ asyncio.run(main())
- [Getting Started](getting-started.md) -- Full tutorial from install to first query
- [Architecture](architecture.md) -- How the 9-step pipeline works
- [Strategies](strategies.md) -- All swappable strategy ABCs and built-in options
- [Token Usage](token-usage.md) -- Cost tracking and observability
- [Benchmark](benchmark.md) -- Methodology and reproduction instructions
- [API Reference](api-reference.md) -- Full API documentation
1 change: 1 addition & 0 deletions docs/ingestion.md
Original file line number Diff line number Diff line change
Expand Up @@ -277,6 +277,7 @@ stats = await rag.finalize()
- **Fastest step:** Quality filter, prune, and resolve — all in-memory, sub-second.
- **Parallelism:** Steps 8-9 run in parallel. Step 1 NER uses a semaphore (default 12 concurrent calls).
- **Batch size:** The benchmark uses 1500-character chunks. 20 documents (~4.7 MB total) take ~47 minutes to ingest.
- **Cost tracking:** Check `result.usage.prompt_tokens`, `result.usage.completion_tokens`, and `result.usage.embedding_tokens` after each `ingest()` call. See [Token Usage](token-usage.md) for aggregation patterns across batch ingestion.

---

Expand Down
14 changes: 9 additions & 5 deletions docs/providers.md
Original file line number Diff line number Diff line change
Expand Up @@ -137,11 +137,13 @@ class MyLLM(LLMInterface):

| Method | Default Behavior | Override When |
|--------|-----------------|--------------|
| `ainvoke(prompt, max_retries=3)` | Runs `invoke()` in a thread pool with retry | You have a native async client |
| `ainvoke_messages(messages, max_retries=3)` | Concatenates messages into a single prompt and calls `ainvoke()` | You have a native multi-turn chat API |
| `ainvoke(prompt, *, ctx=None, max_retries=3)` | Runs `invoke()` in a thread pool with retry; records usage if `ctx` provided | You have a native async client |
| `ainvoke_messages(messages, *, ctx=None, max_retries=3)` | Concatenates messages into a single prompt and calls `ainvoke()` | You have a native multi-turn chat API |
| `invoke_with_model(prompt, response_model)` | Calls `invoke()` and parses JSON into Pydantic model | Your provider has native structured output |
| `ainvoke_with_model(prompt, response_model)` | Calls `ainvoke()` and parses JSON | Same, async version |
| `abatch_invoke(prompts, max_concurrency)` | Concurrent `ainvoke()` with semaphore | You have a native batch API |
| `abatch_invoke(prompts, *, ctx=None, max_concurrency)` | Concurrent `ainvoke()` with semaphore; threads `ctx` to each call | You have a native batch API |

> **Token usage:** pass the current `ctx` to record prompt/completion tokens automatically. See [Token Usage](token-usage.md).

`ainvoke_messages()` is called by `completion()` when conversation history is provided. Override it to pass messages natively to your LLM's chat API for proper multi-turn handling:

Expand Down Expand Up @@ -195,9 +197,11 @@ The `model_name` property is used by the graph config node to validate that the

| Method | Default Behavior | Override When |
|--------|-----------------|--------------|
| `aembed_query(text)` | Runs `embed_query()` in thread pool | You have async embedding |
| `aembed_query(text, *, ctx=None)` | Runs `embed_query()` in thread pool; records embedding tokens if `ctx` provided | You have async embedding |
| `embed_documents(texts)` | Sequential `embed_query()` per text | You can batch embeddings |
| `aembed_documents(texts)` | Runs `embed_documents()` in thread pool | You have async batch |
| `aembed_documents(texts, *, ctx=None)` | Runs `embed_documents()` in thread pool; records embedding tokens if `ctx` provided | You have async batch |

> **Token usage:** pass the current `ctx` to record embedding tokens automatically. See [Token Usage](token-usage.md).

### Batch Embedding

Expand Down
21 changes: 21 additions & 0 deletions docs/retrieval.md
Original file line number Diff line number Diff line change
Expand Up @@ -312,11 +312,32 @@ reranker = CosineReranker(embedder=embedder, top_k=10)
result = await rag.completion("Your question", reranker=reranker)
```

### Token Usage

Both `retrieve()` and `completion()` attach token counters to the result:

```python
# Retrieval only
result = await rag.retrieve("What did Professor Harmon discover?")
print(result.usage.embedding_tokens) # query embedding tokens
print(result.usage.prompt_tokens) # keyword-extraction LLM tokens

# Full completion
result = await rag.completion("What did Professor Harmon discover?")
print(result.usage.prompt_tokens) # retrieval + answer generation LLM input
print(result.usage.completion_tokens) # answer tokens
print(result.usage.embedding_tokens) # query embedding tokens
```

See [Token Usage](token-usage.md) for cost estimation helpers and observability patterns.

---


## File Reference

| File | What it contains |

|------|-----------------|
| [`multi_path.py`](https://github.com/FalkorDB/GraphRAG-SDK/blob/main/graphrag_sdk/src/graphrag_sdk/retrieval/strategies/multi_path.py) | Main orchestrator — coordinates all 9 steps |
| [`entity_discovery.py`](https://github.com/FalkorDB/GraphRAG-SDK/blob/main/graphrag_sdk/src/graphrag_sdk/retrieval/strategies/entity_discovery.py) | RELATES vector search + 2-path entity discovery |
Expand Down
Loading
Loading