diff --git a/.github/actions/agents-schema-memory/action.yml b/.github/actions/agents-schema-memory/action.yml
new file mode 100644
index 0000000..e6bb566
--- /dev/null
+++ b/.github/actions/agents-schema-memory/action.yml
@@ -0,0 +1,21 @@
+name: 'Agents Schema Memory'
+description: >
+ Ingests durable agent memories from a YAML file into AGENTS.MEMORY_* tables.
+
+inputs:
+ memory-file:
+ description: 'Path to a YAML file containing durable agent memories'
+ required: true
+runs:
+ using: composite
+ steps:
+ - name: Set up uv
+ uses: astral-sh/setup-uv@v5
+ with:
+ python-version: '3.12'
+
+ - name: Run memory ingestion
+ shell: bash
+ run: |
+ uvx --from "agents-schema==0.0.6" \
+ agents-schema memory --memory-file "${{ inputs.memory-file }}"
diff --git a/.github/workflows/agents-schema-memory.yml b/.github/workflows/agents-schema-memory.yml
new file mode 100644
index 0000000..de6da37
--- /dev/null
+++ b/.github/workflows/agents-schema-memory.yml
@@ -0,0 +1,27 @@
+name: Agents Schema Memory
+
+on:
+ workflow_call:
+ inputs:
+ memory-file:
+ description: 'Path to a YAML file containing durable agent memories'
+ type: string
+ required: true
+ secrets:
+ WAREHOUSE_CREDENTIALS:
+ required: true
+
+jobs:
+ ingest:
+ runs-on: ubuntu-latest
+
+ steps:
+ - name: Check out repository
+ uses: actions/checkout@v4
+
+ - name: Run memory ingestion
+ uses: fivetran/agents_schema/.github/actions/agents-schema-memory@v0.0.6
+ with:
+ memory-file: ${{ inputs.memory-file }}
+ env:
+ WAREHOUSE_CREDENTIALS: ${{ secrets.WAREHOUSE_CREDENTIALS }}
diff --git a/README.md b/README.md
index ac8244e..b5a0200 100644
--- a/README.md
+++ b/README.md
@@ -32,6 +32,7 @@ available before writing or explaining queries.
- [Sync dbt](#sync-dbt)
- [Sync Looker](#sync-looker)
- [Sync OSI](#sync-osi)
+ - [Sync Memory](#sync-memory)
- [Sync Multiple Sources](#sync-multiple-sources)
- [Query with an agent](#query-with-an-agent)
- [Why Agents Schema](#why-agents-schema)
@@ -43,7 +44,7 @@ available before writing or explaining queries.
## Getting Started
-There are three supported metadata sources. Pick one to get started quickly.
+There are several supported source types. Pick one to get started quickly.
### Prerequisites
@@ -68,6 +69,12 @@ files.
Use [OSI Setup Guide](osi-setup.md) when your repository contains Open Semantic
Interchange `*.osi.yaml` files.
+### Sync Memory
+
+Use [Memory Setup Guide](memory-setup.md) to publish durable, anchored agent
+notes from a YAML file when you do not run a semantic layer. (If you already use
+OSI, prefer object-local `ai_context` instead.)
+
### Sync Multiple Sources
Use the reusable workflows together when one repository contains multiple
@@ -153,6 +160,7 @@ The GitHub Actions call the CLI with explicit source arguments:
agents-schema dbt --project-dir dbt_project
agents-schema looker --lookml-dir lookml
agents-schema osi --osi-dir osi
+agents-schema memory --memory-file memory.yml
```
The CLI reads warehouse credentials from `WAREHOUSE_CREDENTIALS`.
diff --git a/SPEC.md b/SPEC.md
index 7e5f62c..0103095 100644
--- a/SPEC.md
+++ b/SPEC.md
@@ -20,6 +20,7 @@ The implementation writes unquoted identifiers, so Snowflake stores table and co
| `text` | `TEXT` | Longer free-form text. |
| `boolean` | `BOOLEAN` | Boolean values. |
| `array` | `VARIANT` | Inserted as JSON via `PARSE_JSON`. |
+| `timestamp` | `TIMESTAMP` | Timestamp values accepted by the warehouse. |
---
@@ -79,11 +80,162 @@ The current package delivers one table family per metadata source:
| dbt | `AGENTS.DBT_MODEL`, `AGENTS.DBT_COLUMN`, `AGENTS.DBT_DEPENDENCY` |
| LookML | `AGENTS.LOOKML_VIEW`, `AGENTS.LOOKML_DIMENSION`, `AGENTS.LOOKML_MEASURE`, `AGENTS.LOOKML_EXPLORE` |
| OSI | `AGENTS.OSI_DATASET`, `AGENTS.OSI_FIELD`, `AGENTS.OSI_METRIC`, `AGENTS.OSI_RELATIONSHIP` |
+| Memory | `AGENTS.MEMORY`, `AGENTS.MEMORY_ANCHOR` |
Each ingestion replaces its own table family with `CREATE OR REPLACE TABLE` and then inserts the rows parsed from the source metadata.
---
+## Source: Memory
+
+The memory ingestion reads a YAML file of durable semantic memories and writes anchored records that agents can retrieve near relevant schema context. Memories are intended for query rules, join caveats, unit conversions, status meanings, grain warnings, and other project-specific guidance that should survive beyond a single agent session.
+
+### When to use memory
+
+Memory is the lightweight path to anchored, agent-retrievable notes for deployments **without a semantic layer**. It attaches notes directly to physical warehouse objects (schema/table/column) plus logical metrics and relationships, so an agent can pull "the notes relevant to this column" with a simple join — without standing up and maintaining a full semantic model.
+
+If you already run an OSI semantic model, you usually do **not** need memory: OSI carries object-local `ai_context` on every dataset, field, and metric, which is the natural home for the same notes (for example, "this field is in cents" belongs on the OSI field). Memory overlaps that and is largely redundant for OSI-native teams. Reach for memory when there is no semantic layer, or for notes about raw warehouse objects that OSI does not model. Anchors target physical objects on purpose: a column anchor is reachable both from raw schema and — because OSI fields map down to columns — from an OSI-aware consumer, while the reverse is not true.
+
+Memory can also summarize other providers. A process can scan richer provider tables such as LookML, dbt, OSI, catalog metadata, query history, or reviewed agent discoveries, then publish the compact facts worth carrying forward as memories with provenance back to the source material.
+
+### YAML shape
+
+The CLI accepts this canonical list form:
+
+```yaml
+memories:
+ - memory_id: stripe_amounts_are_cents
+ memory_kind: unit_rule
+ title: Stripe amounts
+ content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ source: memories.yaml
+ confidence: 0.9
+ anchors:
+ - anchor_id: invoice_amount_due
+ anchor_type: column
+ schema_name: stripe
+ table_name: invoice
+ column_name: amount_due
+
+ - memory_id: ticket_assignee_join
+ memory_kind: join_rule
+ content: For ticket owner reporting, join ticket.assignee_id to user.id.
+ anchors:
+ - anchor_id: ticket_to_user
+ anchor_type: relationship
+ relationship_name: ticket_to_user
+ from_schema: zendesk
+ from_table: ticket
+ from_columns: [assignee_id]
+ to_schema: zendesk
+ to_table: user
+ to_columns: [id]
+```
+
+Tools may keep their own project memory file and import it into this canonical
+shape. For example, a Dataface project could keep a compact `memories.yaml`
+that is optimized for authors:
+
+```yaml
+memories:
+ stripe_amounts_are_cents:
+ kind: unit_rule
+ content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ applies_to:
+ columns:
+ - stripe.invoice.amount_due
+ - stripe.charge.amount
+ metrics:
+ - revenue
+```
+
+That file is not a second Agents Schema standard. It is an authoring shape a
+tool can map into the canonical rows: the memory key becomes `memory_id`, `kind`
+becomes `memory_kind`, `content` maps directly, and each `applies_to` entry
+becomes a row in `AGENTS.MEMORY_ANCHOR`.
+
+### Schema graph
+
+```mermaid
+graph TD
+ ROOT["AGENTS.ROOT
provider = memory"] --> MEMORY["AGENTS.MEMORY
memory_id"]
+ MEMORY --> ANCHOR["AGENTS.MEMORY_ANCHOR
memory_id, anchor_id"]
+ ANCHOR --> LOCATORS["Anchor locator columns
schema_name/table_name/column_name · metric_id
relationship_name + from_*/to_* join columns"]
+```
+
+### `AGENTS.MEMORY`
+
+One row per durable semantic memory.
+
+```sql
+CREATE OR REPLACE TABLE AGENTS.MEMORY (
+ memory_id VARCHAR NOT NULL,
+ memory_kind VARCHAR NOT NULL,
+ title VARCHAR,
+ content TEXT NOT NULL,
+ source VARCHAR,
+ confidence FLOAT,
+ PRIMARY KEY (memory_id)
+);
+```
+
+| Column | Source field |
+|---|---|
+| `memory_id` | Stable identifier unique within `AGENTS.MEMORY`. Mirrors the OSI parent/child convention: this is the entity's own key, and `AGENTS.MEMORY_ANCHOR` references it under the same prefixed name. |
+| `memory_kind` | Tool-defined kind, such as `unit_rule`, `join_rule`, or `grain_warning`. |
+| `title` | Optional short human-readable title for prompt rendering and review. |
+| `content` | Durable guidance, caveat, or context the agent should remember. |
+| `source` | Optional source reference, such as a file path, URL, provider object id, or import job label. |
+| `confidence` | Optional confidence in `[0, 1]`, so consumers can threshold or rank. |
+
+### `AGENTS.MEMORY_ANCHOR`
+
+One row per retrieval anchor for a memory.
+
+```sql
+CREATE OR REPLACE TABLE AGENTS.MEMORY_ANCHOR (
+ memory_id VARCHAR NOT NULL,
+ anchor_id VARCHAR NOT NULL,
+ anchor_type VARCHAR NOT NULL,
+ schema_name VARCHAR,
+ table_name VARCHAR,
+ column_name VARCHAR,
+ metric_id VARCHAR,
+ relationship_name VARCHAR,
+ from_schema VARCHAR,
+ from_table VARCHAR,
+ from_columns VARIANT,
+ to_schema VARCHAR,
+ to_table VARCHAR,
+ to_columns VARIANT,
+ PRIMARY KEY (
+ memory_id,
+ anchor_id
+ )
+);
+```
+
+Each anchor type uses a specific subset of the locator columns; ingestion rejects locators that do not belong to the anchor's type.
+
+| Column | Used by | Source field |
+|---|---|---|
+| `memory_id` | all | Copied from the parent memory. |
+| `anchor_id` | all | Memory-unique stable identifier for the anchor. |
+| `anchor_type` | all | Retrieval scope: `column`, `table`, `relationship`, or `metric`. |
+| `schema_name` | column, table | Schema name when known. |
+| `table_name` | column, table | Table-like object name (required for `column` and `table`). |
+| `column_name` | column | Column-like object name (required for `column`). |
+| `metric_id` | metric | Metric identifier (required for `metric`). |
+| `relationship_name` | relationship | Optional free-text relationship label; not a foreign key. |
+| `from_schema` / `from_table` | relationship | Left side of the join (`from_table` required). |
+| `from_columns` | relationship | Left-side join columns, paired positionally with `to_columns`. |
+| `to_schema` / `to_table` | relationship | Right side of the join (`to_table` required). |
+| `to_columns` | relationship | Right-side join columns, paired positionally with `from_columns`. |
+
+Relationship anchors carry the join inline rather than referencing a canonical relationship, because memory does not depend on OSI being present. When OSI is available, `relationship_name` can record the matching `OSI_RELATIONSHIP.name` as a best-effort, unenforced pointer.
+
+---
+
## Source: dbt
The dbt ingestion reads a compiled dbt `manifest.json` and writes normalized model, column, and dependency tables. It captures the transformation layer that is useful from the warehouse: what models exist, how they are documented, and which upstream nodes they depend on.
diff --git a/memory-setup.md b/memory-setup.md
new file mode 100644
index 0000000..fd3141a
--- /dev/null
+++ b/memory-setup.md
@@ -0,0 +1,123 @@
+# Memory Setup
+
+## When to use memory
+
+Memory is the lightweight path to anchored, agent-retrievable notes — query
+rules, join caveats, unit conversions, status meanings, and grain warnings —
+for deployments that **do not run a semantic layer**.
+
+If you already maintain an OSI semantic model, prefer object-local `ai_context`
+on the relevant dataset, field, or metric: it carries the same notes right on
+the object and memory is largely redundant. Reach for memory when you have no
+semantic layer, or for notes about raw warehouse objects that OSI does not
+model. See [SPEC.md](./SPEC.md) for the full table contract.
+
+## Prerequisites
+
+The workflow needs destination warehouse credentials so it can create and
+replace tables in the `AGENTS` schema.
+
+Create one required GitHub Actions secret in the repository that calls these
+workflows: `WAREHOUSE_CREDENTIALS`.
+
+Snowflake is the only supported destination today, with more destination
+support coming soon. We recommend key-pair authentication:
+
+**Example key-pair auth secret:**
+
+```yaml
+type: snowflake
+account: abc123
+user: AGENTS_SCHEMA_BOT
+warehouse: COMPUTE_WH
+database: ANALYTICS
+role: TRANSFORMER
+private_key_pem: |
+ -----BEGIN ENCRYPTED PRIVATE KEY-----
+ MIIEvQIBADANBgkqh...
+ -----END ENCRYPTED PRIVATE KEY-----
+private_key_passphrase: your-passphrase # only if the key is encrypted
+```
+
+**Note:**
+- `role` is optional.
+- An unencrypted key uses `-----BEGIN PRIVATE KEY-----` / `-----END PRIVATE KEY-----` markers and omits `private_key_passphrase`.
+
+## Author a memory file
+
+The CLI reads a single YAML file. Each memory has an id, a kind, durable
+`content`, and one or more anchors that attach it to the objects where it is
+relevant.
+
+```yaml
+memories:
+ - memory_id: stripe_amounts_are_cents
+ memory_kind: unit_rule
+ title: Stripe amounts
+ content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ source: memories.yaml
+ confidence: 0.9 # optional, 0..1
+ anchors:
+ - anchor_id: invoice_amount_due
+ anchor_type: column
+ schema_name: stripe
+ table_name: invoice
+ column_name: amount_due
+
+ - memory_id: ticket_assignee_join
+ memory_kind: join_rule
+ content: For ticket owner reporting, join ticket.assignee_id to user.id.
+ anchors:
+ - anchor_id: ticket_to_user
+ anchor_type: relationship
+ relationship_name: ticket_to_user # optional label
+ from_schema: zendesk
+ from_table: ticket
+ from_columns: [assignee_id]
+ to_schema: zendesk
+ to_table: user
+ to_columns: [id]
+```
+
+Anchor types and the locator columns each one uses:
+
+| `anchor_type` | Required locators | Optional |
+|---|---|---|
+| `column` | `table_name`, `column_name` | `schema_name` |
+| `table` | `table_name` | `schema_name` |
+| `metric` | `metric_id` | |
+| `relationship` | `from_table`, `to_table` | `from_schema`, `to_schema`, `from_columns`/`to_columns` (paired, equal length), `relationship_name` |
+
+Validation fails fast on unknown fields, wrong scalar types, a `confidence`
+outside `0..1`, duplicate memory ids, duplicate anchors, unsupported anchor
+types, locators that do not belong to the anchor type, and missing required
+locators.
+
+## Run the Memory Sync Workflow
+
+```yaml
+name: Agents Schema Memory
+
+on:
+ workflow_dispatch:
+ push:
+ branches: [main]
+
+jobs:
+ agents-schema-memory:
+ uses: fivetran/agents_schema/.github/workflows/agents-schema-memory.yml@v0.0.6
+ with:
+ memory-file: memory.yml
+ secrets: inherit
+```
+
+`memory-file` is required — set it to the path of your memory YAML file. The
+example uses `memory.yml`; change it to match your repository.
+
+The workflow writes:
+
+- `AGENTS.MEMORY`
+- `AGENTS.MEMORY_ANCHOR`
+
+These jobs do not need to depend on dbt, Looker, or OSI jobs unless your
+repository has its own ordering requirement.
diff --git a/proposals/memory-provider.md b/proposals/memory-provider.md
new file mode 100644
index 0000000..eadd6f8
--- /dev/null
+++ b/proposals/memory-provider.md
@@ -0,0 +1,273 @@
+# Memory Provider Proposal
+
+**Status:** Proposal
+**Branch:** `notes_provider`
+
+## Summary
+
+Add a general `memory` provider to Agents Schema for durable, agent-facing data memories: query rules, join caveats, unit conversions, status meanings, grain warnings, and project-specific semantic guidance.
+
+The provider is intentionally tool-agnostic. An internal platform job, semantic modeling tool, catalog process, agent workflow, or human-curated repository could all publish memories through the same provider contract.
+
+## When To Use Memory
+
+Memory is the **lightweight path** to anchored, agent-retrievable notes for deployments **without a semantic layer**. It attaches notes directly to physical warehouse objects (schema/table/column) plus logical metrics and relationships, so an agent can pull "the notes relevant to this column" with a simple join — without standing up and maintaining a full semantic model.
+
+If you already run an OSI semantic model, you usually do **not** need memory. OSI carries object-local `ai_context` on every dataset, field, and metric, and that is the natural home for the same notes — "this field is in cents" belongs on the OSI field. Memory overlaps that and is largely redundant for OSI-native teams. The honest scope: reach for memory when there is no semantic layer, or for notes about raw warehouse objects that a semantic model does not cover.
+
+Anchors target physical objects deliberately. A column anchor is reachable both from raw schema and — because OSI fields map down to columns — from an OSI-aware consumer; the reverse is not true. The only logical anchors are `metric` and `relationship`, for things that have no single physical column to point at.
+
+## Motivation
+
+`AGENTS.ROOT` can already hold free-form context and query recipes, and several source providers expose object-local `ai_context`. Those are useful, but they do not give agents a structured way to retrieve "the memories relevant to this table, column, relationship, or metric."
+
+`ROOT` is a discovery surface: it tells an agent which providers and tables exist, and it can carry broad prose. It is not shaped for object-level retrieval. A row like `(internal, revenue_guidance)` might explain cents-to-dollars conversions, but an agent looking at `stripe.invoice.amount_due` has no deterministic join from that column to the `ROOT` row. The agent would have to read broad prose and infer relevance.
+
+Object-local `ai_context` solves a different problem. It works when the context belongs to the same source object that published it, such as a LookML measure or OSI field. It does not work as well for memories that cross providers, attach to a warehouse column that came from dbt but was learned by a catalog tool, describe a relationship between two objects, or need to be updated by an agent after a SQL debugging session. Those memories need their own records plus typed anchors.
+
+Examples agents need at SQL-authoring time:
+
+- Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+- Tickets join to assignees through `ticket.assignee_id = user.id`.
+- Ticket events fan out ticket rows unless collapsed first.
+- Use `closed_at`, not `created_at`, for won revenue recognition.
+- `solved` and `closed` both count as resolved tickets.
+- Current headcount comes from the latest worker history row per worker.
+
+These are durable data semantics. They are not dashboard inventory, open questions, or scratchpad state.
+
+## Memory As Aggregator
+
+The memory provider is useful even when richer raw providers exist.
+
+A Looker, dbt, OSI, catalog, BI, or query-history provider may expose a large amount of source-specific detail. That detail is valuable for deep inspection, but it can be too broad or noisy to send to agents on every request. A curation process can scan those provider tables, user-approved agent discoveries, SQL reviews, failed-query fixes, and dashboard migrations, then write compact durable memories.
+
+In that pattern, memory becomes a summarizer and routing layer:
+
+- Raw providers keep the complete source data.
+- Memory records keep the distilled facts agents should usually remember.
+- Memory anchors attach those facts to the tables, columns, metrics, and relationships where they are relevant.
+- The optional `source` field points back to the provider object, file, URL, or import job when the agent needs to drill into the raw material.
+
+For example, a Looker provider might expose explores, views, dimensions, measures, dashboard elements, and LookML comments. Instead of always exposing that full provider surface, an offline job or interactive agent could extract durable rules such as "net revenue excludes refunded orders" or "this explore joins order_items at item grain and fans out orders." Those distilled rules can be written as memories anchored to the relevant tables, columns, measures, and relationships. Agents can use the small memory set by default and consult the Looker provider only when they need deeper provenance.
+
+This makes memory a bridge between high-fidelity source providers and compact agent context. It is not a replacement for raw provider tables; it is the layer that captures the parts worth carrying forward.
+
+## Provider Registration
+
+`AGENTS.ROOT` rows:
+
+```text
+provider key content
+memory overview Durable semantic memories for agents: query rules, join caveats, units, status meanings, and project-specific data guidance.
+memory memory One row per durable semantic memory. See AGENTS.MEMORY.
+memory memory_anchor One row per retrieval anchor for a memory. See AGENTS.MEMORY_ANCHOR.
+```
+
+The `memory` provider owns the table contract. The memory tables do not repeat a row-level `provider` column: unlike `AGENTS.ROOT`, the provider is already implied by the table family. If a deployment aggregates memories from many tools, it should preserve that detail in `source` and keep `memory_id` globally stable within `AGENTS.MEMORY`.
+
+## Example Project Memory File
+
+The implemented CLI accepts the canonical list form:
+
+```yaml
+memories:
+ - memory_id: stripe_amounts_are_cents
+ memory_kind: unit_rule
+ title: Stripe amounts
+ content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ source: memories.yaml
+ confidence: 0.9
+ anchors:
+ - anchor_id: invoice_amount_due
+ anchor_type: column
+ schema_name: stripe
+ table_name: invoice
+ column_name: amount_due
+```
+
+Tools may keep their own project memory file and import it into this canonical shape. For example, a Dataface project could keep a compact `memories.yaml` that is optimized for authors:
+
+```yaml
+memories:
+ stripe_amounts_are_cents:
+ kind: unit_rule
+ content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ applies_to:
+ columns:
+ - stripe.invoice.amount_due
+ - stripe.charge.amount
+ metrics:
+ - revenue
+```
+
+That file is not a second Agents Schema standard. It is an authoring shape a tool can map into the canonical rows: the memory key becomes `memory_id`, `kind` becomes `memory_kind`, `content` maps directly, and each `applies_to` entry becomes a row in `AGENTS.MEMORY_ANCHOR`.
+
+## Tables
+
+Use singular names to match the existing style: `DBT_MODEL`, `OSI_FIELD`, `LOOKML_VIEW`.
+
+```sql
+CREATE OR REPLACE TABLE AGENTS.MEMORY (
+ memory_id VARCHAR NOT NULL,
+ memory_kind VARCHAR NOT NULL,
+ title VARCHAR,
+ content TEXT NOT NULL,
+ source VARCHAR,
+ confidence FLOAT,
+ PRIMARY KEY (memory_id)
+);
+
+CREATE OR REPLACE TABLE AGENTS.MEMORY_ANCHOR (
+ memory_id VARCHAR NOT NULL,
+ anchor_id VARCHAR NOT NULL,
+ anchor_type VARCHAR NOT NULL,
+ schema_name VARCHAR,
+ table_name VARCHAR,
+ column_name VARCHAR,
+ metric_id VARCHAR,
+ relationship_name VARCHAR,
+ from_schema VARCHAR,
+ from_table VARCHAR,
+ from_columns VARIANT,
+ to_schema VARCHAR,
+ to_table VARCHAR,
+ to_columns VARIANT,
+ PRIMARY KEY (
+ memory_id,
+ anchor_id
+ )
+);
+```
+
+`anchor_id` is stable within `memory_id` and gives each anchor a real key independent of the locator shape. `confidence` is a float in `[0, 1]` so consumers can threshold or rank. Column and table anchors use the `schema_name`/`table_name`/`column_name` locators; metric anchors use `metric_id`; relationship anchors carry the join inline via `from_*`/`to_*` (with positionally-paired `from_columns`/`to_columns` arrays for composite keys) plus an optional free-text `relationship_name`. Relationships are carried inline rather than referenced because memory must work when OSI is absent; when OSI is present, `relationship_name` can record the matching `OSI_RELATIONSHIP.name` as a best-effort, unenforced pointer. Ingestion rejects locators that do not belong to an anchor's type.
+
+## Schema Graph
+
+```mermaid
+graph TD
+ ROOT["AGENTS.ROOT
provider = memory"] --> MEMORY["AGENTS.MEMORY
memory_id"]
+ MEMORY --> ANCHOR["AGENTS.MEMORY_ANCHOR
memory_id, anchor_id"]
+ ANCHOR --> LOCATORS["Anchor locator columns
schema_name/table_name/column_name
relationship_name/metric_id"]
+```
+
+## Anchor Types
+
+| Anchor | Use for | Delivered when |
+|---|---|---|
+| `column` | Units, enum meanings, null semantics, timezone | A column is selected, searched, or used in SQL |
+| `table` | Grain, soft deletes, row meaning, required filters | A table is inspected or selected |
+| `relationship` | Join path, multiplicity, fanout warnings | A join is planned or both sides appear |
+| `metric` | Business calculation, exclusions, date policy | A KPI or metric is requested |
+
+Memories can have multiple anchors. A cents conversion memory might anchor to several amount columns and to a revenue metric.
+
+## Example Rows
+
+```text
+AGENTS.MEMORY
+memory_id memory_kind title content
+stripe_amounts_are_cents unit_rule Stripe amounts Stripe amount columns are stored in cents; divide by 100 for dollar measures.
+ticket_assignee_join join_rule Ticket assignee For ticket owner reporting, join ticket.assignee_id to user.id.
+```
+
+```text
+AGENTS.MEMORY_ANCHOR
+memory_id anchor_id anchor_type schema_name table_name column_name metric_id from_table from_columns to_table to_columns
+stripe_amounts_are_cents invoice_amount_due column stripe invoice amount_due null null null null null
+stripe_amounts_are_cents revenue_metric metric null null null revenue null null null null
+ticket_assignee_join ticket_to_user relationship null null null null ticket ["assignee_id"] user ["id"]
+```
+
+## Retrieval Examples
+
+Column-scoped memories:
+
+```sql
+SELECT m.memory_id, m.content
+FROM AGENTS.MEMORY m
+JOIN AGENTS.MEMORY_ANCHOR a
+ ON m.memory_id = a.memory_id
+WHERE a.anchor_type = 'column'
+ AND LOWER(a.schema_name) = 'stripe'
+ AND LOWER(a.table_name) = 'invoice'
+ AND LOWER(a.column_name) = 'amount_due';
+```
+
+Relationship/fanout memories for a join plan:
+
+```sql
+SELECT m.memory_id, m.content
+FROM AGENTS.MEMORY m
+JOIN AGENTS.MEMORY_ANCHOR a
+ ON m.memory_id = a.memory_id
+WHERE a.anchor_type = 'relationship'
+ AND LOWER(a.schema_name) = 'zendesk'
+ AND LOWER(a.table_name) IN ('ticket', 'user');
+```
+
+Metric memories:
+
+```sql
+SELECT m.memory_id, m.content
+FROM AGENTS.MEMORY m
+JOIN AGENTS.MEMORY_ANCHOR a
+ ON m.memory_id = a.memory_id
+WHERE a.anchor_type = 'metric'
+ AND LOWER(a.metric_id) IN ('revenue', 'arr', 'mrr');
+```
+
+## Delivery To Agents
+
+Agents should not receive the entire memory corpus. Context builders should:
+
+1. Select schema objects relevant to the request.
+2. Retrieve memories anchored to those objects.
+3. Add a compact "Relevant memories" block near schema metadata.
+4. Preserve structured memory rows in JSON output for citations and context assembly.
+
+Rendered prompt fragment:
+
+```text
+Relevant memories:
+- stripe_amounts_are_cents: Stripe amount columns are cents. Divide by 100 for dollar measures.
+- ticket_assignee_join: For owner reporting, join ticket.assignee_id to user.id. Avoid ticket events unless collapsed first.
+```
+
+## Example Consumer / Creator: Dataface `dft schema`
+
+Dataface is one possible consumer and creator of memories; the provider should not be designed around it.
+
+As a consumer, `dft schema` could query `AGENTS.MEMORY` and `AGENTS.MEMORY_ANCHOR` while building table, column, relationship, and metric context. Instead of dumping every memory into an agent prompt, `dft schema --table invoice --column amount_due --json` could return only memories anchored to `stripe.invoice.amount_due`, plus metric memories relevant to the user's request.
+
+As a creator, an agent using Dataface could propose a new memory after learning a durable SQL rule:
+
+```text
+I found that stripe.invoice.amount_due is stored in cents. Add a column-anchored memory?
+```
+
+If the user approves, Dataface could write the memory to its project source file, or to another configured source of truth, then a sync job could publish it into `AGENTS.MEMORY` and `AGENTS.MEMORY_ANCHOR`. The write-back path should be explicit and reviewable; agents should not silently add memories from failed query attempts.
+
+The same pattern applies to other tools: learn a durable rule, ask for approval, write it to the tool's source of truth, then publish it into the shared memory provider.
+
+## Non-Goals
+
+- Replacing dbt, LookML, OSI, or catalog descriptions.
+- Storing open questions or task state.
+- Mirroring dashboard inventories.
+- Storing raw table schemas or column catalogs.
+- Becoming a general document store.
+
+## Resolved Decisions
+
+- **`confidence` is a `FLOAT` in `[0, 1]`**, not a free-form label, so consumers can threshold and rank.
+- **Provenance stays flat** on `AGENTS.MEMORY` (`source` + `confidence`) for v1. A separate `MEMORY_SOURCE` table (per-source confidence on a many-to-many edge) is deferred until multi-source provenance is a real need.
+- **Relationship anchors carry the join inline** (`from_*`/`to_*` with paired array columns) rather than referencing `OSI_RELATIONSHIP`, because memory must work without OSI. `relationship_name` is an optional best-effort pointer when OSI is present.
+- **Key naming follows the OSI parent/child convention**: `MEMORY.memory_id` is the entity key, referenced unchanged as `MEMORY_ANCHOR.memory_id`. `memory_kind` keeps its prefix to match `field_kind` / `upstream_type`.
+- **The CLI source ships now** as `agents-schema memory --memory-file memory.yml`.
+
+## Open Questions
+
+- Should `memory_kind` be a constrained enum or remain tool-defined text? (Currently free text.)
+- Agent write-back of learned rules is out of scope for v1 (the CLI only reads a file); what is the reviewed write-back path when it lands?
+- Should `AGENTS.MEMORY` move into the core spec, or stay an optional provider table family?
diff --git a/src/agents_schema/cli.py b/src/agents_schema/cli.py
index d379262..ef3e06f 100644
--- a/src/agents_schema/cli.py
+++ b/src/agents_schema/cli.py
@@ -6,7 +6,7 @@
from pathlib import Path
from typing import Any
-from . import __version__, dbt, lookml, osi
+from . import __version__, dbt, lookml, memory, osi
from .config import ConfigError
from .dbt_profiles import dbt_adapter_package_from_profiles_file
from .destinations import warehouse_type_from_env
@@ -64,6 +64,17 @@ def _build_parser() -> argparse.ArgumentParser:
help="path to a directory containing *.osi.yaml files",
)
+ memory_parser = sub.add_parser(
+ "memory",
+ help="ingest durable agent memories from YAML into AGENTS.MEMORY_*",
+ )
+ memory_parser.add_argument(
+ "--memory-file",
+ required=True,
+ type=Path,
+ help="path to a YAML file containing durable agent memories",
+ )
+
return parser
@@ -76,6 +87,8 @@ def main(argv: list[str] | None = None) -> int:
lookml.run(_config("looker", args.lookml_dir))
elif args.source_type == "osi":
osi.run(_config("osi", args.osi_dir))
+ elif args.source_type == "memory":
+ memory.run(_config("memory", args.memory_file))
else:
raise ConfigError(f"unsupported source type: {args.source_type}")
except (ConfigError, FileNotFoundError) as e:
diff --git a/src/agents_schema/destinations.py b/src/agents_schema/destinations.py
index 94493ba..2a2a587 100644
--- a/src/agents_schema/destinations.py
+++ b/src/agents_schema/destinations.py
@@ -282,8 +282,12 @@ def _type_sql(kind: str) -> str:
return "VARIANT"
if kind == "boolean":
return "BOOLEAN"
+ if kind == "float":
+ return "FLOAT"
if kind == "text":
return "TEXT"
+ if kind == "timestamp":
+ return "TIMESTAMP"
if kind == "varchar":
return "VARCHAR"
raise ValueError(f"unsupported column kind: {kind}")
diff --git a/src/agents_schema/memory.py b/src/agents_schema/memory.py
new file mode 100644
index 0000000..d5792fb
--- /dev/null
+++ b/src/agents_schema/memory.py
@@ -0,0 +1,260 @@
+"""Memory connector: writes durable agent memories from YAML.
+
+Memory is the lightweight path to anchored, agent-retrievable notes for
+deployments without a semantic layer. A team running OSI already has
+object-local `ai_context` and rarely needs this; see SPEC.md "Source: Memory".
+
+Naming follows the OSI parent/child convention: the entity table's own key is
+``AGENTS.MEMORY.memory_id`` and child rows reference it with the same prefixed
+name (``AGENTS.MEMORY_ANCHOR.memory_id``), mirroring
+``OSI_DATASET.name`` / ``OSI_FIELD.dataset_name``.
+"""
+from __future__ import annotations
+
+from pathlib import Path
+from typing import Any
+
+import yaml
+
+from .config import ConfigError
+from .destinations import Column, Destination, TableSchema, open_destination
+from .root import upsert_provider_root
+
+__all__ = ["MEMORY", "MEMORY_ANCHOR", "load_memory_file", "run"]
+
+MEMORY = TableSchema(
+ "agents.memory",
+ (
+ Column("memory_id", "varchar", nullable=False),
+ Column("memory_kind", "varchar", nullable=False),
+ Column("title", "varchar"),
+ Column("content", "text", nullable=False),
+ Column("source", "varchar"),
+ Column("confidence", "float"),
+ ),
+ primary_key=("memory_id",),
+)
+
+MEMORY_ANCHOR = TableSchema(
+ "agents.memory_anchor",
+ (
+ Column("memory_id", "varchar", nullable=False),
+ Column("anchor_id", "varchar", nullable=False),
+ Column("anchor_type", "varchar", nullable=False),
+ Column("schema_name", "varchar"),
+ Column("table_name", "varchar"),
+ Column("column_name", "varchar"),
+ Column("metric_id", "varchar"),
+ Column("relationship_name", "varchar"),
+ Column("from_schema", "varchar"),
+ Column("from_table", "varchar"),
+ Column("from_columns", "array"),
+ Column("to_schema", "varchar"),
+ Column("to_table", "varchar"),
+ Column("to_columns", "array"),
+ ),
+ primary_key=(
+ "memory_id",
+ "anchor_id",
+ ),
+)
+
+MEMORY_FIELDS = (
+ "memory_id",
+ "memory_kind",
+ "title",
+ "content",
+ "source",
+ "confidence",
+)
+
+ANCHOR_FIELDS = (
+ "memory_id",
+ "anchor_id",
+ "anchor_type",
+ "schema_name",
+ "table_name",
+ "column_name",
+ "metric_id",
+ "relationship_name",
+ "from_schema",
+ "from_table",
+ "from_columns",
+ "to_schema",
+ "to_table",
+ "to_columns",
+)
+
+# Locator columns hold the join/relationship arrays; everything else is scalar.
+ANCHOR_ARRAY_FIELDS = frozenset({"from_columns", "to_columns"})
+
+TOP_LEVEL_FIELDS = frozenset({"memories"})
+MEMORY_FIELD_SET = frozenset(MEMORY_FIELDS) | {"anchors"}
+ANCHOR_FIELD_SET = frozenset(ANCHOR_FIELDS) - {"memory_id"}
+STRING_MEMORY_FIELDS = frozenset(MEMORY_FIELDS) - {"confidence"}
+STRING_ANCHOR_FIELDS = frozenset(ANCHOR_FIELDS) - {"memory_id"} - ANCHOR_ARRAY_FIELDS
+ANCHOR_TYPES = frozenset({"column", "table", "relationship", "metric"})
+
+# Which locator columns each anchor type is allowed to set. Anything outside the
+# allowed set for a type is rejected so rows stay clean and unambiguous.
+ALL_LOCATORS = frozenset(ANCHOR_FIELD_SET - {"anchor_id", "anchor_type"})
+ALLOWED_LOCATORS = {
+ "column": frozenset({"schema_name", "table_name", "column_name"}),
+ "table": frozenset({"schema_name", "table_name"}),
+ "metric": frozenset({"metric_id"}),
+ "relationship": frozenset(
+ {
+ "relationship_name",
+ "from_schema",
+ "from_table",
+ "from_columns",
+ "to_schema",
+ "to_table",
+ "to_columns",
+ }
+ ),
+}
+
+
+def run(cfg: dict) -> None:
+ memory_file = Path(cfg["metadata_connection"]["path"])
+ memories, anchors = load_memory_file(memory_file)
+ with open_destination(cfg) as dest:
+ upsert_provider_root(dest, "memory")
+ _create_tables(dest)
+ if memories:
+ dest.insert_rows(MEMORY, memories)
+ if anchors:
+ dest.insert_rows(MEMORY_ANCHOR, anchors)
+ print(f" memory: {len(memories)} memories, {len(anchors)} anchors")
+
+
+def load_memory_file(path: Path) -> tuple[list[tuple[Any, ...]], list[tuple[Any, ...]]]:
+ try:
+ data = yaml.safe_load(path.read_text()) or {}
+ except yaml.YAMLError as e:
+ raise ConfigError(f"memory file is not valid YAML: {e}") from e
+ if not isinstance(data, dict):
+ raise ConfigError("memory file must be a YAML object")
+ _reject_unknown_fields(data, TOP_LEVEL_FIELDS, "memory file")
+ raw_memories = data.get("memories")
+ if not isinstance(raw_memories, list):
+ raise ConfigError("memories must be a list")
+
+ memory_rows: list[tuple[Any, ...]] = []
+ anchor_rows: list[tuple[Any, ...]] = []
+ seen_memories: set[str] = set()
+ seen_anchors: set[tuple[str, str]] = set()
+ for index, raw_memory in enumerate(raw_memories):
+ memory_loc = f"memories[{index}]"
+ if not isinstance(raw_memory, dict):
+ raise ConfigError(f"{memory_loc} must be an object")
+ _reject_unknown_fields(raw_memory, MEMORY_FIELD_SET, memory_loc)
+ _validate_strings(raw_memory, STRING_MEMORY_FIELDS, memory_loc)
+ _validate_confidence(raw_memory, memory_loc)
+ memory_id = _required_str(raw_memory, "memory_id", memory_loc)
+ _required_str(raw_memory, "memory_kind", memory_loc)
+ _required_str(raw_memory, "content", memory_loc)
+ if memory_id in seen_memories:
+ raise ConfigError(f"duplicate memory: {memory_id}")
+ seen_memories.add(memory_id)
+ memory_rows.append(tuple(raw_memory.get(field) for field in MEMORY_FIELDS))
+
+ raw_anchors = raw_memory.get("anchors", [])
+ if not isinstance(raw_anchors, list):
+ raise ConfigError(f"{memory_loc}.anchors must be a list")
+ for anchor_index, raw_anchor in enumerate(raw_anchors):
+ anchor_loc = f"{memory_loc}.anchors[{anchor_index}]"
+ if not isinstance(raw_anchor, dict):
+ raise ConfigError(f"{anchor_loc} must be an object")
+ _reject_unknown_fields(raw_anchor, ANCHOR_FIELD_SET, anchor_loc)
+ _validate_strings(raw_anchor, STRING_ANCHOR_FIELDS, anchor_loc)
+ for field in ANCHOR_ARRAY_FIELDS:
+ _validate_string_list(raw_anchor, field, anchor_loc)
+ anchor_id = _required_str(raw_anchor, "anchor_id", anchor_loc)
+ anchor_type = _required_str(raw_anchor, "anchor_type", anchor_loc)
+ if anchor_type not in ANCHOR_TYPES:
+ raise ConfigError(f"{anchor_loc}.anchor_type is not supported: {anchor_type}")
+ _validate_anchor_locator(raw_anchor, anchor_loc, anchor_type)
+ anchor_key = (memory_id, anchor_id)
+ if anchor_key in seen_anchors:
+ raise ConfigError(f"duplicate memory anchor: {memory_id}.{anchor_id}")
+ seen_anchors.add(anchor_key)
+ anchor = dict(raw_anchor)
+ anchor["memory_id"] = memory_id
+ anchor_rows.append(tuple(anchor.get(field) for field in ANCHOR_FIELDS))
+
+ return memory_rows, anchor_rows
+
+
+def _create_tables(dest: Destination) -> None:
+ dest.replace_table(MEMORY)
+ dest.replace_table(MEMORY_ANCHOR)
+
+
+def _required_str(data: dict[str, Any], field: str, path: str) -> str:
+ value = data.get(field)
+ if not isinstance(value, str) or not value:
+ raise ConfigError(f"{path}.{field} is required")
+ return value
+
+
+def _reject_unknown_fields(data: dict[str, Any], allowed: frozenset[str], path: str) -> None:
+ unknown = sorted(set(data) - allowed)
+ if unknown:
+ raise ConfigError(f"{path} has unknown field: {unknown[0]}")
+
+
+def _validate_strings(data: dict[str, Any], fields: frozenset[str], path: str) -> None:
+ for field in fields:
+ value = data.get(field)
+ if value is not None and not isinstance(value, str):
+ raise ConfigError(f"{path}.{field} must be a string")
+
+
+def _validate_confidence(data: dict[str, Any], path: str) -> None:
+ value = data.get("confidence")
+ if value is None:
+ return
+ # bool is a subclass of int; reject it explicitly so True/False is not stored as 1.0/0.0.
+ if isinstance(value, bool) or not isinstance(value, (int, float)):
+ raise ConfigError(f"{path}.confidence must be a number")
+ if not 0.0 <= float(value) <= 1.0:
+ raise ConfigError(f"{path}.confidence must be between 0 and 1")
+
+
+def _validate_string_list(data: dict[str, Any], field: str, path: str) -> None:
+ value = data.get(field)
+ if value is None:
+ return
+ if not isinstance(value, list) or not all(isinstance(item, str) and item for item in value):
+ raise ConfigError(f"{path}.{field} must be a list of strings")
+
+
+def _validate_anchor_locator(anchor: dict[str, Any], path: str, anchor_type: str) -> None:
+ disallowed = sorted(field for field in ALL_LOCATORS - ALLOWED_LOCATORS[anchor_type] if anchor.get(field))
+ if disallowed:
+ raise ConfigError(f"{path}: {anchor_type} anchors do not use {disallowed[0]}")
+
+ if anchor_type == "column":
+ if not anchor.get("table_name") or not anchor.get("column_name"):
+ raise ConfigError(f"{path}: column anchors require table_name and column_name")
+ elif anchor_type == "table":
+ if not anchor.get("table_name"):
+ raise ConfigError(f"{path}: table anchors require table_name")
+ elif anchor_type == "metric":
+ if not anchor.get("metric_id"):
+ raise ConfigError(f"{path}: metric anchors require metric_id")
+ elif anchor_type == "relationship":
+ _validate_relationship_locator(anchor, path)
+
+
+def _validate_relationship_locator(anchor: dict[str, Any], path: str) -> None:
+ if not anchor.get("from_table") or not anchor.get("to_table"):
+ raise ConfigError(f"{path}: relationship anchors require from_table and to_table")
+ from_columns = anchor.get("from_columns")
+ to_columns = anchor.get("to_columns")
+ if bool(from_columns) != bool(to_columns):
+ raise ConfigError(f"{path}: relationship anchors need from_columns and to_columns together")
+ if from_columns and len(from_columns) != len(to_columns):
+ raise ConfigError(f"{path}: from_columns and to_columns must be the same length")
diff --git a/src/agents_schema/root.py b/src/agents_schema/root.py
index c10b125..2cc46ff 100644
--- a/src/agents_schema/root.py
+++ b/src/agents_schema/root.py
@@ -36,6 +36,17 @@
("metric", "One row per OSI metric. See AGENTS.OSI_METRIC."),
("relationship", "One row per OSI relationship. See AGENTS.OSI_RELATIONSHIP."),
),
+ "memory": (
+ (
+ "overview",
+ "# Memory\nDurable semantic memories for agents: query rules, join caveats, units, "
+ "status meanings, and project-specific data guidance. A lightweight path to anchored, "
+ "agent-retrievable notes when there is no semantic layer; teams running OSI usually use "
+ "object-local ai_context instead.",
+ ),
+ ("memory", "One row per durable semantic memory. See AGENTS.MEMORY."),
+ ("memory_anchor", "One row per retrieval anchor for a memory. See AGENTS.MEMORY_ANCHOR."),
+ ),
}
diff --git a/tests/test_cli.py b/tests/test_cli.py
new file mode 100644
index 0000000..7307482
--- /dev/null
+++ b/tests/test_cli.py
@@ -0,0 +1,30 @@
+import contextlib
+import io
+import tempfile
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+
+from agents_schema import cli
+
+
+class CliTests(unittest.TestCase):
+ def test_memory_validation_errors_return_clean_cli_error(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text("memories: {}\n")
+ stderr = io.StringIO()
+
+ with (
+ patch("agents_schema.cli.warehouse_type_from_env", return_value="snowflake"),
+ contextlib.redirect_stderr(stderr),
+ ):
+ status = cli.main(["memory", "--memory-file", str(path)])
+
+ self.assertEqual(status, 1)
+ self.assertIn("agents-schema: error: memories must be a list", stderr.getvalue())
+ self.assertNotIn("Traceback", stderr.getvalue())
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/test_connector_root.py b/tests/test_connector_root.py
index 043917c..3be0729 100644
--- a/tests/test_connector_root.py
+++ b/tests/test_connector_root.py
@@ -1,7 +1,7 @@
import unittest
from unittest.mock import patch
-from agents_schema import dbt, lookml, osi
+from agents_schema import dbt, lookml, memory, osi
class FakeDestination:
@@ -75,6 +75,21 @@ def test_osi_run_upserts_root_before_source_tables(self):
self.assertEqual({row[0] for row in dest.calls[0][2]}, {"osi"})
self.assertEqual([call[0] for call in dest.calls[1:5]], ["replace", "replace", "replace", "replace"])
+ def test_memory_run_upserts_root_before_source_tables(self):
+ dest = FakeDestination()
+ cfg = {"metadata_connection": {"path": "."}}
+
+ with (
+ patch("agents_schema.memory.open_destination", return_value=DestinationContext(dest)),
+ patch("agents_schema.memory.load_memory_file", return_value=([], [])),
+ patch("builtins.print"),
+ ):
+ memory.run(cfg)
+
+ self.assertEqual(dest.calls[0][0], "upsert")
+ self.assertEqual({row[0] for row in dest.calls[0][2]}, {"memory"})
+ self.assertEqual([call[0] for call in dest.calls[1:3]], ["replace", "replace"])
+
if __name__ == "__main__":
unittest.main()
diff --git a/tests/test_destinations.py b/tests/test_destinations.py
index 5aa0a72..37779a9 100644
--- a/tests/test_destinations.py
+++ b/tests/test_destinations.py
@@ -1,6 +1,6 @@
import unittest
-from agents_schema.destinations import _create_table_if_not_exists_sql, _merge_sql
+from agents_schema.destinations import Column, TableSchema, _create_table_if_not_exists_sql, _merge_sql
from agents_schema.root import ROOT
@@ -35,6 +35,13 @@ def test_root_merge_upserts_on_provider_and_key(self):
sql,
)
+ def test_create_table_supports_timestamp_columns(self):
+ table = TableSchema("agents.example", (Column("created_at", "timestamp"),))
+
+ sql = _create_table_if_not_exists_sql(table, "agents")
+
+ self.assertIn("created_at TIMESTAMP", sql)
+
if __name__ == "__main__":
unittest.main()
diff --git a/tests/test_memory.py b/tests/test_memory.py
new file mode 100644
index 0000000..a8bb523
--- /dev/null
+++ b/tests/test_memory.py
@@ -0,0 +1,308 @@
+import tempfile
+import textwrap
+import unittest
+from pathlib import Path
+from unittest.mock import patch
+
+from agents_schema import memory
+from agents_schema.config import ConfigError
+
+
+class FakeDestination:
+ def __init__(self):
+ self.calls = []
+
+ def upsert_rows(self, table, rows):
+ self.calls.append(("upsert", table.name, list(rows)))
+
+ def replace_table(self, table):
+ self.calls.append(("replace", table.name))
+
+ def insert_rows(self, table, rows):
+ self.calls.append(("insert", table.name, list(rows)))
+
+
+class DestinationContext:
+ def __init__(self, dest):
+ self.dest = dest
+
+ def __enter__(self):
+ return self.dest
+
+ def __exit__(self, exc_type, exc, tb):
+ return None
+
+
+class MemoryTests(unittest.TestCase):
+ def test_load_memory_file_requires_memories_list(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text("memories: {}\n")
+
+ with self.assertRaisesRegex(ConfigError, "memories must be a list"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_rejects_unknown_memory_fields(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: typo
+ memory_kind: unit_rule
+ content: typo should fail
+ summmary: typo should fail
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "unknown field"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_rejects_duplicate_memory_ids(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: duplicate
+ memory_kind: unit_rule
+ content: duplicate
+ - memory_id: duplicate
+ memory_kind: unit_rule
+ content: duplicate
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "duplicate memory"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_rejects_column_anchor_without_locator(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: missing_locator
+ memory_kind: unit_rule
+ content: Missing locator.
+ anchors:
+ - anchor_id: amount_due
+ anchor_type: column
+ table_name: invoice
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "column anchors require table_name and column_name"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_rejects_locator_not_allowed_for_type(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: cross_wired
+ memory_kind: metric_rule
+ content: A metric anchor should not carry a table_name.
+ anchors:
+ - anchor_id: revenue
+ anchor_type: metric
+ metric_id: revenue
+ table_name: invoice
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "metric anchors do not use table_name"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_rejects_confidence_out_of_range(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: too_confident
+ memory_kind: unit_rule
+ content: Confidence must be 0..1.
+ confidence: 2
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "confidence must be between 0 and 1"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_builds_relationship_anchor_rows(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: ticket_assignee_join
+ memory_kind: join_rule
+ content: Join ticket.assignee_id to user.id.
+ anchors:
+ - anchor_id: ticket_to_user
+ anchor_type: relationship
+ relationship_name: ticket_to_user
+ from_schema: zendesk
+ from_table: ticket
+ from_columns: [assignee_id]
+ to_schema: zendesk
+ to_table: user
+ to_columns: [id]
+ """
+ )
+ )
+
+ _, anchors = memory.load_memory_file(path)
+
+ self.assertEqual(
+ anchors,
+ [
+ (
+ "ticket_assignee_join",
+ "ticket_to_user",
+ "relationship",
+ None, # schema_name
+ None, # table_name
+ None, # column_name
+ None, # metric_id
+ "ticket_to_user",
+ "zendesk",
+ "ticket",
+ ["assignee_id"],
+ "zendesk",
+ "user",
+ ["id"],
+ )
+ ],
+ )
+
+ def test_load_memory_file_rejects_wrong_content_type(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: bad_content
+ memory_kind: unit_rule
+ content: [not, scalar]
+ """
+ )
+ )
+
+ with self.assertRaisesRegex(ConfigError, "content must be a string"):
+ memory.load_memory_file(path)
+
+ def test_load_memory_file_builds_memory_and_anchor_rows(self):
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: stripe_amounts_are_cents
+ memory_kind: unit_rule
+ title: Stripe amounts
+ content: Divide Stripe amount columns by 100 for dollars.
+ source: memories.yaml
+ confidence: 0.9
+ anchors:
+ - anchor_id: invoice_amount_due
+ anchor_type: column
+ schema_name: stripe
+ table_name: invoice
+ column_name: amount_due
+ """
+ )
+ )
+
+ memories, anchors = memory.load_memory_file(path)
+
+ self.assertEqual(
+ memories,
+ [
+ (
+ "stripe_amounts_are_cents",
+ "unit_rule",
+ "Stripe amounts",
+ "Divide Stripe amount columns by 100 for dollars.",
+ "memories.yaml",
+ 0.9,
+ )
+ ],
+ )
+ self.assertEqual(
+ anchors,
+ [
+ (
+ "stripe_amounts_are_cents",
+ "invoice_amount_due",
+ "column",
+ "stripe", # schema_name
+ "invoice", # table_name
+ "amount_due", # column_name
+ None, # metric_id
+ None, # relationship_name
+ None, # from_schema
+ None, # from_table
+ None, # from_columns
+ None, # to_schema
+ None, # to_table
+ None, # to_columns
+ )
+ ],
+ )
+
+ def test_run_upserts_root_and_writes_memory_tables(self):
+ dest = FakeDestination()
+ with tempfile.TemporaryDirectory() as tmp:
+ path = Path(tmp) / "memory.yml"
+ path.write_text(
+ textwrap.dedent(
+ """
+ memories:
+ - memory_id: ticket_assignee_join
+ memory_kind: join_rule
+ content: Join ticket.assignee_id to user.id.
+ anchors:
+ - anchor_id: ticket_to_user
+ anchor_type: relationship
+ from_schema: zendesk
+ from_table: ticket
+ from_columns: [assignee_id]
+ to_schema: zendesk
+ to_table: user
+ to_columns: [id]
+ """
+ )
+ )
+ cfg = {"metadata_connection": {"path": str(path)}}
+
+ with (
+ patch("agents_schema.memory.open_destination", return_value=DestinationContext(dest)),
+ patch("builtins.print"),
+ ):
+ memory.run(cfg)
+
+ self.assertEqual(dest.calls[0][0], "upsert")
+ self.assertEqual({row[0] for row in dest.calls[0][2]}, {"memory"})
+ self.assertEqual([call[0] for call in dest.calls[1:3]], ["replace", "replace"])
+ self.assertEqual(dest.calls[3][1], "agents.memory")
+ self.assertEqual(dest.calls[4][1], "agents.memory_anchor")
+
+
+if __name__ == "__main__":
+ unittest.main()
diff --git a/tests/test_root.py b/tests/test_root.py
index b5078e1..92ebdaa 100644
--- a/tests/test_root.py
+++ b/tests/test_root.py
@@ -32,6 +32,14 @@ def test_upsert_provider_root_has_osi_entries(self):
_, rows = dest.upserts[0]
self.assertEqual({row[1] for row in rows}, {"overview", "dataset", "field", "metric", "relationship"})
+ def test_upsert_provider_root_has_memory_entries(self):
+ dest = FakeDestination()
+
+ upsert_provider_root(dest, "memory")
+
+ _, rows = dest.upserts[0]
+ self.assertEqual({row[1] for row in rows}, {"overview", "memory", "memory_anchor"})
+
if __name__ == "__main__":
unittest.main()