Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
21 changes: 21 additions & 0 deletions .github/actions/agents-schema-memory/action.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: 'Agents Schema Memory'
description: >
Ingests durable agent memories from a YAML file into AGENTS.MEMORY_* tables.

inputs:
memory-file:
description: 'Path to a YAML file containing durable agent memories'
required: true
runs:
using: composite
steps:
- name: Set up uv
uses: astral-sh/setup-uv@v5
with:
python-version: '3.12'

- name: Run memory ingestion
shell: bash
run: |
uvx --from "agents-schema==0.0.6" \
agents-schema memory --memory-file "${{ inputs.memory-file }}"
27 changes: 27 additions & 0 deletions .github/workflows/agents-schema-memory.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
name: Agents Schema Memory

on:
workflow_call:
inputs:
memory-file:
description: 'Path to a YAML file containing durable agent memories'
type: string
required: true
secrets:
WAREHOUSE_CREDENTIALS:
required: true

jobs:
ingest:
runs-on: ubuntu-latest

steps:
- name: Check out repository
uses: actions/checkout@v4

- name: Run memory ingestion
uses: fivetran/agents_schema/.github/actions/agents-schema-memory@v0.0.6
with:
memory-file: ${{ inputs.memory-file }}
env:
WAREHOUSE_CREDENTIALS: ${{ secrets.WAREHOUSE_CREDENTIALS }}
10 changes: 9 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@ available before writing or explaining queries.
- [Sync dbt](#sync-dbt)
- [Sync Looker](#sync-looker)
- [Sync OSI](#sync-osi)
- [Sync Memory](#sync-memory)
- [Sync Multiple Sources](#sync-multiple-sources)
- [Query with an agent](#query-with-an-agent)
- [Why Agents Schema](#why-agents-schema)
Expand All @@ -43,7 +44,7 @@ available before writing or explaining queries.

## Getting Started

There are three supported metadata sources. Pick one to get started quickly.
There are several supported source types. Pick one to get started quickly.

### Prerequisites

Expand All @@ -68,6 +69,12 @@ files.
Use [OSI Setup Guide](osi-setup.md) when your repository contains Open Semantic
Interchange `*.osi.yaml` files.

### Sync Memory

Use [Memory Setup Guide](memory-setup.md) to publish durable, anchored agent
notes from a YAML file when you do not run a semantic layer. (If you already use
OSI, prefer object-local `ai_context` instead.)

### Sync Multiple Sources

Use the reusable workflows together when one repository contains multiple
Expand Down Expand Up @@ -153,6 +160,7 @@ The GitHub Actions call the CLI with explicit source arguments:
agents-schema dbt --project-dir dbt_project
agents-schema looker --lookml-dir lookml
agents-schema osi --osi-dir osi
agents-schema memory --memory-file memory.yml
```

The CLI reads warehouse credentials from `WAREHOUSE_CREDENTIALS`.
Expand Down
152 changes: 152 additions & 0 deletions SPEC.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,7 @@ The implementation writes unquoted identifiers, so Snowflake stores table and co
| `text` | `TEXT` | Longer free-form text. |
| `boolean` | `BOOLEAN` | Boolean values. |
| `array` | `VARIANT` | Inserted as JSON via `PARSE_JSON`. |
| `timestamp` | `TIMESTAMP` | Timestamp values accepted by the warehouse. |

---

Expand Down Expand Up @@ -79,11 +80,162 @@ The current package delivers one table family per metadata source:
| dbt | `AGENTS.DBT_MODEL`, `AGENTS.DBT_COLUMN`, `AGENTS.DBT_DEPENDENCY` |
| LookML | `AGENTS.LOOKML_VIEW`, `AGENTS.LOOKML_DIMENSION`, `AGENTS.LOOKML_MEASURE`, `AGENTS.LOOKML_EXPLORE` |
| OSI | `AGENTS.OSI_DATASET`, `AGENTS.OSI_FIELD`, `AGENTS.OSI_METRIC`, `AGENTS.OSI_RELATIONSHIP` |
| Memory | `AGENTS.MEMORY`, `AGENTS.MEMORY_ANCHOR` |

Each ingestion replaces its own table family with `CREATE OR REPLACE TABLE` and then inserts the rows parsed from the source metadata.

---

## Source: Memory

The memory ingestion reads a YAML file of durable semantic memories and writes anchored records that agents can retrieve near relevant schema context. Memories are intended for query rules, join caveats, unit conversions, status meanings, grain warnings, and other project-specific guidance that should survive beyond a single agent session.

### When to use memory

Memory is the lightweight path to anchored, agent-retrievable notes for deployments **without a semantic layer**. It attaches notes directly to physical warehouse objects (schema/table/column) plus logical metrics and relationships, so an agent can pull "the notes relevant to this column" with a simple join — without standing up and maintaining a full semantic model.

If you already run an OSI semantic model, you usually do **not** need memory: OSI carries object-local `ai_context` on every dataset, field, and metric, which is the natural home for the same notes (for example, "this field is in cents" belongs on the OSI field). Memory overlaps that and is largely redundant for OSI-native teams. Reach for memory when there is no semantic layer, or for notes about raw warehouse objects that OSI does not model. Anchors target physical objects on purpose: a column anchor is reachable both from raw schema and — because OSI fields map down to columns — from an OSI-aware consumer, while the reverse is not true.

Memory can also summarize other providers. A process can scan richer provider tables such as LookML, dbt, OSI, catalog metadata, query history, or reviewed agent discoveries, then publish the compact facts worth carrying forward as memories with provenance back to the source material.

### YAML shape

The CLI accepts this canonical list form:

```yaml
memories:
- memory_id: stripe_amounts_are_cents
memory_kind: unit_rule
title: Stripe amounts
content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
source: memories.yaml
confidence: 0.9
anchors:
- anchor_id: invoice_amount_due
anchor_type: column
schema_name: stripe
table_name: invoice
column_name: amount_due

- memory_id: ticket_assignee_join
memory_kind: join_rule
content: For ticket owner reporting, join ticket.assignee_id to user.id.
anchors:
- anchor_id: ticket_to_user
anchor_type: relationship
relationship_name: ticket_to_user
from_schema: zendesk
from_table: ticket
from_columns: [assignee_id]
to_schema: zendesk
to_table: user
to_columns: [id]
```

Tools may keep their own project memory file and import it into this canonical
shape. For example, a Dataface project could keep a compact `memories.yaml`
that is optimized for authors:

```yaml
memories:
stripe_amounts_are_cents:
kind: unit_rule
content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
applies_to:
columns:
- stripe.invoice.amount_due
- stripe.charge.amount
metrics:
- revenue
```

That file is not a second Agents Schema standard. It is an authoring shape a
tool can map into the canonical rows: the memory key becomes `memory_id`, `kind`
becomes `memory_kind`, `content` maps directly, and each `applies_to` entry
becomes a row in `AGENTS.MEMORY_ANCHOR`.

### Schema graph

```mermaid
graph TD
ROOT["AGENTS.ROOT<br/>provider = memory"] --> MEMORY["AGENTS.MEMORY<br/>memory_id"]
MEMORY --> ANCHOR["AGENTS.MEMORY_ANCHOR<br/>memory_id, anchor_id"]
ANCHOR --> LOCATORS["Anchor locator columns<br/>schema_name/table_name/column_name · metric_id<br/>relationship_name + from_*/to_* join columns"]
```

### `AGENTS.MEMORY`

One row per durable semantic memory.

```sql
CREATE OR REPLACE TABLE AGENTS.MEMORY (
memory_id VARCHAR NOT NULL,
memory_kind VARCHAR NOT NULL,
title VARCHAR,
content TEXT NOT NULL,
source VARCHAR,
confidence FLOAT,
PRIMARY KEY (memory_id)
);
```

| Column | Source field |
|---|---|
| `memory_id` | Stable identifier unique within `AGENTS.MEMORY`. Mirrors the OSI parent/child convention: this is the entity's own key, and `AGENTS.MEMORY_ANCHOR` references it under the same prefixed name. |
| `memory_kind` | Tool-defined kind, such as `unit_rule`, `join_rule`, or `grain_warning`. |
| `title` | Optional short human-readable title for prompt rendering and review. |
| `content` | Durable guidance, caveat, or context the agent should remember. |
| `source` | Optional source reference, such as a file path, URL, provider object id, or import job label. |
| `confidence` | Optional confidence in `[0, 1]`, so consumers can threshold or rank. |

### `AGENTS.MEMORY_ANCHOR`

One row per retrieval anchor for a memory.

```sql
CREATE OR REPLACE TABLE AGENTS.MEMORY_ANCHOR (
memory_id VARCHAR NOT NULL,
anchor_id VARCHAR NOT NULL,
anchor_type VARCHAR NOT NULL,
schema_name VARCHAR,
table_name VARCHAR,
column_name VARCHAR,
metric_id VARCHAR,
relationship_name VARCHAR,
from_schema VARCHAR,
from_table VARCHAR,
from_columns VARIANT,
to_schema VARCHAR,
to_table VARCHAR,
to_columns VARIANT,
PRIMARY KEY (
memory_id,
anchor_id
)
);
```

Each anchor type uses a specific subset of the locator columns; ingestion rejects locators that do not belong to the anchor's type.

| Column | Used by | Source field |
|---|---|---|
| `memory_id` | all | Copied from the parent memory. |
| `anchor_id` | all | Memory-unique stable identifier for the anchor. |
| `anchor_type` | all | Retrieval scope: `column`, `table`, `relationship`, or `metric`. |
| `schema_name` | column, table | Schema name when known. |
| `table_name` | column, table | Table-like object name (required for `column` and `table`). |
| `column_name` | column | Column-like object name (required for `column`). |
| `metric_id` | metric | Metric identifier (required for `metric`). |
| `relationship_name` | relationship | Optional free-text relationship label; not a foreign key. |
| `from_schema` / `from_table` | relationship | Left side of the join (`from_table` required). |
| `from_columns` | relationship | Left-side join columns, paired positionally with `to_columns`. |
| `to_schema` / `to_table` | relationship | Right side of the join (`to_table` required). |
| `to_columns` | relationship | Right-side join columns, paired positionally with `from_columns`. |

Relationship anchors carry the join inline rather than referencing a canonical relationship, because memory does not depend on OSI being present. When OSI is available, `relationship_name` can record the matching `OSI_RELATIONSHIP.name` as a best-effort, unenforced pointer.

---

## Source: dbt

The dbt ingestion reads a compiled dbt `manifest.json` and writes normalized model, column, and dependency tables. It captures the transformation layer that is useful from the warehouse: what models exist, how they are documented, and which upstream nodes they depend on.
Expand Down
123 changes: 123 additions & 0 deletions memory-setup.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Memory Setup

## When to use memory

Memory is the lightweight path to anchored, agent-retrievable notes — query
rules, join caveats, unit conversions, status meanings, and grain warnings —
for deployments that **do not run a semantic layer**.

If you already maintain an OSI semantic model, prefer object-local `ai_context`
on the relevant dataset, field, or metric: it carries the same notes right on
the object and memory is largely redundant. Reach for memory when you have no
semantic layer, or for notes about raw warehouse objects that OSI does not
model. See [SPEC.md](./SPEC.md) for the full table contract.

## Prerequisites

The workflow needs destination warehouse credentials so it can create and
replace tables in the `AGENTS` schema.

Create one required GitHub Actions secret in the repository that calls these
workflows: `WAREHOUSE_CREDENTIALS`.

Snowflake is the only supported destination today, with more destination
support coming soon. We recommend key-pair authentication:

**Example key-pair auth secret:**

```yaml
type: snowflake
account: abc123
user: AGENTS_SCHEMA_BOT
warehouse: COMPUTE_WH
database: ANALYTICS
role: TRANSFORMER
private_key_pem: |
-----BEGIN ENCRYPTED PRIVATE KEY-----
MIIEvQIBADANBgkqh...
-----END ENCRYPTED PRIVATE KEY-----
private_key_passphrase: your-passphrase # only if the key is encrypted
```

**Note:**
- `role` is optional.
- An unencrypted key uses `-----BEGIN PRIVATE KEY-----` / `-----END PRIVATE KEY-----` markers and omits `private_key_passphrase`.

## Author a memory file

The CLI reads a single YAML file. Each memory has an id, a kind, durable
`content`, and one or more anchors that attach it to the objects where it is
relevant.

```yaml
memories:
- memory_id: stripe_amounts_are_cents
memory_kind: unit_rule
title: Stripe amounts
content: Stripe amount columns are stored in cents; divide by 100 for dollar measures.
source: memories.yaml
confidence: 0.9 # optional, 0..1
anchors:
- anchor_id: invoice_amount_due
anchor_type: column
schema_name: stripe
table_name: invoice
column_name: amount_due

- memory_id: ticket_assignee_join
memory_kind: join_rule
content: For ticket owner reporting, join ticket.assignee_id to user.id.
anchors:
- anchor_id: ticket_to_user
anchor_type: relationship
relationship_name: ticket_to_user # optional label
from_schema: zendesk
from_table: ticket
from_columns: [assignee_id]
to_schema: zendesk
to_table: user
to_columns: [id]
```

Anchor types and the locator columns each one uses:

| `anchor_type` | Required locators | Optional |
|---|---|---|
| `column` | `table_name`, `column_name` | `schema_name` |
| `table` | `table_name` | `schema_name` |
| `metric` | `metric_id` | |
| `relationship` | `from_table`, `to_table` | `from_schema`, `to_schema`, `from_columns`/`to_columns` (paired, equal length), `relationship_name` |

Validation fails fast on unknown fields, wrong scalar types, a `confidence`
outside `0..1`, duplicate memory ids, duplicate anchors, unsupported anchor
types, locators that do not belong to the anchor type, and missing required
locators.

## Run the Memory Sync Workflow

```yaml
name: Agents Schema Memory

on:
workflow_dispatch:
push:
branches: [main]

jobs:
agents-schema-memory:
uses: fivetran/agents_schema/.github/workflows/agents-schema-memory.yml@v0.0.6
with:
memory-file: memory.yml
secrets: inherit
```

`memory-file` is required — set it to the path of your memory YAML file. The
example uses `memory.yml`; change it to match your repository.

The workflow writes:

- `AGENTS.MEMORY`
- `AGENTS.MEMORY_ANCHOR`

These jobs do not need to depend on dbt, Looker, or OSI jobs unless your
repository has its own ordering requirement.
Loading