As I continue working on the memory system from different angles, here is what I've learned:
- Superior document processing is informed by the use case downstream
- Original sources, like a PDF file or audio segment, exist somewhere, and we possess a pointer. We should not duplicate all source documents.
- Temporal data is informative.
All Convex tables:
sources
journals
entities
relations
knowledge
embeddings
metadata
operations
Architecture
Storage Layer: A set of sources and their parts. When used in downstream tasks, these are cited.
Distilled Layer: A set of structured journals that always cite sources. These should be validated for correctness before being added to the index.
Knowledge Layer: A mapping of associations between entities and their relations. These are grounded in journals or human intervention, not sources.
Journals should be:
- Easy for the user to read, understand, and update
- Grounded by source documents. We are agnostic to the structure and contents, as long as there is a valid reference to a
source and its parts.
As I continue working on the memory system from different angles, here is what I've learned:
All Convex tables:
sourcesjournalsentitiesrelationsknowledgeembeddingsmetadataoperationsArchitecture
Storage Layer: A set ofsourcesand their parts. When used in downstream tasks, these are cited.Distilled Layer: A set of structuredjournalsthat always citesources. These should be validated for correctness before being added to the index.Knowledge Layer: A mapping of associations between entities and their relations. These are grounded injournalsor human intervention, notsources.Journals should be:
sourceand its parts.