atomicmemory
diff --git a/‎.gitignore‎
Lines changed: 10 additions & 0 deletions b/‎.gitignore‎
Lines changed: 10 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 38 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎CLAUDE.md‎
Lines changed: 38 additions & 0 deletions b/‎CLAUDE.md‎
Lines changed: 38 additions & 0 deletions
diff --git a/‎LICENSE‎
Lines changed: 21 additions & 0 deletions b/‎LICENSE‎
Lines changed: 21 additions & 0 deletions
diff --git a/‎README.md‎
Lines changed: 150 additions & 0 deletions b/‎README.md‎
Lines changed: 150 additions & 0 deletions
diff --git a/‎examples/basic/README.md‎
Lines changed: 34 additions & 0 deletions b/‎examples/basic/README.md‎
Lines changed: 34 additions & 0 deletions
diff --git a/‎examples/basic/sources/knowledge-compilation.md‎
Lines changed: 44 additions & 0 deletions b/‎examples/basic/sources/knowledge-compilation.md‎
Lines changed: 44 additions & 0 deletions
diff --git a/‎examples/basic/wiki/concepts/change-detection.md‎
Lines changed: 41 additions & 0 deletions b/‎examples/basic/wiki/concepts/change-detection.md‎
Lines changed: 41 additions & 0 deletions
@@ -0,0 +1,10 @@
+.env
+node_modules/
+dist/
+.llmwiki/
+.llm-wiki/
+/wiki/
+/sources/
+*.tmp
+.DS_Store
+localdocs/
@@ -0,0 +1,38 @@
+# llmwiki
+
+A knowledge compiler CLI. Raw sources in, interlinked wiki out.
+
+## Development Guidelines
+
+### Code Style & Standards
+
+- Files must be smaller than 400 lines excluding comments. Once 400 is exceeded, initiate a refactor.
+- Functions must be smaller than 40 lines excluding comments and the catch/finally blocks of try/catch sections. If a function exceeds that, refactor it.
+
+### clean code rules
+
+- Meaningful Names: Name variables and functions to reveal their purpose, not just their value.
+- One Function, One Responsibility: Functions should do one thing.
+- Avoid Magic Numbers: Replace hard-code values with named constants to give them meaning.
+- Use Descriptive Booleans: Boolean names should state a condition, not just its value.
+- Keep Code DRY: Duplicate code means duplicate bugs. Try and reuse logic where it makes sense.
+- Avoid Deep Nesting: Flatten your code flow to improve clarity and reduce cognitive load.
+- Comment Why, Not What: Explain the intention behind your code, not the obvious mechanics.
+- Limit Function Arguments: Too many parameters confuse. Group related data into objects.
+- Code Should Be Self-Explanatory: Well-written code needs fewer comments because it reads like a story.
+
+### Comments and Documentation
+
+- include a substantial JSDoc comment at the top of each file. For python files, use google style docstrings
+- Write clear comments for complex logic
+- Document public APIs and functions
+- Use JSDoc comments for functions
+- Keep comments up-to-date with code changes
+- Document any non-obvious behavior
+
+## General Rules
+
+- First think through the problem, read the codebase for relevant files.
+- Make every task and code change you do as simple as possible. We want to avoid making any massive or complex changes. Every change should impact as little code as possible. Everything is about simplicity.
+- Never speculate about code you have not opened. If the user references a specific file, you MUST read the file before answering. Make sure to investigate and read relevant files BEFORE answering questions about the codebase. Never make any claims about code before investigating unless you are certain of the correct answer - give grounded and hallucination-free answers.
+
@@ -0,0 +1,38 @@
+# llmwiki
+
+A knowledge compiler CLI. Raw sources in, interlinked wiki out.
+
+## Development Guidelines
+
+### Code Style & Standards
+
+- Files must be smaller than 400 lines excluding comments. Once 400 is exceeded, initiate a refactor.
+- Functions must be smaller than 40 lines excluding comments and the catch/finally blocks of try/catch sections. If a function exceeds that, refactor it.
+
+### clean code rules
+
+- Meaningful Names: Name variables and functions to reveal their purpose, not just their value.
+- One Function, One Responsibility: Functions should do one thing.
+- Avoid Magic Numbers: Replace hard-code values with named constants to give them meaning.
+- Use Descriptive Booleans: Boolean names should state a condition, not just its value.
+- Keep Code DRY: Duplicate code means duplicate bugs. Try and reuse logic where it makes sense.
+- Avoid Deep Nesting: Flatten your code flow to improve clarity and reduce cognitive load.
+- Comment Why, Not What: Explain the intention behind your code, not the obvious mechanics.
+- Limit Function Arguments: Too many parameters confuse. Group related data into objects.
+- Code Should Be Self-Explanatory: Well-written code needs fewer comments because it reads like a story.
+
+### Comments and Documentation
+
+- include a substantial JSDoc comment at the top of each file. For python files, use google style docstrings
+- Write clear comments for complex logic
+- Document public APIs and functions
+- Use JSDoc comments for functions
+- Keep comments up-to-date with code changes
+- Document any non-obvious behavior
+
+## General Rules
+
+- First think through the problem, read the codebase for relevant files.
+- Make every task and code change you do as simple as possible. We want to avoid making any massive or complex changes. Every change should impact as little code as possible. Everything is about simplicity.
+- Never speculate about code you have not opened. If the user references a specific file, you MUST read the file before answering. Make sure to investigate and read relevant files BEFORE answering questions about the codebase. Never make any claims about code before investigating unless you are certain of the correct answer - give grounded and hallucination-free answers.
+
@@ -0,0 +1,21 @@
+MIT License
+
+Copyright (c) 2026 atomicmemory
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
@@ -0,0 +1,150 @@
+# llmwiki
+
+Compile raw sources into an interlinked markdown wiki.
+
+Inspired by Karpathy's [LLM Wiki](https://gist.github.com/karpathy/442a6bf555914893e9891c11519de94f) pattern: instead of re-discovering knowledge at query time, compile it once into a persistent, browsable artifact that compounds over time.
+
+## Who this is for
+
+- **AI researchers and engineers** building persistent knowledge from papers, docs, and notes
+- **Technical writers** compiling scattered sources into a structured, interlinked reference
+- **Anyone with too many bookmarks** who wants a wiki instead of a graveyard of tabs
+
+## Quick start
+
+```bash
+npm install -g llm-wiki-compiler
+export ANTHROPIC_API_KEY=sk-...
+
+llmwiki ingest https://some-article.com
+llmwiki compile
+llmwiki query "what is X?"
+```
+
+## Why not just RAG?
+
+RAG retrieves chunks at query time. Every question re-discovers the same relationships from scratch. Nothing accumulates.
+
+llmwiki **compiles** your sources into a wiki. Concepts get their own pages. Pages link to each other. When you ask a question with `--save`, the answer becomes a new page, and future queries use it as context. Your explorations compound.
+
+This is complementary to RAG, not a replacement. RAG is great for ad-hoc retrieval over large corpora. llmwiki gives you a persistent, structured artifact to retrieve from.
+
+```
+RAG:     query → search chunks → answer → forget
+llmwiki: sources → compile → wiki → query → save → richer wiki → better answers
+```
+
+## How it works
+
+```
+sources/  →  SHA-256 hash check  →  LLM concept extraction  →  wiki page generation  →  [[wikilink]] resolution  →  index.md
+```
+
+**Two-phase pipeline.** Phase 1 extracts all concepts from all sources. Phase 2 generates pages. This eliminates order-dependence, catches failures before writing anything, and merges concepts shared across multiple sources into single pages.
+
+**Incremental.** Only changed sources go through the LLM. Everything else is skipped via hash-based change detection.
+
+**Compounding queries.** `llmwiki query --save` writes the answer as a wiki page and immediately rebuilds the index. Saved answers show up in future queries as context.
+
+### What it produces
+
+A raw source like a Wikipedia article on knowledge compilation becomes a structured wiki page:
+
+```yaml
+---
+title: Knowledge Compilation
+summary: Techniques for converting knowledge representations into forms that support efficient reasoning.
+sources:
+  - knowledge-compilation.md
+createdAt: "2026-04-05T12:00:00Z"
+updatedAt: "2026-04-05T12:00:00Z"
+---
+```
+
+```markdown
+Knowledge compilation refers to a family of techniques for pre-processing
+a knowledge base into a target language that supports efficient queries.
+
+Related concepts: [[Propositional Logic]], [[Model Counting]]
+```
+
+Pages include source attribution in frontmatter. Provenance is page-level today, not claim-level.
+
+## Commands
+
+| Command | What it does |
+|---------|-------------|
+| `llmwiki ingest <url\|file>` | Fetch a URL or copy a local file into `sources/` |
+| `llmwiki compile` | Incremental compile: extract concepts, generate wiki pages |
+| `llmwiki query "question"` | Ask questions against your compiled wiki |
+| `llmwiki query "question" --save` | Answer and save the result as a wiki page |
+| `llmwiki watch` | Auto-recompile when `sources/` changes |
+
+## Output
+
+```
+wiki/
+  concepts/     one .md file per concept, with YAML frontmatter
+  queries/      saved query answers, included in index and retrieval
+  index.md      auto-generated table of contents
+```
+
+Obsidian-compatible. `[[wikilinks]]` resolve to concept titles.
+
+## Demo
+
+Try it on any article or document:
+
+```bash
+mkdir my-wiki && cd my-wiki
+llmwiki ingest https://en.wikipedia.org/wiki/Knowledge_compilation
+llmwiki compile
+llmwiki query "how does knowledge compilation work?"
+```
+
+See `examples/basic/` in the repo for pre-generated output you can browse without an API key.
+
+## Limitations
+
+Early software. Best for small, high-signal corpora (a few dozen sources). Query routing is index-based. Provenance is page-level, not claim-level. Anthropic-only for now.
+
+**Honest about truncation.** Sources that exceed the character limit are truncated on ingest with `truncated: true` and the original character count recorded in frontmatter, so downstream consumers know they're working with partial content.
+
+## Karpathy's LLM Wiki pattern vs this compiler
+
+Karpathy describes an abstract pattern for turning raw data into compiled knowledge. Here's how llmwiki maps to it:
+
+| Karpathy's concept | llmwiki | Status |
+|---|---|---|
+| Data ingest | `llmwiki ingest` | Implemented |
+| Compile wiki | `llmwiki compile` | Implemented |
+| Q&A | `llmwiki query` | Implemented |
+| Output filing (save answers back) | `llmwiki query --save` | Implemented |
+| Auto-recompile | `llmwiki watch` | Implemented |
+| Linting / health-check pass | — | Not yet implemented (`watch` is auto-recompile, not lint) |
+| Image support | — | Not yet implemented |
+| Marp slides | — | Not yet implemented |
+| Fine-tuning | — | Not yet implemented |
+
+## Roadmap
+
+- Multi-provider support (OpenAI, local models)
+- Better provenance (claim-level source attribution)
+- Larger-corpus query strategy (semantic search, embeddings)
+- Deeper Obsidian integration
+- Linting pass for wiki quality checks
+
+If you want to contribute, these are the highest-leverage areas right now. Issues and PRs are welcome.
+
+## Requirements
+
+Node.js >= 18, an Anthropic API key.
+
+## License
+
+MIT
+
+
+## Disclaimer
+
+No LLMs were harmed in the making of this repo.
@@ -0,0 +1,34 @@
+# Basic Example
+
+Real output from running llmwiki on a single source document about knowledge compilation.
+
+## What's here
+
+```
+sources/
+  knowledge-compilation.md    ← the input (one markdown file)
+
+wiki/
+  concepts/                   ← 7 concept pages extracted by the LLM
+    change-detection.md
+    compilation-pipeline.md
+    concept-extraction.md
+    cross-source-concepts.md
+    incremental-compilation.md
+    knowledge-compilation.md
+    wikilinks.md
+  index.md                    ← auto-generated table of contents
+```
+
+One source in, seven interlinked concept pages out. Browse the `wiki/` directory to see the compiled output, or open it in Obsidian for navigable `[[wikilinks]]`.
+
+## Reproduce it yourself
+
+```bash
+# run from the repo root
+llmwiki ingest examples/basic/sources/knowledge-compilation.md
+llmwiki compile
+llmwiki query "what is knowledge compilation?"
+```
+
+Output will vary since it's LLM-generated, but the structure will match.
@@ -0,0 +1,44 @@
+---
+title: knowledge compilation
+source: examples/basic/sources/knowledge-compilation.md
+ingestedAt: "2026-04-06T07:53:26.851Z"
+---
+
+---
+title: knowledge compilation
+source: basic/sources/knowledge-compilation.md
+ingestedAt: "2026-04-06T07:31:48.092Z"
+---
+
+# Knowledge Compilation
+
+The idea of "knowledge compilation" is that LLMs can take messy, unstructured information and compile it into clean, structured, interlinked reference material. Think of it like a compiler for knowledge: raw sources in, organized wiki out.
+
+## Why This Matters
+
+Most knowledge lives in scattered documents, articles, notes, and conversations. Finding what you need means searching through all of it. A knowledge compiler processes these sources and produces a wiki where every concept has its own page, linked to related concepts.
+
+## How It Works
+
+The compilation pipeline has several stages:
+
+1. **Ingestion**: Raw sources (URLs, files, documents) are collected into a sources directory.
+2. **Change Detection**: SHA-256 hashes identify which sources have changed since the last compile.
+3. **Concept Extraction**: An LLM reads each changed source and extracts the key concepts.
+4. **Page Generation**: For each concept, the LLM generates a wiki page with proper structure.
+5. **Interlink Resolution**: Concept mentions across pages are wrapped in [[wikilinks]].
+6. **Index Generation**: A table of contents is built from all concept pages.
+
+## Incremental Compilation
+
+Like a code compiler, only changed sources need reprocessing. This saves both time and API costs. The system tracks source hashes in a state file and skips unchanged sources entirely.
+
+## Cross-Source Concepts
+
+When multiple sources discuss the same concept, the compiler detects this overlap through semantic dependency tracking. Changes to one source trigger recompilation of shared concepts using content from all contributing sources.
+
+## Obsidian Compatibility
+
+The output format uses YAML frontmatter and [[wikilinks]], making it directly compatible with Obsidian and similar tools. Each concept page includes metadata like title, summary, source attribution, and timestamps.
+
+
@@ -0,0 +1,41 @@
+---
+title: Change Detection
+summary: Using SHA-256 hashes to identify which sources have been modified since the last compilation run
+sources:
+  - knowledge-compilation.md
+createdAt: "2026-04-06T07:53:56.456Z"
+updatedAt: "2026-04-06T07:53:56.456Z"
+---
+
+# Change Detection
+
+Change Detection is a crucial component of the [[knowledge compilation]] pipeline that determines which sources need reprocessing during [[Incremental Compilation]]. It uses cryptographic hashing to efficiently identify modified content and avoid unnecessary work.
+
+## Purpose
+
+Change Detection serves as an optimization mechanism for [[knowledge compilation]] systems. Rather than reprocessing all sources on every compilation run, it identifies only the sources that have actually changed since the last compilation. This approach saves both processing time and API costs when working with large knowledge bases.
+
+## How It Works
+
+The system uses **SHA-256 hashes** to create unique fingerprints for each source document. These hashes are stored in a state file that tracks the compilation history. During each compilation run:
+
+1. The system calculates SHA-256 hashes for all current sources
+2. These new hashes are compared against the stored hashes from the previous compilation
+3. Sources with different hashes are marked as changed and queued for reprocessing
+4. Unchanged sources are skipped entirely
+
+## Integration with [[Compilation Pipeline]]
+
+Change Detection operates as the second stage of the [[knowledge compilation]] pipeline, immediately after **Ingestion**. It acts as a filter that determines which sources proceed to the **Concept Extraction** stage, ensuring that only modified content triggers expensive LLM processing.
+
+## State Management
+
+The system maintains compilation state through a dedicated state file that preserves hash information between runs. This persistent tracking enables true [[Incremental Compilation]], where the system can resume work efficiently after interruptions or when processing sources that are updated at different intervals.
+
+## Cross-Source Dependencies
+
+When sources share concepts, Change Detection must account for semantic dependencies. If one source changes and affects a concept that appears in multiple sources, the system may need to recompile pages for that concept using content from all contributing sources, not just the changed one.
+
+## Sources
+
+- knowledge-compilation.md