Add data import toolkit (ChatGPT, Obsidian, Readwise)#58
Merged
Conversation
Standalone importer scripts that bulk-load existing personal data into the
memory store over the REST API, so a fresh server doesn't start empty. Adapts
the OB1 project's import recipes to this server's single-user REST surface.
- importers/: a retrying REST client (client.py), pure per-source parsers
(chatgpt.py, obsidian.py, readwise.py), and a shared CLI runner (cli.py).
Each parser yields MemoryClient.add kwargs; nothing here ships in the app
image. Imported memories are tagged agent_id=import:<source> with source/
title/path/book/author metadata for later provenance.
- scripts/import_{chatgpt,obsidian,readwise}.py: thin CLIs with --dry-run,
--limit, --source, and $MEM0_URL/$MEM0_API_KEY support.
- tests/test_importers.py: parser cases + client behavior (bearer header,
dry-run, no-retry on 4xx, backoff+retry on 5xx) via respx.
- docs: USER_GUIDE "Importing existing data" section (incl. LLM cost note),
DEVELOPER_GUIDE project-layout entry.
https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue
There was a problem hiding this comment.
Pull request overview
Adds a standalone data import toolkit that seeds a fresh memserv memory store by parsing common exports and POSTing them to the existing POST /api/v1/memories REST endpoint (no server/runtime behavior changes).
Changes:
- Introduces an
importers/package with a retrying REST client, shared CLI wiring, and pure parsers for ChatGPT/Obsidian/Readwise exports. - Adds thin
scripts/import_*.pyentrypoints that run from a repo checkout (no packaging step). - Adds importer-focused unit tests and updates docs to describe the new import workflow and repo layout.
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/test_importers.py | New test suite covering parsers and client retry/dry-run behavior. |
| scripts/import_chatgpt.py | CLI entrypoint for importing ChatGPT conversations.json. |
| scripts/import_obsidian.py | CLI entrypoint for importing an Obsidian vault directory. |
| scripts/import_readwise.py | CLI entrypoint for importing Readwise CSV highlights. |
| importers/init.py | Package docstring and high-level intent for importer tooling. |
| importers/chatgpt.py | Parser/loader for ChatGPT exports producing messages payloads. |
| importers/obsidian.py | Parser for Obsidian vaults (frontmatter stripping, skip dirs). |
| importers/readwise.py | Parser/loader for Readwise CSV highlights (+ optional notes/metadata). |
| importers/client.py | MemoryClient REST client with retry/backoff and dry-run support. |
| importers/cli.py | Shared argparse + import run loop used by the scripts. |
| docs/USER_GUIDE.md | Documents how to run the import scripts and expected behavior/cost. |
| docs/DEVELOPER_GUIDE.md | Adds importers/ + scripts/import_*.py to the project layout map. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…_retries - importers/cli.py: run() now returns a non-zero exit code whenever any record fails, even if others succeeded, so automation can detect partial failures. - importers/client.py: validate max_retries >= 1 in __init__ and raise a clear ValueError, instead of running zero attempts and hitting an AssertionError. - tests: cover both (cli.run exit codes for all-success vs partial-failure, and the max_retries guard). https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds a data import toolkit so a fresh memory store doesn't have to start empty. These are standalone REST clients of
POST /api/v1/memories— adapted from the OB1 / Open Brain project's import recipes (backlog issue #49) and reworked for this server's single-user REST surface. Nothing here ships in the app image.Three sources to start:
conversations.json)scripts/import_chatgpt.pymessagespayload per conversation (user/assistant turns, ordered).md)scripts/import_obsidian.pyscripts/import_readwise.pyAll three share the same options:
path,--base-url/--api-key(default$MEM0_URL/$MEM0_API_KEY),--source,--limit(trial runs), and--dry-run(preview without sending). Each imported memory is taggedagent_id=import:<source>and carriessource(+title/path/book/author) in metadata so imported memories stay distinguishable from session-written ones.Structure
importers/package:client.py(retrying REST client — backoff on transport errors and 5xx, no retry on 4xx), pure parserschatgpt.py/obsidian.py/readwise.py(each a generator yieldingMemoryClient.addkwargs), andcli.py(shared argparse + run loop).scripts/import_*.py: thin CLI wrappers that add the repo root tosys.path, so they run from a checkout with no packaging step.Tests
tests/test_importers.py(11 cases): ChatGPT turn ordering / system-message drop / wrapped-key / custom source; Obsidian frontmatter stripping, dotfolder skip, top-only frontmatter; Readwise highlight+note+metadata; client bearer header, dry-run, no-retry on 4xx, backoff+retry on 5xx (via respx).ruff checkclean.--dry-runagainst sample exports and confirmed the missing-credentials guard exits non-zero.Architecture notes
import:<source>provenance tag.--dry-run/--limitguidance) and a project-layout entry in the Developer Guide.Follow-ups
Easy to extend with more sources (Gmail, X/Twitter, Perplexity, Grok) by adding a parser + a thin script. Server-side byte-identical dedup is tracked separately in #48.
Relates to #49.
https://claude.ai/code/session_017835DVrvURaYnbQiPQwzue
Generated by Claude Code