refactor(core): simplify note write flow by phernandez · Pull Request #739 · basicmachines-co/basic-memory

phernandez · 2026-04-13T23:54:17Z

Summary

remove the public fast note-write path and keep create/update/edit internally consistent before returning
split entity write semantics into reusable prepare methods so local Basic Memory and Cloud can share markdown/frontmatter/edit behavior
keep only derived async scheduling for vectors and remove note-write reindex scheduling and legacy fast params from clients/tools

Testing

uv run pytest tests/services/test_entity_service_prepare.py tests/api/v2/test_knowledge_router.py tests/services/test_task_scheduler_semantic.py tests/mcp/clients/test_clients.py

Context

closes BUG: New entities silently skip embedding generation after sqlite-vec load failure #578

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-04-13T23:54:37Z

Claude finished @phernandez's task in 2m 14s —— View job

PR Review: refactor(core): simplify note write flow

Tasks:

Code Quality & Standards

Item	Status
Coding conventions (CLAUDE.md)	✅ Generally followed — see note on `getattr` below
Python 3.12+ type annotations / async patterns	✅
SQLAlchemy 2.0 best practices	✅
FastAPI conventions	✅
100-char line length	✅
No commented-out code	✅

Testing & Documentation

Item	Status
Unit tests for new functions	✅ `test_entity_service_prepare.py` covers all three prepare methods
Integration tests for MCP tools	✅ `test_clients.py` updated
Edge case coverage	✅ `check_storage_exists=False` path tested
100% coverage maintained	✅ `# pragma: no cover` usage looks appropriate
Documentation / docstrings	✅ `PreparedEntityWrite` has a detailed docstring with attribute explanations

Basic Memory Architecture

Item	Status
MCP tools follow atomic / composable design	✅
No DB schema changes / no Alembic migration needed	✅
Local-first architecture preserved	✅
Knowledge graph consistency	✅
Markdown integrity	✅

Security & Performance

Item	Status
No hardcoded secrets	✅
Input validation	✅
Error handling	✅
No sensitive data in logs	✅
Windows path compatibility	✅ `.as_posix()` used consistently

Findings

Design — Strong

The PreparedEntityWrite dataclass is a well-thought-out abstraction. Having prepare methods return a sealed, frozen object before any persistence side effects is a clean way to share semantics between local and cloud paths without duplicating logic. The docstring on PreparedEntityWrite (lines 63–85) is thorough and explains why the abstraction exists.

Removing the fast=True/False query param eliminates a whole category of "which path am I on?" confusion. Keeping only the fully synchronous path with deferred-only async for semantic vectors is a cleaner contract.

Minor Concern: `getattr` in `_coerce_schema_input`

entity_service.py:276:

content=getattr(schema, "content", None),

CLAUDE.md explicitly says: "Never use getattr(obj, "attr", default) when unsure about attribute names." The comment explains this is intentional — EntityModel doesn't have content, but some legacy callers annotate the instance ad-hoc. The use is defensible, but worth flagging since it's a documented anti-pattern in this repo. Consider documenting the type union more explicitly or using a hasattr guard with an explanatory comment if that pattern ever gets wider use. Fix this →

Minor Concern: Double existence check in prepare + create pattern

prepare_create_entity_content (default check_storage_exists=True) calls file_service.exists(). But create_entity_with_content internally calls prepare_create_entity_content again — so callers who call prepare first and then create do two existence checks. The test test_prepare_create_entity_content_matches_create_entity_with_content demonstrates this pattern. It's not a bug (both pass since neither writes the file), but could be surprising if the file is created between calls. The check_storage_exists=False escape hatch handles the cloud use case, which is the right call.

Removed Telemetry Sub-spans

The old code had nested spans (write_entity, search_index, vector_sync, read_content) inside each router endpoint. These are now removed in favor of a single root span. The tests in test_knowledge_router_telemetry.py and test_upsert_entity_optimization.py reflect this — previously they asserted sub-spans existed; now they assert only the root span is emitted.

This is a reasonable tradeoff for simplicity, but it does reduce observability into per-operation latency on the write path. If there's an existing APM/distributed-tracing integration, the granular spans would have been useful for detecting bottlenecks. If this is cloud-facing, consider adding them back selectively for the sub-operations that matter most (e.g., file write, DB upsert). Not a blocker.

`# pyright: ignore` on model `id` columns

knowledge.py adds # pyright: ignore [reportIncompatibleVariableOverride] to all three id columns. These silence a legitimate Pyright complaint about Mapped[int] vs the base class. This is fine — it's a known SQLAlchemy/Pyright interaction — but worth noting for reviewers.

Removed tests: `test_upsert_entity_optimization.py` sub-span assertions

The removed section (test_upsert_update_emits_sub_spans) was testing internal implementation details (specific telemetry span names). Removing it is correct since those spans no longer exist. The behavioral correctness tests in the file are preserved.

Checklist Summary

Overall: Approve. This is a solid simplification. The PreparedEntityWrite abstraction is well-designed, the fast-path removal reduces cognitive overhead, and test coverage is maintained. The two minor concerns noted above are non-blocking.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5171793121

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T00:02:04Z

src/basic_memory/api/v2/routers/knowledge_router.py

+        # truly derived work, like semantic vectors, on the async scheduler.
+        write_result = await entity_service.create_entity_with_content(data)
+        entity = write_result.entity
+        await search_service.index_entity(entity, content=write_result.search_content)


Index persisted note text, not pre-format snapshot

This indexes write_result.search_content, which is derived before file_service.write_file runs formatters. When format_on_save is enabled, the on-disk markdown (and checksum) can differ from this in-memory text, so search can return stale snippets/ranking until a later reindex; the same pattern is used in the PUT/PATCH handlers too.

Useful? React with 👍 / 👎.

Signed-off-by: phernandez <paul@basicmachines.co>

github-actions · 2026-04-14T00:13:31Z

Claude finished @phernandez's task in 1m 43s —— View job

PR Review: refactor(core): simplify note write flow

Tasks:

Code Quality & Standards

Item	Status
Follows Basic Memory's coding conventions (CLAUDE.md)	✅
Python 3.12+ type annotations and async patterns	✅
SQLAlchemy 2.0 best practices	✅
FastAPI and Typer conventions followed	✅
100-character line length limit maintained	✅
No commented-out code blocks	✅

Testing & Documentation

Item	Status
Unit tests for new functions/methods	✅ `test_entity_service_prepare.py` covers all three prepare methods
Integration tests for MCP clients	✅ `test_clients.py` updated
Edge case coverage	✅ `check_storage_exists=False`, malformed frontmatter, prepend without frontmatter
100% coverage maintained	✅ `# pragma: no cover` usage looks appropriate
Documentation / docstrings	✅ `PreparedEntityWrite` has a detailed docstring

Basic Memory Architecture

Item	Status
MCP tools follow atomic / composable design	✅
No DB schema changes / no Alembic migration needed	✅
Local-first architecture preserved	✅
Knowledge graph consistency	✅
Markdown file handling integrity	✅

Security & Performance

Item	Status
No hardcoded secrets	✅
Input validation	✅
Error handling	✅
No sensitive data in logs	✅
Windows path compatibility	✅ `.as_posix()` used consistently

Compatibility

Item	Status
Windows path compatibility	✅
No emoji/unicode in console output	✅

Findings

Latest Commit: Malformed Frontmatter Fix

The third commit (cc68ceb) adds fail-fast behavior to _prepend_after_frontmatter for malformed YAML frontmatter. This is a good addition — parse_frontmatter now propagates ParseError instead of silently producing a corrupted note. The new tests in test_prepare_edit_entity_content_prepend_fails_for_malformed_frontmatter and test_prepare_edit_entity_content_prepend_preserves_valid_frontmatter cover both the happy path and the failure case cleanly. The section headers and decision-point comments in _prepend_after_frontmatter follow the literate style well.

Design Strengths

PreparedEntityWrite is a well-scoped, frozen dataclass. Separating "what we accepted" from "where we stored it" prevents state drift between file write, DB write, and search index. The three prepare_* methods are also good candidates for Cloud to reuse directly, which is the stated goal in the PR description.

Removing the fast=True/False query param eliminates a whole category of "which path am I on?" confusion. The single write path with deferred-only async for vectors is a cleaner contract.

Minor: `getattr` in `_coerce_schema_input` is justified but still worth documenting more explicitly

entity_service.py:275:

content=getattr(schema, "content", None),

CLAUDE.md says "Never use getattr(obj, "attr", default) when unsure about attribute names." The comment at lines 269–271 explains the rationale (ORM entity annotated ad-hoc by legacy callers), so this is an intentional, documented deviation. That said, the rule exists specifically to prevent silent drift. Consider making the constraint comment a bit more explicit about where these ad-hoc callers are to make it easier for future maintainers to know when it's safe to remove the compatibility shim.

Codex's Concern: Search Content Indexing Before `format_on_save`

The Codex review flagged that write_result.search_content (set from remove_frontmatter(markdown_content) inside _build_prepared_write) could diverge from the on-disk content if format_on_save applies transformations in file_service.write_file.

Looking at the code flow:

_build_prepared_write derives search_content = remove_frontmatter(markdown_content) before any file write
file_service.write_file(prepared.file_path, prepared.markdown_content) happens next
search_service.index_entity(entity, content=write_result.search_content) uses the pre-write text

If file_service.write_file normalizes whitespace, trailing newlines, or applies any other format-on-save transforms, the FTS index gets slightly stale content until the next reindex. This is a minor correctness gap rather than a blocking issue since the checksum is derived from the post-write file content, but search snippets could show pre-format text. Worth a quick check against FileService.write_file to confirm whether any formatting is applied during write. Investigate →

`# pyright: ignore` on model `id` columns

Three # pyright: ignore [reportIncompatibleVariableOverride] comments were added to knowledge.py for id columns. This is a known SQLAlchemy/Pyright interaction with Mapped[int] overriding the base class attribute. Fine as-is.

Removed Telemetry Sub-spans

The write path now has a single root telemetry span instead of nested sub-spans (write_entity, search_index, vector_sync, etc.). This simplifies the code but reduces per-operation observability. Not a blocker, and the PR description scopes this to the local write path where the simplification makes sense.

Checklist Summary

Overall: Approve. The PreparedEntityWrite abstraction is well-designed, the fast-path removal reduces cognitive overhead, the latest malformed-frontmatter fix adds useful fail-fast behavior, and test coverage is thorough. The two minor points (getattr comment clarity and the Codex search-content-indexing concern) are non-blocking.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: cc68cebd0f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-14T00:20:34Z

src/basic_memory/api/v2/routers/knowledge_router.py

        result = EntityResponseV2.model_validate(entity)
-        if fast:
-            result = result.model_copy(update={"observations": [], "relations": []})
-
-        with telemetry.scope(
-            "api.knowledge.create_entity.read_content",
-            domain="knowledge",
-            action="create_entity",
-            phase="read_content",
-            source="file" if fast else "memory",
-        ):
-            if fast:
-                content = await file_service.read_file_content(entity.file_path)
-            else:
-                # Non-fast writes already captured the markdown in memory. Reuse it here
-                # instead of re-reading the file; format_on_save is the one config that can
-                # still make the persisted file diverge because write_file only returns a checksum.
-                content = written_content
-        result = result.model_copy(update={"content": content})
+        # The write service already returns the canonical markdown accepted for this request.
+        result = result.model_copy(update={"content": write_result.content})



Return persisted markdown instead of pre-write snapshot

The write endpoints now always copy write_result.content into the API response, but EntityService.create/update/edit_*_with_content build that content before FileService.write_file runs formatters. When format_on_save is enabled, the on-disk markdown can differ from this in-memory string, so clients can receive stale content that does not match the file they just wrote. Please populate content from the persisted file (or return the formatter output from the write path) before returning.

Useful? React with 👍 / 👎.

phernandez added 2 commits April 13, 2026 16:52

type checking and removing extra spans in api flow

bbe014b

refactor(core): simplify note write flow

5171793

Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

fix(core): fail prepend on malformed frontmatter

cc68ceb

Signed-off-by: phernandez <paul@basicmachines.co>

chatgpt-codex-connector bot reviewed Apr 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): simplify note write flow#739

refactor(core): simplify note write flow#739
phernandez wants to merge 3 commits intomainfrom
knowledge-flow-cleanup

phernandez commented Apr 13, 2026

Uh oh!

github-actions bot commented Apr 13, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

github-actions bot commented Apr 14, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

phernandez commented Apr 13, 2026

Summary

Testing

Context

Uh oh!

github-actions bot commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(core): simplify note write flow

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Findings

Design — Strong

Minor Concern: getattr in _coerce_schema_input

Minor Concern: Double existence check in prepare + create pattern

Removed Telemetry Sub-spans

# pyright: ignore on model id columns

Removed tests: test_upsert_entity_optimization.py sub-span assertions

Checklist Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: refactor(core): simplify note write flow

Code Quality & Standards

Testing & Documentation

Basic Memory Architecture

Security & Performance

Compatibility

Findings

Latest Commit: Malformed Frontmatter Fix

Design Strengths

Minor: getattr in _coerce_schema_input is justified but still worth documenting more explicitly

Codex's Concern: Search Content Indexing Before format_on_save

# pyright: ignore on model id columns

Removed Telemetry Sub-spans

Checklist Summary

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

github-actions bot commented Apr 13, 2026 •

edited

Loading

Minor Concern: `getattr` in `_coerce_schema_input`

`# pyright: ignore` on model `id` columns

Removed tests: `test_upsert_entity_optimization.py` sub-span assertions

github-actions bot commented Apr 14, 2026 •

edited

Loading

Minor: `getattr` in `_coerce_schema_input` is justified but still worth documenting more explicitly

Codex's Concern: Search Content Indexing Before `format_on_save`

`# pyright: ignore` on model `id` columns