basicmachines-co
diff --git a/‎.github/workflows/test.yml‎
Lines changed: 2 additions & 2 deletions b/‎.github/workflows/test.yml‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎docs/post-v0.18.0-test-plan.md‎
Lines changed: 1 addition & 1 deletion b/‎docs/post-v0.18.0-test-plan.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎docs/semantic-search.md‎
Lines changed: 26 additions & 21 deletions b/‎docs/semantic-search.md‎
Lines changed: 26 additions & 21 deletions
diff --git a/‎justfile‎
Lines changed: 1 addition & 1 deletion b/‎justfile‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎pyproject.toml‎
Lines changed: 1 addition & 5 deletions b/‎pyproject.toml‎
Lines changed: 1 addition & 5 deletions
diff --git a/‎src/basic_memory/alembic/versions/i2c3d4e5f6g7_auto_backfill_semantic_embeddings.py‎
Lines changed: 29 additions & 0 deletions b/‎src/basic_memory/alembic/versions/i2c3d4e5f6g7_auto_backfill_semantic_embeddings.py‎
Lines changed: 29 additions & 0 deletions
diff --git a/‎src/basic_memory/cli/commands/tool.py‎
Lines changed: 2 additions & 2 deletions b/‎src/basic_memory/cli/commands/tool.py‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎src/basic_memory/config.py‎
Lines changed: 4 additions & 3 deletions b/‎src/basic_memory/config.py‎
Lines changed: 4 additions & 3 deletions
diff --git a/‎src/basic_memory/db.py‎
Lines changed: 117 additions & 0 deletions b/‎src/basic_memory/db.py‎
Lines changed: 117 additions & 0 deletions
diff --git a/‎src/basic_memory/mcp/tools/chatgpt_tools.py‎
Lines changed: 0 additions & 1 deletion b/‎src/basic_memory/mcp/tools/chatgpt_tools.py‎
Lines changed: 0 additions & 1 deletion
@@ -254,7 +254,7 @@ jobs:
 
       - name: Install dependencies
         run: |
-          uv pip install -e ".[dev,semantic]"
+          uv pip install -e ".[dev]"
 
       - name: Run tests (Semantic)
         run: |
@@ -296,7 +296,7 @@ jobs:
 
       - name: Install dependencies
         run: |
-          uv pip install -e ".[dev,semantic]"
+          uv pip install -e ".[dev]"
 
       - name: Run combined coverage (SQLite + Postgres)
         run: |
 
@@ -79,7 +79,7 @@ These are the most important post-`v0.18.0` feature modules currently under-cove
 ### Acceptance criteria
 
 - `search_type=text|vector|hybrid` returns expected ranked results on canonical semantic corpus.
-- Missing semantic extras fail fast with actionable install guidance.
+- Missing semantic dependencies fail fast with actionable install guidance.
 - Reindex and provider/model changes produce valid vectors without dimension mismatch.
 - SQLite and Postgres produce equivalent behavior for semantic modes on the same dataset.
 - Generated-column migration path is valid on SQLite environments in use.
 
@@ -1,26 +1,26 @@
 # Semantic Search
 
-This guide covers Basic Memory's optional semantic (vector) search feature, which adds meaning-based retrieval alongside the existing full-text search.
+This guide covers Basic Memory's semantic (vector) search feature, which adds meaning-based retrieval alongside the existing full-text search.
 
 ## Overview
 
-Basic Memory's default search uses full-text search (FTS) — keyword matching with boolean operators. Semantic search adds vector embeddings that capture the *meaning* of your content, enabling:
+Basic Memory's search supports both full-text search (FTS) and semantic retrieval. Semantic search adds vector embeddings that capture the *meaning* of your content, enabling:
 
 - **Paraphrase matching**: Find "authentication flow" when searching for "login process"
 - **Conceptual queries**: Search for "ways to improve performance" and find notes about caching, indexing, and optimization
 - **Hybrid retrieval**: Combine the precision of keyword search with the recall of semantic similarity
 
-Semantic search is **opt-in** — existing behavior is completely unchanged unless you enable it. It works on both SQLite (local) and Postgres (cloud) backends.
+Semantic search is enabled by default when semantic dependencies are available at runtime. It works on both SQLite (local) and Postgres (cloud) backends.
 
 ## Installation
 
-Semantic search dependencies (fastembed, sqlite-vec, openai) are **optional extras** — they are not installed with the base `basic-memory` package. Install them with:
+Semantic search dependencies (fastembed, sqlite-vec, openai) are included in the default `basic-memory` install.
 
 ```bash
-pip install 'basic-memory[semantic]'
+pip install basic-memory
 ```
 
-This keeps the base install lightweight and avoids platform-specific issues with ONNX Runtime wheels.
+You can always override with `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true|false`.
 
 ### Platform Compatibility
 
@@ -34,36 +34,40 @@ This keeps the base install lightweight and avoids platform-specific issues with
 
 #### Intel Mac Workaround
 
-The default FastEmbed provider uses ONNX Runtime, which dropped Intel Mac (x86_64) wheels starting in v1.24. Intel Mac users have two options:
+The default install includes FastEmbed, which depends on ONNX Runtime. ONNX Runtime dropped Intel Mac (x86_64) wheels starting in v1.24, so install with a compatible ONNX Runtime pin first:
 
-**Option 1: Use OpenAI embeddings (recommended)**
+```bash
+pip install basic-memory 'onnxruntime<1.24'
+```
 
-Install only the OpenAI dependency manually — no ONNX Runtime or FastEmbed needed:
+After installation, Intel Mac users have two runtime options:
+
+**Option 1: Use OpenAI embeddings (recommended)**
 
 ```bash
-pip install openai sqlite-vec
 export BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true
 export BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER=openai
 export OPENAI_API_KEY=sk-...
 ```
 
-**Option 2: Pin an older ONNX Runtime**
+**Option 2: Use FastEmbed locally**
 
-FastEmbed's ONNX Runtime dependency is unpinned, so you can constrain it to an older version that still ships Intel Mac wheels by passing both requirements in the same install command:
+Keep the same pinned installation and use FastEmbed (default provider):
 
 ```bash
-pip install 'basic-memory[semantic]' 'onnxruntime<1.24'
+export BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true
+export BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER=fastembed
 ```
 
 ## Quick Start
 
-1. Install semantic extras:
+1. Install Basic Memory:
 
 ```bash
-pip install 'basic-memory[semantic]'
+pip install basic-memory
 ```
 
-2. Enable semantic search:
+2. (Optional) Explicitly enable semantic search:
 
 ```bash
 export BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true
@@ -84,7 +88,7 @@ search_notes("login process", search_type="vector")
 # Hybrid: combines FTS precision with vector recall (recommended)
 search_notes("login process", search_type="hybrid")
 
-# Traditional full-text search (still the default)
+# Explicit full-text search
 search_notes("login process", search_type="text")
 ```
 
@@ -94,7 +98,7 @@ All settings are fields on `BasicMemoryConfig` and can be set via environment va
 
 | Config Field | Env Var | Default | Description |
 |---|---|---|---|
-| `semantic_search_enabled` | `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED` | `false` | Enable semantic search. Required before vector/hybrid modes work. |
+| `semantic_search_enabled` | `BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED` | Auto (`true` when semantic deps are available) | Enable semantic search. Required before vector/hybrid modes work. |
 | `semantic_embedding_provider` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_PROVIDER` | `"fastembed"` | Embedding provider: `"fastembed"` (local) or `"openai"` (API). |
 | `semantic_embedding_model` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_MODEL` | `"bge-small-en-v1.5"` | Model identifier. Auto-adjusted per provider if left at default. |
 | `semantic_embedding_dimensions` | `BASIC_MEMORY_SEMANTIC_EMBEDDING_DIMENSIONS` | Auto-detected | Vector dimensions. 384 for FastEmbed, 1536 for OpenAI. Override only if using a non-default model. |
@@ -112,8 +116,8 @@ FastEmbed runs entirely locally using ONNX models — no API key, no network cal
 - **Tradeoff**: Smaller model, fast inference, good quality for most use cases
 
 ```bash
-# Install semantic extras and enable
-pip install 'basic-memory[semantic]'
+# Install basic-memory and enable semantic search
+pip install basic-memory
 export BASIC_MEMORY_SEMANTIC_SEARCH_ENABLED=true
 ```
 
@@ -197,7 +201,8 @@ bm reindex -p my-project
 
 ### When You Need to Reindex
 
-- **First enable**: After turning on `semantic_search_enabled` for the first time
+- **Upgrade note**: Migration now performs a one-time automatic embedding backfill on upgrade.
+- **Manual enable case**: If you explicitly had `semantic_search_enabled=false` and then turn it on
 - **Provider change**: After switching between `fastembed` and `openai`
 - **Model change**: After changing `semantic_embedding_model`
 - **Dimension change**: After changing `semantic_embedding_dimensions`
 
@@ -2,7 +2,7 @@
 
 # Install dependencies
 install:
-    uv sync --extra semantic
+    uv sync
     @echo ""
     @echo "💡 Remember to activate the virtual environment by running: source .venv/bin/activate"
 
 
@@ -44,10 +44,6 @@ dependencies = [
     "sniffio>=1.3.1",
     "anyio>=4.10.0",
     "httpx>=0.28.0",
-]
-
-[project.optional-dependencies]
-semantic = [
     "fastembed>=0.7.4",
     "sqlite-vec>=0.1.6",
     "openai>=1.100.2",
@@ -78,7 +74,7 @@ markers = [
     "postgres: Tests that run against Postgres backend (deselect with '-m \"not postgres\"')",
     "windows: Windows-specific tests (deselect with '-m \"not windows\"')",
     "smoke: Fast end-to-end smoke tests for MCP flows",
-    "semantic: Tests requiring [semantic] extras (fastembed, sqlite-vec, openai)",
+    "semantic: Tests requiring semantic dependencies (fastembed, sqlite-vec, openai)",
 ]
 
 [tool.ruff]
 
@@ -0,0 +1,29 @@
+"""Trigger automatic semantic embedding backfill during migration.
+
+Revision ID: i2c3d4e5f6g7
+Revises: h1b2c3d4e5f6
+Create Date: 2026-02-19 00:00:00.000000
+
+"""
+
+from typing import Sequence, Union
+
+# revision identifiers, used by Alembic.
+revision: str = "i2c3d4e5f6g7"
+down_revision: Union[str, None] = "h1b2c3d4e5f6"
+branch_labels: Union[str, Sequence[str], None] = None
+depends_on: Union[str, Sequence[str], None] = None
+
+
+def upgrade() -> None:
+    """No schema change.
+
+    Trigger: this revision is newly applied.
+    Why: db.run_migrations() detects this revision transition and runs the existing
+    sync_entity_vectors() pipeline to backfill semantic embeddings automatically.
+    Outcome: users no longer need to run `bm reindex --embeddings` after upgrading.
+    """
+
+
+def downgrade() -> None:
+    """No-op downgrade."""
@@ -847,8 +847,8 @@ def search_notes(
         if not metadata_filters:
             metadata_filters = None
 
-        # set search type
-        search_type = "text"
+        # set search type (None delegates to MCP tool default selection)
+        search_type: str | None = None
         if permalink:
             search_type = "permalink"
             if query and "*" in query:
 
@@ -40,8 +40,9 @@ class DatabaseBackend(str, Enum):
 
 
 def _default_semantic_search_enabled() -> bool:
-    """Enable semantic search by default when semantic extras are installed."""
-    return importlib.util.find_spec("fastembed") is not None
+    """Enable semantic search by default when required local semantic dependencies exist."""
+    required_modules = ("fastembed", "sqlite_vec")
+    return all(importlib.util.find_spec(module_name) is not None for module_name in required_modules)
 
 
 @dataclass
@@ -145,7 +146,7 @@ class BasicMemoryConfig(BaseSettings):
     # Semantic search configuration
     semantic_search_enabled: bool = Field(
         default_factory=_default_semantic_search_enabled,
-        description="Enable semantic search (vector/hybrid retrieval). Works on both SQLite and Postgres backends. Requires semantic extras.",
+        description="Enable semantic search (vector/hybrid retrieval). Works on both SQLite and Postgres backends. Requires semantic dependencies (included by default).",
     )
     semantic_embedding_provider: str = Field(
         default="fastembed",
 
@@ -43,6 +43,99 @@
 _engine: Optional[AsyncEngine] = None
 _session_maker: Optional[async_sessionmaker[AsyncSession]] = None
 
+# Alembic revision that enables one-time automatic embedding backfill.
+SEMANTIC_EMBEDDING_BACKFILL_REVISION = "i2c3d4e5f6g7"
+
+
+async def _load_applied_alembic_revisions(
+    session_maker: async_sessionmaker[AsyncSession],
+) -> set[str]:
+    """Load applied Alembic revisions from alembic_version.
+
+    Returns an empty set when the version table does not exist yet
+    (fresh database before first migration).
+    """
+    try:
+        async with scoped_session(session_maker) as session:
+            result = await session.execute(text("SELECT version_num FROM alembic_version"))
+            return {str(row[0]) for row in result.fetchall() if row[0]}
+    except Exception as exc:
+        error_message = str(exc).lower()
+        if "alembic_version" in error_message and (
+            "no such table" in error_message or "does not exist" in error_message
+        ):
+            return set()
+        raise
+
+
+def _should_run_semantic_embedding_backfill(
+    revisions_before_upgrade: set[str],
+    revisions_after_upgrade: set[str],
+) -> bool:
+    """Check if this migration run newly applied the backfill-trigger revision."""
+    return (
+        SEMANTIC_EMBEDDING_BACKFILL_REVISION in revisions_after_upgrade
+        and SEMANTIC_EMBEDDING_BACKFILL_REVISION not in revisions_before_upgrade
+    )
+
+
+async def _run_semantic_embedding_backfill(
+    app_config: BasicMemoryConfig,
+    session_maker: async_sessionmaker[AsyncSession],
+) -> None:
+    """Backfill semantic embeddings for all active projects/entities."""
+    if not app_config.semantic_search_enabled:
+        logger.info("Skipping automatic semantic embedding backfill: semantic search is disabled.")
+        return
+
+    async with scoped_session(session_maker) as session:
+        project_result = await session.execute(
+            text("SELECT id, name FROM project WHERE is_active = :is_active ORDER BY id"),
+            {"is_active": True},
+        )
+        projects = [(int(row[0]), str(row[1])) for row in project_result.fetchall()]
+
+    if not projects:
+        logger.info("Skipping automatic semantic embedding backfill: no active projects found.")
+        return
+
+    repository_class = (
+        PostgresSearchRepository
+        if app_config.database_backend == DatabaseBackend.POSTGRES
+        else SQLiteSearchRepository
+    )
+
+    total_entities = 0
+    for project_id, project_name in projects:
+        async with scoped_session(session_maker) as session:
+            entity_result = await session.execute(
+                text("SELECT id FROM entity WHERE project_id = :project_id ORDER BY id"),
+                {"project_id": project_id},
+            )
+            entity_ids = [int(row[0]) for row in entity_result.fetchall()]
+
+        if not entity_ids:
+            continue
+
+        total_entities += len(entity_ids)
+        logger.info(
+            "Automatic semantic embedding backfill: "
+            f"project={project_name}, entities={len(entity_ids)}"
+        )
+
+        search_repository = repository_class(
+            session_maker,
+            project_id=project_id,
+            app_config=app_config,
+        )
+        for entity_id in entity_ids:
+            await search_repository.sync_entity_vectors(entity_id)
+
+    logger.info(
+        "Automatic semantic embedding backfill complete: "
+        f"projects={len(projects)}, entities={total_entities}"
+    )
+
 
 class DatabaseType(Enum):
     """Types of supported databases."""
@@ -384,6 +477,23 @@ async def run_migrations(
     """
     logger.info("Running database migrations...")
     try:
+        revisions_before_upgrade: set[str] = set()
+        # Trigger: run_migrations() can be invoked before module-level session maker is set.
+        # Why: we still need reliable before/after revision detection for one-time backfill.
+        # Outcome: create a short-lived session maker when needed, then dispose it immediately.
+        if _session_maker is None:
+            temp_engine, temp_session_maker = _create_engine_and_session(
+                app_config.database_path,
+                database_type,
+                app_config,
+            )
+            try:
+                revisions_before_upgrade = await _load_applied_alembic_revisions(temp_session_maker)
+            finally:
+                await temp_engine.dispose()
+        else:
+            revisions_before_upgrade = await _load_applied_alembic_revisions(_session_maker)
+
         # Get the absolute path to the alembic directory relative to this file
         alembic_dir = Path(__file__).parent / "alembic"
         config = Config()
@@ -422,6 +532,13 @@ async def run_migrations(
             await PostgresSearchRepository(session_maker, 1).init_search_index()
         else:
             await SQLiteSearchRepository(session_maker, 1).init_search_index()
+
+        revisions_after_upgrade = await _load_applied_alembic_revisions(session_maker)
+        if _should_run_semantic_embedding_backfill(
+            revisions_before_upgrade,
+            revisions_after_upgrade,
+        ):
+            await _run_semantic_embedding_backfill(app_config, session_maker)
     except Exception as e:  # pragma: no cover
         logger.error(f"Error running migrations: {e}")
         raise
@@ -120,7 +120,6 @@ async def search(
             project=default_project,  # Use default project for ChatGPT
             page=1,
             page_size=10,  # Reasonable default for ChatGPT consumption
-            search_type="text",  # Default to full-text search
             output_format="json",
             context=context,
         )
Original file line number	Diff line number	Diff line change
`@@ -120,7 +120,6 @@ async def search(`
`120`	`120`	`project=default_project, # Use default project for ChatGPT`
`121`	`121`	`page=1,`
`122`	`122`	`page_size=10, # Reasonable default for ChatGPT consumption`
`123`		`- search_type="text", # Default to full-text search`
`124`	`123`	`output_format="json",`
`125`	`124`	`context=context,`
`126`	`125`	`)`