Skip to content

feat: implement pgvector semantic and hybrid rrf search with bgd indexing#10

Open
Prakhar-Sethi012 wants to merge 1 commit into
GDGVIT:devfrom
Prakhar-Sethi012:Changes_Prakhar
Open

feat: implement pgvector semantic and hybrid rrf search with bgd indexing#10
Prakhar-Sethi012 wants to merge 1 commit into
GDGVIT:devfrom
Prakhar-Sethi012:Changes_Prakhar

Conversation

@Prakhar-Sethi012

Copy link
Copy Markdown
Contributor

Title: feat: Implement Hybrid Search Engine & Background Indexer

Description

This PR introduces the core AI search domain for the WhereTF backend. It sets up the pgvector database infrastructure, implements a background asynchronous indexing service to prevent API blocking, and exposes a high-performance Reciprocal Rank Fusion (RRF) search route that combines semantic vector search with full-text keyword matching.

Type of Change

  • New feature (non-breaking change which adds functionality)
  • Infrastructure / DB Migration
  • Bug fix (Fixed Docker compose command targets and Alembic table naming mismatches)

Architecture & Technical Highlights

  • Database Layer: Configured PostgreSQL with the pgvector extension. Created SQLAlchemy models (files and file_content) and generated Alembic migrations to handle vector storage.
  • Background Processing: Implemented app/services/indexer.py using FastAPI BackgroundTasks. It opens an isolated SQLAlchemy session, chunks incoming text, generates 384-dimensional embeddings, stores them in PostgreSQL, and cleans up the server's /tmp/ directory.
  • Search Engine: Integrated the processing/search.py module to execute complex raw SQL via SQLAlchemy's db.execute().
  • Routing: Added the POST /search/ endpoint supporting vector, keyword, and hybrid modes.

How This Was Tested

  • Verified local Docker container orchestration (backend safely waits for db health check).
  • Resolved module import paths to ensure clean Uvicorn startup.
  • Triggered a temporary /test-indexer/ route to ingest a dummy syllabus file.
  • Successfully executed POST /search/ (Hybrid Mode) via Swagger UI, receiving a 200 OK with correctly fused vector/keyword scoring and text chunks.

Next Steps & Blockers

  • Blocker: The indexer currently relies on a temporary test route.
  • Next Action: Awaiting the POST /files/ upload route to be completed so we can route real user files from the web client directly into this background indexing pipeline.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant