feat: implement pgvector semantic and hybrid rrf search with bgd indexing by Prakhar-Sethi012 · Pull Request #10 · GDGVIT/WhereTF-backend

Prakhar-Sethi012 · 2026-06-13T22:50:08Z

Title: feat: Implement Hybrid Search Engine & Background Indexer

Description

This PR introduces the core AI search domain for the WhereTF backend. It sets up the pgvector database infrastructure, implements a background asynchronous indexing service to prevent API blocking, and exposes a high-performance Reciprocal Rank Fusion (RRF) search route that combines semantic vector search with full-text keyword matching.

Type of Change

New feature (non-breaking change which adds functionality)
Infrastructure / DB Migration
Bug fix (Fixed Docker compose command targets and Alembic table naming mismatches)

Architecture & Technical Highlights

Database Layer: Configured PostgreSQL with the pgvector extension. Created SQLAlchemy models (files and file_content) and generated Alembic migrations to handle vector storage.
Background Processing: Implemented app/services/indexer.py using FastAPI BackgroundTasks. It opens an isolated SQLAlchemy session, chunks incoming text, generates 384-dimensional embeddings, stores them in PostgreSQL, and cleans up the server's /tmp/ directory.
Search Engine: Integrated the processing/search.py module to execute complex raw SQL via SQLAlchemy's db.execute().
Routing: Added the POST /search/ endpoint supporting vector, keyword, and hybrid modes.

How This Was Tested

Verified local Docker container orchestration (backend safely waits for db health check).
Resolved module import paths to ensure clean Uvicorn startup.
Triggered a temporary /test-indexer/ route to ingest a dummy syllabus file.
Successfully executed POST /search/ (Hybrid Mode) via Swagger UI, receiving a 200 OK with correctly fused vector/keyword scoring and text chunks.

Next Steps & Blockers

Blocker: The indexer currently relies on a temporary test route.
Next Action: Awaiting the POST /files/ upload route to be completed so we can route real user files from the web client directly into this background indexing pipeline.

…nd indexing

feat: implement pgvector semantic and hybrid rrf search with backgrou…

1f9aa7e

…nd indexing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement pgvector semantic and hybrid rrf search with bgd indexing#10

feat: implement pgvector semantic and hybrid rrf search with bgd indexing#10
Prakhar-Sethi012 wants to merge 1 commit into
GDGVIT:devfrom
Prakhar-Sethi012:Changes_Prakhar

Prakhar-Sethi012 commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Prakhar-Sethi012 commented Jun 13, 2026

Title: feat: Implement Hybrid Search Engine & Background Indexer

Description

Type of Change

Architecture & Technical Highlights

How This Was Tested

Next Steps & Blockers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant