This file is the source of truth for what's done, what's active, and what's next. The coding agent MUST update this file after completing any task. Last updated: 2026-03-03 - Phase 7 COMPLETE! 🎉
- Turborepo + pnpm workspaces
- Shared packages: @smart-commerce/types, @smart-commerce/errors
- Hono + GraphQL Yoga + MCP server
- Prisma integration
- Commit: 161c325f
- FastAPI + Python LangGraph
- classify + shopper + support agents
- Commit: 254d451b
- /api/agent route (SSE to agent-core)
- /api/copilotkit route
- Deleted lib/agents/ + lib/llm/ (moved to agent-core)
- ProductGrid, CartDrawer, ActionConfirm, OrderTimeline
- Registered in chat.tsx with useCopilotAction
- 18 component tests passing
- docker-compose.yml (4 services: postgres, redis, commerce-api, agent-core)
- docker-compose.langfuse.yml (separate — optional)
- Updated Dockerfiles for monorepo build context
- .env.example with all required vars
- Makefile with memory-safe targets
- 9 docker-compose structure tests passing
- Full stack smoke test
- Playwright E2E tests
- pgvector embeddings for recommendations
- checkout-wizard GenUI component
- cx-proactive.ts port + cron triggers
- Proxy route protections
- Secrets management
- TLS
- Health dashboards
- 88 pre-existing integration test failures (require running infrastructure)
- UCP module exists — should be deleted (replaced by Stripe MCP)
- RAG metrics in README showing old 44%/38%
| Date | Decision | Reason |
|---|---|---|
| Feb 21 | Dropped UCP, using Stripe MCP | UCP is custom/unknown, Stripe MCP is real + official |
| Feb 21 | Dropped Qdrant, pgvector only | Redundant infra, pgvector sufficient at portfolio scale |
| Feb 21 | LangGraph active (not disabled) | Agent orchestration is the core of the project |
| Feb 21 | Azure AI Foundry over Ollama | Production-ready, industry-standard |
| Feb 21 | TDD enforced via CLAUDE.md | Stops hallucination, ensures quality |
| Feb 21 | Real infra for all tests | No mocks for DB/Redis/LLM in integration tests |
| Mar 03 | Monorepo with 3 apps | Clean separation: web, commerce-api, agent-core |
| Goal: Real working chat → DB → Azure AI Foundry pipeline |
- Verify Docker stack runs (
docker ps -ashows 3 containers healthy)- ✅ smart-commerce-postgres (pgvector:pg16) - healthy
- ✅ smart-commerce-redis (redis:7-alpine) - healthy
- ✅ smart-commerce-langfuse (langfuse/latest) - running
- Verify Azure AI Foundry responds (test with curl)
- ✅ Model: gpt-oss-120b responding successfully
- Prisma schema v1 (Customer, Product, Order, Cart, CartItem, Ticket)
- ✅ 15 tables created successfully
- Run migrations + seed 20 products
- ✅ Migration: 20260221060801_init
- ✅ 20 realistic products seeded (MacBook, Sony, iPhone, etc.)
- pgvector HNSW index + tsvector trigger (migration.sql)
- ✅ pgvector extension created
- ✅ HNSW index created (manual SQL)
- ✅ GIN index for full-text search created
- Verify pgvector works:
SELECT vector_dims('[1,2,3]'::vector);- ✅ Returns: 3 (pgvector working)
-
lib/agents/state.ts— AgentState type with 14 intent types- ✅ Message, toolResults, uiComponents reducers
- ✅ Entities, Sentiment, ToolResult, UIHint types
- ✅ Tests: 7/7 passing
-
lib/agents/nodes/classify.ts— intent + entity extraction with Azure AI- ✅ 14 intent types supported
- ✅ Entity extraction (products, prices, orderIds, emails)
- ✅ Sentiment detection (positive, neutral, negative, frustrated)
- ✅ Fallback to keyword classification on error
- ✅ Tests: 11/11 passing
-
lib/agents/supervisor.ts— graph with routing logic- ✅ StateGraph assembly with 8 nodes
- ✅ Intent-based routing (product_search → search_node, etc.)
- ✅ State accumulation through workflow
- ✅ Error handling with fallback
- ✅ Tests: 10/10 passing
- Test: classifyIntent("find wireless headphones") → intent="product_search"
- ✅ Verified with mock Azure AI
- Test: graph persists state across workflow
- ✅ Messages accumulated
- ✅ userId preserved
-
lib/mcp/server.ts— auth wrapper + Langfuse tracing- ✅ Tool registration system
- ✅ User authentication (userId requirement)
- ✅ Rate limiting interface
- ✅ Zod argument validation
- ✅ Langfuse tracing integration
- ✅ Error handling
- ✅ Execution metadata (timing, userId, traced)
- ✅ Tests: 17/17 passing
-
lib/mcp/tools.ts— existing tools integrated with server - Test: catalog.search with userId enforced (no userId → UNAUTHORIZED)
- Test: cart.add_item with real Docker Postgres
Goal: Hybrid FTS + pgvector search with Azure embeddings
-
lib/rag/semantic-chunker.ts— semantic chunking with similarity merging -
lib/rag/reranker.ts— cross-encoder reranking -
lib/rag/query-transform.ts— query rewriting + HyDE -
lib/rag/semantic-cache.ts— Redis-backed semantic cache - Integrate semantic chunking into
indexDocument - Wire reranker into
ragQuery - Hook query transforms into MCP tools
-
lib/search/embeddings.ts— Azure text-embedding-3-small -
lib/search/hybrid.ts— FTS candidates → pgvector rerank - Test: hybridSearch("wireless headphones") returns ranked results
- Test: FTS fallback when query returns 0 semantic matches
- Test: filter by maxPrice works at SQL level (not post-filter)
- Semantic cache in Redis (5min TTL)
- Test: second identical search hits Redis cache
Goal: Full cart cycle + Stripe MCP checkout
- cart.get, cart.add_item, cart.update_quantity, cart.remove_item, cart.clear
- Idempotency: adding same product twice updates quantity (not duplicate row)
- Test: cart total recalculated correctly after update
- Test: remove last item → empty cart (not null cart)
-
lib/payments/stripe-mcp.ts— toolkit init -
lib/payments/idempotency.ts— key generation + Redis storage - checkout.start MCP tool → Stripe payment intent via toolkit
- Test: idempotency key prevents duplicate payment intents
- Test: Stripe webhook → order.create_from_cart → order in DB
- Add stripe-mcp container to docker-compose.dev.yml
Goal: Agent renders React components, not markdown
- ProductCard — existing in app/dashboard/components/genui/
- OrderCard — existing
- TicketStatus — existing
- ProductGrid — grid of ProductCards with add-to-cart
- CartDrawer — slide-in cart with quantity controls
- CheckoutWizard — Stripe Elements embedded
- OrderConfirmation — post-purchase summary
- OrderTracking — status timeline
- useCopilotAction("catalog.search") → renders
- useCopilotAction("cart.add_item") → renders
- useCopilotAction("checkout.start") → renders
- useCopilotReadable: expose cart + visible products to agent
- orders.list, orders.get, orders.track MCP tools
- support.create_ticket, support.get_ticket MCP tools
- LangGraph refund_node → Stripe MCP refunds.create
- Azure Language NER on support tickets (sentiment tagging)
Goal: RAGAS scores ≥ 70% relevancy, ≥ 75% faithfulness
-
lib/observability/rag-trace.ts— per-span RAG tracing -
lib/observability/llm-judge.ts— LLM-as-judge scoring - Per-span Langfuse tracing: classify → search → rerank → generate
-
scripts/llm_eval.py— RAGAS metrics (replace current 44%/38% scores) - Azure Content Safety on LLM outputs
- Target metrics: relevancy >70%, faithfulness >75%
- Azure Language NER on search queries (enrich before FTS)
- Azure SignalR for real-time cart updates
- Azure Event Grid for order.placed → async workers
- Azure Functions: price alerts, abandoned cart, inventory
- Removed Supabase client files
- Azure AI Foundry .env configured
- docker-compose.dev.yml created (PostgreSQL + Redis + Langfuse)
- Prisma adapter for local Postgres
- HLD + LLD documented
- Semantic chunking with similarity merging (22 tests passing)
- Cross-encoder reranker (15 tests passing)
- Query transformation (rewriting + HyDE)
- Semantic cache with Redis
- Pydantic schemas for validation
- LangChain guard chains
- DSPy signatures for optimization
- PII, toxicity, jailbreak detection (24 tests passing)
- Cart tools (update_quantity, remove_item, clear, apply_coupon)
- Checkout tool (checkout.create)
- Order tools (create_from_cart, cancel)
- 61 unit tests passing
- 22 integration tests passing
- Test files created for all new modules
- CLAUDE.md — agent instructions
- AGENTS.md — architecture context
- TASKS.md — living task board
- TRANSFORMATION_REPORT.md — gaps analysis
- IMPLEMENTATION_COMPLETE_SUMMARY.md
- App won't start — Supabase middleware blocking (needs removal or mock)
- LangGraph disabled — routes temporarily disabled, need activation
- RAG metrics in README — showing old 44%/38% (needs update after integration)
- UCP module exists — should be deleted (replaced by Stripe MCP)
| Date | Decision | Reason |
|---|---|---|
| Feb 21 | Dropped UCP, using Stripe MCP | UCP is custom/unknown, Stripe MCP is real + official |
| Feb 21 | Dropped Qdrant, pgvector only | Redundant infra, pgvector sufficient at portfolio scale |
| Feb 21 | LangGraph active (not disabled) | Agent orchestration is the core of the project |
| Feb 21 | Azure AI Foundry over Ollama | Production-ready, industry-standard |
| Feb 21 | TDD enforced via CLAUDE.md | Stops hallucination, ensures quality |
| Feb 21 | Real infra for all tests | No mocks for DB/Redis/LLM in integration tests |