Skip to content

Latest commit

 

History

History
303 lines (242 loc) · 11.2 KB

File metadata and controls

303 lines (242 loc) · 11.2 KB

TASKS.md — Smart Commerce Agent

This file is the source of truth for what's done, what's active, and what's next. The coding agent MUST update this file after completing any task. Last updated: 2026-03-03 - Phase 7 COMPLETE! 🎉


✅ Completed Phases

Phase 1-2: Monorepo Scaffold

  • Turborepo + pnpm workspaces
  • Shared packages: @smart-commerce/types, @smart-commerce/errors

Phase 3: commerce-api

  • Hono + GraphQL Yoga + MCP server
  • Prisma integration
  • Commit: 161c325f

Phase 4: agent-core

  • FastAPI + Python LangGraph
  • classify + shopper + support agents
  • Commit: 254d451b

Phase 5: Web Proxy Layer

  • /api/agent route (SSE to agent-core)
  • /api/copilotkit route
  • Deleted lib/agents/ + lib/llm/ (moved to agent-core)

Phase 6: GenUI Components

  • ProductGrid, CartDrawer, ActionConfirm, OrderTimeline
  • Registered in chat.tsx with useCopilotAction
  • 18 component tests passing

Phase 7: Docker + Makefile + Env

  • docker-compose.yml (4 services: postgres, redis, commerce-api, agent-core)
  • docker-compose.langfuse.yml (separate — optional)
  • Updated Dockerfiles for monorepo build context
  • .env.example with all required vars
  • Makefile with memory-safe targets
  • 9 docker-compose structure tests passing

⏳ Remaining Phases

Phase 8: E2E Verification

  • Full stack smoke test
  • Playwright E2E tests

Phase 9: Taste Vector

  • pgvector embeddings for recommendations

Phase 10: Stripe MCP Payment Flow

  • checkout-wizard GenUI component

Phase 11: Proactive Agent

  • cx-proactive.ts port + cron triggers

Phase 12: Rate Limiting + Circuit Breaker

  • Proxy route protections

Phase 13: Production Hardening

  • Secrets management
  • TLS
  • Health dashboards

🐛 Known Issues

  1. 88 pre-existing integration test failures (require running infrastructure)
  2. UCP module exists — should be deleted (replaced by Stripe MCP)
  3. RAG metrics in README showing old 44%/38%

📝 Architecture Decisions Log

Date Decision Reason
Feb 21 Dropped UCP, using Stripe MCP UCP is custom/unknown, Stripe MCP is real + official
Feb 21 Dropped Qdrant, pgvector only Redundant infra, pgvector sufficient at portfolio scale
Feb 21 LangGraph active (not disabled) Agent orchestration is the core of the project
Feb 21 Azure AI Foundry over Ollama Production-ready, industry-standard
Feb 21 TDD enforced via CLAUDE.md Stops hallucination, ensures quality
Feb 21 Real infra for all tests No mocks for DB/Redis/LLM in integration tests
Mar 03 Monorepo with 3 apps Clean separation: web, commerce-api, agent-core
Goal: Real working chat → DB → Azure AI Foundry pipeline

Core Infrastructure

  • Verify Docker stack runs (docker ps -a shows 3 containers healthy)
    • ✅ smart-commerce-postgres (pgvector:pg16) - healthy
    • ✅ smart-commerce-redis (redis:7-alpine) - healthy
    • ✅ smart-commerce-langfuse (langfuse/latest) - running
  • Verify Azure AI Foundry responds (test with curl)
    • ✅ Model: gpt-oss-120b responding successfully
  • Prisma schema v1 (Customer, Product, Order, Cart, CartItem, Ticket)
    • ✅ 15 tables created successfully
  • Run migrations + seed 20 products
    • ✅ Migration: 20260221060801_init
    • ✅ 20 realistic products seeded (MacBook, Sony, iPhone, etc.)
  • pgvector HNSW index + tsvector trigger (migration.sql)
    • ✅ pgvector extension created
    • ✅ HNSW index created (manual SQL)
    • ✅ GIN index for full-text search created
  • Verify pgvector works: SELECT vector_dims('[1,2,3]'::vector);
    • ✅ Returns: 3 (pgvector working)

✅ Phase 2: LangGraph Agent - COMPLETE! 🎉

LangGraph Agent (activate — don't disable)

  • lib/agents/state.ts — AgentState type with 14 intent types
    • ✅ Message, toolResults, uiComponents reducers
    • ✅ Entities, Sentiment, ToolResult, UIHint types
    • ✅ Tests: 7/7 passing
  • lib/agents/nodes/classify.ts — intent + entity extraction with Azure AI
    • ✅ 14 intent types supported
    • ✅ Entity extraction (products, prices, orderIds, emails)
    • ✅ Sentiment detection (positive, neutral, negative, frustrated)
    • ✅ Fallback to keyword classification on error
    • ✅ Tests: 11/11 passing
  • lib/agents/supervisor.ts — graph with routing logic
    • ✅ StateGraph assembly with 8 nodes
    • ✅ Intent-based routing (product_search → search_node, etc.)
    • ✅ State accumulation through workflow
    • ✅ Error handling with fallback
    • ✅ Tests: 10/10 passing
  • Test: classifyIntent("find wireless headphones") → intent="product_search"
    • ✅ Verified with mock Azure AI
  • Test: graph persists state across workflow
    • ✅ Messages accumulated
    • ✅ userId preserved

✅ Phase 3: MCP Tool Layer - COMPLETE! 🎉

MCP Tool Layer

  • lib/mcp/server.ts — auth wrapper + Langfuse tracing
    • ✅ Tool registration system
    • ✅ User authentication (userId requirement)
    • ✅ Rate limiting interface
    • ✅ Zod argument validation
    • ✅ Langfuse tracing integration
    • ✅ Error handling
    • ✅ Execution metadata (timing, userId, traced)
    • ✅ Tests: 17/17 passing
  • lib/mcp/tools.ts — existing tools integrated with server
  • Test: catalog.search with userId enforced (no userId → UNAUTHORIZED)
  • Test: cart.add_item with real Docker Postgres

🟡 Phase 2: Search Pipeline (Week 1-2)

Goal: Hybrid FTS + pgvector search with Azure embeddings

RAG Enhancements (IMPLEMENTED - needs integration)

  • lib/rag/semantic-chunker.ts — semantic chunking with similarity merging
  • lib/rag/reranker.ts — cross-encoder reranking
  • lib/rag/query-transform.ts — query rewriting + HyDE
  • lib/rag/semantic-cache.ts — Redis-backed semantic cache
  • Integrate semantic chunking into indexDocument
  • Wire reranker into ragQuery
  • Hook query transforms into MCP tools

Search Implementation

  • lib/search/embeddings.ts — Azure text-embedding-3-small
  • lib/search/hybrid.ts — FTS candidates → pgvector rerank
  • Test: hybridSearch("wireless headphones") returns ranked results
  • Test: FTS fallback when query returns 0 semantic matches
  • Test: filter by maxPrice works at SQL level (not post-filter)
  • Semantic cache in Redis (5min TTL)
  • Test: second identical search hits Redis cache

🟡 Phase 3: Cart + Checkout (Week 2)

Goal: Full cart cycle + Stripe MCP checkout

Cart MCP Tools

  • cart.get, cart.add_item, cart.update_quantity, cart.remove_item, cart.clear
  • Idempotency: adding same product twice updates quantity (not duplicate row)
  • Test: cart total recalculated correctly after update
  • Test: remove last item → empty cart (not null cart)

Stripe MCP Integration

  • lib/payments/stripe-mcp.ts — toolkit init
  • lib/payments/idempotency.ts — key generation + Redis storage
  • checkout.start MCP tool → Stripe payment intent via toolkit
  • Test: idempotency key prevents duplicate payment intents
  • Test: Stripe webhook → order.create_from_cart → order in DB
  • Add stripe-mcp container to docker-compose.dev.yml

🟡 Phase 4: GenUI + CopilotKit (Week 2-3)

Goal: Agent renders React components, not markdown

shadcn Components

  • ProductCard — existing in app/dashboard/components/genui/
  • OrderCard — existing
  • TicketStatus — existing
  • ProductGrid — grid of ProductCards with add-to-cart
  • CartDrawer — slide-in cart with quantity controls
  • CheckoutWizard — Stripe Elements embedded
  • OrderConfirmation — post-purchase summary
  • OrderTracking — status timeline

CopilotKit Actions

  • useCopilotAction("catalog.search") → renders
  • useCopilotAction("cart.add_item") → renders
  • useCopilotAction("checkout.start") → renders
  • useCopilotReadable: expose cart + visible products to agent

🟡 Phase 5: Orders + Support (Week 3)

  • orders.list, orders.get, orders.track MCP tools
  • support.create_ticket, support.get_ticket MCP tools
  • LangGraph refund_node → Stripe MCP refunds.create
  • Azure Language NER on support tickets (sentiment tagging)

🟡 Phase 6: Observability + Evals (Week 3-4)

Goal: RAGAS scores ≥ 70% relevancy, ≥ 75% faithfulness

Observability

  • lib/observability/rag-trace.ts — per-span RAG tracing
  • lib/observability/llm-judge.ts — LLM-as-judge scoring
  • Per-span Langfuse tracing: classify → search → rerank → generate
  • scripts/llm_eval.py — RAGAS metrics (replace current 44%/38% scores)
  • Azure Content Safety on LLM outputs
  • Target metrics: relevancy >70%, faithfulness >75%

🟡 Phase 7: Azure AI Services (Week 4)

  • Azure Language NER on search queries (enrich before FTS)
  • Azure SignalR for real-time cart updates
  • Azure Event Grid for order.placed → async workers
  • Azure Functions: price alerts, abandoned cart, inventory

✅ Completed

Infrastructure

  • Removed Supabase client files
  • Azure AI Foundry .env configured
  • docker-compose.dev.yml created (PostgreSQL + Redis + Langfuse)
  • Prisma adapter for local Postgres
  • HLD + LLD documented

RAG Enhancements (IMPLEMENTED)

  • Semantic chunking with similarity merging (22 tests passing)
  • Cross-encoder reranker (15 tests passing)
  • Query transformation (rewriting + HyDE)
  • Semantic cache with Redis

Guardrails (IMPLEMENTED)

  • Pydantic schemas for validation
  • LangChain guard chains
  • DSPy signatures for optimization
  • PII, toxicity, jailbreak detection (24 tests passing)

MCP Tools (IMPLEMENTED)

  • Cart tools (update_quantity, remove_item, clear, apply_coupon)
  • Checkout tool (checkout.create)
  • Order tools (create_from_cart, cancel)

Testing

  • 61 unit tests passing
  • 22 integration tests passing
  • Test files created for all new modules

Documentation

  • CLAUDE.md — agent instructions
  • AGENTS.md — architecture context
  • TASKS.md — living task board
  • TRANSFORMATION_REPORT.md — gaps analysis
  • IMPLEMENTATION_COMPLETE_SUMMARY.md

🐛 Known Issues / Blockers

  1. App won't start — Supabase middleware blocking (needs removal or mock)
  2. LangGraph disabled — routes temporarily disabled, need activation
  3. RAG metrics in README — showing old 44%/38% (needs update after integration)
  4. UCP module exists — should be deleted (replaced by Stripe MCP)

📝 Architecture Decisions Log

Date Decision Reason
Feb 21 Dropped UCP, using Stripe MCP UCP is custom/unknown, Stripe MCP is real + official
Feb 21 Dropped Qdrant, pgvector only Redundant infra, pgvector sufficient at portfolio scale
Feb 21 LangGraph active (not disabled) Agent orchestration is the core of the project
Feb 21 Azure AI Foundry over Ollama Production-ready, industry-standard
Feb 21 TDD enforced via CLAUDE.md Stops hallucination, ensures quality
Feb 21 Real infra for all tests No mocks for DB/Redis/LLM in integration tests