This repository contains the Minimum Viable Product (MVP) for a B2B lead generation system for agritech companies. It is designed for agribusinesses and focuses on modularity, zero or minimal-cost tools and end-to-end workflows from lead discovery to multi-channel outreach and feedback loops.
The system integrates AI, RAG-enhanced outreach, lead scoring, matchmaking engines, and multi-channel messaging while remaining reproducible using free tiers of cloud and API tools.
| Layer | Tools / Frameworks | Notes |
|---|---|---|
| Infrastructure & Hosting | GCP (VMs, Cloud SQL, Cloud Storage, Cloud Scheduler, Cloud Tasks) | Free tier leveraged where possible |
| Backend & API | Python, FastAPI, Docker | RESTful endpoints for modular services |
| Data Enrichment & Processing | Python (Pandas, NumPy, BeautifulSoup, Scrapy) | Web scraping and enrichment scripts |
| AI & RAG Integration | HuggingFace Transformers, Sentence-Transformers, Pinecone | Local embeddings and vector search for context retrieval |
| Analytics & Monitoring | BigQuery, GCP Cloud Logging & Monitoring | Optional, scalable for production |
| CI/CD | GitHub Actions | Automated testing & deployment |
| Authentication | OAuth2 via Firebase / Identity Platform | Secure user login & roles management |
| Module | Description | Notes |
|---|---|---|
| Lead Discovery | Scrape and compile leads from LinkedIn, Apollo.io, RSS feeds | Free-tier APIs, Python scripts |
| Lead Qualification | Weighted scoring of leads based on firmographics & tech adoption | Requires normalization & weighting formulas |
| Contextual Data Enrichment | Pull public data for leads | Logs stored in GCS |
| RAG-Enhanced Outreach | Use embeddings and LLMs to generate contextual messages | Pinecone for vector search, HuggingFace Transformers for local LLM inference |
| Multi-Channel Messaging | Email (SMTP), LinkedIn, optional social channels | Track engagements |
| Lead Scoring | Weighted score combining engagement, tech fit, compatibility | Pandas-based computation |
| CRM Sync Integration | Sync leads to Google Sheets for MVP | Google Sheets API integration |
| Human-in-the-Loop Review | Manual review interface for lead validation | Google Sheets |
| Automated Follow-ups | Schedule email follow-ups | GCP Cloud Scheduler + Python scripts |
| Feedback Learning Loop | Track engagement & update scoring | Adjust weights dynamically |
| Database Management | PostgreSQL (GCP Cloud SQL) | Includes relational & vector DB |
| Compliance Checking | GDPR & CAN-SPAM adherence | Opt-out enforcement & public data only |
| API Integrations | LinkedIn, Apollo.io, Clearbit | Python requests library |
| A/B Outreach Testing | Track message variants | Pandas for analysis |
| Competitor Mapping | Track competitors & market insights | Scraping scripts |
| Market Trend Analysis | Extract & score trends | Free feeds, Google Alerts |
| Tech Readiness Scoring | Normalize tech adoption metrics | Scales 0–1 for compatibility with lead scoring |
| Matchmaking Engine | Rank leads against user profiles | Weighted geometric mean |
| User Quality Matching | Assess lead-user fit | Uses feedback loop to refine weights |
| Scalable Deployment | Dockerized services on GCP VM | Supports future growth |
Core Databases
- Lead Database:
leads,lead_contacts,lead_scores,lead_enrichment - User / Account Database:
users,user_profiles,user_activity - Activity & Engagement Database:
outreach_events,engagement_metrics,ab_tests - Competitor & Market Intelligence Database (Optional):
competitors,market_trends - Vector Database:
lead_embeddings,competitor_embeddings,doc_embeddings
Notes:
- Relational DB (PostgreSQL) for structured data.
- Vector DB (Pinecone or FAISS) for embeddings and RAG retrieval.
- Logging DB optional for debugging and job tracking.
Lead Score Formula
score = 0.4 * tech_readiness + 0.3 * engagement + 0.2 * company_size + 0.1 * decision_maker_roleThresholds
- High: >70
- Medium: 40–70
- Low: <40
Tech Readiness
- Normalize ordinal/nominal data 0–1
- Map tech adoption indicators (IoT, AI, automation) to score
- Handle missing data with default median values
Matchmaking Engine
- Weighted geometric mean of lead_score and user_profile_score
- Handles missing attributes with flexible mode
- Thresholding configurable
- Provision GCP VM & Cloud SQL instance
- Deploy PostgreSQL schema
- Run Python scripts for lead discovery, enrichment, and scoring
- Deploy FastAPI backend via Docker
- Schedule tasks using Cloud Scheduler + Cloud Tasks
- Integrate Google Sheets API for CRM sync
- Configure OAuth2 authentication
- Test end-to-end workflow
- Include opt-out links in emails
- Use only public data
- Follow GDPR & CAN-SPAM rules
- Track consent and engagement
- Track email opens, clicks, replies
- Monitor vector search usage
- Log background job status
- Store metrics in BigQuery / Sheets
- All modules are modular & independently testable
- Free-tier services allow MVP launch without budget
- Vector search and AI components can be swapped for paid alternatives for scale
- Lead qualification & scoring formulas are fully reproducible in Pandas
- Set up GCP project and enable APIs
- Deploy PostgreSQL and vector DB
- Implement lead discovery & enrichment scripts
- Deploy FastAPI backend with endpoints
- Integrate OAuth2 for authentication
- Schedule periodic background tasks
- Configure CRM sync (Google Sheets)
- Implement outreach automation (email/LinkedIn)
- Test RAG-enhanced outreach & scoring logic
- Review compliance checklist