Skip to content

ksploitx/DataSanity

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DataSanity

DataSanity is an AI-powered web application for dataset cleaning, synthetic data generation, vectorization, and data enrichment using natural language prompts.

Features

  • Dataset cleaning with LLM detection of noisy, missing, or duplicate values
  • Synthetic data generation based on schema or prompt
  • Vectorization for RAG pipelines
  • Data enrichment using web search APIs
  • Natural language prompt-based workflow
  • Support for CSV uploads and downloads

Tech Stack

  • Frontend: Next.js with Tailwind CSS
  • Backend: FastAPI (Python)
  • LLM Inference: OpenRouter API
  • Data Processing: pandas, numpy
  • Embedding: sentence-transformers
  • Vector Store: FAISS
  • Web Search: Exa or Serper.dev
  • Storage: SQLite + local filesystem

About

🧠 DataSanity is a AI-powered web application for dataset cleaning, synthetic data generation, vectorization, and data enrichment using natural language prompts.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Contributors