Symptom-based disease prediction powered by DistilBERT embeddings, Logistic Regression, and a conversational RAG pipeline.
- π¬ Top-3 Disease Prediction β Enter symptoms in plain English and get the top 3 most likely conditions with confidence scores
- π§ DistilBERT Embeddings β Semantic understanding of symptoms using a pre-trained transformer model
- π RAG Pipeline β Retrieval-Augmented Generation for medical knowledge retrieval
- ποΈ Intent Classification β Understands user intent before answering
- π¬ React Web App β Full-stack React + Express interface for a smooth chat experience
- ποΈ SQLite Storage β Lightweight local chat history via
better-sqlite3
| Symptom Input | Predicted Diseases |
|---|---|
| fever, cough, sore throat, body pain, fatigue | Flu, COVID-19, Asthma |
| frequent urination, increased thirst, fatigue | Diabetes, Anemia, Hypertension |
| severe headache, nausea, vomiting | Migraine, Hypertension, Food Poisoning |
β οΈ Disclaimer: VitaAI is for educational purposes only. It is not a substitute for professional medical advice, diagnosis, or treatment.
| Tool | Purpose |
|---|---|
React 19 + Vite |
Frontend SPA |
Express |
API server |
LangChain + Google Gemini |
RAG & intent classification |
transformers (DistilBERT) |
Sentence embeddings |
scikit-learn |
Logistic Regression classifier |
better-sqlite3 |
Chat history storage |
Tailwind CSS |
Styling |
vitaai/
βββ server.ts # Express + Vite server entry point
βββ server/
β βββ routes.ts # API routes (RAG, chat, intent classification)
βββ src/ # React frontend
βββ index.html # HTML entry point
βββ FAQ.csv # Symptom-disease training dataset
βββ package.json # Node dependencies
βββ vite.config.ts # Vite configuration
βββ tsconfig.json # TypeScript config
User enters symptoms
β
βΌ
DistilBERT encodes symptoms β sentence embedding
β
βΌ
Logistic Regression classifies embedding
β
βΌ
Top-3 diseases returned with confidence scores
β
βΌ
LangChain RAG retrieves medical context
β
βΌ
Gemini generates a natural-language response
The model is trained on FAQ.csv, a curated dataset of symptomβdisease pairs covering conditions including:
- Flu, COVID-19, Asthma
- Diabetes, Hypertension, Heart Disease
- Migraine, Anemia, Kidney Stone
- Food Poisoning, and more
Contributions, issues, and feature requests are welcome!
- Fork the repo
- Create a branch:
git checkout -b feature/your-feature - Commit:
git commit -m "Add your feature" - Push:
git push origin feature/your-feature - Open a Pull Request
Harshit Ranbhare
This project is open source and available under the MIT License.