An intelligent study assistant that helps MISM students at CMU learn from their course materials through AI-powered summaries, practice questions, and interactive Q&A using Retrieval-Augmented Generation (RAG).
- Document Processing: Upload and process PDFs, PowerPoint slides, Word documents, and text files
- RAG-Powered Q&A: Ask questions and get accurate answers grounded in your course materials
- Smart Summaries: Generate concise summaries of topics or entire documents
- Practice Questions: Auto-generate diverse question types:
- Multiple Choice Questions (MCQs)
- True/False
- Fill-in-the-Blanks
- Short Answer
- Match-the-Following
- Long Answer/Essay Questions
- Evaluation Framework: Built-in RAGAS and DeepEval metrics for quality assessment
- User Feedback: Collect and analyze user feedback for continuous improvement
AI-study-buddy/
├── backend/ # FastAPI backend server
│ ├── main.py # API endpoints
│ ├── config.py # Configuration management
│ └── requirements.txt
├── frontend/ # Streamlit UI
│ ├── app.py # Main application
│ └── requirements.txt
├── utils/ # Core utilities
│ ├── document_processor.py # Document parsing and chunking
│ ├── vector_store.py # Vector database and RAG pipeline
│ └── content_generator.py # LLM-powered content generation
├── evaluation/ # Evaluation framework
│ └── evaluator.py # RAGAS and DeepEval integration
├── data/ # Data storage
│ ├── raw/ # Uploaded documents
│ ├── processed/ # Processed documents
│ └── chromadb/ # Vector database
├── models/ # Model configurations
└── tests/ # Test suite for debugging
- Python 3.9 or higher
- OpenAI API key (required)
- Git
- Get your OpenAI API key: Visit OpenAI API Keys
- Set up environment variables
cp .env.example .env
# Edit .env and add your OpenAI API key- Install backend dependencies
cd backend
pip install -r requirements.txt- Install frontend dependencies
cd ../frontend
pip install -r requirements.txt- Start the backend server (Terminal 1)
cd backend
python main.pyThe API will be available at http://localhost:8000
- Start the frontend (Terminal 2)
cd frontend
streamlit run app.pyThe UI will open in your browser at http://localhost:8501
- Navigate to the "Upload Documents" page
- Select a PDF, PPTX, DOCX, or TXT file
- Click "Process Document"
- The system will extract text, create chunks, and generate embeddings
- Go to the "Ask Questions" page
- Type your question about the course materials
- Get AI-generated answers with supporting context
- Rate the answer to help improve the system
- Visit the "Generate Summary" page
- Choose topic-based or custom query summary
- Receive a concise summary of the content
- Download the summary for later reference
- Select "Practice Questions"
- Choose question type (MCQ, True/False, etc.)
- Specify number of questions
- Review questions and answers for self-assessment
Edit .env file to customize:
# OpenAI Settings
OPENAI_API_KEY=your_key_here
OPENAI_MODEL=gpt-4-turbo-preview
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
# RAG Settings
CHUNK_SIZE=1000
CHUNK_OVERLAP=200
TOP_K_RETRIEVAL=5
TEMPERATURE=0.4
# Vector Database
VECTOR_DB_TYPE=chromadb
CHROMADB_PATH=./data/chromadbThe system implements a sophisticated RAG pipeline:
-
Document Processing
- Extract text from various file formats
- Split into manageable chunks with overlap
- Preserve metadata and context
-
Embedding Generation
- Generate vector embeddings using OpenAI
- Store in ChromaDB for efficient retrieval
- Support batch processing for large documents
-
Retrieval
- Convert queries to embeddings
- Perform similarity search
- Retrieve top k most relevant chunks
-
Generation
- Provide context to LLM
- Generate accurate, grounded responses
- Include source references
The system includes comprehensive evaluation:
- Faithfulness: How factually accurate are the answers?
- Answer Relevancy: How relevant is the answer to the query?
- Context Precision: How precise is the retrieved context?
- Context Recall: How complete is the retrieved context?
- Answer Relevancy: Semantic relevance of responses
- Faithfulness: Consistency with source material
- Coherence: Logical flow and clarity
- Star ratings (1-5)
- Qualitative comments
- Usage analytics
- Documents are stored locally
- No data sharing with third parties
- API keys stored securely in environment variables
- User sessions isolated
- Uploaded files can be deleted anytime
Once the backend is running, visit:
- API Docs:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
- POST /upload - Upload and process a document
- POST /query - Ask questions about materials
- POST /summary - Generate summaries
- POST /questions - Generate practice questions
- GET /stats - Get knowledge base statistics
- Upload Quality Materials: Clear, well-formatted documents work best
- Specific Questions: More specific queries yield better answers
- Chunk Size: Adjust based on your document structure
- Regular Updates: Keep adding new materials for better coverage
- Provide Feedback: Help improve the system through ratings
Built with ❤️ for MISM students at Carnegie Mellon University