This repository https://github.com/navchetna/research-assistant provides a comprehensive framework for building a Retrieval-Augmented Generation (RAG) chatbot specifically designed for researchers. This framework combines various components to create a powerful system that can retrieve relevant information from documents and generate accurate, contextually appropriate responses.
![]() |
![]() |
![]() |
![]() |
Before getting started, ensure your system meets the following requirements:
- RAM: At least 10GB of free RAM -> Groq setup
- Storage: Required storage space depends on your configuration:
- Base setup (using Groq): ~7.7GB
- 2.8GB for Docker images
- 1.1GB for
BAAI/bge-reranker-basemodel - 256MB for
BAAI/bge-small-en-v1.5model - 3.4GB for dataprep models
- Base setup (using Groq): ~7.7GB
- Additional for vLLM setup: 5-10GB extra for
meta-llama/Llama-3.2-1B-Instructmodel
- For Server Deployments: vLLM option
- For Personal Computers: Use Groq serving option
The repository offers two options for the LLM serving component: Groq and vLLM.
- vLLM is an open-source high-performance inference engine for large language models.
- Groq is a proprietary inference engine designed for ultra-fast response times using specialized hardware.
Groq is recommended for running the application on personal computers as Groq's API-based approach does NOT require running the LLM locally, making it much less resource-intensive for personal computers that may not have powerful GPUs. Additionally using Groq requires only an API key rather than setting up a complex local inference environment.
The vLLM option is more suitable for server deployments with dedicated GPU resources where running models locally might be preferred for reasons like data privacy.
RAG combines retrieval systems with large language models to generate more accurate and relevant responses by incorporating external knowledge.
- Dataprep Component: The framework includes a dataprep component that processes input documents, breaking them into manageable chunks that can be embedded and indexed.
- Embedding Service: Documents are transformed into vector representations using embedding models like
BAAI/bge-base-en-v1.5which captures the semantic meaning of text. - Vector Storage: These embeddings are stored in a
Redisvector database, allowing for efficient similarity searches. - Retriever Component: When a user asks a question, the retriever component finds the most relevant document chunks by comparing the query embedding with the stored document embeddings.
- Reranker Component: Retrieved documents are further refined through a reranking process using models like
BAAI/bge-reranker-baseto ensure the most relevant context is provided to the LLM. - LLM Backend: The LLM (either via
vLLMorGroq) generates responses based on the retrieved context and the user's query. - Audio Component: Transcribes audio input into text, enabling voice interaction with the RAG system. Uses OpenAI's
Whisper basemodel. - Frontend UI: User interface to interact with the chatbot.
This architecture allows real-time information integration and better response quality by combining search capabilities with generative models.
Each service is containerized for modularity and scalability.
Before building the application images, you need to have Docker installed on your system.
-
Visit the official Docker installation guide here -> Install docker
-
Choose your operating system:
- Windows users with WSL: Select Ubuntu from the installation options
- Mac users: Select Docker Desktop for Mac
- Linux users: Select your specific distribution (Ubuntu, Debian, Fedora, etc.)
-
Follow the installation steps provided in the documentation for your chosen platform
-
After completing the installation, restart your terminal or command prompt
To verify that Docker is installed correctly, open your terminal and run:
dockerExpected Output: You should see Docker's help information displayed, similar to this:
Usage: docker [OPTIONS] COMMAND
A self-sufficient runtime for containers
Common Commands:
run Create and run a new container from an image
exec Execute a command in a running container
ps List containers
build Build an image from a Dockerfile
pull Download an image from a registry
push Upload an image to a registry
images List images
login Log in to a registry
logout Log out from a registry
search Search Docker Hub for images
version Show the Docker version information
info Display system-wide information
...
If you see this output (or similar), Docker is successfully installed! If you get an error like "command not found" or "docker is not recognized", please revisit the installation steps.
Once Docker is successfully installed and verified, you're ready to proceed with building the application images.
Only run the BUILD commands present in each of these README files. Run them individually ONLY for testing.
- Build dataprep component
- Build retriever component
- Build backend component
- Build Groq component
- Build UI component
Optional components:
Before running the application, you need to configure your API keys and cache directories:
-
Copy the example environment script:
cd research-assistant/ cp env-example.sh .env.sh -
Edit
research-assistant/.env.shand add your API keys and configure cache directories:- Add your
GROQ_API_KEYorHUGGINGFACEHUB_API_TOKEN - Set appropriate cache directories for models and data
- Configure any other required environment variables
- Add your
-
Make the script executable:
chmod +x .env.sh
-
Source the environment variables:
source .env.sh
Once you have built all the required images and configured your environment variables, you can start the application using Docker Compose. This will spin up Groq as the serving engine.
Run in foreground:
cd research-assistant/ # make sure you're in the right directory
docker compose -f install/docker/research-assistant/docker-compose-groq.yaml upRun in background (detached mode):
cd research-assistant/
docker compose -f install/docker/research-assistant/docker-compose-groq.yaml up -dTo view logs when running in background:
cd research-assistant/
docker compose -f install/docker/research-assistant/docker-compose-groq.yaml logs -fTo stop the application running in background:
docker compose -f research-assistant/install/docker/research-assistant/docker-compose-groq.yaml downOnce all services are running, open your browser and navigate to:
http://localhost:5009
You should now be able to interact with the RAG chatbot
To upload your own PDF documents to the RAG system, use the following curl command:
curl -X POST "http://localhost:1006/v1/dataprep" \
-H "Content-Type: multipart/form-data" \
-F "files=@/path/to/your/document.pdf"Note: The first time you upload a PDF, it may take some time as the system downloads multiple embedding and processing models. Subsequent file uploads will be much faster.
export LLM_SERVER_HOST_IP=groq-service
export LLM_SERVER_PORT=8000
export GROQ_MODEL=llama-3.3-70b-versatile
export GROQ_API_KEY=${GROQ_API_KEY}
docker compose -f install/docker/docker-compose-groq.yaml upexport SERVER_HOST_IP=vllm-service
export LLM_SERVER_HOST_IP=vllm-service
export LLM_SERVER_PORT=8000
docker compose -f install/docker/research-assistant/docker-compose.yaml up



