PCL-Chat bot

This is a repository for our summer school project 2024 at university of applied sciences Kalrsurhe. The goal of this project is to create a retrieval-augmented generation (RAG) chatbot that can answer questions about the Point Cloud Library (PCL) documentation. The chatbot will be able to answer questions about the Point Cloud Library by retrieving relevant information from the documentation and generating an answer based on the retrieved information.

Features

Web Scraping: Utilizes BeautifulSoup to parse HTML content of the documentation and extract relevant information.
Data Processing: Employs pandas for data manipulation and storage.
Document Analysis: Analyzes different types of documentation elements such as classes, functions, and descriptions.
CSV Export: Outputs the processed data into a CSV file for easy access and further analysis.
Streamlit Integration: Provides a user-friendly interface to interact with the processed data.
Retrieval-Augmented Generation (RAG): Implements a RAG pipeline using Haystack with HyDE (with an option to alternatively use HyQE) to generate answers to user questions based on the processed data.

Getting Started

Dependencies

The project requires Python 3.10 or later and depends on
- beautifulsoup4
- pandas
- streamlit
- haystack-ai
- qdrant-haystack
- pypdf
- markdown-it-py
- sentence-transformers
- cryptography
- langfuse-haystack
- langdetect
The project also requires the following tools:
- ollama
- docker
While this app can be run on any operating system supporting the above dependencies and tools, it has been tested and instructions have been provided for Ubuntu 22.04.

Installation

install poetry via pipx:
```
pip3 install pipx
pipx install poetry
```

Install ollama:

cd ~
curl -fsSL https://ollama.com/install.sh | sh

Clone the repository:

git clone https://github.com/your-repo/rag-project.git
cd rag-project

Setup your virtual environment:
```
python3 -m venv .venv
```
install dependencies from repository root:
```
source .venv/bin/activate
poetry install
```

Setup environment variables for tracing via Langfuse:

echo "export LANGFUSE_SECRET_KEY=<your-secret-key> >> ~/.bashrc"
echo "export LANGFUSE_PUBLIC_KEY=<your-public-key> >> ~/.bashrc"

Running the RAG app

Pull the latest version of llama3.1:
```
ollama pull llama3.1
```

Start your local qdrant intance:

docker run -p 6333:6333 -p 6334:6334 \
	-v ~/qdrant_storage:/qdrant/storage:z \
	qdrant/qdrant

Activate the virtual environment:
```
source .venv/bin/activate
```
From the src folder of the repository, run the app:
```
cd src
streamlit run main.py
```
An instance of your browser should open, to look something like this:

Name		Name	Last commit message	Last commit date
Latest commit History 45 Commits
docs		docs
src		src
.gitignore		.gitignore
README.md		README.md
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PCL-Chat bot

Features

Getting Started

Dependencies

Installation

Running the RAG app

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PCL-Chat bot

Features

Getting Started

Dependencies

Installation

Running the RAG app

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages