CyberGuard is a complete system that allows uploading documents (PDF, texts, articles, emails) and provides advanced information processing capabilities:
- Semantic search based on embedding vectors
- Generation of coherent summaries
- Context-based questions and answers
- Similar content recommendations
- Data visualization and analysis
The system is built on a modern architecture:
- Frontend: HTML/JavaScript with Tailwind CSS
- Backend API: FastAPI (Python)
- Vector Database: FAISS for fast search
- Embeddings: SentenceTransformers (all-MiniLM-L6-v2)
- LLM: Integration with OpenAI GPT-3.5 for summarization and Q&A
- Upload Center: Upload PDF, TXT files
- Text Extractor: Extract text from documents
- Chunker: Split text into semantic fragments
- Embedding Generator: Transform text into vectors
- Vector Index: Store vectors for fast search
- Semantic Search: Similarity-based search
- Summarizer: Generate coherent summaries
- Q&A Chatbot: Context-based answers to questions
- Dashboard: View statistics and search history
- Create a Python virtual environment:
python -m venv venv
source venv/bin/activate # Linux/Mac
venv\Scripts\activate # Windows
- Install dependencies:
pip install fastapi uvicorn pydantic faiss-cpu openai sentence-transformers PyPDF2
- Run the server:
cd astramind
python main.py
The server will run at: http://localhost:8000
- Open the
index.htmlfile from theastramind-clientdirectory in a web browser.
- Upload documents through the drag-and-drop interface
- Use the search bar to query the documents
- Choose between semantic search, summary generation, or questions and answers
- View usage statistics in the dashboard panel
- FastAPI: Python framework for fast APIs
- FAISS: Library for efficient vector search
- SentenceTransformers: Models for generating embeddings
- OpenAI API: For generating summaries and answers
- Tailwind CSS: CSS framework for modern design
- Chart.js: Library for data visualization
- Add JWT authentication
- Support for more document types (DOCX, HTML, etc.)
- Implementation of local models (Llama2) for independence from external APIs
- Improvement of the interface for mobile devices
- Addition of export and sharing functionalities