A proof-of-concept application that combines Docling's powerful document parsing capabilities with OpenAI's API to enable intelligent analysis and interaction with your documents.
This project demonstrates how to:
- Parse Documents - Use Docling to extract structured content from various document formats (PDF, DOCX, PPTX, images)
- Process with AI - Send the parsed content to OpenAI's API along with user-provided prompts
- Get Intelligent Output - Receive AI-generated responses that can analyze, summarize, answer questions, or perform other tasks on your document content
Document Input β Docling Parser β Structured Content β OpenAI API + User Prompt β AI Response
- Document Ingestion: Upload or provide a document (PDF, DOCX, PPTX, or image)
- Content Extraction: Docling processes the document and converts it to structured Markdown/JSON
- Prompt Integration: Combine the parsed content with your custom text prompt
- AI Analysis: Send both to OpenAI's API for intelligent processing
- Response: Receive AI-generated insights, summaries, answers, or analysis
- Multi-format Support: Handle PDFs, Word documents, PowerPoint presentations, and images
- Structured Output: Convert documents to clean, structured Markdown or JSON
- Custom Prompting: Add your own prompts to guide the AI analysis
- OpenAI Integration: Leverage GPT models for powerful document understanding
- Flexible Pipeline: Easy to extend for different use cases (RAG, Q&A, summarization, etc.)
- Docling - Advanced document parsing and conversion
- OpenAI API - GPT models for intelligent text processing
- Python - Core application framework
# 1. Parse document with Docling
parsed_content = docling.parse("document.pdf")
# 2. Combine with user prompt
user_prompt = "Summarize the key findings in this research paper"
full_prompt = f"{user_prompt}\n\nDocument content:\n{parsed_content}"
# 3. Send to OpenAI
response = openai.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": full_prompt}]
)
# 4. Get AI analysis
ai_response = response.choices[0].message.content- Python 3.8+
- Node.js 16+
- OpenAI API key
-
Clone the repository
git clone <your-repo-url> cd doc-parse-poc
-
Set up Python environment
cd backend python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt
-
Configure environment variables
cp .env.example .env # Edit .env and add your OpenAI API key -
Run the backend
python app.py
The API will be available at
http://localhost:5000
-
Install dependencies
cd frontend npm install -
Start the development server
npm start
The app will open at
http://localhost:3000
- Upload a Document: Drag and drop or click to select a document (PDF, DOCX, PPTX, or image)
- Enter a Prompt: Type what you want to know about the document
- Analyze: Click the analyze button to process your document
- View Results: See the AI-generated analysis with document metadata
- "Summarize the key points and main conclusions"
- "What are the most important findings?"
- "Extract all numerical data and statistics"
- "Identify the main themes and topics"
- "What action items or recommendations are mentioned?"
backend/
βββ app.py # Main Flask application
βββ config/
β βββ settings.py # Configuration management
βββ services/
β βββ document_parser.py # Docling integration
β βββ openai_service.py # OpenAI API client
β βββ file_service.py # File handling
βββ routes/
β βββ document_routes.py # Document processing endpoints
β βββ health_routes.py # Health check endpoints
βββ requirements.txt # Python dependencies
βββ .env.example # Environment template
frontend/
βββ src/
β βββ components/ # React components
β βββ services/ # API client
β βββ types/ # TypeScript definitions
β βββ utils/ # Helper functions
βββ public/ # Static assets
βββ package.json # Node.js dependencies
OPENAI_API_KEY=your_api_key_here
OPENAI_MODEL=gpt-4
FLASK_DEBUG=False
SECRET_KEY=your_secret_key
UPLOAD_FOLDER=uploads
MAX_CONTENT_LENGTH=16777216 # 16MBREACT_APP_API_URL=http://localhost:5000/api# Install production server
pip install gunicorn
# Run with gunicorn
gunicorn --bind 0.0.0.0:5000 wsgi:application# Build for production
npm run buildAnalyze a document with AI
Request:
file: Document file (multipart/form-data)prompt: Analysis prompt (string)
Response:
{
"success": true,
"analysis": "AI-generated analysis...",
"metadata": {
"document": {
"title": "document.pdf",
"file_type": ".pdf",
"file_size": 1024,
"page_count": 5
},
"usage": {
"total_tokens": 1500,
"model_used": "gpt-4"
}
}
}Get supported file formats
Health check endpoint
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
MIT License (inherits core Docling license)
Ready to unlock the intelligence hidden in your documents? This POC shows you how to combine state-of-the-art document parsing with powerful AI analysis!