Repository for http://griesheim-transparent.de - A transparency platform providing citizens access to municipal government documents, decisions, and activities from Griesheim (Germany, Hesse, Darmstadt-Dieburg) city parliament.
-
parliscope: Django web application for citizen search interface and document processing
- frontend: Web interface with search functionality and document viewer
- models: Database models (based on OParl standard)
- healthcheck: Health monitoring for external services
- parliscope: Main project configuration, settings, and Celery setup
-
scraper: Scrapy-based web scraper for SessionNet municipal information systems
- Extracts documents, meetings, and metadata from SessionNet
- Stores data in PostgreSQL database
-
solr: Apache Solr search platform with German language configuration
- Full-text indexing and search capabilities
- Custom schema for municipal documents
-
deployment: Docker Compose configurations with Celery infrastructure
- Redis broker for task queue
- Celery Beat for scheduled tasks
- Celery Worker for background processing
- Scraper → Documents/metadata → PostgreSQL
- Django management commands → Document processing → Solr indexing
- Citizens → Django frontend → Solr search results
The platform integrates with several microservices for document processing:
- PostgreSQL: Application database
- Redis: Message broker for Celery background task queue
- Apache Tika: PDF text extraction, metadata extraction, and OCR
- PDFAct: Advanced PDF text extraction with structure recognition
- Gotenberg: Document format conversion to PDF
- Preview Service: Document thumbnail generation
# Full development environment with Docker
cd deployment && docker-compose -f dev.yaml up
# Local development (requires uv)
cd parliscope/ && uv sync && uv run python manage.py runserver
cd scraper/ && uv sync && uv run scrapy crawl sessionnet- Background task processing: Celery with Redis broker for scalable document processing
- Document processing pipeline: OCR, text extraction, format conversion, thumbnails
- Full-text search: German-language optimized Solr configuration
- Health monitoring: Service availability checks for all external dependencies
- Responsive design: Mobile-friendly interface for citizen access
- Automated scraping: Cron job regularly extracts data from SessionNet
- Document processing: Celery background tasks process documents through external services
- Search indexing: Processed documents are indexed in Solr for fast search
- Citizen access: Web interface provides search and document viewing capabilities
After deploying the application for the first time:
-
Run database migrations:
python manage.py migrate
-
Create a superuser account:
python manage.py createsuperuser
-
Configure periodic Solr indexing (choose one method):
Option A: Management command (recommended)
python manage.py setup_periodic_tasks --schedule "0 3 * * *"This creates a daily task to update the Solr search index at 3 AM.
Option B: Django admin interface
- Navigate to
/admin/django_celery_beat/periodictask/ - Create new periodic task:
- Name:
update-solr-index - Task:
parliscope.tasks.indexing.update_solr_index - Crontab schedule: Create/select schedule (e.g., daily at 3 AM)
- Arguments (JSON):
{"force": false, "allow_ocr": true, "chunk_size": 10}
- Name:
- Navigate to
-
Manual indexing trigger (optional):
- Superusers can trigger immediate indexing via
/update/endpoint - Uses Celery task queue for background processing
- Superusers can trigger immediate indexing via
MIT