griesheim-transparent.de

Repository for http://griesheim-transparent.de - A transparency platform providing citizens access to municipal government documents, decisions, and activities from Griesheim (Germany, Hesse, Darmstadt-Dieburg) city parliament.

Project Structure

Core Components

parliscope: Django web application for citizen search interface and document processing
- frontend: Web interface with search functionality and document viewer
- models: Database models (based on OParl standard)
- healthcheck: Health monitoring for external services
- parliscope: Main project configuration, settings, and Celery setup
scraper: Scrapy-based web scraper for SessionNet municipal information systems
- Extracts documents, meetings, and metadata from SessionNet
- Stores data in PostgreSQL database
solr: Apache Solr search platform with German language configuration
- Full-text indexing and search capabilities
- Custom schema for municipal documents
deployment: Docker Compose configurations with Celery infrastructure
- Redis broker for task queue
- Celery Beat for scheduled tasks
- Celery Worker for background processing

System Architecture

Data Flow

Scraper → Documents/metadata → PostgreSQL
Django management commands → Document processing → Solr indexing
Citizens → Django frontend → Solr search results

External Services

The platform integrates with several microservices for document processing:

PostgreSQL: Application database
Redis: Message broker for Celery background task queue
Apache Tika: PDF text extraction, metadata extraction, and OCR
PDFAct: Advanced PDF text extraction with structure recognition
Gotenberg: Document format conversion to PDF
Preview Service: Document thumbnail generation

Development

Quick Start

# Full development environment with Docker
cd deployment && docker-compose -f dev.yaml up

# Local development (requires uv)
cd parliscope/ && uv sync && uv run python manage.py runserver
cd scraper/ && uv sync && uv run scrapy crawl sessionnet

Key Features

Background task processing: Celery with Redis broker for scalable document processing
Document processing pipeline: OCR, text extraction, format conversion, thumbnails
Full-text search: German-language optimized Solr configuration
Health monitoring: Service availability checks for all external dependencies
Responsive design: Mobile-friendly interface for citizen access

Workflow

Automated scraping: Cron job regularly extracts data from SessionNet
Document processing: Celery background tasks process documents through external services
Search indexing: Processed documents are indexed in Solr for fast search
Citizen access: Web interface provides search and document viewing capabilities

Initial Setup

After deploying the application for the first time:

Run database migrations:
```
python manage.py migrate
```
Create a superuser account:
```
python manage.py createsuperuser
```
Configure periodic Solr indexing (choose one method):

Option A: Management command (recommended)
```
python manage.py setup_periodic_tasks --schedule "0 3 * * *"
```
This creates a daily task to update the Solr search index at 3 AM.

Option B: Django admin interface
- Navigate to /admin/django_celery_beat/periodictask/
- Create new periodic task:
  - Name: update-solr-index
  - Task: parliscope.tasks.indexing.update_solr_index
  - Crontab schedule: Create/select schedule (e.g., daily at 3 AM)
  - Arguments (JSON): {"force": false, "allow_ocr": true, "chunk_size": 10}
Manual indexing trigger (optional):
- Superusers can trigger immediate indexing via /update/ endpoint
- Uses Celery task queue for background processing

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 873 Commits
.github		.github
deployment		deployment
parliscope		parliscope
scraper		scraper
solr		solr
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

griesheim-transparent.de

Project Structure

Core Components

System Architecture

Data Flow

External Services

Development

Quick Start

Key Features

Workflow

Initial Setup

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

griesheim-transparent.de

Project Structure

Core Components

System Architecture

Data Flow

External Services

Development

Quick Start

Key Features

Workflow

Initial Setup

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages