On-premise Keboola bez Snowflake a bez S3
590 testu | 93 API endpointu | 35 gRPC handleru
plan/README.md - Hlavni implementacni plan s fazemi 1-15
| Faze | Stav | Popis |
|---|---|---|
| 1-11 | DONE | Core API (projekty, buckety, tabulky, snapshoty, branche, workspaces) |
| 12 | DONE | Connection Integration (gRPC, S3 API) |
| 13 | DONE | Observability (Prometheus metriky) |
| 14 | PROPOSAL | Backend Plugin Architecture |
| 15 | TODO | Comprehensive E2E Test Suite |
| Soubor | Popis |
|---|---|
| api/duckapi.json | OpenAPI specifikace (93 endpointu) |
| api/keboola.apib | Keboola Storage API Blueprint (reference) |
| api/keboolamanagementapi.apib | Keboola Management API Blueprint (reference) |
adr/ - 15 architektonickych rozhodnuti
| ADR | Rozhodnuti | Stav |
|---|---|---|
| 001 | Python microservice (ne PHP FFI) | IMPLEMENTED |
| 002 | 1 projekt = 1 DuckDB soubor | SUPERSEDED by 009 |
| 003 | Dev branches = separate soubory | SUPERSEDED by 007 |
| 004 | Snapshoty = Parquet export | IMPLEMENTED |
| 005 | Write queue pro serializaci | IMPLEMENTED |
| 006 | Storage Files = lokalni FS | IMPLEMENTED |
| 007 | Copy-on-Write branching | APPROVED (post-MVP) |
| 008 | Centralni metadata.duckdb | IMPLEMENTED |
| 009 | 1 tabulka = 1 DuckDB soubor | IMPLEMENTED |
| 010 | PG Wire SQL interface | IMPLEMENTED |
| 011 | Apache Arrow integrace | DEFERRED |
| 012 | Branch-First API design | IMPLEMENTED |
| 013 | SQL naming conventions | IMPLEMENTED |
| 014 | gRPC driver interface | IMPLEMENTED |
| 015 | Observability & metriky | IMPLEMENTED |
Klicove ADR: 009 (per-table files), 012 (branch-first API), 014 (gRPC)
| Soubor | Popis |
|---|---|
| plan/decisions.md | Vsechna schvalena rozhodnuti |
| plan/risks.md | Akceptovana rizika pro MVP |
| connection-duckdb-patch.md | Zmeny v Connection pro DuckDB |
| local-connection.md | Navod na lokalni Connection setup |
| Soubor | Popis |
|---|---|
| zajca.md | Puvodni zadani od Zajcy |
| duckdb-technical-research.md | Technicky vyzkum DuckDB |
| duckdb-keboola-features.md | Mapovani Keboola features |
| duckdb-api-endpoints.md | Puvodni seznam API endpointu |
| bigquery-driver-research.md | Analyza BigQuery driveru |
┌──────────────────────────────────────────────────────────────────────────────┐
│ ON-PREMISE KEBOOLA │
├──────────────────────────────────────────────────────────────────────────────┤
│ │
│ Keboola Connection (PHP) │
│ │ │
│ ├── Storage API (/v2/storage/*) │
│ │ │ │
│ │ │ gRPC (Protocol Buffers) │
│ │ ▼ │
│ └── DuckDB Driver ──────────────────────────────┐ │
│ └── DuckdbDriverClient │ │
│ │ HTTP REST/JSON │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ DuckDB API Service │ │
│ │ (Python FastAPI) │ │
│ ├─────────────────────────┤ │
│ │ - REST API (port 8000) │ │
│ │ - gRPC (port 50051) │ │
│ │ - PG Wire (port 5432) │ │
│ │ - S3 API (/s3/*) │ │
│ │ - Prometheus /metrics │ │
│ └───────────┬─────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ LOCAL FILESYSTEM │ │
│ ├─────────────────────────┤ │
│ │ /data/duckdb/ │ │
│ │ ├── project_X/ │ │
│ │ │ └── bucket/ │ │
│ │ │ └── table.duckdb│ │
│ │ └── metadata.duckdb │ │
│ │ │ │
│ │ /data/files/ │ │
│ │ └── project_X/... │ │
│ └─────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────────────┘
Klicova rozhodnuti:
- 1 tabulka = 1 DuckDB soubor (ADR-009) - parallel writes, easy branching
- Branch-First API (ADR-012) -
/branches/{branch_id}/buckets/... - gRPC + HTTP bridge (ADR-014) - flexibilni integrace s Connection
- PG Wire - SQL pristup pres PostgreSQL klienty (DBeaver, psql, psycopg2)
- S3-Compatible API - pre-signed URLs pro file upload/download
# Spustit DuckDB API Service
cd duckdb-api-service
source .venv/bin/activate
python -m src.unified_server # REST + gRPC + PG Wire
# Spustit testy
pytest tests/ -v
# Otevrit metriky dashboard
open dashboard2.htmlViz CLAUDE.md sekce "Post-MVP TODO & Technical Debt":
- Phase 15: E2E Test Suite
- CoW branching (ADR-007)
- HA / Multi-instance
- Key rotation, RBAC