Overview • Quick Start • API Reference • Operations • Deployment • Development
Dyvine is a production-grade REST API for Douyin (TikTok) content management. It delivers asynchronous downloads of videos, image galleries, livestreams, and user content with persistent operation tracking and optional Cloudflare R2 archival.
Dyvine fronts the third-party f2 Douyin SDK with a FastAPI service that adds:
- Async, fire-and-forget downloads — every long-running task returns an
operation_idimmediately and continues in a tracked background task that survives until the FastAPI lifespan drains it. - Persistent operation state — a SQLite-backed
OperationStorewrites WAL-mode journal so progress, terminal status, and download paths survive process restarts. - API-key gated routers —
X-API-Keyauthentication protects every router endpoint by default; production builds refuse to boot with placeholder secrets. - Path-traversal jail — user-supplied
output_pathvalues are resolved insideDOUYIN_DOWNLOAD_ROOTand rejected if they escape, including symlink redirection. - Bounded thread-pool executors — R2 uploads, R2
head_objectfan-out, SQLite writes, and audit-log writes run on dedicated, named pools so a burst in one IO domain cannot starve another. - Operational endpoints — split
/livez,/readyz,/startupzprobes plus a Prometheus metrics endpoint at/metrics.
| Domain | Purpose |
|---|---|
| Users | Profile lookup and bulk download of a user's posts/likes to R2. |
| Posts | Per-post detail, paginated post listing, bulk download of all user posts. |
| Livestreams | Active-room download by user ID or direct URL with deduplication. |
| System | Liveness/readiness/startup probes, health summary, Prometheus metrics. |
- uv (recommended) or Python 3.12+
- Git
- A valid Douyin session cookie
- Optional: Cloudflare R2 credentials and a bucket for archival uploads
git clone https://github.com/memenow/dyvine.git
cd dyvine
# Install runtime + dev dependencies
uv sync --all-extrasCopy .env.example to .env and populate the required fields:
cp .env.example .envMinimum required settings for a non-debug deployment:
API_DEBUG=false
SECURITY_SECRET_KEY=<48+ bytes of entropy>
SECURITY_API_KEY=<48+ bytes of entropy>
DOUYIN_COOKIE=<browser session cookie>Generate strong secrets with:
python -c "import secrets; print(secrets.token_urlsafe(48))"Optional R2 archival requires every R2 field — R2_ACCOUNT_ID, R2_ACCESS_KEY_ID, R2_SECRET_ACCESS_KEY, R2_BUCKET_NAME, and R2_ENDPOINT. The readiness probe (/readyz) fails until all R2 fields plus the Douyin cookie are set; the storage service auto-disables when any field is empty.
# Development (auto-reload, debug formatting)
uv run uvicorn src.dyvine.main:app --reload
# Production
uv run uvicorn src.dyvine.main:app --host 0.0.0.0 --port 8000 \
--timeout-graceful-shutdown 25The API exposes:
- Application: http://localhost:8000
- Swagger UI: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
- OpenAPI spec: http://localhost:8000/api/v1/openapi.json
- Prometheus metrics: http://localhost:8000/metrics
All router endpoints sit under the configurable prefix API_PREFIX (default /api/v1).
When SECURITY_REQUIRE_API_KEY=true (the default), every router request must carry the configured key in an X-API-Key header. Mismatches return 401 Unauthorized. The check uses hmac.compare_digest to keep timing uniform.
X-API-Key: <SECURITY_API_KEY>Set SECURITY_REQUIRE_API_KEY=false only when fronting the API with mTLS, an authenticated gateway, or a service-mesh policy that already gates access.
GET / # Service metadata and feature list
GET /livez # Process liveness signal (200)
GET /readyz # Dependency-aware readiness (503 when any dep is missing)
GET /startupz # Reports lifespan startup completion
GET /health # Aggregated runtime metrics (uptime, RSS, CPU); always 200
GET /metrics # Prometheus exposition formatGET /api/v1/users/{user_id} # User profile
POST /api/v1/users/{user_id}/content:download # Bulk content download (posts/likes)
GET /api/v1/users/operations/{operation_id} # Poll bulk download statusGET /api/v1/posts/{post_id} # Single post detail
GET /api/v1/posts/users/{user_id}/posts # Paginated user posts
POST /api/v1/posts/users/{user_id}/posts:download # Bulk download all of a user's posts
GET /api/v1/posts/operations/{operation_id} # Poll bulk download statusThe list endpoint accepts an opaque page_token query parameter. Echo the response's next_page_token back unchanged on the follow-up call; the server treats the token as a base64-encoded Douyin cursor and never as an offset.
POST /api/v1/livestreams/users/{user_id}/stream:download # Download by user ID
POST /api/v1/livestreams/stream:download # Download by URL or webcast ID
GET /api/v1/livestreams/operations/{operation_id} # Poll download statusThe URL endpoint validates the host against an allowlist of *.douyin.com domains to prevent SSRF.
# Profile lookup
curl -H "X-API-Key: $API_KEY" \
"http://localhost:8000/api/v1/users/USER_ID"
# Schedule bulk post download (returns 202 with operation_id)
curl -X POST -H "X-API-Key: $API_KEY" \
"http://localhost:8000/api/v1/posts/users/USER_ID/posts:download"
# Schedule a livestream download by URL
curl -X POST -H "X-API-Key: $API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://live.douyin.com/123456789"}' \
"http://localhost:8000/api/v1/livestreams/stream:download"
# Poll an operation
curl -H "X-API-Key: $API_KEY" \
"http://localhost:8000/api/v1/livestreams/operations/OPERATION_ID"Every async operation returns a standardized OperationResponse:
{
"operation_id": "550e8400-e29b-41d4-a716-446655440000",
"task_id": "550e8400-e29b-41d4-a716-446655440000",
"operation_type": "livestream_download",
"subject_id": "123456789",
"status": "pending",
"message": "Livestream download scheduled",
"progress": 0.0,
"total_items": null,
"completed_items": null,
"downloaded_items": null,
"download_path": "livestreams/123456789_live.flv",
"error": null,
"created_at": "2026-05-10T10:00:00Z",
"updated_at": "2026-05-10T10:00:00Z"
}Status values are drawn from a single shared OperationStatus enum: pending, running, completed, partial, failed. Bulk-download responses extend this with per-PostType counters and a failed_count field.
src/dyvine/core/operations.py owns a SQLite database (default path data/douyin/state/operations.db, override via API_OPERATION_DB_PATH). The store:
- Runs in WAL journaling mode so concurrent readers and a single writer share the database without serializing every read.
- Maintains per-thread reader connections for parallel polls and one shared writer connection guarded by a lock.
- Sweeps
pendingandrunningrows tofailedon startup so a crashed deployment cannot leave stale "in-progress" rows behind. - Dispatches every SQLite call onto a dedicated
sqlite_executor(4 workers) so progress updates never block the event loop.
Long-running downloads are scheduled through a shared BackgroundTaskRegistry (see src/dyvine/core/background.py). On SIGTERM the FastAPI lifespan drains in-flight tasks (default 30 s) before reaping the executor pools, so a rolling restart never tears down an active R2 upload mid-flight.
| Endpoint | Purpose |
|---|---|
/livez |
Process liveness — always 200 if the event loop is responsive. |
/readyz |
Returns 503 when any required dependency is missing: Douyin cookie, service container, operation store, R2. |
/startupz |
200 once the lifespan startup hook completes; 503 while the container is still wiring up. |
/health |
Always 200; reports uptime, RSS memory, CPU usage, and an informational dependency snapshot. |
/metrics |
Prometheus exposition format. Default counters track HTTP requests/duration and R2 upload metrics. |
The /health endpoint is informational only — degraded (e.g. R2 missing) returns 200. Use /readyz for dependency-aware gating.
ServiceContainer (in src/dyvine/core/dependencies.py) provisions four named thread-pool executors so each IO domain has an independent capacity ceiling:
| Pool | Workers | Purpose |
|---|---|---|
dyvine-r2 |
16 | R2 PUT/HEAD/DELETE/list operations. |
dyvine-r2-head |
16 | Per-key head_object fan-out triggered inside list_objects hydration. |
dyvine-sqlite |
4 | OperationStore reads and writes. |
dyvine-audit |
2 | Lifecycle audit-log writes. |
docker build -t dyvine:latest .
docker run -d \
--name dyvine \
-p 8000:8000 \
-v $(pwd)/data:/app/data \
-v $(pwd)/logs:/app/logs \
--env-file .env \
--restart unless-stopped \
dyvine:latestNotes:
- The image is multi-stage and runs as non-root user
1000. The runtime layer carriespython -m uvicorndirectly with--timeout-graceful-shutdown 25so the container fits inside the default Kubernetes 30 s grace period. - The build step strips the
f2-injectedblackbinary from the production layer to keep the Trivy scan green. apt-get upgraderuns in both stages on every build to absorb upstream Debian security advisories. CI fails the image scan on HIGH/CRITICAL findings.
Manifests live under k8s/:
k8s/
├── base/
│ ├── kustomization.yaml
│ └── core/
│ ├── namespace.yaml
│ ├── service-account.yaml
│ ├── configmap.yaml
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── pvc.yaml # 1Gi RWO claim for operation state
│ └── networkpolicy.yaml
└── overlays/
└── production/
├── kustomization.yaml # Patches deployment, hpa, pdb, ingress
├── ingress.yaml
├── hpa.yaml
├── pdb.yaml
└── patches.yaml
Apply the base directly (development) or the production overlay:
# Development / staging — pin a real image first
kustomize edit set image \
ghcr.io/memenow/dyvine=ghcr.io/memenow/dyvine:main-<sha>
kubectl apply -k k8s/base
# Production
kubectl apply -k k8s/overlays/productionThe dyvine-secrets Secret is intentionally excluded from base/kustomization.yaml; the deploy workflows create it from GitHub secrets before the kustomize apply.
- The default operation store is a pod-local SQLite file backed by a
ReadWriteOncePVC (dyvine-operation-state, 1 Gi). The baseDeploymentuses theRecreatestrategy and a single replica because RWO volumes do not support attach to a surge pod, and a shared SQLite writer cannot be safely scaled horizontally. - To scale beyond a single replica, replace the SQLite-backed
OperationStorewith a shared backend (e.g. PostgreSQL) before bumpingreplicasor enabling the production HPA. - Place the API behind an ingress, gateway, or service mesh that enforces authentication and rate limiting in addition to (or instead of) the built-in API key.
uv run pytest # All tests
uv run pytest --cov=src/dyvine # With coverage
uv run pytest tests/services/test_post_service.pyCI gates merges on pytest --cov-fail-under=80; the warning filter is set to error so any new DeprecationWarning from runtime code fails the run.
The test layout mirrors the source tree:
tests/
├── conftest.py # sys.path bootstrap + per-test isolation
├── test_main.py # App startup, probes, root metadata
├── test_dependencies.py # ServiceContainer wiring
├── test_utils.py
├── core/
│ ├── test_background.py
│ ├── test_decorators.py
│ ├── test_error_handlers.py
│ ├── test_exceptions.py
│ ├── test_logging.py
│ ├── test_operations.py
│ ├── test_path_safety.py
│ └── test_settings.py
├── routers/
│ ├── test_livestreams_router.py
│ ├── test_posts_router.py
│ └── test_users_router.py
├── schemas/
│ ├── test_schemas_livestreams.py
│ ├── test_schemas_posts.py
│ └── test_schemas_users.py
└── services/
├── test_lifecycle_service.py
├── test_livestream_service.py
├── test_post_service.py
├── test_storage_service.py
├── test_storage_service_extended.py
└── test_user_service.py
make format # black + isort
make lint # ruff + mypy
make test # pytestCI runs black --check, isort --check-only, ruff check, mypy src/, and the test matrix on Python 3.12 and 3.13. The container scan job (container-scan) builds the production image and asserts that black is not present in /app/.venv/bin/.
Workflows under .github/workflows/:
| Workflow | Trigger | Purpose |
|---|---|---|
ci.yml |
Push and PR to main |
Tests, quality gates, Trivy container scan, Docker build/push. |
docker-build.yml |
Reusable from ci.yml |
Multi-platform image build and push to ghcr.io. |
deploy-dev.yml |
Push to main |
Apply k8s/overlays/production against the dev cluster (gated). |
deploy-prod.yml |
Tag push | Apply against the production cluster. |
release.yml |
Tag push | GitHub release creation. |
security.yml |
Weekly schedule + manual | Dependency audit (Bandit, Safety, Trivy). |
Renderable architecture diagrams live under docs/architecture/ as standalone HTML pages with embedded Mermaid diagrams.
Internal notes intended for AI assistants and contributors live as Markdown alongside the source (AGENTS.md, CLAUDE.md, .serena/memories/).
- API key auth is enabled by default; the production composite settings validator refuses to boot with the placeholder
change-me-in-productionsentinel whenAPI_DEBUG=false. - The schema layer rejects URLs whose host is not on the
douyin.comallowlist before the service issues any outbound HTTP request. - Output paths are resolved inside the configured download root with both pre- and post-mkdir checks, plus an explicit symlink-segment scan that fires before
Path.resolve()follows any indirection. /healthand/liveznever echo dependency state in a way that can be used to probe for missing credentials; only/readyzreports each dependency by name.
- Fork the repository and create a feature branch from
main. - Run
make format && make lint && make testbefore pushing. - Open a pull request — CI will exercise the same gates plus the container scan.
This project is licensed under the Apache License 2.0 — see the LICENSE file for details.