Convert podcast and social-feed episodes into clean text transcripts for downstream BTC analysis and reporting.
bitpod automates this workflow:
- Discover and poll configured feeds.
- Identify new episodes.
- Choose the best available source (RSS audio, captions, or media) based on policy.
- Transcribe and normalize text.
- Export deterministic artifacts for model consumption.
- Jack Mallers Show processing path is implemented and serves as the validation baseline.
- Feed discovery supports YouTube RSS URL extraction from channel inputs.
- Sync supports mixed feed strategy per show (
youtubeplus optionalrsslist), with RSS prioritized. - Episode processing is idempotent: successful episodes are skipped on reruns.
- Transcript artifacts are written under
transcripts/<show>/<year>/with status tracking inindex/processed.json. - Captions are parsed and stitched (de-overlap) before acceptance; low-quality captions fall back to media transcription.
Immediate goal: reliable transcript generation from high-signal feeds. Primary consumer: GPT workflows that currently require clean text transcripts as input.
# from repo root
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
export OPENAI_API_KEY="your_key_here"Optional: override root path for generated artifacts.
export BITPOD_ROOT="/path/to/bitpod"# discover configured feed(s) for Jack Mallers Show
python -m bitpod discover --show jack_mallers_show
# preview only (no downloads/transcription/writes)
python -m bitpod sync --show jack_mallers_show --dry-run
# sync/transcribe newest episodes (default max: 3)
python -m bitpod sync --show jack_mallers_show
# choose source behavior
python -m bitpod sync --show jack_mallers_show --source-policy balanced
# restrict which feed families are eligible
python -m bitpod sync --show jack_mallers_show --feed-mode rss_preferred
# fail fast if youtube captions are low quality (no media download fallback)
python -m bitpod sync --show jack_mallers_show --no-youtube-download
# require a stronger caption quality floor
python -m bitpod sync --show jack_mallers_show --min-caption-words 300
# sync/transcribe newest N episodes
python -m bitpod sync --show jack_mallers_show --max-episodes 5
# process only recent episodes
python -m bitpod sync --show jack_mallers_show --since-days 14
# optional transcription model override
python -m bitpod sync --show jack_mallers_show --model gpt-4o-mini-transcribeInputs
- Show/feed definitions in
shows.json. - Runtime config:
OPENAI_API_KEY(required),BITPOD_ROOT(optional).
Outputs
- Transcript Markdown files in
transcripts/<show_key>/<YYYY>/. - Companion plain-text files:
*_plain.txt. - Companion structured segments:
*_segments.jsonl. - Weekly run status artifacts:
transcripts/jack_mallers_show/jack_mallers_status.jsontranscripts/jack_mallers_show/jack_mallers_status.md
- Processing status index in
index/processed.json. - Discovered/normalized feed metadata in
shows.json.
Artifact tracking policy:
- Runtime/cache outputs are local-only and git-ignored (for example:
cache/,.wrangler/,artifacts/public/,artifacts/private/, cost logs, and feedback logs). - Retained per-run GPT QA artifacts are written to
artifacts/private/gpt-qa/. - Latest deploy preview pointer is written to
artifacts/private/coordination/latest_deploy_url.txt. - Canonical transcript artifacts under
transcripts/remain tracked unless explicitly changed by project policy. - Before push, run
make auditto enforce size guard + unit tests.
Per-show contract (API-like surface):
- Each show has its own stable pointer (
stable_pointerinshows.json). - Each show has its own status artifacts (
<stable_pointer_stem>_status.json|md). - Schedules can differ per show while preserving the same output contract.
- Public permalink publish (semi-paranoid): each show gets
artifacts/public/permalinks/<opaque_id>/index.htmlartifacts/public/permalinks/<opaque_id>/intake.mdartifacts/public/permalinks/<opaque_id>/transcript.md(stable latest transcript permalink for GPT)artifacts/public/permalinks/<opaque_id>/latest.mdartifacts/public/permalinks/<opaque_id>/status.jsonartifacts/public/permalinks/<opaque_id>/discovery.json<opaque_id>is stable pershow_key/sector_feed_id, not per run- the stable single-URL show page is
https://permalinks.bitpod.app/<opaque_id> - with noindex/nofollow/noarchive +
robots.txtdisallow-all. - internal mapping remains private in
artifacts/private/public_permalink_manifest.json. - feed identity/tags contract reference:
docs/architecture/feed_identity_contract.md. status.jsonnow includes machine-readable public bundle health:public_bundle_completepublic_bundle_readabilitypublic_bundle_missing
- the landing page is the preferred single GPT entry point:
- it embeds the machine-readable run contract
- it links the canonical raw artifacts
- its executive summary and artifact rail should vary honestly for
usable,degraded,failed, andno-new-episode - it must not imply full bundle health unless public HTTP verification has actually completed
jack_mallers_show: confirmed working reference feed path.- Additional source types and social feed integrations: planned next.
Primary stable transcript pointer (Jack Mallers):
transcripts/jack_mallers_show/jack_mallers.md
Raw GitHub URL:
https://raw.githubusercontent.com/cjarguello/bitpod/main/transcripts/jack_mallers_show/jack_mallers.md
Note:
- If the repository is private, GPT cannot fetch the raw URL directly.
- Use local upload artifacts instead (
jack_mallers.mdplusjack_mallers_status.md).
How it updates:
- Run
syncsuccessfully forjack_mallers_show. - The latest successful transcript is selected from
index/processed.json. jack_mallers.mdis overwritten with that transcript content.- Commit and push changes to
mainso the raw URL serves the update.
Operational commands:
cd "$WORKSPACE/sector-feeds"
source .venv311/bin/activate
python -m bitpod discover --show jack_mallers_show
python -m bitpod sync --show jack_mallers_show --max-episodes 1Weekly helper scripts:
# Monday run (newest 1 episode, status artifacts always written)
bash scripts/run_mallers_weekly.sh
# Optional: include GPT report generation in weekly run
WEEKLY_GPT_REPORT=1 bash scripts/run_mallers_weekly.sh
# Tuesday verification report (writes artifacts/jack_mallers_show_tuesday_report.md)
# - includes deterministic intake readiness checks for intake.md/latest.md/status.json/discovery.json
bash scripts/report_mallers_tuesday_status.sh
# One-shot legacy Tuesday track (sync + report + contract print)
bash scripts/run_legacy_tuesday_track.sh jack_mallers_show
# Friday verification report (writes artifacts/jack_mallers_show_friday_report.md)
bash scripts/report_mallers_friday_status.sh
# One-shot legacy Friday track (sync + report + contract print)
bash scripts/run_legacy_friday_track.sh jack_mallers_show
# One-shot experimental track (collect + process + contract print)
bash scripts/run_experimental_track.sh jack_mallers_show
# Force experimental track to use RSS only
BITPOD_FEED_MODE=rss_only bash scripts/run_experimental_track.sh jack_mallers_show
# Validate feed identity contract (IDs + canonical catalog path)
make feed-identity-check SHOW_KEY=jack_mallers_show
# Render one-page board for Tuesday/Friday/Experimental status
make track-status-board SHOW_KEY=jack_mallers_show
# Fast PASS/FAIL gate from board JSON (non-zero exit on failure)
make track-status-check SHOW_KEY=jack_mallers_show
# One-command fast operator preflight
make preflight SHOW_KEY=jack_mallers_show
# Release readiness gate (preflight + key tests)
make release-ready SHOW_KEY=jack_mallers_show
# Full daily operator cycle (all tracks + board + gate)
make ops-cycle SHOW_KEY=jack_mallers_show
# Optional DNS helpers (macOS networksetup)
make dns-set-fast
make dns-restore-default
# Full session helper (with DNS set/restore)
make today-run SHOW_KEY=jack_mallers_show
# Full session helper (without DNS changes)
make today-run-no-dns SHOW_KEY=jack_mallers_show
# Full final gate (today-run-no-dns + handoff refresh + live smoke)
make final-check SHOW_KEY=jack_mallers_show
# Refresh concise handoff snapshot
make handoff-refresh SHOW_KEY=jack_mallers_show
# Public smoke test against canonical URLs (HTTP + contract markers)
make smoke-public SHOW_KEY=jack_mallers_show
# Fail if status timestamp is older than threshold (default 9 days)
make stale-check SHOW_KEY=jack_mallers_show
# Quiet wrapper mode (compact output)
QUIET=1 make ops-cycle SHOW_KEY=jack_mallers_show
# Legacy weekly alias (writes artifacts/jack_mallers_show_weekly_report.md)
bash scripts/report_mallers_weekly_status.shGitHub automation ownership:
- weekly fetch automation:
.github/workflows/mallers-weekly-fetch.yml- schedule: Monday 9:15 PM CST (
15 3 * * 2UTC) - command target:
bash scripts/run_mallers_weekly.sh - scope: transcript/status/permalink refresh only, no GPT report generation by default
- local ad hoc behavior matches the workflow unless you explicitly set
WEEKLY_GPT_REPORT=1 - stable status surface:
transcripts/jack_mallers_show/jack_mallers_status.jsontranscripts/jack_mallers_show/jack_mallers_status.mdtranscripts/jack_mallers_show/jack_mallers_gpt_review_request.md
- unique per-run debug surface:
artifacts/runs/mallers_weekly_fetch/jack_mallers_show/<timestamp>__summary.mdartifacts/runs/mallers_weekly_fetch/jack_mallers_show/<timestamp>__status.json
- interpretation:
- stable status files answer "what is the current latest known state?"
- unique run summaries answer "what happened on this specific weekly run?"
- schedule: Monday 9:15 PM CST (
- daily intake gate:
.github/workflows/intake-gate-daily.yml- produces or consumes the
bitregime-coreintake artifact before running the v2-default handshake gate - requires
BITPOD_CROSS_REPO_TOKENonly when the workflow must check out the privatebitregime-corerepo itself
- produces or consumes the
When debugging BitPod workspace cutover/path issues on macOS, run:
# explicit root
bash scripts/diagnose_codex_workspace_rebuild.sh "$WORKSPACE"
# or auto-detect (prefers BitPod-App, then falls back to bitpod-app)
bash scripts/diagnose_codex_workspace_rebuild.shThis script checks:
- root
.codexpresence andorg-workspace.tomlroot path, - allows canonical root path casing (
BitPod-App) with lowercase fallback (bitpod-app), - repo-local
.codex/config.tomlcoverage across the active repo set, - stale references to rebuild/retired paths,
- whether repo-local verification scripts are being run from the correct repo root,
- origin remotes + current branch for each active repo.
bash scripts/run_show_weekly.sh <show_key> bash scripts/report_show_weekly_status.sh <show_key> [weekly|tuesday|friday]
bash scripts/check_bitregime_core_intake_handshake.sh
../bitregime-core/artifacts/intake/jack_mallers_show_intake.json
deck_weekly_btc
make intake-handshake-check SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc
make intake-gate-daily SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc
make intake-gate-triage SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc
python3 scripts/generate_weekly_critical_bundle.py
--report-md ../artifacts/recovery/2026-02-27/weekly_btc_strict_single_pass_7_artifact.md
--output-json artifacts/private/weekly_bundles/weekly_critical_bundle.json
bash scripts/run_show_adhoc.sh <show_key>
bash scripts/record_show_gpt_feedback.sh <show_key> <feedback_markdown_path>
bash scripts/verify_show_adhoc.sh <show_key>
bash scripts/bitpod_status.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] bash scripts/bitpod_sync.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] [--min-episode-age-minutes 180] [--trigger-cmd ""] bash scripts/bitpod_verify.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] [--gpt-feedback-file ] [--gpt-note ""]
- Verify remains useful as a standalone audit/recheck command or for recording GPT feedback linkage.
python3 scripts/refresh_public_permalinks.py [show_key]
python3 scripts/verify_public_permalink_bundle.py --show jack_mallers_show --base-url https://permalinks.bitpod.app
bash scripts/deploy_public_permalinks_worker.sh [worker_name] [show_key]
- canonical verification may still false-negative from some CI edges even when the public hostname is healthy
- in that case the workflow falls back to preview-host verification and warns instead of failing the deploy
.venv311/bin/python scripts/gpt_report_from_transcript.py
--transcript-path transcripts/jack_mallers_show/jack_mallers.md
--report-name gpt-bitreport-pods-all-YYYYMMDD-HHMMSS.md
--show-key jack_mallers_show
.venv311/bin/python scripts/gpt_report_from_transcript.py
--transcript-path transcripts/jack_mallers_show/jack_mallers.md
--report-name gpt-bitreport-pods-all-YYYYMMDD-HHMMSS.md
--show-key jack_mallers_show
--full-text
bash scripts/check_repo_size.sh
make audit
Timeline policy:
- Default local timeline is `America/Managua` (no DST drift).
- `--as-of` is optional and intended for historical debugging/replay.
- For live-heavy YouTube sources, sync applies a default maturity guard (`--min-episode-age-minutes 180`) to avoid unfinished captures.
- GPT bridge cost estimates are logged to:
- local: `artifacts/cost-meter/bridge_cost_estimates.jsonl`
- shared tools source of truth: `$WORKSPACE/tools/artifacts/cost-meter/cost_events.jsonl`
Shared cost summary (all repos/commands that write to tools meter):
```bash
$WORKSPACE/tools/costs/cost_ctl.py
Weekly cost guardrails (automation-friendly):
run_show_weekly.shexecutes cost guard checks when caps are set via env.- Defaults are auto-loaded from
scripts/bitpod_budget.env(or optional override file.bitpod_budget.envat repo root). - Supported env vars:
COST_SOURCE(example:bitpod.gpt_report_from_transcript)COST_WINDOW_HOURS(default from tools CLI is 24)COST_RUN_WARN,COST_RUN_FAILCOST_DAILY_WARN,COST_DAILY_FAILCOST_WARN_EXIT_0=1to keep warning status non-blocking
Governance metadata (optional, status artifact only):
- Sync can embed spec-lock/provenance metadata into
<stable_pointer_stem>_status.json|md. - Default tuple:
origin_actor=OTHER,authority_state=PROPOSAL. - Fast setup:
cp .bitpod_runtime.env.example .bitpod_runtime.envand edit values. - Weekly scripts auto-load
.bitpod_runtime.envwhen present. - Optional env vars:
BITPOD_ORIGIN_ACTOR(CJ|GPT|CODEX|TAYLOR|HUMAN_TEAM|OTHER)BITPOD_AUTHORITY_STATE(PROPOSAL|CJ_ENDORSED|TEAM_ENDORSED|CJ_OVERRIDE)BITPOD_EXPANSION_GATE(defaultBLOCKED)BITPOD_SPEC_LOCK_ORIGINAL_ASKBITPOD_SPEC_LOCK_SUCCESS_CRITERIA(comma-separated)BITPOD_SPEC_LOCK_OUT_OF_SCOPE(comma-separated)BITPOD_BASELINE_REFS(comma-separated)- Soft override guard inputs (non-blocking visibility):
BITPOD_OVERRIDE_CONFLICT(1/true/yes/on)BITPOD_OVERRIDE_CONFLICT_NOTEBITPOD_OVERRIDE_IMPACTED_DECISIONBITPOD_OVERRIDE_BROADCAST_NOTE
Cadence policy:
- Unknown-cadence shows should be checked frequently (scan model), but processing remains idempotent: if latest is already transcribed and pointer-ready, ad hoc sync skips work.
- Confirm stable weekly transcript fetch/transcribe behavior for Jack Mallers Show.
- Expand support across multiple feed/source types.
- Standardize transcript cleanliness for downstream BTC scoring/reporting ingestion.
- Add lightweight reporting outputs once transcript reliability is consistently high.
- Weekly runs should process only new episodes and skip known-successful ones.
- Failed episodes should remain visible for retry.
- Paths and output formats should remain stable to protect downstream automations.
- Versioning follows pre-1.0 SemVer (
0.x.y). - Change history lives in CHANGELOG.md.
See CONTRIBUTING.md for the local quality gate and pre-push checklist.