Skip to content

BitPod-App/sector-feeds

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

156 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bitpod

Convert podcast and social-feed episodes into clean text transcripts for downstream BTC analysis and reporting.

What This Repo Does

bitpod automates this workflow:

  1. Discover and poll configured feeds.
  2. Identify new episodes.
  3. Choose the best available source (RSS audio, captions, or media) based on policy.
  4. Transcribe and normalize text.
  5. Export deterministic artifacts for model consumption.

What Works Today

  • Jack Mallers Show processing path is implemented and serves as the validation baseline.
  • Feed discovery supports YouTube RSS URL extraction from channel inputs.
  • Sync supports mixed feed strategy per show (youtube plus optional rss list), with RSS prioritized.
  • Episode processing is idempotent: successful episodes are skipped on reruns.
  • Transcript artifacts are written under transcripts/<show>/<year>/ with status tracking in index/processed.json.
  • Captions are parsed and stitched (de-overlap) before acceptance; low-quality captions fall back to media transcription.

Why This Exists

Immediate goal: reliable transcript generation from high-signal feeds. Primary consumer: GPT workflows that currently require clean text transcripts as input.

Quickstart

# from repo root
python3 -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -e .
export OPENAI_API_KEY="your_key_here"

Optional: override root path for generated artifacts.

export BITPOD_ROOT="/path/to/bitpod"

Usage

# discover configured feed(s) for Jack Mallers Show
python -m bitpod discover --show jack_mallers_show

# preview only (no downloads/transcription/writes)
python -m bitpod sync --show jack_mallers_show --dry-run

# sync/transcribe newest episodes (default max: 3)
python -m bitpod sync --show jack_mallers_show

# choose source behavior
python -m bitpod sync --show jack_mallers_show --source-policy balanced

# restrict which feed families are eligible
python -m bitpod sync --show jack_mallers_show --feed-mode rss_preferred

# fail fast if youtube captions are low quality (no media download fallback)
python -m bitpod sync --show jack_mallers_show --no-youtube-download

# require a stronger caption quality floor
python -m bitpod sync --show jack_mallers_show --min-caption-words 300

# sync/transcribe newest N episodes
python -m bitpod sync --show jack_mallers_show --max-episodes 5

# process only recent episodes
python -m bitpod sync --show jack_mallers_show --since-days 14

# optional transcription model override
python -m bitpod sync --show jack_mallers_show --model gpt-4o-mini-transcribe

Inputs And Outputs

Inputs

  • Show/feed definitions in shows.json.
  • Runtime config: OPENAI_API_KEY (required), BITPOD_ROOT (optional).

Outputs

  • Transcript Markdown files in transcripts/<show_key>/<YYYY>/.
  • Companion plain-text files: *_plain.txt.
  • Companion structured segments: *_segments.jsonl.
  • Weekly run status artifacts:
    • transcripts/jack_mallers_show/jack_mallers_status.json
    • transcripts/jack_mallers_show/jack_mallers_status.md
  • Processing status index in index/processed.json.
  • Discovered/normalized feed metadata in shows.json.

Artifact tracking policy:

  • Runtime/cache outputs are local-only and git-ignored (for example: cache/, .wrangler/, artifacts/public/, artifacts/private/, cost logs, and feedback logs).
  • Retained per-run GPT QA artifacts are written to artifacts/private/gpt-qa/.
  • Latest deploy preview pointer is written to artifacts/private/coordination/latest_deploy_url.txt.
  • Canonical transcript artifacts under transcripts/ remain tracked unless explicitly changed by project policy.
  • Before push, run make audit to enforce size guard + unit tests.

Per-show contract (API-like surface):

  • Each show has its own stable pointer (stable_pointer in shows.json).
  • Each show has its own status artifacts (<stable_pointer_stem>_status.json|md).
  • Schedules can differ per show while preserving the same output contract.
  • Public permalink publish (semi-paranoid): each show gets
    • artifacts/public/permalinks/<opaque_id>/index.html
    • artifacts/public/permalinks/<opaque_id>/intake.md
    • artifacts/public/permalinks/<opaque_id>/transcript.md (stable latest transcript permalink for GPT)
    • artifacts/public/permalinks/<opaque_id>/latest.md
    • artifacts/public/permalinks/<opaque_id>/status.json
    • artifacts/public/permalinks/<opaque_id>/discovery.json
    • <opaque_id> is stable per show_key / sector_feed_id, not per run
    • the stable single-URL show page is https://permalinks.bitpod.app/<opaque_id>
    • with noindex/nofollow/noarchive + robots.txt disallow-all.
    • internal mapping remains private in artifacts/private/public_permalink_manifest.json.
    • feed identity/tags contract reference: docs/architecture/feed_identity_contract.md.
    • status.json now includes machine-readable public bundle health:
      • public_bundle_complete
      • public_bundle_readability
      • public_bundle_missing
    • the landing page is the preferred single GPT entry point:
      • it embeds the machine-readable run contract
      • it links the canonical raw artifacts
      • its executive summary and artifact rail should vary honestly for usable, degraded, failed, and no-new-episode
      • it must not imply full bundle health unless public HTTP verification has actually completed

Supported Feeds (Current)

  • jack_mallers_show: confirmed working reference feed path.
  • Additional source types and social feed integrations: planned next.

Stable Pointer And Private GPT Workflow

Primary stable transcript pointer (Jack Mallers):

  • transcripts/jack_mallers_show/jack_mallers.md

Raw GitHub URL:

  • https://raw.githubusercontent.com/cjarguello/bitpod/main/transcripts/jack_mallers_show/jack_mallers.md

Note:

  • If the repository is private, GPT cannot fetch the raw URL directly.
  • Use local upload artifacts instead (jack_mallers.md plus jack_mallers_status.md).

How it updates:

  1. Run sync successfully for jack_mallers_show.
  2. The latest successful transcript is selected from index/processed.json.
  3. jack_mallers.md is overwritten with that transcript content.
  4. Commit and push changes to main so the raw URL serves the update.

Operational commands:

cd "$WORKSPACE/sector-feeds"
source .venv311/bin/activate
python -m bitpod discover --show jack_mallers_show
python -m bitpod sync --show jack_mallers_show --max-episodes 1

Weekly helper scripts:

# Monday run (newest 1 episode, status artifacts always written)
bash scripts/run_mallers_weekly.sh

# Optional: include GPT report generation in weekly run
WEEKLY_GPT_REPORT=1 bash scripts/run_mallers_weekly.sh

# Tuesday verification report (writes artifacts/jack_mallers_show_tuesday_report.md)
# - includes deterministic intake readiness checks for intake.md/latest.md/status.json/discovery.json
bash scripts/report_mallers_tuesday_status.sh

# One-shot legacy Tuesday track (sync + report + contract print)
bash scripts/run_legacy_tuesday_track.sh jack_mallers_show

# Friday verification report (writes artifacts/jack_mallers_show_friday_report.md)
bash scripts/report_mallers_friday_status.sh

# One-shot legacy Friday track (sync + report + contract print)
bash scripts/run_legacy_friday_track.sh jack_mallers_show

# One-shot experimental track (collect + process + contract print)
bash scripts/run_experimental_track.sh jack_mallers_show

# Force experimental track to use RSS only
BITPOD_FEED_MODE=rss_only bash scripts/run_experimental_track.sh jack_mallers_show

# Validate feed identity contract (IDs + canonical catalog path)
make feed-identity-check SHOW_KEY=jack_mallers_show

# Render one-page board for Tuesday/Friday/Experimental status
make track-status-board SHOW_KEY=jack_mallers_show

# Fast PASS/FAIL gate from board JSON (non-zero exit on failure)
make track-status-check SHOW_KEY=jack_mallers_show

# One-command fast operator preflight
make preflight SHOW_KEY=jack_mallers_show

# Release readiness gate (preflight + key tests)
make release-ready SHOW_KEY=jack_mallers_show

# Full daily operator cycle (all tracks + board + gate)
make ops-cycle SHOW_KEY=jack_mallers_show

# Optional DNS helpers (macOS networksetup)
make dns-set-fast
make dns-restore-default

# Full session helper (with DNS set/restore)
make today-run SHOW_KEY=jack_mallers_show

# Full session helper (without DNS changes)
make today-run-no-dns SHOW_KEY=jack_mallers_show

# Full final gate (today-run-no-dns + handoff refresh + live smoke)
make final-check SHOW_KEY=jack_mallers_show

# Refresh concise handoff snapshot
make handoff-refresh SHOW_KEY=jack_mallers_show

# Public smoke test against canonical URLs (HTTP + contract markers)
make smoke-public SHOW_KEY=jack_mallers_show

# Fail if status timestamp is older than threshold (default 9 days)
make stale-check SHOW_KEY=jack_mallers_show

# Quiet wrapper mode (compact output)
QUIET=1 make ops-cycle SHOW_KEY=jack_mallers_show

# Legacy weekly alias (writes artifacts/jack_mallers_show_weekly_report.md)
bash scripts/report_mallers_weekly_status.sh

GitHub automation ownership:

  • weekly fetch automation: .github/workflows/mallers-weekly-fetch.yml
    • schedule: Monday 9:15 PM CST (15 3 * * 2 UTC)
    • command target: bash scripts/run_mallers_weekly.sh
    • scope: transcript/status/permalink refresh only, no GPT report generation by default
    • local ad hoc behavior matches the workflow unless you explicitly set WEEKLY_GPT_REPORT=1
    • stable status surface:
      • transcripts/jack_mallers_show/jack_mallers_status.json
      • transcripts/jack_mallers_show/jack_mallers_status.md
      • transcripts/jack_mallers_show/jack_mallers_gpt_review_request.md
    • unique per-run debug surface:
      • artifacts/runs/mallers_weekly_fetch/jack_mallers_show/<timestamp>__summary.md
      • artifacts/runs/mallers_weekly_fetch/jack_mallers_show/<timestamp>__status.json
    • interpretation:
      • stable status files answer "what is the current latest known state?"
      • unique run summaries answer "what happened on this specific weekly run?"
  • daily intake gate: .github/workflows/intake-gate-daily.yml
    • produces or consumes the bitregime-core intake artifact before running the v2-default handshake gate
    • requires BITPOD_CROSS_REPO_TOKEN only when the workflow must check out the private bitregime-core repo itself

Codex Workspace Rebuild Diagnostics

When debugging BitPod workspace cutover/path issues on macOS, run:

# explicit root
bash scripts/diagnose_codex_workspace_rebuild.sh "$WORKSPACE"

# or auto-detect (prefers BitPod-App, then falls back to bitpod-app)
bash scripts/diagnose_codex_workspace_rebuild.sh

This script checks:

  • root .codex presence and org-workspace.toml root path,
  • allows canonical root path casing (BitPod-App) with lowercase fallback (bitpod-app),
  • repo-local .codex/config.toml coverage across the active repo set,
  • stale references to rebuild/retired paths,
  • whether repo-local verification scripts are being run from the correct repo root,
  • origin remotes + current branch for each active repo.

Track-specific runbooks/prompts:

- docs/runbooks/legacy_tuesday_report.md

- docs/prompts/legacy_tuesday_report_prompt.md

- docs/runbooks/experimental_weekly_btc_gate.md

- docs/prompts/experimental_weekly_btc_gate_prompt.md

- docs/prompts/legacy_tuesday_single_prompt.md

- docs/prompts/experimental_weekly_gate_single_prompt.md

- docs/runbooks/report_tracks_short_vs_long_term.md

- docs/runbooks/weekly_automation_two_track_contract.md

- docs/runbooks/recovered_local_trash_report_assets_assessment.md

- scripts/experimental_weekly_ctl.sh (transitional decoupled stage wrapper)

Generic, multi-show equivalents:

bash scripts/run_show_weekly.sh <show_key> bash scripts/report_show_weekly_status.sh <show_key> [weekly|tuesday|friday]

bitregime-core intake handshake check (additive; does not change weekly tracks):

bash scripts/check_bitregime_core_intake_handshake.sh
../bitregime-core/artifacts/intake/jack_mallers_show_intake.json
deck_weekly_btc

Make wrapper:

make intake-handshake-check SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc

Runbook:

- docs/runbooks/bitregime_core_intake_handshake.md

Daily v2-default intake gate (machine+human artifacts + retained history):

make intake-gate-daily SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc

One-command operator triage + rollback diagnostic path:

make intake-gate-triage SHOW_KEY=jack_mallers_show DECK_ID=deck_weekly_btc

Deterministic weekly critical bundle (10-metric gate input):

python3 scripts/generate_weekly_critical_bundle.py
--report-md ../artifacts/recovery/2026-02-27/weekly_btc_strict_single_pass_7_artifact.md
--output-json artifacts/private/weekly_bundles/weekly_critical_bundle.json

Ad hoc mode:

- sync only if latest selected episode is not already processed

bash scripts/run_show_adhoc.sh <show_key>

- record GPT feedback consumption for latest run

bash scripts/record_show_gpt_feedback.sh <show_key> <feedback_markdown_path>

- verify run + GPT consumption + feedback linkage

bash scripts/verify_show_adhoc.sh <show_key>

Unified operator commands (chat-friendly wrappers):

bash scripts/bitpod_status.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] bash scripts/bitpod_sync.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] [--min-episode-age-minutes 180] [--trigger-cmd ""] bash scripts/bitpod_verify.sh [--show all|<show_key>] [--as-of "YYYY-MM-DD[ HH:MM]"] [--gpt-feedback-file ] [--gpt-note ""]

M-5 intake operations runbook:

- docs/runbooks/intake_gate_daily_ops.md

Operator guidance (MVP):

- Primary flow: Status -> Sync -> Deploy

- Sync now enforces strict parity checks (same gate as Verify) by default.

- Verify remains useful as a standalone audit/recheck command or for recording GPT feedback linkage.

Refresh permalink artifacts from the current local status contract:

python3 scripts/refresh_public_permalinks.py [show_key]

Verify deployed public readability and print machine-readable health:

python3 scripts/verify_public_permalink_bundle.py --show jack_mallers_show --base-url https://permalinks.bitpod.app

Worker deploy path:

- refreshes the local permalink bundle first

- verifies the Worker preview by default

- if PERMALINKS_WORKER_CUSTOM_DOMAIN is set, verifies the canonical custom domain

- writes public bundle health back into status.json only during canonical verification

bash scripts/deploy_public_permalinks_worker.sh [worker_name] [show_key]

Automated permalink deploys:

- canonical Worker workflow:

- .github/workflows/deploy-public-permalinks-worker.yml

- weekly fetch workflow:

- .github/workflows/mallers-weekly-fetch.yml

- required GitHub Actions secrets / vars:

- CLOUDFLARE_WORKERS_API_TOKEN

- CLOUDFLARE_ACCOUNT_ID

- should include Workers deploy permissions for the permalink Worker

- optional: CLOUDFLARE_WORKER_NAME

- optional: PERMALINKS_WORKER_CUSTOM_DOMAIN

- optional during cutover: PERMALINKS_WORKER_PREVIEW_BASE_URL

- lets refresh rebuild from the preview Worker status URL if canonical is temporarily unavailable

- canonical verification may still false-negative from some CI edges even when the public hostname is healthy

- in that case the workflow falls back to preview-host verification and warns instead of failing the deploy

Cloudflare permalink Worker architecture and cutover steps:

- see cloudflare/permalinks-worker/

- see docs/runbooks/permalink_worker_cutover.md

Paranoid-public Cloudflare hardening checklist:

- custom domain + AI crawler controls + bot policy

- see CLOUDFLARE_PARANOID_PUBLIC_CHECKLIST.md

Cost-controlled GPT report generation from transcript:

default mode sends excerpt only (not full transcript)

.venv311/bin/python scripts/gpt_report_from_transcript.py
--transcript-path transcripts/jack_mallers_show/jack_mallers.md
--report-name gpt-bitreport-pods-all-YYYYMMDD-HHMMSS.md
--show-key jack_mallers_show

full transcript mode (explicit opt-in only)

.venv311/bin/python scripts/gpt_report_from_transcript.py
--transcript-path transcripts/jack_mallers_show/jack_mallers.md
--report-name gpt-bitreport-pods-all-YYYYMMDD-HHMMSS.md
--show-key jack_mallers_show
--full-text

Repo hygiene guard (tracked file size limits):

bash scripts/check_repo_size.sh

Combined local audit:

make audit


Timeline policy:
- Default local timeline is `America/Managua` (no DST drift).
- `--as-of` is optional and intended for historical debugging/replay.
- For live-heavy YouTube sources, sync applies a default maturity guard (`--min-episode-age-minutes 180`) to avoid unfinished captures.
- GPT bridge cost estimates are logged to:
- local: `artifacts/cost-meter/bridge_cost_estimates.jsonl`
- shared tools source of truth: `$WORKSPACE/tools/artifacts/cost-meter/cost_events.jsonl`

Shared cost summary (all repos/commands that write to tools meter):
```bash
$WORKSPACE/tools/costs/cost_ctl.py

Weekly cost guardrails (automation-friendly):

  • run_show_weekly.sh executes cost guard checks when caps are set via env.
  • Defaults are auto-loaded from scripts/bitpod_budget.env (or optional override file .bitpod_budget.env at repo root).
  • Supported env vars:
  • COST_SOURCE (example: bitpod.gpt_report_from_transcript)
  • COST_WINDOW_HOURS (default from tools CLI is 24)
  • COST_RUN_WARN, COST_RUN_FAIL
  • COST_DAILY_WARN, COST_DAILY_FAIL
  • COST_WARN_EXIT_0=1 to keep warning status non-blocking

Governance metadata (optional, status artifact only):

  • Sync can embed spec-lock/provenance metadata into <stable_pointer_stem>_status.json|md.
  • Default tuple: origin_actor=OTHER, authority_state=PROPOSAL.
  • Fast setup: cp .bitpod_runtime.env.example .bitpod_runtime.env and edit values.
  • Weekly scripts auto-load .bitpod_runtime.env when present.
  • Optional env vars:
  • BITPOD_ORIGIN_ACTOR (CJ|GPT|CODEX|TAYLOR|HUMAN_TEAM|OTHER)
  • BITPOD_AUTHORITY_STATE (PROPOSAL|CJ_ENDORSED|TEAM_ENDORSED|CJ_OVERRIDE)
  • BITPOD_EXPANSION_GATE (default BLOCKED)
  • BITPOD_SPEC_LOCK_ORIGINAL_ASK
  • BITPOD_SPEC_LOCK_SUCCESS_CRITERIA (comma-separated)
  • BITPOD_SPEC_LOCK_OUT_OF_SCOPE (comma-separated)
  • BITPOD_BASELINE_REFS (comma-separated)
  • Soft override guard inputs (non-blocking visibility):
  • BITPOD_OVERRIDE_CONFLICT (1/true/yes/on)
  • BITPOD_OVERRIDE_CONFLICT_NOTE
  • BITPOD_OVERRIDE_IMPACTED_DECISION
  • BITPOD_OVERRIDE_BROADCAST_NOTE

Cadence policy:

  • Unknown-cadence shows should be checked frequently (scan model), but processing remains idempotent: if latest is already transcribed and pointer-ready, ad hoc sync skips work.

Roadmap (Near Term)

  1. Confirm stable weekly transcript fetch/transcribe behavior for Jack Mallers Show.
  2. Expand support across multiple feed/source types.
  3. Standardize transcript cleanliness for downstream BTC scoring/reporting ingestion.
  4. Add lightweight reporting outputs once transcript reliability is consistently high.

Operations

  • Weekly runs should process only new episodes and skip known-successful ones.
  • Failed episodes should remain visible for retry.
  • Paths and output formats should remain stable to protect downstream automations.

Versioning And Changes

  • Versioning follows pre-1.0 SemVer (0.x.y).
  • Change history lives in CHANGELOG.md.

Contributing

See CONTRIBUTING.md for the local quality gate and pre-push checklist.

About

Bitcoin sector feeds ingestion and transcript pipeline.

Resources

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages