Crawl SEC filings, structure them, and prepare them for AI enrichment.
- Download and split 10-K filings into structured sections.
- Ingest structured data into PostgreSQL for querying.
- Enrich filings with LLM-generated summaries and insights.
- Optional Next.js frontend for exploration.
- download_and_split_10k: fetch SEC submissions, download filings, split into sections.
- db_ingestion: validate JSON and load into Postgres.
- db_enrichment: LLM enrichment pipeline and schemas.
- frontend: Next.js UI (optional).
- Create a virtual environment and install dependencies:
python -m venv .venv
source .venv/bin/activate
pip install -e .- Configure environment variables:
cp example.env .env
# edit .env- Download and split recent 10-Ks:
poe download_and_split_10k- Ingest into Postgres:
poe ingest_to_db- Run enrichment:
poe enrich_db_with_ai- SEC rate limiting:
REQS_PER_SECOND,MAX_WORKERS_COMPANIES,MAX_WORKERS_FILINGS. - SEC headers:
SEC_COOKIE(recommended for compliant access). - Database:
DATABASE_URLorPGHOST,PGPORT,PGUSER,PGPASSWORD,PGDATABASE. - Enrichment:
PG_DSN,LLM_PROVIDER,LLM_MODEL,OPENAI_API_KEY/NEBIUS_API_KEY/GEMINI_API_KEY.
cd frontend
pnpm install
pnpm dev- Downloaded filings can be large. Keep the data directory out of version control.
- Respect SEC rate limits; the downloader defaults to a conservative throttle.
See CONTRIBUTING.md and CODE_OF_CONDUCT.md.
See SECURITY.md for reporting guidelines.
GNU AGPLv3. See LICENSE.