ConfBot crawls conference paper metadata, optionally generates keywords with an LLM, and visualizes the collected data in a Streamlit dashboard.
The project now uses uv for dependency management and Playwright for browser automation.
- Crawl paper titles, authors, and abstracts from conference program pages
- Save normalized metadata into a CSV file
- Generate keywords with an LLM-based pipeline
- Explore results in a Streamlit analysis dashboard
- Python
3.11+ uv
uv sync
uv run playwright install chromiumIf keyword generation depends on environment variables, create a .env file before running the keyword pipeline.
Example variables:
API_KEYBASE_URLMODEL
Run the crawler and keyword pipeline:
uv run python main.pyRun only the crawler for a specific track:
uv run python main.py --urls "https://conf.researchr.org/track/fse-2025/fse-2025-research-papers" --no-keywordLaunch the dashboard:
uv run streamlit run app.pyCompile the main entrypoints:
uv run python -m py_compile crawler.py test_playwright.py main.pyRun the Playwright smoke test:
uv run python test_playwright.pyIf Playwright reports that the browser executable is missing, run:
uv run playwright install chromiumcrawler.py- Playwright-based crawlermain.py- CLI entrypoint for crawling and keyword generationgenkw.py- keyword generation logicanalysis.py- analysis helpersapp.py- Streamlit dashboard
This project is intended strictly for academic research and educational use.
You are responsible for complying with the target website's Terms of Service and robots.txt policy.
The developer is not liable for any misuse or legal issues caused by using this code.