Skip to content

pppppkun/ConfBot

Repository files navigation

ConfBot

ConfBot crawls conference paper metadata, optionally generates keywords with an LLM, and visualizes the collected data in a Streamlit dashboard.

The project now uses uv for dependency management and Playwright for browser automation.

Features

  • Crawl paper titles, authors, and abstracts from conference program pages
  • Save normalized metadata into a CSV file
  • Generate keywords with an LLM-based pipeline
  • Explore results in a Streamlit analysis dashboard

Requirements

  • Python 3.11+
  • uv

Setup

uv sync
uv run playwright install chromium

If keyword generation depends on environment variables, create a .env file before running the keyword pipeline.

Example variables:

  • API_KEY
  • BASE_URL
  • MODEL

Usage

Run the crawler and keyword pipeline:

uv run python main.py

Run only the crawler for a specific track:

uv run python main.py --urls "https://conf.researchr.org/track/fse-2025/fse-2025-research-papers" --no-keyword

Launch the dashboard:

uv run streamlit run app.py

Validation

Compile the main entrypoints:

uv run python -m py_compile crawler.py test_playwright.py main.py

Run the Playwright smoke test:

uv run python test_playwright.py

If Playwright reports that the browser executable is missing, run:

uv run playwright install chromium

Project Files

  • crawler.py - Playwright-based crawler
  • main.py - CLI entrypoint for crawling and keyword generation
  • genkw.py - keyword generation logic
  • analysis.py - analysis helpers
  • app.py - Streamlit dashboard

Disclaimer

This project is intended strictly for academic research and educational use. You are responsible for complying with the target website's Terms of Service and robots.txt policy. The developer is not liable for any misuse or legal issues caused by using this code.

About

Utilise web crawlers to gather papers, abstracts, and relevant authors from the SE Top Conference. Employ large language models to tag the literature with keywords. This assists in swiftly grasping current research trends, directions, and shifts in the research community.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages