Skip to content

Rheyhan/BloxScrape

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

BloxScrape

Lightweight scraper + ETL for Roblox accessories, storing results in SQLite and exposing a small FastAPI read-only API.

Contents

  • UTILS.py - core scraping and DB helpers (init, scrape_new_items, insert_rows, get_most_recent_link, send_email)
  • ETL.py - one-shot ETL runner that scrapes new items and inserts into the DB
  • fastAPI.py - small read-only API to query the scraped data
  • creds.json - (not checked in) credentials and local chrome/driver paths
  • example.db - SQLite databases example (created at runtime)

Overview

This repo scrapes the Roblox accessories catalog periodically (ETL job) and persists each item to a local SQLite database. The API serves the stored rows for simple queries, pagination and basic stats.

Database schema (sqlite):

  • Table: roblox_accessories
    • id INTEGER PRIMARY KEY AUTOINCREMENT
    • Name TEXT
    • category TEXT
    • price TEXT -- kept as TEXT to support values like Free / unavailable
    • Creator TEXT
    • IsVerified INTEGER (0/1)
    • IsLimited INTEGER (0/1)
    • Link TEXT (unique index)
    • ImageURL TEXT
    • timeCollected TEXT (ISO datetime string)

Prerequisites

  • Python 3.10+ (project tested on CPython 3.10+)
  • Chrome and compatible chromedriver installed locally
  • creds.json with at least the following keys (example):
{
	"chrome_executable_path": "Path to your chrome.exe",
	"driver_executable_path": "Path to your chromedriver.exe",
	"email": "you@example.com",
	"password": "app-password-or-smtp-pass",
	"send_to_email": "notify@example.com"
}

Install dependencies

python -m pip install -r requirements.txt

Run the ETL (one-shot)

python ETL.py

ETL behavior

  • ETL.py calls get_most_recent_link() to find the last stored Link and scrapes newer items until it reaches that link (stop is exclusive). New rows are then inserted with INSERT OR IGNORE to avoid duplicates.
  • The scraper stores price as TEXT so it supports Free, unavailable, or numeric strings.

Run the API (development)

uvicorn fastAPI:app --reload --port 8000

API highlights

  • GET /items — list items with limit, offset, and filters: creator, category, verified, limited
  • GET /items/{item_id} — retrieve single item by id
  • GET /stats — database statistics
  • GET /recent — most recent items by timeCollected

Notes and troubleshooting

  • Make sure the paths in creds.json are correct and Chrome/Chromedriver versions are compatible.
  • If scraping fails frequently, enable a visible browser (set headless=False) to debug page loads and XPaths.
  • The repository uses undetected_chromedriver to reduce bot detection, but behavior may still change if Roblox updates their markup.

Security

  • Do not commit creds.json or any secrets to source control.

About

Roblox Marketplace Accessory ETL and API

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages