Skip to content

JayroMartinez/movielens-recsys

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MovieLens-XGB Recommender

A minimal, end‑to‑end movie‑recommendation system that

  • trains a baseline XGBoost model on the public MovieLens‑100K dataset,
  • logs and registers the model with MLflow,
  • exposes real‑time recommendations via FastAPI, and
  • runs everything reproducibly with Docker Compose.

Project layout

├── api/                     # FastAPI service (inference)
│   └── app.py
├── recsys/                  # Local Python package
│   ├── bootstrap_once.py    # Train + register baseline if needed
│   ├── download_data.py     # Download MovieLens‑100K manually (optional)
│   ├── preprocess.py        # Generate feature files from raw data
│   ├── train_xgb.py         # Train better models manually
│   └── __init__.py
├── data/
│   ├── raw/                 # MovieLens archive (downloaded on first run)
│   └── processed/           # Parquet files generated by bootstrap / training
├── mlruns/                  # MLflow tracking & model registry
├── Dockerfile               # Single image for bootstrap + api
├── docker-compose.yml       # Orchestration (mlflow, bootstrap, api)
└── README.md

Quick start (development)

 start (development)

# clone and enter the repo
$ git clone https://github.com/<your‑user>/movielens-recsys.git
$ cd movielens-recsys

# build and launch all services
$ docker compose up -d --build

# wait ~20 s (first run downloads data and trains baseline)

# health check
$ curl http://localhost:8000/health
{"status":"ok"}

# get top‑10 recommendations (default) for user 42
$ curl "http://localhost:8000/recommend/42"

# request exactly 5 recommendations for the same user
$ curl "http://localhost:8000/recommend/42?k=5"

To shut everything down:

$ docker compose down

How it works

1 . MLflow

  • Runs in its own container on port 5000.
  • Stores runs and model registry inside the mlruns/ host folder.

2 . Bootstrap container

  • On every docker compose up it checks whether alias prod already resolves to a valid artifact.

  • If missing, it:

    1. downloads MovieLens‑100K,
    2. creates simple statistical features,
    3. trains a tiny XGBoost classifier,
    4. registers the model and sets alias prod.
  • Exits with code 0 → Compose starts the API.

3 . Inference API

  • Loads the model via models:/MovieLensXGB@prod.
  • Computes features on the fly and returns top‑k items with scores.

Training a better model

# activate local Python (example with venv)
$ python -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt

# point MLflow to the same tracking folder
$ export MLFLOW_TRACKING_URI=file://$(pwd)/mlruns

# train
$ python -m recsys.train_xgb
# → creates a new model version (e.g. v2)

# promote v2 to prod
$ mlflow models alias set -m MovieLensXGB -v 2 -a prod

# reload the API
$ docker compose restart api

Customising

Area How
Features / model Edit recsys/train_xgb.py and retrain
Dataset Replace data under data/raw and adapt preprocessing
Default k (number of items returned when the query param is omitted) Edit K_DEFAULT in api/app.py

About

Mini end-to-end recommender using MovieLens-100K, XGBoost, FastAPI, and MLflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors