MovieLens-XGB Recommender

A minimal, end‑to‑end movie‑recommendation system that

trains a baseline XGBoost model on the public MovieLens‑100K dataset,
logs and registers the model with MLflow,
exposes real‑time recommendations via FastAPI, and
runs everything reproducibly with Docker Compose.

Project layout

├── api/                     # FastAPI service (inference)
│   └── app.py
├── recsys/                  # Local Python package
│   ├── bootstrap_once.py    # Train + register baseline if needed
│   ├── download_data.py     # Download MovieLens‑100K manually (optional)
│   ├── preprocess.py        # Generate feature files from raw data
│   ├── train_xgb.py         # Train better models manually
│   └── __init__.py
├── data/
│   ├── raw/                 # MovieLens archive (downloaded on first run)
│   └── processed/           # Parquet files generated by bootstrap / training
├── mlruns/                  # MLflow tracking & model registry
├── Dockerfile               # Single image for bootstrap + api
├── docker-compose.yml       # Orchestration (mlflow, bootstrap, api)
└── README.md

Quick start (development)

start (development)

# clone and enter the repo
$ git clone https://github.com/<your‑user>/movielens-recsys.git
$ cd movielens-recsys

# build and launch all services
$ docker compose up -d --build

# wait ~20 s (first run downloads data and trains baseline)

# health check
$ curl http://localhost:8000/health
{"status":"ok"}

# get top‑10 recommendations (default) for user 42
$ curl "http://localhost:8000/recommend/42"

# request exactly 5 recommendations for the same user
$ curl "http://localhost:8000/recommend/42?k=5"

To shut everything down:

$ docker compose down

How it works

1 . MLflow

Runs in its own container on port 5000.
Stores runs and model registry inside the mlruns/ host folder.

2 . Bootstrap container

On every docker compose up it checks whether alias prod already resolves to a valid artifact.
If missing, it:
1. downloads MovieLens‑100K,
2. creates simple statistical features,
3. trains a tiny XGBoost classifier,
4. registers the model and sets alias prod.
Exits with code 0 → Compose starts the API.

3 . Inference API

Loads the model via models:/MovieLensXGB@prod.
Computes features on the fly and returns top‑k items with scores.

Training a better model

# activate local Python (example with venv)
$ python -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt

# point MLflow to the same tracking folder
$ export MLFLOW_TRACKING_URI=file://$(pwd)/mlruns

# train
$ python -m recsys.train_xgb
# → creates a new model version (e.g. v2)

# promote v2 to prod
$ mlflow models alias set -m MovieLensXGB -v 2 -a prod

# reload the API
$ docker compose restart api

Customising

Area	How
Features / model	Edit `recsys/train_xgb.py` and retrain
Dataset	Replace data under `data/raw` and adapt preprocessing
Default k (number of items returned when the query param is omitted)	Edit `K_DEFAULT` in `api/app.py`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MovieLens-XGB Recommender

Project layout

Quick start (development)

How it works

1 . MLflow

2 . Bootstrap container

3 . Inference API

Training a better model

Customising

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
api		api
recsys		recsys
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

MovieLens-XGB Recommender

Project layout

Quick start (development)

How it works

1 . MLflow

2 . Bootstrap container

3 . Inference API

Training a better model

Customising

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1 . MLflow

2 . Bootstrap container

3 . Inference API

Packages