A minimal, end‑to‑end movie‑recommendation system that
- trains a baseline XGBoost model on the public MovieLens‑100K dataset,
- logs and registers the model with MLflow,
- exposes real‑time recommendations via FastAPI, and
- runs everything reproducibly with Docker Compose.
├── api/ # FastAPI service (inference)
│ └── app.py
├── recsys/ # Local Python package
│ ├── bootstrap_once.py # Train + register baseline if needed
│ ├── download_data.py # Download MovieLens‑100K manually (optional)
│ ├── preprocess.py # Generate feature files from raw data
│ ├── train_xgb.py # Train better models manually
│ └── __init__.py
├── data/
│ ├── raw/ # MovieLens archive (downloaded on first run)
│ └── processed/ # Parquet files generated by bootstrap / training
├── mlruns/ # MLflow tracking & model registry
├── Dockerfile # Single image for bootstrap + api
├── docker-compose.yml # Orchestration (mlflow, bootstrap, api)
└── README.md
start (development)
# clone and enter the repo
$ git clone https://github.com/<your‑user>/movielens-recsys.git
$ cd movielens-recsys
# build and launch all services
$ docker compose up -d --build
# wait ~20 s (first run downloads data and trains baseline)
# health check
$ curl http://localhost:8000/health
{"status":"ok"}
# get top‑10 recommendations (default) for user 42
$ curl "http://localhost:8000/recommend/42"
# request exactly 5 recommendations for the same user
$ curl "http://localhost:8000/recommend/42?k=5"To shut everything down:
$ docker compose down- Runs in its own container on port 5000.
- Stores runs and model registry inside the
mlruns/host folder.
-
On every
docker compose upit checks whether aliasprodalready resolves to a valid artifact. -
If missing, it:
- downloads MovieLens‑100K,
- creates simple statistical features,
- trains a tiny XGBoost classifier,
- registers the model and sets alias
prod.
-
Exits with code 0 → Compose starts the API.
- Loads the model via
models:/MovieLensXGB@prod. - Computes features on the fly and returns top‑k items with scores.
# activate local Python (example with venv)
$ python -m venv .venv && source .venv/bin/activate
$ pip install -r requirements.txt
# point MLflow to the same tracking folder
$ export MLFLOW_TRACKING_URI=file://$(pwd)/mlruns
# train
$ python -m recsys.train_xgb
# → creates a new model version (e.g. v2)
# promote v2 to prod
$ mlflow models alias set -m MovieLensXGB -v 2 -a prod
# reload the API
$ docker compose restart api| Area | How |
|---|---|
| Features / model | Edit recsys/train_xgb.py and retrain |
| Dataset | Replace data under data/raw and adapt preprocessing |
| Default k (number of items returned when the query param is omitted) | Edit K_DEFAULT in api/app.py |