Ticket Priority ML Service

An end-to-end applied machine learning project for support-ticket triage. The system predicts both the operational queue and the business priority from ticket text, tracks training runs in MLflow, and serves fixed demo models through a FastAPI API plus a Streamlit UI.

Tech stack: Python, scikit-learn, MLflow, FastAPI, Streamlit, Docker, GitHub Actions

What This Project Does

trains two text classifiers on multilingual support-ticket data: one for queue, one for priority
evaluates both tasks with shared cross-validation and logs metrics plus artifacts to MLflow
ships fixed serving assets so the public demo stays runnable and stable
exposes predictions through a FastAPI service and a Streamlit frontend
includes tests for preprocessing, evaluation, training/tracking smoke paths, and serving behavior

Results

Task	Macro F1 (mean +/- std)	Accuracy (mean +/- std)
Queue	0.6854 +/- 0.0041	0.6892 +/- 0.0029
Priority	0.7108 +/- 0.0081	0.7204 +/- 0.0074

Language-specific performance is noticeably stronger on English tickets than on German tickets:

Queue macro F1: English 0.7841, German 0.5341
Priority macro F1: English 0.7951, German 0.5960

Run The Demo

The demo uses the fixed promoted models that are already checked in under serving_assets/.

Fastest Path: Docker

If you already have Docker installed, this is the quickest way to run the full demo.

git lfs install
git clone https://github.com/feboe/ticket-priority-ml-service.git
cd ticket-priority-ml-service
git lfs pull
docker build -t ticket-triage-demo .
docker run --rm -p 8000:8000 -p 8501:8501 ticket-triage-demo

Open:

Streamlit UI: http://127.0.0.1:8501
FastAPI docs: http://127.0.0.1:8000/docs

Alternative: Run Locally Without Docker

git lfs install
git clone https://github.com/feboe/ticket-priority-ml-service.git
cd ticket-priority-ml-service
git lfs pull
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements-app.txt
.\.venv\Scripts\python -m nltk.downloader stopwords

Start the API:

.\.venv\Scripts\uvicorn app.api:app --host 127.0.0.1 --port 8000

Start the UI in a second terminal:

$env:API_BASE_URL='http://127.0.0.1:8000'
.\.venv\Scripts\streamlit run app/ui.py

Open:

Streamlit UI: http://127.0.0.1:8501
FastAPI docs: http://127.0.0.1:8000/docs

Training And Reproducibility

The public repo is demo-reproducible out of the box because serving uses fixed checked-in model artifacts.

For full retraining, download the public Kaggle dataset Multilingual Customer Support Tickets and place the default training file at:

data/aa_dataset-tickets-multi-lang-5-2-50-version.csv

The Kaggle bundle contains multiple CSV files. This repository uses the file above by default, or you can train on a different file with:

.\.venv\Scripts\python train.py --data data/<filename>.csv

pip install -r requirements.txt
.\.venv\Scripts\python train.py --algorithm linear_svc --run-group algo-benchmark-v1

To verify the repository locally, run:

.\.venv\Scripts\python -m unittest discover -s tests -v

The full experiment story, model-selection rationale, and promoted-model confusion matrices are documented in docs/experiments.md. The selected LinearSVC models came out of shared cross-validation sweeps because they gave the best balance of macro F1, accuracy, and feature-space size while keeping the serving pipeline simple and consistently TF-IDF-based.

The promoted serving models, their task-specific hyperparameters, and their headline metrics are summarized in serving_assets/promoted_models.json. The demo intentionally does not serve "latest run wins" artifacts.

Limitations

English performance is substantially better than German performance.
The system uses TF-IDF features and linear classifiers, so semantic understanding is limited compared with transformer-based approaches.
Some queue classes remain systematically confusable where business meanings overlap.
The public repo does not include the full training CSV

License And Data

Source code license: MIT, see LICENSE
Upstream dataset license: CC BY-NC 4.0
Derived dataset/model reuse should be reviewed against the upstream dataset terms

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
app		app
docs		docs
serving_assets		serving_assets
src		src
tests		tests
tools		tools
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
requirements-app.txt		requirements-app.txt
requirements.txt		requirements.txt
start.sh		start.sh
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ticket Priority ML Service

What This Project Does

Results

Run The Demo

Fastest Path: Docker

Alternative: Run Locally Without Docker

Training And Reproducibility

Limitations

License And Data

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Ticket Priority ML Service

What This Project Does

Results

Run The Demo

Fastest Path: Docker

Alternative: Run Locally Without Docker

Training And Reproducibility

Limitations

License And Data

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages