mlquantify is a Python library for quantification, also known as supervised prevalence estimation, designed to estimate the distribution of classes within datasets. It offers a range of tools for various quantification methods, model selection tailored for quantification tasks, evaluation metrics, and protocols to assess quantification performance. Additionally, mlquantify includes calibration tools, confidence region estimation, pluggable solvers and representations, and visualization utilities to help analyze and interpret results.
Website: https://luizfernandolj.github.io/mlquantify/
To install mlquantify, run the following command:
pip install mlquantifyIf you only want to update, run the code below:
pip install --upgrade mlquantify| Section | Description |
|---|---|
| 33 Quantification Methods | Counting (CC, PCC, ACC, TAC, TX, TMAX, T50, MS, MS2, FM, GACC, GPACC), Matching (DyS, HDy, HDx, SORD, SMM, MMD_RKHS, KDEyML, KDEyHD, KDEyCS, GHDy, GHDx, GKDEyML, EDy, EDx), Likelihood (EMQ, CDE, MLPE), Neighbors (PWK), Meta (EnsembleQ, AggregativeBootstrap, QuaDapt). |
| Dynamic class management | All methods are dynamic, and handle multiclass and binary problems; in the binary case, One-Vs-All (OVA) is applied automatically. |
| Solvers | Modular optimization backends: BinarySolver, LeastSquaresSolver, SimplexSolver. |
| Representations | Pluggable feature representations: HistogramRepresentation, KDERepresentation, DistanceRepresentation, KernelMeanRepresentation, PredictionRepresentation. |
| Losses | Composable loss functions (distance-based and likelihood-based) shared across quantifier families. |
| Calibration | ClassifierCalibrator and QuantifierCalibrator for post-hoc calibration of classifiers and quantifiers. |
| Confidence Regions | ConfidenceInterval, ConfidenceEllipseSimplex, ConfidenceEllipseCLR for uncertainty estimation on prevalence predictions. |
| Model Selection | GridSearchQ and evaluation protocols (APP, NPP, UPP, PPP) tailored for quantification tasks. |
| Evaluation Metrics | Metrics for quantification performance: AE, MAE, NAE, SE, MSE, KLD, RAE, NRAE, NKLD, NMD, RNOD, VSE, CvM_L1. |
| Visualization | scikit-learn-style Display classes for both single- and multiple-sample results: DiagonalDisplay, BiasDisplay, ErrorByShiftDisplay, PrevalenceDisplay, ConfidenceRegionDisplay. |
| Comprehensive Documentation | Full API reference and user guide covering all modules and methods. |
This code first loads the breast cancer dataset from sklearn, which is then split into training and testing sets. It uses the Expectation Maximisation Quantifier (EMQ) with a RandomForest classifier to predict class prevalence. After training the model, it evaluates performance by calculating and printing the absolute error and bias between the real and predicted prevalences.
from mlquantify.likelihood import EMQ
from mlquantify.metrics import MAE, NRAE
from mlquantify.utils import get_prev_from_labels
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Loading dataset from sklearn
features, target = load_breast_cancer(return_X_y=True)
#Splitting into train and test
X_train, X_test, y_train, y_test = train_test_split(features, target, test_size=0.3)
#Create the model, here it is the Expectation Maximisation Quantifier (EMQ) with a classifier
model = EMQ(RandomForestClassifier())
model.fit(X_train, y_train)
#Predict the class prevalence for X_test
pred_prevalence = model.predict(X_test)
real_prevalence = get_prev_from_labels(y_test)
#Get the error for the prediction
mae = MAE(real_prevalence, pred_prevalence)
nrae = NRAE(real_prevalence, pred_prevalence)
print(f"Mean Absolute Error -> {mae}")
print(f"Normalized Relative Absolute Error -> {nrae}")- In case you need any help, refer to the User Guide.
- Explore the API documentation for detailed developer information.
- See also the library in the pypi site in pypi mlquantify
- Check the CHANGELOG to see what's currently beign developed!
- scikit-learn
- numpy
- scipy
- pandas
- joblib
- tqdm
- matplotlib
- xlrd
- abstention