Classification algorithms from first principles through production deployment — logistic regression, tree ensembles, SVMs, and evaluation frameworks for imbalanced, high-stakes domains including financial risk and anomaly detection.
Production LLM and agentic systems rely on classification at every layer:
- Intent routing — which agent handles this query?
- Output validation — is this response safe / on-topic?
- Anomaly detection — is this transaction / data point an outlier?
- Embedding clustering — which semantic category does this document belong to?
Understanding the mathematics and failure modes of classification algorithms makes you a better AI engineer, not just a better data scientist.
def sigmoid(z: np.ndarray) -> np.ndarray:
"""
Maps any real value to (0, 1) — interpretable as P(y=1 | x).
Key properties:
σ(0) = 0.5 (decision boundary)
σ(+∞) → 1.0
σ(-∞) → 0.0
σ'(z) = σ(z)(1 - σ(z)) ← clean derivative, enables backprop
Numerical stability: clip z to [-500, 500] to prevent overflow.
"""
return 1.0 / (1.0 + np.exp(-np.clip(z, -500, 500)))MSE on classification → non-convex loss surface → gradient descent gets stuck in local minima
BCE → convex loss surface → guaranteed convergence to global minimum
BCE(y, ŷ) = -(1/m) Σ [ yᵢ log(ŷᵢ) + (1-yᵢ) log(1-ŷᵢ) ]
Intuition:
When y=1 and ŷ≈0 → loss → ∞ (heavily penalises confident wrong predictions)
When y=1 and ŷ≈1 → loss → 0 (correctly confident → near-zero loss)
def logistic_gradient_descent(
X: np.ndarray, y: np.ndarray,
alpha: float = 0.01, iterations: int = 1000,
lambda_reg: float = 0.01, # L2 regularisation
) -> tuple[np.ndarray, float, list[float]]:
m, n = X.shape
w, b = np.zeros(n), 0.0
costs = []
for i in range(iterations):
z = X @ w + b
y_hat = sigmoid(z)
error = y_hat - y
# Vectorised gradients
dw = (X.T @ error) / m + (lambda_reg / m) * w # L2 penalty on weights
db = np.mean(error)
w -= alpha * dw
b -= alpha * db
if i % 100 == 0:
cost = -np.mean(y * np.log(y_hat + 1e-8) + (1-y) * np.log(1-y_hat + 1e-8))
costs.append(cost)
return w, b, costsCritical for fraud detection, anomaly detection, and medical/compliance classification — all domains where the minority class is the one that matters most.
# Anti-pattern: optimising for accuracy on imbalanced data
# 99% "no fraud" → predict all negative → 99% accuracy → useless model
# Correct approach: evaluation suite for imbalanced classification
def evaluate_classifier(model, X_test, y_test, threshold: float = 0.5):
y_prob = model.predict_proba(X_test)[:, 1]
y_pred = (y_prob >= threshold).astype(int)
return {
# Never report just accuracy on imbalanced data
"accuracy": accuracy_score(y_test, y_pred),
"precision": precision_score(y_test, y_pred), # Of predicted positives, how many correct?
"recall": recall_score(y_test, y_pred), # Of actual positives, how many found?
"f1": f1_score(y_test, y_pred), # Harmonic mean — use when FP and FN equally costly
"auc_roc": roc_auc_score(y_test, y_prob), # Threshold-independent performance
"avg_precision": average_precision_score(y_test, y_prob), # Better than AUC for severe imbalance
}| Strategy | When to Use | Mechanism |
|---|---|---|
class_weight='balanced' |
Mild imbalance (5:1 to 10:1) | Upweights minority class in loss |
| SMOTE oversampling | Moderate imbalance | Synthetic minority samples |
| Undersampling majority | Large datasets, severe imbalance | Random removal of majority |
| Threshold tuning | Post-training | Shift decision boundary from 0.5 |
| Ensemble (BalancedBaggingClassifier) | Severe imbalance + high stakes | Bootstrap with balanced subsamples |
Systematic benchmark across classifiers on a financial risk dataset:
| Algorithm | AUC-ROC | F1 (minority) | Training Time | Interpretable |
|---|---|---|---|---|
| Logistic Regression | 0.847 | 0.71 | <1s | ✅ Yes |
| Decision Tree | 0.791 | 0.68 | 1s | ✅ Yes |
| Random Forest | 0.912 | 0.81 | 12s | Partial |
| XGBoost | 0.934 | 0.84 | 8s | Partial |
| SVM (RBF kernel) | 0.889 | 0.79 | 45s | ❌ No |
| Naive Bayes | 0.821 | 0.66 | <1s | ✅ Yes |
Key insight: XGBoost wins on tabular financial data. Logistic Regression wins when you need regulatory explainability (OSFI model risk requirements). Never choose a model without benchmarking both.
# Feature importance — understand what your model actually learned
importances = pd.Series(
model.feature_importances_,
index=feature_names
).sort_values(ascending=False)
# Top features reveal data quality issues and business insights:
# If "data_entry_timestamp" ranks #1 → data leakage from future information
# If "account_age_days" ranks #1 → sensible business signalclass ProductionClassifier:
"""
Production-safe classifier wrapper:
- Threshold calibrated on validation set (not default 0.5)
- Probability calibration with Platt scaling / isotonic regression
- Prediction with uncertainty quantification
- Audit log for regulated environments
"""
def predict_with_confidence(
self, X: np.ndarray, min_confidence: float = 0.7
) -> dict:
proba = self.model.predict_proba(X)
confidence = proba.max(axis=1)
prediction = (proba[:, 1] >= self.optimal_threshold).astype(int)
return {
"prediction": prediction,
"probability": proba[:, 1],
"confidence": confidence,
"requires_human_review": confidence < min_confidence,
}- ml-neural-network-projects — extends to deep learning classifiers (LSTM, MLP)
- data-prep-utility — preprocessing pipeline used upstream
- production-rag-pipeline — classification applied to retrieval routing and output validation
Garry Singh — Principal AI & Data Engineer · MSc Oxford