A computer vision pipeline for automated detection and classification of plant leaf diseases from images, using classical image processing techniques combined with machine learning (SVM classifier).
Built with Python, OpenCV, scikit-learn, and PyWavelets.
π For a visual overview of the system architecture, methodology, and results, see the project poster.
This project implements an end-to-end image processing pipeline designed to detect and classify plant leaf diseases from photographs. The system does not rely on deep learning β instead it uses a combination of:
- Retinex-based glare and shadow correction for robust preprocessing
- K-Means clustering in LAB color space for leaf and disease segmentation
- Texture (GLCM, Wavelet), color, and shape feature extraction for representation
- SVM with RBF kernel for multi-class classification
The result is a lightweight, interpretable, and trainable system that works well even on challenging images with lighting artifacts.
Dataset: Tomato Leaves Dataset β 25,851 images across 5 classes. Training set: 9,000 images. Test set: 6,276 images.
Before any segmentation or classification can happen, raw leaf images need to be cleaned up. Photographs taken in field conditions often contain strong specular glare (sun reflections on the leaf surface), deep shadows, and low contrast β all of which interfere with color-based segmentation. This module corrects all of these issues in a fixed sequence.
1. Bilateral filter + unsharp masking
The image is first passed through a bilateral filter, which reduces high-frequency noise while keeping edge boundaries sharp. Unlike a standard Gaussian blur, bilateral filtering weighs pixels by both spatial proximity and color similarity, so it preserves the boundaries between lesions and healthy tissue. After filtering, an unsharp mask is computed as the difference between the original and blurred image, and added back scaled by a factor k=2.8. This amplifies fine structural details β veins, lesion borders β that would otherwise be washed out by subsequent processing.
2. Multi-scale Retinex on the L channel
Retinex is a model of human color constancy: the idea that perceived color is the difference between what the eye receives and the estimated illumination. Mathematically, a single-scale Retinex is computed as the log difference between the image and a Gaussian-blurred version of it at a given sigma. By averaging across 11 sigma values (from 10 to 300), the multi-scale version captures illumination variation at both fine and coarse scales. Applying this only to the L channel in LAB space ensures that the hue and saturation information (a, b channels) is not distorted β the correction is purely luminance-based. The result is a much more uniform brightness across the leaf surface regardless of lighting direction.
3. Polynomial contrast enhancement
After Retinex normalization, two cascaded cubic polynomial LUT (Look-Up Table) functions are applied to the L channel to further stretch its contrast. Each LUT maps input intensity values 0β255 to a new output using a third-degree polynomial curve, which compresses very dark and very bright values while expanding the mid-range. Applying two of these in sequence creates a stronger, more targeted contrast boost that makes lesion regions more visually distinct before segmentation.
4. Glare reconstruction
Overexposed white regions (glare) are detected by thresholding the L channel (L > 210) combined with checking that the a and b channels are near neutral (close to 128), which is characteristic of white/grey pixels with no color information. The detected mask is dilated slightly to cover glare edges. For the masked region, the a and b channels are reconstructed using biharmonic inpainting β a technique that fills missing values by solving a biharmonic PDE, producing a smooth, physically plausible interpolation from the surrounding unmasked pixels. The inpainted color is then further adjusted toward the mean LAB color of the surrounding leaf to avoid color drift. Finally, an HSV saturation correction ensures the reconstructed areas blend naturally with the rest of the leaf.
5. Shadow reconstruction
Deep shadow areas (L < 20 within the leaf region) undergo a similar reconstruction. The L channel is brightened by a factor of 2, and the a/b channels are again inpainted from the surrounding pixels. Saturation is normalized to the leaf mean. This prevents shadow regions from being misclassified as dark lesions in the segmentation stage.
This module contains all the low-level image restoration primitives used by preprocesare.py.
Retinex implementation
Single-scale Retinex is implemented as log(I) - log(GaussianBlur(I, sigma)), operating in float space to avoid quantization artifacts. The multi-scale version simply averages multiple single-scale results across a list of sigma values, giving a balanced illumination estimate across spatial frequencies.
Polynomial LUT functions
Two cubic polynomial mappings are defined as precomputed 256-entry lookup tables, applied via cv2.LUT for efficiency. The first is more aggressive (higher leading coefficient), the second more conservative. Cascading them produces a combined tone curve that is applied only to the L channel of the LAB image.
Glare and shadow inpainting
Both reconstruct_glare and reconstruct_shadow follow the same pattern: detect a binary mask of the problematic region, normalize the a and b channels to the range [-1, 1], call skimage.restoration.inpaint_biharmonic on the stacked a/b array treating the mask as missing data, then rescale back and merge with the corrected L channel. The key insight is that biharmonic inpainting produces much smoother, more natural-looking results than simple neighbor averaging because it enforces both continuity and smoothness at the boundary.
This module takes the preprocessed image and produces two masks: one for the leaf area (excluding background), and one for the diseased tissue within it.
Leaf isolation
The leaf is isolated from the background using LAB color thresholds: the a channel (green-red axis) is constrained to the green range, and the b channel (blue-yellow) is constrained to exclude very blue backgrounds. The resulting mask is combined with the shadow mask from preprocessing to exclude areas that are too dark to reliably classify. Morphological closing with a large kernel (9Γ9, 30 iterations) fills holes inside the leaf, and opening removes small isolated regions from the background, producing a clean leaf silhouette.
Disease region detection β K-Means on LAB
Once the leaf region is isolated, K-Means (K=2) is run on the LAB pixel values of the segmented region. Before clustering, the L channel is replaced with a constant value (128) so that the clustering depends only on color (a, b) and not brightness. This prevents dark shadows or bright reflections from splitting what should be a single color cluster. The cluster with the highest b* value (more yellow/brown, characteristic of diseased tissue) is selected as the disease foreground mask.
Healthy tissue isolation β anchored K-Means
A second K-Means pass (K=3) is run specifically to find the healthy green tissue. Rather than random initialization, the cluster centroids are seeded with fixed LAB anchor points corresponding to healthy green, mild yellowing, and heavy yellowing. This makes the clustering deterministic and ensures the green cluster is always identifiable. The cluster closest to the healthy green anchor (a=40, b=128) is returned as the healthy mask. The disease mask is then computed as the leaf mask minus the healthy mask.
Brown and black lesion refinement
Some disease types (blight, mold) produce distinctly brown or black spots that can be captured more reliably with direct HSV thresholding than K-Means. Brown is defined as hue in the range 10β30Β° with non-extreme saturation and value; black as value < 50. These are combined with the K-Means disease mask via bitwise OR to produce the final disease mask. Shadow regions (V < 50 in the original image) are explicitly excluded from the disease mask to avoid false positives.
Once the disease region is segmented, a 58-dimensional feature vector is computed to describe it. The features span four complementary representations: texture, color statistics, color ratios, and shape.
GLCM β Gray-Level Co-occurrence Matrix
The GLCM captures texture by counting how often pairs of pixel intensities appear at a given spatial offset. The image is quantized to 8 gray levels (to reduce noise sensitivity) and the disease mask is applied so only diseased pixels contribute. The matrix is computed at 4 angles (0Β°, 45Β°, 90Β°, 135Β°) with distance 1, capturing texture directionality. Five standard Haralick properties are then extracted: contrast (intensity variation between neighbors), dissimilarity (similar to contrast but linear), homogeneity (closeness of distribution to the diagonal β high for uniform textures), energy (textural uniformity), and correlation (linear dependency between neighboring pixel intensities). This produces 20 features.
Wavelet β Daubechies db4, 2 levels
A 2-level 2D discrete wavelet decomposition is applied to the grayscale image using the Daubechies db4 wavelet. Each decomposition level produces three detail sub-bands: LH (horizontal edges), HL (vertical edges), and HH (diagonal edges). Level 1 captures fine-scale structure; level 2 captures coarser structure. For each of the 6 sub-bands, three statistics are computed: mean (average presence of that frequency/orientation), standard deviation (variability), and energy (sum of squared coefficients, related to the power in that sub-band). The LL (approximation) sub-band is not used, as it mostly duplicates the original image information. This produces 18 features.
LAB color statistics
Over the set of pixels belonging to the disease mask, the mean, standard deviation, 25th percentile, and 75th percentile are computed for each of the three LAB channels. The L channel describes brightness (useful for distinguishing black lesions from brown), the a channel encodes green-red (important for detecting necrosis vs. healthy tissue), and the b channel encodes blue-yellow (key for yellowing diseases). This produces 12 features.
Color ratio features
Three percentage-based features are derived from the a and b channels of the disease region: the proportion of pixels with b > 140 (yellowing), with both a > 140 and b > 140 (brown coloring characteristic of blight/mold), and with b < 120 (dark/black coloring). These ratios give a fast, robust summary of the dominant color signature of the disease. This produces 3 features.
Shape features
External contours are extracted from the disease mask. The number of contours, mean contour area, standard deviation of contour areas, mean contour perimeter, and total diseased area are computed. Together these describe whether the disease manifests as a few large blotches (blight), many small spots (mold), or a diffuse spread (powdery mildew, yellow leaf curl). This produces 5 features.
This module provides pixel-level segmentation metrics and visualization helpers used during development and evaluation.
Segmentation metrics β IoU (Intersection over Union) and Dice coefficient measure how well the predicted disease mask overlaps with a ground truth mask. Pixel-level precision, recall, and accuracy are also implemented. All metrics handle the edge case of empty masks (returning 1.0, since two empty masks agree perfectly).
Visualization β A pie chart of healthy vs. affected pixel percentages can be generated from two binary masks. LAB histogram and 3D cluster scatter plot functions exist but are currently disabled (commented out), as they were used during development.
| Group | Method | # Features |
|---|---|---|
| Texture | GLCM β 5 properties Γ 4 angles | 20 |
| Texture | Wavelet db4 β 3 statistics Γ 6 sub-bands | 18 |
| Color | LAB statistics β 4 stats Γ 3 channels | 12 |
| Color | Color ratio β yellow / brown / black | 3 |
| Shape | Contour-based shape descriptors | 5 |
| Total | 58 |
| Index | Class | Description |
|---|---|---|
| 0 | blight |
Bacterial or fungal blight β large necrotic lesions |
| 1 | healthy |
No disease β fully green leaf |
| 2 | mold |
Mold / late blight β dark brown irregular spots |
| 3 | powdery_mildew |
Powdery mildew β white/grey powdery coating |
| 4 | yellow_leaf_curl |
Yellow leaf curl virus β yellowing and curling |
Evaluated on a test set of 6,276 images from the Tomato Leaves Dataset.
| Metric | Value |
|---|---|
| Accuracy | 92.96% |
| Precision | 93.19% |
| Recall | 92.96% |
| F1-score | 92.93% |
| Class | Precision | Recall | F1-score | Test samples |
|---|---|---|---|---|
| Blight | 0.93 | 0.86 | 0.90 | 1,266 |
| Healthy | 0.99 | 0.90 | 0.94 | 1,374 |
| Mold | 0.89 | 0.98 | 0.93 | 1,260 |
| Powdery mildew | 0.93 | 0.96 | 0.94 | 1,188 |
| Yellow leaf curl | 0.92 | 0.95 | 0.94 | 1,188 |
The system showed particularly strong performance on the healthy vs. diseased distinction, with only 16 false positives classified as healthy out of the entire test set of 6,276 images. The lowest per-class F1 (0.90) was observed for blight, where visual similarity with mold causes occasional confusion. Mold achieved the highest recall (0.98), meaning almost all mold cases were correctly identified.
| Component | Configuration |
|---|---|
| SVM kernel | RBF |
| C | 10 |
| gamma | 0.05 |
| PCA variance retained | 95% |
| Scaler | StandardScaler |
| Train / Val / Test split | 80% / 10% / 10% stratified |
| Feature vector size | 58 |
| Library | Purpose |
|---|---|
opencv-python |
Image I/O, filtering, morphology, K-Means |
numpy |
Numerical operations |
scikit-learn |
SVM, PCA, StandardScaler, GridSearchCV, metrics |
scikit-image |
Biharmonic inpainting, GLCM |
PyWavelets |
2D discrete wavelet decomposition |
matplotlib |
Visualization β pie charts, confusion matrix |
joblib |
Model serialization |
scipy |
Spatial distance computations (cdist) |