This repository presents a controlled comparative study of three convolutional neural network (CNN) architectures—ResNet-18, MobileNetV3-Large, and EfficientNet-B0—applied to binary pneumonia classification from chest X-ray images. The study is designed as a minimal proof-of-concept (PoC) emphasizing reproducibility and architectural comparison under identical training conditions, rather than clinical applicability.
Experiments are conducted on the publicly available Chest X-Ray Pneumonia dataset from Kaggle, consisting of frontal chest radiographs labeled as NORMAL or PNEUMONIA. The original train/test split provided by the dataset is used without modification. No additional external validation data is introduced.
Three ImageNet-pretrained CNN backbones from torchvision are evaluated:
- ResNet-18, a residual network using standard convolutions
- MobileNetV3-Large, a lightweight architecture based on depthwise separable convolutions
- EfficientNet-B0, an efficiency-oriented architecture using compound scaling
All models are used with their canonical ImageNet-pretrained variants and identical training hyperparameters. No architecture-specific tuning was performed, which may disadvantage models such as EfficientNet that typically require longer training schedules.
To ensure comparability, all models are trained using the same experimental protocol:
- Input resolution: 224 × 224
- Optimizer: Adam
- Learning rate: 1 × 10⁻³
- Batch size: 32
- Number of epochs: 3
- Random seed: 1337
- Hardware: Apple Silicon (MPS backend)
No architecture-specific hyperparameter tuning, learning rate scheduling, or extended training is applied.
| Architecture | scc | F1-score (macro) | Failure cases |
|---|---|---|---|
| MobileNetV3-Large | 0.817 | 0.776 | 114 |
| ResNet-18 | 0.938 | 0.932 | 39 |
| EfficientNet-B0 | 0.710 | 0.593 | 181 |
ResNet-18 achieves the highest overall performance, with both accuracy and macro F1-score indicating strong and balanced classification across classes. MobileNetV3-Large exhibits moderate performance with a significantly reduced parameter count. EfficientNet-B0 underperforms in this experimental regime.
Gradient-weighted Class Activation Mapping (Grad-CAM) is applied to selected test samples for each architecture. Qualitative inspection suggests that ResNet-18 generally produces more spatially localized and anatomically relevant activations within lung regions, whereas MobileNetV3-Large and EfficientNet-B0 exhibit more diffuse or occasionally misaligned attention patterns. Misclassified samples are systematically recorded for further qualitative analysis.
The observed performance differences highlight the influence of architectural robustness under constrained training regimes. While EfficientNet architectures are designed for parameter efficiency, they are known to be sensitive to hyperparameter choices and training duration. The limited number of epochs and absence of tuning likely prevented EfficientNet-B0 from reaching its expected performance. In contrast, ResNet-18 demonstrates strong robustness and convergence stability in low-tuning settings.
This study is limited by the use of a single dataset, a fixed train/test split, and a short training schedule. No cross-validation or external validation is performed. Consequently, the results should not be interpreted as indicative of clinical performance or generalization.
Within a minimal and reproducible experimental framework, ResNet-18 demonstrates superior performance and robustness for pneumonia detection from chest X-rays. These findings emphasize that, in practical low-tuning or rapid prototyping scenarios, architectural robustness may outweigh theoretical efficiency advantages.