Classifying natural scenes into 6 categories using Transfer Learning on top of a pre-trained InceptionV3 architecture.
- Overview
- Dataset
- Project Structure
- Model Architecture
- Training Strategy
- Results
- Future Improvements
- Technologies Used
- How to Run
This project tackles a multi-class image classification problem using the Intel Image Classification dataset. Rather than training a CNN from scratch β which would require far more data and time β I chose to leverage Transfer Learning with InceptionV3 pretrained on ImageNet.
The motivation was simple: the dataset (~24K images) is decent but not massive. InceptionV3 brings powerful, battle-tested feature extraction that allows the model to generalize well even with limited fine-tuning.
The dataset contains approximately 24,335 images across 3 splits:
| Split | Structure | Purpose |
|---|---|---|
seg_train |
Organized by class subfolders | Training |
seg_test |
Organized by class subfolders | Validation during training |
seg_pred |
Mixed images (no subfolders) | Final real-world prediction |
0 β buildings
1 β forest
2 β glacier
3 β mountain
4 β sea
5 β street
Note: The
seg_predfolder contains mixed, unlabeled images β simulating a real deployment scenario where we don't know the ground truth.
π¦ Intel-Image-Classification/
βββ π intel_image_classification.ipynb β Main notebook
βββ π README.md
βββ π Data/
βββ seg_train/
β βββ buildings/
β βββ forest/
β βββ glacier/
β βββ mountain/
β βββ sea/
β βββ street/
βββ seg_test/
β βββ (same structure as seg_train)
βββ seg_pred/
βββ (flat folder β mixed images)
The model is built with Keras / TensorFlow using a clean Sequential pipeline:
Input (299Γ299Γ3)
β
InceptionV3 (pretrained on ImageNet, frozen)
β
GlobalAveragePooling2D β built into InceptionV3 via pooling="avg"
β
Dense(6, activation="softmax")
β
Output: probability distribution over 6 classes
I kept model.layers[0].trainable = False intentionally. Since the dataset is not huge, unfreezing InceptionV3 at this stage would risk catastrophic forgetting β where the pretrained weights get overwritten with noise. The frozen base acts as a powerful fixed feature extractor, while only the classification head is trained from scratch.
In a future iteration, fine-tuning (gradually unfreezing the last N layers) can squeeze out extra performance.
ImageDataGenerator(preprocessing_function=preprocess_input)Used InceptionV3's built-in preprocess_input (scales pixels to [-1, 1]) β the exact preprocessing the model was originally trained with. This is critical for Transfer Learning to work correctly.
I initially experimented with augmentation (
zoom_range,shear_range,brightness_range,horizontal_flip), but found it wasn't necessary to achieve strong results with the frozen base. It remains commented out for easy re-enabling.
| Parameter | Value | Reason |
|---|---|---|
| Image Size | 299Γ299 |
Required by InceptionV3 |
| Batch Size (Train) | 100 |
Balance between speed and stability |
| Batch Size (Validation) | 16 |
Lighter memory footprint |
| Optimizer | Adam |
Adaptive LR, works well out of the box |
| Loss | categorical_crossentropy |
Multi-class classification standard |
| Epochs | 10 (max) |
EarlyStopping handles the rest |
| Patience | 3 |
Stop if val_loss doesn't improve for 3 epochs |
EarlyStopping(monitor="val_loss", patience=3, restore_best_weights=True)The restore_best_weights=True flag ensures we always keep the best checkpoint automatically β no manual saving needed.
| Metric | Train | Validation |
|---|---|---|
| Accuracy | ~94% | ~92.5% |
| Loss | ~0.15 | ~0.20 |
The training loss steadily decreased from ~0.42 β ~0.15, while validation loss stabilized around 0.20. A small but consistent gap appeared after epoch 4.
Both curves climbed quickly in the first 2 epochs and plateaued β train at 94%, validation at 92.5%.
There is a mild overfitting present:
- The training accuracy kept improving after epoch 4, while validation accuracy plateaued.
- The gap (~1.5%) is not alarming, but it's a signal.
The model is still generalizing well β 92.5% validation accuracy on a 6-class problem is solid. But this is something to address in future iterations.
- Add Dropout before the final Dense layer to reduce overfitting
- Enable Data Augmentation (already scaffolded in the code, just uncomment)
- Experiment with other architectures β EfficientNetV2, VGG, ConvNeXt
- Use ReduceLROnPlateau callback alongside EarlyStopping
- Use CheckPoint callback
| Tool | Purpose |
|---|---|
| Python 3 | Core language |
| TensorFlow / Keras | Model building & training |
| InceptionV3 | Pretrained base model |
| NumPy | Array operations |
| Pandas | Prediction dataframe |
| Matplotlib | Visualization |
- Clone the repo and set up your data:
Data/
βββ seg_train/
βββ seg_test/
βββ seg_pred/
- Install dependencies:
pip install tensorflow numpy pandas matplotlib- Run the notebook:
jupyter notebook intel_image_classification.ipynb- The
seg_predfolder is handled differently from train/test β it usesflow_from_dataframeinstead offlow_from_directorysince it has no class subfolders. - Labels are automatically inferred from the
seg_trainfolder structure usingos.listdir. - Predictions include a confidence score (the max softmax probability) displayed alongside each image.