A comparative study analyzing how L2 regularization and Dropout individually and combined affect neural network generalization, bias-variance tradeoff, and overfitting prevention.
This project implements a 3-layer neural network and compares four different regularization approaches:
| Model | Train Accuracy | Test Accuracy | Observation |
|---|---|---|---|
| No Regularization / No Dropout | 94.8% | 91.5% | Overfitting |
| Only L2 Regularization | 93.8% | 93.0% | Balanced |
| Only Dropout | 92.9% | 95.0% | Underfitting |
| Both Regularization & Dropout | 98.0% | 95.0% | Best Generalization |
| Approach | Bias | Variance | Status |
|---|---|---|---|
| No Regularization / No Dropout | Low | High | Overfitting |
| Only L2 Regularization | Moderate | Moderate | Balanced |
| Only Dropout | High | Low | Underfitting |
| Both Methods | Low | Low | Improved Generalization |
| Epoch | Loss | Train Accuracy |
|---|---|---|
| 96 | 0.1067 | 96.00% |
| 97 | 0.0906 | 97.27% |
| 98 | 0.0915 | 96.79% |
| 99 | 0.0938 | 97.09% |
| 100 | 0.1018 | 96.36% |
- Adds penalty term to cost function:
λ/2m * Σ||W||² - Prevents excessive reliance on specific features
- Gradient update includes regularization term:
dW = 1/m * np.dot(dZ, A_prev.T) + (lambd * W)/m
- Randomly shuts down neurons during training with probability
keep_prob - Forces network to learn redundant representations
- Scales activations:
A = A / keep_probto maintain expected values
- Uses L2 regularization to constrain weights
- Uses Dropout to prevent co-adaptation
- Achieves best generalization performance
Overall Test Accuracy: 95.0%
pip install numpy matplotlib scikit-learn- Clone the repository
- Ensure reg_utils.py and testCases.py are in the same directory
- Run the Jupyter notebook A3_DL_Tayyabah Rehman_017.ipynb
- The notebook contains implementations of: Forward/backward propagation L2 regularization Dropout
- Comparison of all four approaches
- Combining regularization and dropout significantly improves generalization
- The combined model achieved 98% train accuracy and 95% test accuracy
- L2 regularization alone provides balanced bias-variance tradeoff
- Dropout alone slightly underfits but shows robustness
- No regularization leads to clear overfitting