Skip to content

Commit 56c3db7

Browse files
committed
πŸ“ added readme for regularization explanation.
1 parent a9c3f4e commit 56c3db7

6 files changed

Lines changed: 91 additions & 0 deletions

File tree

β€ŽRegularization/README.mdβ€Ž

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1 +1,92 @@
11
# Regularization 🦺
2+
3+
### Why Regularization?
4+
5+
In linear regression, the goal is to find the best-fitting line (or hyperplane in higher dimensions) through the data. However, if the model is too complex (e.g., using too many features or having excessively large weights), it may fit the training data very well but perform poorly on new data due to overfitting. Regularization helps address this issue by constraining or penalizing the complexity of the model by **_increasing bias_** a bit for **_decreasing variance_**. Which make the model isn't overfitting and it gets a better performance.
6+
7+
> Example of **_increasing bias_** a bit for **_decreasing variance_**
8+
9+
![comparison](./assets/compare.png)
10+
</br>
11+
12+
## Need to know for this section πŸ‘¨πŸ½β€πŸ’»
13+
14+
### Types of Regularization
15+
16+
1. <mark>**Ridge Regression (L2 Regularization):**</mark>
17+
18+
**_Defination_**
19+
20+
Ridge regression shrinks the coefficients towards zero but does not necessarily set them to zero. This leads to a model where all features are included, but their effects are reduced, making the model simpler and more robust.
21+
22+
**_Penalty Term_**
23+
24+
The penalty term **_Ξ» _ βˆ‘ w_j^2\*** is the sum of the squares of the model coefficients. This term discourages large coefficients by adding their squared values to the loss function.
25+
26+
**_Loss function Form_**:
27+
28+
```math
29+
Loss = MSE + Ξ» * βˆ‘ w_j^2
30+
```
31+
32+
**_Machine Learning Components for Linear Regression_**
33+
34+
![ridge components](./assets/ridge.png)
35+
</br>
36+
37+
![gradient ridge](./assets/gradientRidge.png)
38+
</br>
39+
40+
![normal ridge](./assets/normalRidge.png)
41+
</br>
42+
43+
> we have to define identity metrix for normal equation as the last fig
44+
45+
2. <mark>**Lasso Regression (L1 Regularization):**</mark>
46+
47+
**_Definition_**
48+
49+
Lasso regression encourages sparsity in the model by driving some coefficients exactly to zero, effectively selecting a subset of features. This can simplify the model and aid in feature selection.
50+
51+
**_Penalty Term_**
52+
53+
The penalty term Ξ» βˆ‘ | w_j | is the sum of the absolute values of the model coefficients. This term discourages large coefficients by adding their absolute values to the loss function.
54+
55+
**_Loss function Form:_**
56+
57+
```math
58+
Loss = MSE + Ξ» * βˆ‘ |w_j|
59+
```
60+
61+
3. <mark>**Elastic Net Regression:**</mark>
62+
63+
**_Definition_**
64+
65+
Elastic Net regression combines both Ridge and Lasso regression. It provides a balance between Lasso and Ridge regression by including both sparsity (feature selection) and coefficient shrinkage. This can be particularly useful when dealing with highly correlated features.
66+
67+
**_Penalty Term_**
68+
69+
The penalty term Ξ»_1 βˆ‘ |w_j| + Ξ»_2 βˆ‘ w_j^2 is a combination of the L1 and L2 penalties. This term balances between encouraging sparsity (L1) and shrinking coefficients (L2).
70+
71+
**_Loss function Form:_**
72+
73+
```math
74+
Loss = MSE + Ξ»_1 * βˆ‘ |w_j| + Ξ» * βˆ‘ w_j^2
75+
```
76+
77+
### How It Works?
78+
79+
- **_Loss Function:_** Regularization modifies the loss function (e.g., Mean Squared Error) by adding a penalty term. This term discourages overly large weights or a large number of non-zero weights.
80+
81+
- **_Regularization Parameter (Ξ»):_** Controls the strength of the penalty. A higher πœ† increases the penalty, leading to more regularization, and a lower Ξ» decreases the penalty, making the model closer to a regular linear regression.
82+
83+
### Benefits of Regularization
84+
85+
1. Reduces Overfitting: By penalizing large coefficients, regularization helps the model generalize better to new data.
86+
Feature Selection: L1 regularization can help in feature selection by shrinking some coefficients to zero.
87+
88+
2. Improves Stability: Helps in making the model more stable and robust, especially when dealing with multicollinearity (highly correlated features).
89+
90+
### Summary πŸ’Ό
91+
92+
In summary, regularization helps control model complexity, improves generalization, and enhances the performance of linear regression models by adding a penalty for large coefficients.
2.67 MB
Loading
126 KB
Loading
59.7 KB
Loading
115 KB
Loading
696 KB
Loading

0 commit comments

Comments
Β (0)