Kariusdi
diff --git a/‎Generalization/README.md‎
Lines changed: 54 additions & 0 deletions b/‎Generalization/README.md‎
Lines changed: 54 additions & 0 deletions
diff --git a/‎Generalization/assets/genaralize.png‎
506 KB b/‎Generalization/assets/genaralize.png‎
506 KB
diff --git a/‎Performance-Estimation/Nested-CrossValidation/README.md‎
Lines changed: 34 additions & 1 deletion b/‎Performance-Estimation/Nested-CrossValidation/README.md‎
Lines changed: 34 additions & 1 deletion
diff --git a/‎Performance-Estimation/Nested-CrossValidation/assets/nested.png‎
334 KB b/‎Performance-Estimation/Nested-CrossValidation/assets/nested.png‎
334 KB
diff --git a/‎README.md‎
Lines changed: 1 addition & 1 deletion b/‎README.md‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎Regularization/README.md‎
Lines changed: 5 additions & 0 deletions b/‎Regularization/README.md‎
Lines changed: 5 additions & 0 deletions
diff --git a/‎…gularization/usingLibs/Ridge&Lasso.ipynb‎ ‎…ularization/playground/Ridge&Lasso.ipynb‎Regularization/usingLibs/Ridge&Lasso.ipynb renamed to Regularization/playground/Ridge&Lasso.ipynb b/‎…gularization/usingLibs/Ridge&Lasso.ipynb‎ ‎…ularization/playground/Ridge&Lasso.ipynb‎Regularization/usingLibs/Ridge&Lasso.ipynb renamed to Regularization/playground/Ridge&Lasso.ipynb
diff --git a/‎Regularization/usingLibs/Ridge_ass1.py‎ ‎Regularization/playground/Ridge_ass1.py‎Regularization/usingLibs/Ridge_ass1.py renamed to Regularization/playground/Ridge_ass1.py b/‎Regularization/usingLibs/Ridge_ass1.py‎ ‎Regularization/playground/Ridge_ass1.py‎Regularization/usingLibs/Ridge_ass1.py renamed to Regularization/playground/Ridge_ass1.py
diff --git a/‎…zation/usingLibs/model-cross-ridge.ipynb‎ ‎…ation/playground/model-cross-ridge.ipynb‎Regularization/usingLibs/model-cross-ridge.ipynb renamed to Regularization/playground/model-cross-ridge.ipynb b/‎…zation/usingLibs/model-cross-ridge.ipynb‎ ‎…ation/playground/model-cross-ridge.ipynb‎Regularization/usingLibs/model-cross-ridge.ipynb renamed to Regularization/playground/model-cross-ridge.ipynb
diff --git a/‎Regularization/scratching/lamda.py‎ ‎Regularization/scratching/5.2.py‎Regularization/scratching/lamda.py renamed to Regularization/scratching/5.2.py
Lines changed: 1 addition & 1 deletion b/‎Regularization/scratching/lamda.py‎ ‎Regularization/scratching/5.2.py‎Regularization/scratching/lamda.py renamed to Regularization/scratching/5.2.py
Lines changed: 1 addition & 1 deletion
@@ -0,0 +1,54 @@
+# Generalization 👻
+
+Generalization refers to a model's ability to perform well on new, unseen data. In linear regression, achieving good generalization means that the model captures the underlying trend in the data without overfitting or underfitting.
+
+## Need to know for this section 👨🏽‍💻
+
+### Bias and Variance
+
+- <mark>**_Bias_**:</mark>
+
+  Bias is the error introduced by approximating a real-world problem (which may be complex) by a simplified model. In linear regression, high bias often occurs when the model is too simple to capture the underlying patterns in the data.
+
+  > **_High bias_** can lead to underfitting, where the model fails to capture the underlying trend and performs poorly on both training and test data.
+
+- <mark>**_Variance_**:</mark>
+
+  Variance is the error introduced by the model’s sensitivity to fluctuations in the training data. High variance occurs when the model is too complex and learns not only the underlying patterns but also the noise in the training data.
+
+  > **_High variance_** can lead to overfitting, where the model performs very well on training data but poorly on test data due to its excessive complexity.
+
+</br>
+
+![generalization](./assets/genaralize.png)
+</br>
+
+> For the left img is "high bias but low variance". On the other hand, the right img is "low bias and high varaince". Which both cause the overfitting and underfitting!
+
+### Overfitting vs. Underfitting
+
+- Overfitting:
+
+  Occurs when a model learns the noise in the training data rather than the actual signal. This results in excellent performance on training data but poor performance on new, unseen data.
+
+  <mark>High variance, low training error, high test error. (low bias)</mark>
+
+- Underfitting:
+
+  Occurs when a model is too simplistic to capture the underlying patterns in the data. This results in poor performance on both training and test data.
+
+  <mark>High bias, high training error, high test error.</mark>
+
+### Goal Representation
+
+Bias vs. Variance Trade-off:
+
+- **_High Bias (Underfitting)_** ⟶ Simple Model ⟶ High Training Error, High Test Error
+- **_Low Bias, Low Variance_** ⟶ Optimal Model ⟶ Low Training Error, Low Test Error
+- **_High Variance (Overfitting)_** ⟶ Complex Model ⟶ Low Training Error, High Test Error
+
+> Training error is E in and Testing error is E out
+
+### Note 🚨
+
+We can visualize overfitting and underfitting by making a learning curve. You can follow the link https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/LearningCurve.
@@ -1 +1,34 @@
-Use jupyternotebook
+# Nested Cross-Validation 🧵
+
+Nested cross-validation is a robust technique used to evaluate the performance of a machine learning model and to tune its hyperparameters. It involves two nested loops of cross-validation to avoid overfitting and to ensure that hyperparameter tuning does not bias the model evaluation.
+
+![nested](./assets/nested.png)
+</br>
+
+> We use Jupytor notebook for this section
+
+## Need to know for this section 👨🏽‍💻
+
+### How It Works
+
+1. <mark>**_Outer Cross-Validation Loop:_**</mark>
+
+   - Purpose: Estimates the generalization performance of the model.
+
+   - Process: The dataset is divided into several folds (e.g., 5 or 10). For each iteration, one fold is held out as the test set, while the remaining folds are used for training and hyperparameter tuning.
+
+2. <mark>**_Inner Cross-Validation Loop:_**</mark>
+
+   - Purpose: Selects the best hyperparameters for the model.
+
+   - Process: Within each training set from the outer loop, the data is further split into training and validation sets. The model is trained on the training set with different hyperparameters and evaluated on the validation set to find the optimal hyperparameters.
+
+### Benefits
+
+- **Unbiased Evaluation:** Provides an unbiased estimate of the model's performance by ensuring that hyperparameter tuning does not influence the test set performance.
+
+- **Robustness:** Helps in selecting the best model and its hyperparameters while mitigating overfitting.
+
+### Example Use
+
+Nested cross-validation is particularly useful for models with many hyperparameters or when working with small datasets, as it provides a reliable estimate of model performance and parameter settings.
@@ -7,7 +7,7 @@ This is for ML class as a senior year, 2567.
 - [Linear Regression](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Linear-Regression)
 - [Performance Estimation](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Performance-Estimation)
   - [Experiments with Python](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Performance-Estimation/Experiments-python)
-  - [Nested Cross Validation](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Performance-Estimation/Nested_CV)
+  - [Nested Cross Validation](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Performance-Estimation/Nested-CrossValidation)
 - [Generalization](https://github.com/Kariusdi/Machine-Learning-Class67/tree/main/Generalization)
   - [Constant Model](https://github.com/Kariusdi/Machine-Learning-Class/tree/main/Generalization/ConstantModel)
   - [Linear Model](https://github.com/Kariusdi/Machine-Learning-Class/tree/main/Generalization/LinearModel)
 
@@ -90,3 +90,8 @@ In linear regression, the goal is to find the best-fitting line (or hyperplane i
 ### Summary 💼
 
 In summary, regularization helps control model complexity, improves generalization, and enhances the performance of linear regression models by adding a penalty for large coefficients.
+
+> Please follow this link, it's very helpful.
+
+https://youtu.be/Xm2C_gTAl8c?si=kVIxoW-fcGWpWEi3
+https://www.youtube.com/watch?v=Q81RR3yKn30
@@ -42,7 +42,7 @@ def plot_rmse_vs_alpha(alphas, train_rmse, test_rmse):
     plt.tight_layout()
     plt.show()
 
-X_train, Y_train, X_test, Y_test = import_csv("Regularization/dataset/HeightWeight.csv")
+X_train, Y_train, X_test, Y_test = import_csv("../dataset/HeightWeight.csv")
 #X_train, Y_train, X_test, Y_test = generate_sin()
 
 alphas = np.arange(1, 100000, 100)