You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is the optimal number of predictors we should use for our final model?
226
+
227
227
## Bias-Variance Trade-off
228
228
229
229
Another way to describe the underfitting/overfitting phenoma is via the theory "**Bias-Variance Trade-off".** It breaks down our Testing Error of a single model by the following:
@@ -333,7 +333,9 @@ This is a Piecewise Cubic Regression, an example can be seen in the top panel of
333
333
334
334
Here, we end up using 8 predictors for our model. We see something that looks off immediately: our model is not continuous at the cutoff point! To fix the problem, we can constrain our model to be continuous: we require that the first and second derivatives of the piecewise polynomials to be continuous at the cutoff point. This fix is shown in the bottom panel, which is called **Cubic Spline Regression**. We can increase the number of cutoff points as we like in a piecewise or spline model. This cubic spline model uses $K + 4$ predictors, where $K$ is the number of cutoff points used.
335
335
336
-
To pick the number of cutoff points, we can also perform cross validation:
336
+
To pick the number of cutoff points, we can also perform cross validation.
337
+
338
+
For 10 cutoff points, here is the cross validation result:
337
339
338
340
```{python}
339
341
y, X = model_matrix("MeanBloodPressure ~ BMI + cs(BMI, df=10)", nhanes_tiny)
0 commit comments