You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
<p>The linear regression model assumes that there is a straight line (linear) relationship between the predictors and the response. It doesn’t ask for the straight line relationship to be perfect, but rather on average the cloud of points has a linear shape. If that is not true, then our prediction is going to be less accurate.</p>
359
359
<p>We can check this relationship by seeing whether predictor is linear with the predicted response value, but this is cumbersome with multiple predictors. Rather, we typically calculate the <strong>residual</strong>, which is the difference between the response value and the predicted response value (similar to a type of model performance metrics we examined last week). Then, we can make a <strong>residual plot</strong> of the predicted response vs. residual. Ideally, this residual plot should have no pattern - some residuals above 0, some below 0, but no strong trend.</p>
360
360
<p>If there’s a trend in the data, that means there are non-linear associations between some of the predictors and the response.</p>
<p>We see there’s a slight curve in our residual plot. We will look at ways to deal with this later in this lecture.</p>
377
377
<p>In a model with more predictors, we can dig into more details by making a residual plot of a predictor vs. residual. This is often used to figure out which predictor is contributing to the shape of the predicted response vs. residual plot.</p>
<li><p>When there is a collinear relationship between three or more predictors, pairwise methods will fail. We may consider the Variance Inflation Factor to detect them, but doesn’t necessarily recommend which variables to remove.</p></li>
408
408
</ul>
409
409
<p>Suppose that we are consider the predictors of our training set:</p>
MeanBloodPressure= \beta_0 + \beta_1 \cdot Age + \beta_2 \cdot Age^2
470
470
\]</span></p>
471
471
<p>This is <em>still</em> a linear model – we have added a new predictor that gives us a quadratic shape. We use the <ahref="https://matthewwardrop.github.io/formulaic/latest/guides/splines/#poly"><code>poly()</code> function</a> to generate our polynomial predictor.</p>
@@ -515,7 +515,7 @@ <h2 data-number="3.4" class="anchored" data-anchor-id="interactions"><span class
515
515
<p>Here is another way to extend the Linear Model:</p>
516
516
<p>Suppose we think that <spanclass="math inline">\(BMI\)</span> and <spanclass="math inline">\(Gender\)</span> may be good predictors of <spanclass="math inline">\(MeanBloodPressure\)</span>:</p>
517
517
<p>Let’s explore the relationship between <spanclass="math inline">\(MeanBloodPressure\)</span> and <spanclass="math inline">\(BMI\)</span> separately for values of <spanclass="math inline">\(Gender\)</span>.</p>
<p><spanclass="math inline">\(\beta_0\)</span> is a parameter describing the intercept of the line, and <spanclass="math inline">\(\beta_1\)</span> is a parameter describing the slope of the line.</p>
585
585
<p>Suppose that from fitting the model on the Training Set, <spanclass="math inline">\(\beta_1=2\)</span>. That means increasing <spanclass="math inline">\(BMI\)</span> by 1 will lead to an increase of <spanclass="math inline">\(BloodPressure\)</span> by 2. This measures the strength of association between a variable and the outcome.</p>
0 commit comments