Skip to content

Commit dbd070e

Browse files
committed
Update 03-Classification.qmd
1 parent 18ba6e1 commit dbd070e

1 file changed

Lines changed: 2 additions & 3 deletions

File tree

03-Classification.qmd

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ plt.show()
108108

109109
This shows that the logistic model was able to model some of the relationship between $BMI$ and $P(Hypertension)$, but the model predicts much higher $P(Hypertension)$ at high values of $BMI$.
110110

111-
It is hard to figure out visually when the data can fall on a logistic S-curve - one can imagine a red line stretched out more, etc. If we move the equation around so that the right hand side is linear:
111+
Showing goodness of fit via this plot is rather difficult, because it is hard to figure out visually when the data can fall on a logistic S-curve - one can imagine a red line stretched out more, etc. If we move the equation around so that the right hand side is linear:
112112

113113
$$log(\frac{P(Hyptertension)}{1 - P(Hyptertension)}) = \beta_0 + \beta_1 \cdot BMI$$
114114

@@ -146,7 +146,7 @@ Okay, that's a starting point!
146146

147147
However, we need to be mindful of the class imbalance we saw in the dataset at the beginning of the lesson. Recall we roughly have 88% of our data as No Hypertension. If we have a classifier that *always* predicted No Hypertension, then we achieve a 88% accuracy rate, but this model is not particularly novel and it raises questions of whether our model of 76% accuracy is novel.
148148

149-
Well, break down the accuracy by the Hypertension events and No Hypertension events:
149+
Well, break down the accuracy by the Hypertension events and No Hypertension events:
150150

151151
Our **Sensitivity** (accuracy of Hypertension events) is defined as: $\frac{TruePositives}{TruePostives+FalseNegatives}$, which is 15/(15+325) = 4%
152152

@@ -165,7 +165,6 @@ plt.show()
165165

166166
The top left hand corner is the number of True Negatives (1128), the top right hand corner is the number of False Positives (24), the bottom left corner is the number of False Negatives (325), and the bottom right corner is the number of True Positives (15).
167167

168-
169168
What happened exactly? Let's look back at the Training Data: it seems that from the plots that we are making predictions of Hypertension for BMI of 50 or more. However, there are so few people with such a high BMI that even if most of those folks have Hypertension, the model missed most of the folks with Hypertension in the 20-40 BMI range. This range wasn't high enough for our decision boundary of 50% probability, so we missed out most of our Hypertension people.
170169

171170
What can we do? There are lot's of things we can change about the model, but let's tinker around with the decision boundary for a moment. We can change the decision boundary to be lower, which will improve our sensitivity at the expense of our specificity, and vice versa if we change the decision boundary to be higher. What if we set the new decision boundary to be .2?

0 commit comments

Comments
 (0)