You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This forces the right hand side of the equation to be between 0 and 1, which is at the scale of probability. The relationship between the X and Y axis is not going to be a straight line, but rather a non-linear, "S-shaped" one. Let's fit this model and look at the model visually to understand.
plt.ylabel('Proportion of people with Hypertension')
104
104
plt.ylim(0, 1)
105
105
plt.legend()
106
106
plt.show()
@@ -146,7 +146,15 @@ Okay, that's a starting point!
146
146
147
147
However, we need to be mindful of the class imbalance we saw in the dataset at the beginning of the lesson. Recall we roughly have 88% of our data as No Hypertension. If we have a classifier that *always* predicted No Hypertension, then we achieve a 88% accuracy rate, but this model is not particularly novel and it raises questions of whether our model of 76% accuracy is novel.
148
148
149
-
We can break down classification accuracy to four additional results, via a table called the **Confusion Matrix**:
149
+
Well, break down the accuracy by the Hypertension events and No Hypertension events:
150
+
151
+
Our **Sensitivity** (accuracy of Hypertension events) is defined as: $\frac{TruePositives}{TruePostives+FalseNegatives}$, which is 15/(15+325) = 4%
152
+
153
+
Our **Specificity** (accuracy of No Hypertension events) is defined as: $\frac{TrueNegatives}{TrueNegatives+FalsePositives}$, which is 1128/(1128+24) = 98%.
154
+
155
+
Therefore, we do a pretty terrible job of predicting the Hypertension cases!
156
+
157
+
We can describe the detailed numbers via a table called the **Confusion Matrix**:
150
158
151
159
```{python}
152
160
cm = confusion_matrix(y_test, logit_model.predict(X_test))
@@ -157,11 +165,6 @@ plt.show()
157
165
158
166
The top left hand corner is the number of True Negatives (1128), the top right hand corner is the number of False Positives (24), the bottom left corner is the number of False Negatives (325), and the bottom right corner is the number of True Positives (15).
159
167
160
-
Our **Sensitivity** (accuracy of Hypertension events) is defined as: $\frac{TP}{TP+FN}$, which is 15/(15+325) = 4%
161
-
162
-
Our **Specificity** (accuracy of No Hypertension events) is defined as: $\frac{TN}{TN+FP}$, which is 1128/(1128+24) = 98%.
163
-
164
-
Therefore, we do a pretty terrible job of predicting the Hypertension cases!
165
168
166
169
What happened exactly? Let's look back at the Training Data: it seems that from the plots that we are making predictions of Hypertension for BMI of 50 or more. However, there are so few people with such a high BMI that even if most of those folks have Hypertension, the model missed most of the folks with Hypertension in the 20-40 BMI range. This range wasn't high enough for our decision boundary of 50% probability, so we missed out most of our Hypertension people.
0 commit comments