new version of classification tutorial

kmoscoe · kmoscoe · commit 517df17b739d · 2025-05-08T10:14:34.000-07:00
diff --git a/notebooks/v2/intro_data_science/Classification_and_Model_Evaluation.ipynb b/notebooks/v2/intro_data_science/Classification_and_Model_Evaluation.ipynb
@@ -143,7 +143,7 @@
         "id": "JYLCmVag2YxO"
       },
       "source": [
-        "### Loading the data\n",
+        "### Load the data\n",
         "\n",
         "We'll query data using the Data Commons API, storing it in a Pandas DataFrame. To demonstrate the concepts with a more reasonably sized dataset, we'll first analyze the 500 largest cities by population. To do that:\n",
         "* We use the statistical variable `Count_Person`.\n",
@@ -1719,9 +1719,9 @@
         "id": "bFJOOjyOVje8"
       },
       "source": [
-        "**1A)** Which model do you think is better, Classifier 1 or Classifier 2? Explain your reasoning.\n",
+        "**1.1A)** Which model do you think is better, Classifier 1 or Classifier 2? Explain your reasoning.\n",
         "\n",
-        "**1B)** Classifier 2 has a higher accuracy than Classifier 1, but has a more complicated decision boundary. Which do you think would generalize best to new data?\n"
+        "**1.1B)** Classifier 2 has a higher accuracy than Classifier 1, but has a more complicated decision boundary. Which do you think would generalize best to new data?\n"
       ]
     },
     {
@@ -1912,9 +1912,9 @@
         "id": "51H3zw11xMnF"
       },
       "source": [
-        "**2A)** In light of all the new data points, now which classifier do you think is better, Classifer 1 or Classifier 2? Explain your reasoning.\n",
+        "**1.2A)** In light of all the new data points, now which classifier do you think is better, Classifer 1 or Classifier 2? Explain your reasoning.\n",
         "\n",
-        "**2B)** In question 1, Classifier 1 had a *lower* accuracy than Classifier 2. After adding more data points, we now see the reverse, with Classifier 1 having a *higher* accuracy than Classifier 2. What happened? Give an explanation (or at least your best guess) for why this is."
+        "**1.2B)** In question 1, Classifier 1 had a *lower* accuracy than Classifier 2. After adding more data points, we now see the reverse, with Classifier 1 having a *higher* accuracy than Classifier 2. What happened? Give an explanation (or at least your best guess) for why this is."
       ]
     },
     {
@@ -2425,11 +2425,11 @@
         "\n",
         "> There are two classes, A and B. We have 100 data points in our dataset. Of these 100 data points, 99 points are labeled class A, while only 1 of the data points is labeled class B.\n",
         "\n",
-        "**2.1A)** Consider a model that always predicts class A. What is the accuracy of this always-A model?\n",
+        "**2.1.1A)** Consider a model that always predicts class A. What is the accuracy of this always-A model?\n",
         "\n",
-        "**2.1B)** How well do you expect the always-A model to perform on new, previously unseen data? Assume the new data follows the same distribution as the original 100 data points.\n",
+        "**2.1.1B)** How well do you expect the always-A model to perform on new, previously unseen data? Assume the new data follows the same distribution as the original 100 data points.\n",
         "\n",
-        "**2.1C)** Run the following code block to calculate the classification accuracy of our large model. Is the accuracy higher or lower than you expected?"
+        "**2.1.1C)** Run the following code block to calculate the classification accuracy of our large model. Is the accuracy higher or lower than you expected?"
       ]
     },
     {
@@ -2659,8 +2659,8 @@
         "id": "Sw8y60E1N7AV"
       },
       "source": [
-        "#### 2.4.1) What about regression? -- Mean Squared Error\n",
-        "Different models and different problems often use different accuracy metrics. You may have noticed that classification accuracy doesn't make much sense for regression problems, where instead of predicting a label, the model predicts a numeric value. In regression, a common accuracy metric is the Mean Squared Error, or MSE.\n",
+        "#### 2.4.1) What about regression? -- mean squared error\n",
+        "Different models and different problems often use different accuracy metrics. You may have noticed that classification accuracy doesn't make much sense for regression problems, where instead of predicting a label, the model predicts a numeric value. In regression, a common accuracy metric is the mean squared error, or MSE.\n",
         "\n",
         "$ MSE = \\frac{1}{\\text{# total data points}}\\sum_{\\text{all data points}}(\\text{predicted value} - \\text{actual value})^2$\n",
         "\n",