|
143 | 143 | "id": "JYLCmVag2YxO" |
144 | 144 | }, |
145 | 145 | "source": [ |
146 | | - "### Loading the data\n", |
| 146 | + "### Load the data\n", |
147 | 147 | "\n", |
148 | 148 | "We'll query data using the Data Commons API, storing it in a Pandas DataFrame. To demonstrate the concepts with a more reasonably sized dataset, we'll first analyze the 500 largest cities by population. To do that:\n", |
149 | 149 | "* We use the statistical variable `Count_Person`.\n", |
|
1719 | 1719 | "id": "bFJOOjyOVje8" |
1720 | 1720 | }, |
1721 | 1721 | "source": [ |
1722 | | - "**1A)** Which model do you think is better, Classifier 1 or Classifier 2? Explain your reasoning.\n", |
| 1722 | + "**1.1A)** Which model do you think is better, Classifier 1 or Classifier 2? Explain your reasoning.\n", |
1723 | 1723 | "\n", |
1724 | | - "**1B)** Classifier 2 has a higher accuracy than Classifier 1, but has a more complicated decision boundary. Which do you think would generalize best to new data?\n" |
| 1724 | + "**1.1B)** Classifier 2 has a higher accuracy than Classifier 1, but has a more complicated decision boundary. Which do you think would generalize best to new data?\n" |
1725 | 1725 | ] |
1726 | 1726 | }, |
1727 | 1727 | { |
|
1912 | 1912 | "id": "51H3zw11xMnF" |
1913 | 1913 | }, |
1914 | 1914 | "source": [ |
1915 | | - "**2A)** In light of all the new data points, now which classifier do you think is better, Classifer 1 or Classifier 2? Explain your reasoning.\n", |
| 1915 | + "**1.2A)** In light of all the new data points, now which classifier do you think is better, Classifer 1 or Classifier 2? Explain your reasoning.\n", |
1916 | 1916 | "\n", |
1917 | | - "**2B)** In question 1, Classifier 1 had a *lower* accuracy than Classifier 2. After adding more data points, we now see the reverse, with Classifier 1 having a *higher* accuracy than Classifier 2. What happened? Give an explanation (or at least your best guess) for why this is." |
| 1917 | + "**1.2B)** In question 1, Classifier 1 had a *lower* accuracy than Classifier 2. After adding more data points, we now see the reverse, with Classifier 1 having a *higher* accuracy than Classifier 2. What happened? Give an explanation (or at least your best guess) for why this is." |
1918 | 1918 | ] |
1919 | 1919 | }, |
1920 | 1920 | { |
|
2425 | 2425 | "\n", |
2426 | 2426 | "> There are two classes, A and B. We have 100 data points in our dataset. Of these 100 data points, 99 points are labeled class A, while only 1 of the data points is labeled class B.\n", |
2427 | 2427 | "\n", |
2428 | | - "**2.1A)** Consider a model that always predicts class A. What is the accuracy of this always-A model?\n", |
| 2428 | + "**2.1.1A)** Consider a model that always predicts class A. What is the accuracy of this always-A model?\n", |
2429 | 2429 | "\n", |
2430 | | - "**2.1B)** How well do you expect the always-A model to perform on new, previously unseen data? Assume the new data follows the same distribution as the original 100 data points.\n", |
| 2430 | + "**2.1.1B)** How well do you expect the always-A model to perform on new, previously unseen data? Assume the new data follows the same distribution as the original 100 data points.\n", |
2431 | 2431 | "\n", |
2432 | | - "**2.1C)** Run the following code block to calculate the classification accuracy of our large model. Is the accuracy higher or lower than you expected?" |
| 2432 | + "**2.1.1C)** Run the following code block to calculate the classification accuracy of our large model. Is the accuracy higher or lower than you expected?" |
2433 | 2433 | ] |
2434 | 2434 | }, |
2435 | 2435 | { |
|
2659 | 2659 | "id": "Sw8y60E1N7AV" |
2660 | 2660 | }, |
2661 | 2661 | "source": [ |
2662 | | - "#### 2.4.1) What about regression? -- Mean Squared Error\n", |
2663 | | - "Different models and different problems often use different accuracy metrics. You may have noticed that classification accuracy doesn't make much sense for regression problems, where instead of predicting a label, the model predicts a numeric value. In regression, a common accuracy metric is the Mean Squared Error, or MSE.\n", |
| 2662 | + "#### 2.4.1) What about regression? -- mean squared error\n", |
| 2663 | + "Different models and different problems often use different accuracy metrics. You may have noticed that classification accuracy doesn't make much sense for regression problems, where instead of predicting a label, the model predicts a numeric value. In regression, a common accuracy metric is the mean squared error, or MSE.\n", |
2664 | 2664 | "\n", |
2665 | 2665 | "$ MSE = \\frac{1}{\\text{# total data points}}\\sum_{\\text{all data points}}(\\text{predicted value} - \\text{actual value})^2$\n", |
2666 | 2666 | "\n", |
|
0 commit comments