small adjustments and error corrections

fbkarsdorp · fbkarsdorp · commit 8c95cbd5a13f · 2014-07-26T15:00:44.000+02:00
diff --git a/Chapter 9 - Learning from Examples.ipynb b/Chapter 9 - Learning from Examples.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:cfdf2e2f87626e61a47356de271a32958896c29bce6ec4af4456a82dc25b3c9b"
+  "signature": "sha256:75c7f5b623b7ad7980d3aefb8f0f625a1452e13d82768f661cda9499d7ce63d7"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -70,15 +70,15 @@
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-      "      <th>ham</th>\n",
+      "      <th>spam</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-      "      <th>ham</th>\n",
+      "      <th>spam</th>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
@@ -165,7 +165,8 @@
      ],
      "language": "python",
      "metadata": {},
-     "outputs": []
+     "outputs": [],
+     "prompt_number": 1
     },
     {
      "cell_type": "markdown",
@@ -331,7 +332,7 @@
       "  </tbody>\n",
       "</table>\n",
       "\n",
-      "What is the probability of $P(y=\\textrm{Walt Whitman}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$? And what is the probability of $P(y=\\textrm{J.K. Rowling}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$?"
+      "What is the probability of $P(y=\\textrm{Walt Whitman}|x = [12, 10, 1, 8, 0, 4, 0, 0, 0, 4])$? And what is the probability of $P(y=\\textrm{J.K. Rowling}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$?"
      ]
     },
     {
@@ -472,10 +473,11 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "**d)** Now that we know how to compute the posterior probability, it is time to implement our naive bayes learner. We will start with implementing the `fit` method. The `fit` method has two core jobs:\n",
+      "**d)** Now that we know how to compute the posterior probability, it is time to implement our naive bayes learner. We will start with implementing the `fit` method. The `fit` method has three core jobs:\n",
       "\n",
       "1. extract all the counts of each feature given each class;\n",
-      "2. count how often each class label occurs in the training data.\n",
+      "2. count how often each class label occurs in the training data;\n",
+      "3. count the number of unique features.\n",
       "\n",
       "The `NaiveBayesLearner` below provides the skeleton of our class. Implement the `fit` method."
      ]
@@ -500,6 +502,8 @@
       "        self.C = # insert your code here (class counts)\n",
       "        self.N = # insert your code here (feature counts per class)\n",
       "        # add the feature counts per class here\n",
+      "        \n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x])) # number of unique features\n",
       "       \n",
       "    def predict(self, x):\n",
       "        \"\"\"Predict the outcome for example x. Choose the most\n",
@@ -570,6 +574,7 @@
       "        self.N = defaultdict(Counter)\n",
       "        for x, y_x in zip(X, y):\n",
       "            self.N[y_x] += Counter(x)\n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x]))\n",
       "\n",
       "    def prior(self, y):\n",
       "        \"\"\"Return the prior probability of class y.\"\"\"\n",
@@ -701,6 +706,7 @@
       "        self.N = defaultdict(Counter)\n",
       "        for x, y_x in zip(X, y):\n",
       "            self.N[y_x] += Counter(x)\n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x]))\n",
       "\n",
       "    def prior(self, y):\n",
       "        \"\"\"Return the prior probability of class y.\"\"\"\n",
@@ -709,7 +715,7 @@
       "    def probability(self, x, y):\n",
       "        \"\"\"Apply Laplace Smoothing to give a probability\n",
       "        estimate of feature x given y.\"\"\"\n",
-      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + len(self.N))\n",
+      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + self.V)\n",
       "\n",
       "    def predict(self, x):\n",
       "        \"\"\"Predict the outcome for example x. Choose the most\n",
diff --git a/answerbook/Chapter 9 - Learning from Examples.ipynb b/answerbook/Chapter 9 - Learning from Examples.ipynb
@@ -1,7 +1,7 @@
 {
  "metadata": {
   "name": "",
-  "signature": "sha256:0d06f7c0dcbe740f33bc07e6b2f3b75bd99464160080eeb0dc89c4b381ba494a"
+  "signature": "sha256:945f626408fb0a4ebaefe616111b44b5fbd70d5fef9361265f942a7392cac81e"
  },
  "nbformat": 3,
  "nbformat_minor": 0,
@@ -70,15 +70,15 @@
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-      "      <th>ham</th>\n",
+      "      <th>spam</th>\n",
       "      <td> 0</td>\n",
       "      <td> 1</td>\n",
       "      <td> 1</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
-      "      <th>ham</th>\n",
+      "      <th>spam</th>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
       "      <td> 0</td>\n",
@@ -332,7 +332,7 @@
       "  </tbody>\n",
       "</table>\n",
       "\n",
-      "What is the probability of $P(y=\\textrm{Walt Whitman}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$? And what is the probability of $P(y=\\textrm{J.K. Rowling}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$?"
+      "What is the probability of $P(y=\\textrm{Walt Whitman}|x = [12, 10, 1, 8, 0, 4, 0, 0, 0, 4])$? And what is the probability of $P(y=\\textrm{J.K. Rowling}|x = [7, 4, 0, 0, 0, 12, 6, 8, 3, 0])$?"
      ]
     },
     {
@@ -473,10 +473,11 @@
      "cell_type": "markdown",
      "metadata": {},
      "source": [
-      "**d)** Now that we know how to compute the posterior probability, it is time to implement our naive bayes learner. We will start with implementing the `fit` method. The `fit` method has two core jobs:\n",
+      "**d)** Now that we know how to compute the posterior probability, it is time to implement our naive bayes learner. We will start with implementing the `fit` method. The `fit` method has three core jobs:\n",
       "\n",
       "1. extract all the counts of each feature given each class;\n",
-      "2. count how often each class label occurs in the training data.\n",
+      "2. count how often each class label occurs in the training data;\n",
+      "3. count the number of unique features.\n",
       "\n",
       "The `NaiveBayesLearner` below provides the skeleton of our class. Implement the `fit` method."
      ]
@@ -502,6 +503,7 @@
       "        self.N = defaultdict(Counter) # insert your code here (feature counts per class)\n",
       "        for x, y_x in zip(X, y):\n",
       "            self.N[y_x] += Counter(x)\n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x])) # number of unique features\n",
       "       \n",
       "    def predict(self, x):\n",
       "        \"\"\"Predict the outcome for example x. Choose the most\n",
@@ -572,6 +574,7 @@
       "        self.N = defaultdict(Counter)\n",
       "        for x, y_x in zip(X, y):\n",
       "            self.N[y_x] += Counter(x)\n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x]))\n",
       "\n",
       "    def prior(self, y):\n",
       "        \"\"\"Return the prior probability of class y.\"\"\"\n",
@@ -583,7 +586,7 @@
       "        \"\"\"Apply Laplace Smoothing to give a probability\n",
       "        estimate of feature x given y.\"\"\"\n",
       "        # insert your code here\n",
-      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + len(self.N))\n",
+      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + self.V)\n",
       "\n",
       "# these tests should return True if your code is correct\n",
       "nb = NaiveBayesLearner()\n",
@@ -706,6 +709,7 @@
       "        self.N = defaultdict(Counter)\n",
       "        for x, y_x in zip(X, y):\n",
       "            self.N[y_x] += Counter(x)\n",
+      "        self.V = len(set(x for y_x in self.N for x in self.N[y_x]))\n",
       "\n",
       "    def prior(self, y):\n",
       "        \"\"\"Return the prior probability of class y.\"\"\"\n",
@@ -714,7 +718,7 @@
       "    def probability(self, x, y):\n",
       "        \"\"\"Apply Laplace Smoothing to give a probability\n",
       "        estimate of feature x given y.\"\"\"\n",
-      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + len(self.N))\n",
+      "        return (self.N[y][x] + 1.0) / (sum(self.N[y].values()) + self.V)\n",
       "\n",
       "    def predict(self, x):\n",
       "        \"\"\"Predict the outcome for example x. Choose the most\n",
@@ -1659,7 +1663,7 @@
      "input": [
       "from IPython.core.display import HTML\n",
       "def css_styling():\n",
-      "    styles = open(\"styles/custom.css\", \"r\").read()\n",
+      "    styles = open(\"../styles/custom.css\", \"r\").read()\n",
       "    return HTML(styles)\n",
       "css_styling()"
      ],
@@ -1723,13 +1727,13 @@
        ],
        "metadata": {},
        "output_type": "pyout",
-       "prompt_number": 370,
+       "prompt_number": 10,
        "text": [
-        "<IPython.core.display.HTML at 0x117f224e0>"
+        "<IPython.core.display.HTML at 0x103fc0ac8>"
        ]
       }
      ],
-     "prompt_number": 370
+     "prompt_number": 10
     },
     {
      "cell_type": "markdown",