fhdsl
diff --git a/‎docs/01-Problem-Setup.html‎
Lines changed: 15 additions & 15 deletions b/‎docs/01-Problem-Setup.html‎
Lines changed: 15 additions & 15 deletions
diff --git a/‎docs/01-Problem-Setup_files/figure-html/cell-12-output-1.png‎
0 Bytes b/‎docs/01-Problem-Setup_files/figure-html/cell-12-output-1.png‎
0 Bytes
diff --git a/‎docs/01-Problem-Setup_files/figure-html/cell-2-output-1.png‎
0 Bytes b/‎docs/01-Problem-Setup_files/figure-html/cell-2-output-1.png‎
0 Bytes
diff --git a/‎docs/01-Problem-Setup_files/figure-html/cell-3-output-2.png‎
0 Bytes b/‎docs/01-Problem-Setup_files/figure-html/cell-3-output-2.png‎
0 Bytes
diff --git a/‎docs/01-Problem-Setup_files/figure-html/cell-4-output-1.png‎
0 Bytes b/‎docs/01-Problem-Setup_files/figure-html/cell-4-output-1.png‎
0 Bytes
diff --git a/‎docs/01-Problem-Setup_files/figure-html/cell-8-output-1.png‎
9 Bytes b/‎docs/01-Problem-Setup_files/figure-html/cell-8-output-1.png‎
9 Bytes
@@ -281,7 +281,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 </ol>
 <p>Let’s start with the easiest case for just <span class="math inline">\(Hypertension = f(Age)\)</span>, a single predictor.</p>
 <p>Before we fit models, we often visualize the data to get a sense whether our setup makes sense.</p>
-<div id="00a807f6" class="cell" data-execution_count="1">
+<div id="411bb769" class="cell" data-execution_count="1">
 <div class="sourceCode cell-code" id="cb1"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
 <span id="cb1-2"><a href="#cb1-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
 <span id="cb1-3"><a href="#cb1-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
@@ -308,7 +308,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 </div>
 <p>Okay, great, it looks like when someone’s BMI is higher, then it is more likely that the person has Hypertension.</p>
 <p>Now, let’s build the model <span class="math inline">\(Hypertension = f(BMI)\)</span> to make a prediction of <span class="math inline">\(Hyptertension\)</span> given <span class="math inline">\(BMI\)</span>.</p>
-<div id="f0d0e807" class="cell" data-execution_count="2">
+<div id="ecc79aea" class="cell" data-execution_count="2">
 <div class="sourceCode cell-code" id="cb2"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
 <span id="cb2-2"><a href="#cb2-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
 <span id="cb2-3"><a href="#cb2-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
@@ -346,7 +346,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 <p>Instead of boxplots, we plotted the data just using points, with “Hypertension” having a probability of 1 and “No Hypertension” having a probability of 0. We see that we have a fitted line in blue for every value of BMI, which represents our machine learning model <span class="math inline">\(f(BMI)\)</span>. This model is called <strong>Logistic Regression</strong>.</p>
 <p>The first thing we want to investigate about this model is how well it performs in terms of Classification. Just using <span class="math inline">\(BMI\)</span> as a variable, what is the Accuracy of <span class="math inline">\(f(BMI)\)</span> classifying whether a person has <span class="math inline">\(Hypertension\)</span>? Notice that first <span class="math inline">\(f(BMI)\)</span> gives us continuous probability values, such as given a BMI of 30, there is a 20% chance the person has Hypertension. We need a discrete cutoff of this model to decide whether the person has Hypertension.</p>
 <p>A reasonable cutoff to start is 50%: if the probability of having Hypertension is &gt;=50%, then classify that person having Hypertension. Same for &lt; 50%. This is called the <strong>Decision Boundary</strong>.</p>
-<div id="632919dc" class="cell" data-execution_count="3">
+<div id="81aa2b9d" class="cell" data-execution_count="3">
 <div class="sourceCode cell-code" id="cb4"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1" aria-hidden="true" tabindex="-1"></a>plt.clf()</span>
 <span id="cb4-2"><a href="#cb4-2" aria-hidden="true" tabindex="-1"></a>plt.scatter(X.BMI, logit_model.predict(), color<span class="op">=</span><span class="st">"blue"</span>, label<span class="op">=</span><span class="st">"Fitted Line"</span>)</span>
 <span id="cb4-3"><a href="#cb4-3" aria-hidden="true" tabindex="-1"></a>plt.scatter(X.BMI, y, alpha<span class="op">=</span><span class="fl">.3</span>, color<span class="op">=</span><span class="st">"brown"</span>, label<span class="op">=</span><span class="st">"Data"</span>)</span>
@@ -364,7 +364,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 </div>
 </div>
 <p>Given this decision boundary, what is the accuracy?</p>
-<div id="0f3a7d84" class="cell" data-execution_count="4">
+<div id="415f34f5" class="cell" data-execution_count="4">
 <div class="sourceCode cell-code" id="cb5"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb5-1"><a href="#cb5-1" aria-hidden="true" tabindex="-1"></a><span class="im">from</span> sklearn.metrics <span class="im">import</span> (confusion_matrix, accuracy_score)</span>
 <span id="cb5-2"><a href="#cb5-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb5-3"><a href="#cb5-3" aria-hidden="true" tabindex="-1"></a>prediction_cut <span class="op">=</span> [<span class="dv">1</span> <span class="cf">if</span> x <span class="op">&gt;=</span> <span class="fl">.5</span> <span class="cf">else</span> <span class="dv">0</span> <span class="cf">for</span> x <span class="kw">in</span> logit_model.predict()]</span>
@@ -375,7 +375,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 </div>
 <p>Okay, that’s a starting point!</p>
 <p>We can break down classification accuracy to four additional results:</p>
-<div id="18903073" class="cell" data-execution_count="5">
+<div id="81c5fed6" class="cell" data-execution_count="5">
 <div class="sourceCode cell-code" id="cb7"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb7-1"><a href="#cb7-1" aria-hidden="true" tabindex="-1"></a>tn, fp, fn, tp <span class="op">=</span> confusion_matrix(y, prediction_cut).ravel().tolist()</span>
 <span id="cb7-2"><a href="#cb7-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">"True Positive:"</span>, tp, <span class="st">"</span><span class="ch">\n</span><span class="st">False Positive: "</span>, fp, <span class="st">"</span><span class="ch">\n</span><span class="st">True Negative: "</span>, tn, <span class="st">"</span><span class="ch">\n</span><span class="st">False Negative:"</span>, fn)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
@@ -387,7 +387,7 @@ <h2 data-number="2.1" class="anchored" data-anchor-id="classification-model-exam
 </div>
 <p>define tp, fp, tn, fn</p>
 <p>define confusion matrix</p>
-<div id="42a4e264" class="cell" data-execution_count="6">
+<div id="28862d4d" class="cell" data-execution_count="6">
 <div class="sourceCode cell-code" id="cb9"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb9-1"><a href="#cb9-1" aria-hidden="true" tabindex="-1"></a>cm <span class="op">=</span> confusion_matrix(y, prediction_cut) </span>
 <span id="cb9-2"><a href="#cb9-2" aria-hidden="true" tabindex="-1"></a><span class="bu">print</span>(<span class="st">"Confusion Matrix : </span><span class="ch">\n</span><span class="st">"</span>, cm) </span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-stdout">
@@ -485,7 +485,7 @@ <h2 data-number="2.4" class="anchored" data-anchor-id="how-to-evaluate-and-pick-
 <section id="prediction" class="level3" data-number="2.4.1">
 <h3 data-number="2.4.1" class="anchored" data-anchor-id="prediction"><span class="header-section-number">2.4.1</span> Prediction</h3>
 <p>Suppose we try to use the single variable <span class="math inline">\(BMI\)</span> to predict <span class="math inline">\(BloodPressure\)</span> using a linear model.</p>
-<div id="7ca969f2" class="cell" data-execution_count="7">
+<div id="248c4bee" class="cell" data-execution_count="7">
 <div class="sourceCode cell-code" id="cb11"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb11-1"><a href="#cb11-1" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> pandas <span class="im">as</span> pd</span>
 <span id="cb11-2"><a href="#cb11-2" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> seaborn <span class="im">as</span> sns</span>
 <span id="cb11-3"><a href="#cb11-3" aria-hidden="true" tabindex="-1"></a><span class="im">import</span> numpy <span class="im">as</span> np</span>
@@ -517,27 +517,27 @@ <h3 data-number="2.4.1" class="anchored" data-anchor-id="prediction"><span class
 </div>
 <p>We examine how well our model performs in terms of prediction by seeing how close our model’s predicted <span class="math inline">\(BloodPressure\)</span> is to the Training Set’s true <span class="math inline">\(BloodPressure\)</span>: the <strong>Training Error</strong>. We also take the model to the Testing Set to predict <span class="math inline">\(BloodPressure\)</span> using predictors from the Test Set and compare to the true <span class="math inline">\(BloodPressure\)</span> in the Test Set: the <strong>Testing Error.</strong> We want the model’s Training Error to be adequately small on the Training Set, but what we really care about is the Testing Error, because it is a true test of how the model performs on unseen, new data, and allows us to see how generalizeable the model is.</p>
 <p>Okay, let’s how it does on the Training Set:</p>
-<div id="5d4b4820" class="cell" data-execution_count="8">
+<div id="4907bc81" class="cell" data-execution_count="8">
 <div class="sourceCode cell-code" id="cb12"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb12-1"><a href="#cb12-1" aria-hidden="true" tabindex="-1"></a>np.mean((results.fittedvalues <span class="op">-</span> y_train.BloodPressure) <span class="op">**</span> <span class="dv">2</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="8">
-<pre><code>152.36993639963026</code></pre>
+<pre><code>np.float64(152.36993639963026)</code></pre>
 </div>
 </div>
-<div id="287be5fc" class="cell" data-execution_count="9">
+<div id="e32d79f4" class="cell" data-execution_count="9">
 <div class="sourceCode cell-code" id="cb14"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb14-1"><a href="#cb14-1" aria-hidden="true" tabindex="-1"></a>results.mse_resid</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="9">
-<pre><code>152.4417920640486</code></pre>
+<pre><code>np.float64(152.44179206404883)</code></pre>
 </div>
 </div>
 <p>[graph here]</p>
 <p>And then on the Test Set:</p>
-<div id="983fa0c6" class="cell" data-execution_count="10">
+<div id="7bf725d9" class="cell" data-execution_count="10">
 <div class="sourceCode cell-code" id="cb16"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb16-1"><a href="#cb16-1" aria-hidden="true" tabindex="-1"></a>np.mean((results.get_prediction(X_test).predicted_mean <span class="op">-</span> y_test.BloodPressure) <span class="op">**</span> <span class="dv">2</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
 <div class="cell-output cell-output-display" data-execution_count="10">
-<pre><code>155.83019738256442</code></pre>
+<pre><code>np.float64(155.83019738256448)</code></pre>
 </div>
 </div>
-<div id="af1edf69" class="cell" data-execution_count="11">
+<div id="93eea0df" class="cell" data-execution_count="11">
 <div class="sourceCode cell-code" id="cb18"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb18-1"><a href="#cb18-1" aria-hidden="true" tabindex="-1"></a>plt.plot(X_test.BMI, results.get_prediction(X_test).predicted_mean, label<span class="op">=</span><span class="st">"fitted line"</span>)</span>
 <span id="cb18-2"><a href="#cb18-2" aria-hidden="true" tabindex="-1"></a>plt.scatter(X_test.BMI, y_test, alpha<span class="op">=</span><span class="fl">.3</span>, color<span class="op">=</span><span class="st">"black"</span>, label<span class="op">=</span><span class="st">"test set"</span>)</span>
 <span id="cb18-3"><a href="#cb18-3" aria-hidden="true" tabindex="-1"></a>plt.legend()<span class="op">;</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
@@ -551,7 +551,7 @@ <h3 data-number="2.4.1" class="anchored" data-anchor-id="prediction"><span class
 </div>
 <p>We see that the Training Error is fairly high, and the Testing Error is even higher. This is an example of <strong>Underfitting</strong>, where our model failed to capture the complexity of the data in both the Training and Testing Set.</p>
 <p>Let’s return to the drawing board and fit a new type of model that has more flexibility around complicated patterns of data. Let’s see how it does on the Training Set:</p>
-<div id="312d472c" class="cell" data-execution_count="12">
+<div id="ce8cbf11" class="cell" data-execution_count="12">
 <div class="sourceCode cell-code" id="cb19"><pre class="sourceCode python code-with-copy"><code class="sourceCode python"><span id="cb19-1"><a href="#cb19-1" aria-hidden="true" tabindex="-1"></a><span class="co">#y, X = model_matrix("BloodPressure ~ poly(BMI, degree=5)", nhanes)</span></span>
 <span id="cb19-2"><a href="#cb19-2" aria-hidden="true" tabindex="-1"></a></span>
 <span id="cb19-3"><a href="#cb19-3" aria-hidden="true" tabindex="-1"></a><span class="co">#X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.5, random_state=42)</span></span>