OpenIntroStat
diff --git a/‎.nojekyll‎
Lines changed: 1 addition & 1 deletion b/‎.nojekyll‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎_redirects‎
Lines changed: 1 addition & 0 deletions b/‎_redirects‎
Lines changed: 1 addition & 0 deletions
diff --git a/‎data-design.html‎
Lines changed: 2 additions & 2 deletions b/‎data-design.html‎
Lines changed: 2 additions & 2 deletions
diff --git a/‎foundations-randomization.html‎
Lines changed: 1 addition & 1 deletion b/‎foundations-randomization.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎inf-model-logistic.html‎
Lines changed: 1 addition & 1 deletion b/‎inf-model-logistic.html‎
Lines changed: 1 addition & 1 deletion
diff --git a/‎inf-model-mlr.html‎
Lines changed: 4 additions & 4 deletions b/‎inf-model-mlr.html‎
Lines changed: 4 additions & 4 deletions
@@ -1 +1 @@
-13a3e4c6
+c4e819b6
@@ -26,3 +26,4 @@
 /25-inf-model-mlr					      /inf-model-mlr
 /26-inf-model-logistic				  /inf-model-logistic
 /27-inf-model-applications			/inf-model-applications
+/                               https://openintrostat.github.io/ims/
@@ -836,7 +836,7 @@ <h1 class="title"><span id="sec-data-design" class="quarto-section-identifier"><
 <p>No! Some previous research tells us that using sunscreen actually reduces skin cancer risk, so maybe there is another variable that can explain this hypothetical association between sunscreen usage and skin cancer, as shown in <a href="#fig-sun-causes-cancer" class="quarto-xref">Figure&nbsp;<span>2.7</span></a>. One important piece of information that is absent is sun exposure. If someone is out in the sun all day, they are more likely to use sunscreen <em>and</em> more likely to get skin cancer. Exposure to the sun is unaccounted for in the simple observational investigation.</p>
 <div class="cell" data-layout-align="center">
 <div class="cell-output-display">
-<div id="fig-sun-causes-cancer" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-align="center" alt="Three boxes are shown in a triangle arrangement representing: sun exposure, using sunscreen, and skin cancer. A solid arrow connects sun exposure as a causal mechanism to using sunscreen; a solid arrow also connects sun exposure as a causal mechanism to skin cancer. A questioning arrow indicates that the causal effect of using sunscreen on skin cancer is unknown. ">
+<div id="fig-sun-causes-cancer" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Three boxes are shown in a triangle arrangement representing: sun exposure, using sunscreen, and skin cancer. A solid arrow connects sun exposure as a causal mechanism to using sunscreen; a solid arrow also connects sun exposure as a causal mechanism to skin cancer. A questioning arrow indicates that the causal effect of using sunscreen on skin cancer is unknown. " data-fig-align="center">
 <figure class="quarto-float quarto-float-fig figure"><div aria-describedby="fig-sun-causes-cancer-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
 <a href="data-design_files/figure-html/fig-sun-causes-cancer-1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-7" title="Figure&nbsp;2.7: Sun exposure may be the root cause of both sunscreen use and skin cancer."><img src="data-design_files/figure-html/fig-sun-causes-cancer-1.png" class="img-fluid quarto-figure quarto-figure-center figure-img" style="width:60.0%" alt="Three boxes are shown in a triangle arrangement representing: sun exposure, using sunscreen, and skin cancer. A solid arrow connects sun exposure as a causal mechanism to using sunscreen; a solid arrow also connects sun exposure as a causal mechanism to skin cancer. A questioning arrow indicates that the causal effect of using sunscreen on skin cancer is unknown. "></a>
 </div>
@@ -858,7 +858,7 @@ <h1 class="title"><span id="sec-data-design" class="quarto-section-identifier"><
 <p>A proficient analyst will have a good sense of the types of data they are working with and how to visualize the data in order to gain a complete understanding of the variables. Equally important, however, is the data source. In this chapter, we have discussed randomized experiments and taking good, random, representative samples from a population. When we discuss inferential methods (starting in <a href="foundations-randomization.html" class="quarto-xref"><span>Chapter 11</span></a>), the conclusions that can be drawn will be dependent on how the data were collected. <a href="#fig-randsampValloc" class="quarto-xref">Figure&nbsp;<span>2.8</span></a> summarizes how sampling and assignment methods relate to the scope of inference.<a href="#fn10" class="footnote-ref" id="fnref10" role="doc-noteref"><sup>10</sup></a> Regularly revisiting <a href="#fig-randsampValloc" class="quarto-xref">Figure&nbsp;<span>2.8</span></a> will be important when making conclusions from a given data analysis.</p>
 <div class="cell">
 <div class="cell-output-display">
-<div id="fig-randsampValloc" class="quarto-float quarto-figure quarto-figure-center anchored" alt="A two by two table describing the scenarios of random sample or not and random allocation or not. Selecting randomly from a population allows for generalization back to the population. Randomly allocating in an experiment allows for establishing causation. " data-fig-pos="H">
+<div id="fig-randsampValloc" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-pos="H" alt="A two by two table describing the scenarios of random sample or not and random allocation or not. Selecting randomly from a population allows for generalization back to the population. Randomly allocating in an experiment allows for establishing causation. ">
 <figure class="quarto-float quarto-float-fig figure"><div aria-describedby="fig-randsampValloc-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
 <a href="images/randsampValloc.png" class="lightbox" data-gallery="quarto-lightbox-gallery-8" title="Figure&nbsp;2.8: Analysis conclusions should be made carefully according to how the data were collected. Very few datasets come from the top left box because usually ethics require that random assignment of treatments can only be given to volunteers. Both representative (ideally random) sampling and experiments (random assignment of treatments) are important for how statistical conclusions can be made on populations."><img src="images/randsampValloc.png" class="img-fluid figure-img" style="width:96.0%" data-fig-pos="H" alt="A two by two table describing the scenarios of random sample or not and random allocation or not. Selecting randomly from a population allows for generalization back to the population. Randomly allocating in an experiment allows for establishing causation. "></a>
 </div>
 
@@ -982,7 +982,7 @@ <h1 class="title"><span id="sec-foundations-randomization" class="quarto-section
 <p>It might be a little easier to review the results using a visualization. <a href="#fig-opportunity-cost-obs-bar" class="quarto-xref">Figure&nbsp;<span>11.5</span></a> shows that a higher proportion of students in the treatment group chose not to buy the video compared to those in the control group.</p>
 <div class="cell">
 <div class="cell-output-display">
-<div id="fig-opportunity-cost-obs-bar" class="quarto-float quarto-figure quarto-figure-center anchored" alt="Stacked bar plot with groups of control and treatment and filled using the proportion who did and did not buy the video. 74% of the control group bought the video as compared with a little over 50% of the treatment group who bought the video. " data-fig-pos="H">
+<div id="fig-opportunity-cost-obs-bar" class="quarto-float quarto-figure quarto-figure-center anchored" data-fig-pos="H" alt="Stacked bar plot with groups of control and treatment and filled using the proportion who did and did not buy the video. 74% of the control group bought the video as compared with a little over 50% of the treatment group who bought the video. ">
 <figure class="quarto-float quarto-float-fig figure"><div aria-describedby="fig-opportunity-cost-obs-bar-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
 <a href="foundations-randomization_files/figure-html/fig-opportunity-cost-obs-bar-1.png" class="lightbox" data-gallery="quarto-lightbox-gallery-5" title="Figure&nbsp;11.5: Stacked bar plot of results of the opportunity cost study."><img src="foundations-randomization_files/figure-html/fig-opportunity-cost-obs-bar-1.png" class="img-fluid figure-img" style="width:90.0%" data-fig-pos="H" alt="Stacked bar plot with groups of control and treatment and filled using the proportion who did and did not buy the video. 74% of the control group bought the video as compared with a little over 50% of the treatment group who bought the video. "></a>
 </div>
 
@@ -712,7 +712,7 @@ <h1 class="title"><span id="sec-inf-model-logistic" class="quarto-section-identi
 </div>
 <section id="model-diagnostics" class="level2" data-number="26.1"><h2 data-number="26.1" class="anchored" data-anchor-id="model-diagnostics">
 <span class="header-section-number">26.1</span> Model diagnostics</h2>
-<p>Before looking at the hypothesis tests associated with the coefficients (turns out they are very similar to those in linear regression!), it is valuable to understand the technical conditions that underlie the inference applied to the logistic regression model. Generally, as you’ve seen in the logistic regression modeling examples, it is imperative that the response variable is binary. Additionally, the key technical condition for logistic regression has to do with the relationship between the predictor variables <span class="math inline">\((x_i\)</span> values) and the probability the outcome will be a success. It turns out, the relationship is a specific functional form called a logit function, where <span class="math inline">\({\rm logit}(p) = \log_e(\frac{p}{1-p}).\)</span> The function may feel complicated, and memorizing the formula of the logit is not necessary for understanding logistic regression. What you do need to remember is that the probability of the outcome being a success is a function of a linear combination of the explanatory variables.</p>
+<p>Before looking at the hypothesis tests associated with the coefficients (turns out they are very similar to those in linear regression!), it is valuable to understand the technical conditions that underlie the inference applied to the logistic regression model. Generally, as you’ve seen in the logistic regression modeling examples, it is imperative that the response variable is binary. Additionally, the key technical condition for logistic regression has to do with the relationship between the predictor variables (<span class="math inline">\(x_i\)</span> values) and the probability the outcome will be a success. It turns out, the relationship is a specific functional form called a logit function, where <span class="math inline">\({\rm logit}(p) = \log_e(\frac{p}{1-p}).\)</span> The function may feel complicated, and memorizing the formula of the logit is not necessary for understanding logistic regression. What you do need to remember is that the probability of the outcome being a success is a function of a linear combination of the explanatory variables.</p>
 <div class="important">
 <p><strong>Logistic regression conditions.</strong></p>
 <p></p>
 
@@ -639,7 +639,7 @@ <h1 class="title"><span id="sec-inf-model-mlr" class="quarto-section-identifier"
 <span class="header-section-number">25.1</span> Multiple regression output from software</h2>
 <p>Recall the <code>loans</code> data from <a href="model-mlr.html" class="quarto-xref">Chapter&nbsp;<span>8</span></a>.</p>
 <div class="data">
-<p>The <a href="http://openintrostat.github.io/openintro/reference/loans_full_schema.html"><code>loans_full_schema</code></a> data can be found in the <a href="http://openintrostat.github.io/openintro"><strong>openintro</strong></a> R package. Based on the data in this dataset we have created two new variables: <code>credit_util</code> which is calculated as the total credit utilized divided by the total credit limit and <code>bankruptcy</code> which turns the number of bankruptcies to an indicator variable (0 for no bankruptcies and 1 for at least 1 bankruptcies). We will refer to this modified dataset as <code>loans</code>.</p>
+<p>The <a href="http://openintrostat.github.io/openintro/reference/loans_full_schema.html"><code>loans_full_schema</code></a> data can be found in the <a href="http://openintrostat.github.io/openintro"><strong>openintro</strong></a> R package. Based on the data in this dataset we have created two new variables: <code>credit_util</code> which is calculated as the total credit utilized divided by the total credit limit and <code>bankruptcy</code> which turns the number of bankruptcies to an indicator variable (0 for no bankruptcies and 1 for at least 1 bankruptcy). We will refer to this modified dataset as <code>loans</code>.</p>
 </div>
 <p>Now, our goal is to create a model where <code>interest_rate</code> can be predicted using the variables <code>debt_to_income</code>, <code>term</code>, and <code>credit_checks</code>. As you learned in <a href="model-mlr.html" class="quarto-xref"><span>Chapter 8</span></a>, least squares can be used to find the coefficient estimates for the linear model. The unknown population model can be written as:</p>
 <p><span class="math display">\[
@@ -767,7 +767,7 @@ <h1 class="title"><span id="sec-inf-model-mlr" class="quarto-section-identifier"
 </div>
 </div>
 <figcaption class="quarto-float-caption-bottom quarto-float-caption quarto-float-fig" id="fig-coinfig-caption-0ceaefa1-69ba-4598-a22c-09a6ac19f8ca">
-Figure&nbsp;25.2: Two plots describing the total amount of money (USD) as a function of the total number of coins or low coins. As you might expect, the total amount of money is more highly postively correlated with the total number of coins than with the number of low coins.
+Figure&nbsp;25.2: Two plots describing the total amount of money (USD) as a function of the total number of coins or low coins. As you might expect, the total amount of money is more highly positively correlated with the total number of coins than with the number of low coins.
 </figcaption></figure>
 </div>
 <p>Using the total <code>number_of_coins</code> as the predictor variable, <a href="#tbl-coinhigh" class="quarto-xref">Table&nbsp;<span>25.2</span></a> provides the least squares estimate of the coefficient is 0.13. For every additional coin in the dish, we would predict that the student had US$0.13 more. The <span class="math inline">\(b_1 = 0.13\)</span> coefficient has a small p-value associated with it, suggesting we would not have seen data like this if <code>number_of_coins</code> and <code>total_amount</code> of money were not linearly related.</p>
@@ -959,7 +959,7 @@ <h1 class="title"><span id="sec-inf-model-mlr" class="quarto-section-identifier"
 </div>
 </div>
 <div class="data">
-<p>The <a href="https://allisonhorst.github.io/palmerpenguins/articles/intro.html"><code>penguins</code></a> data can be found in the <a href="https://github.com/allisonhorst/palmerpenguins"><strong>palmerpenguings</strong></a> R package.</p>
+<p>The <a href="https://allisonhorst.github.io/palmerpenguins/articles/intro.html"><code>penguins</code></a> data can be found in the <a href="https://github.com/allisonhorst/palmerpenguins"><strong>palmerpenguins</strong></a> R package.</p>
 </div>
 <p>Our goal in this section is to compare two different regression models which both seek to predict the mass of an individual penguin in grams. The observations of three different penguin species include measurements on body size and sex. The data were collected by <a href="https://www.uaf.edu/cfos/people/faculty/detail/kristen-gorman.php">Dr.&nbsp;Kristen Gorman</a> and the <a href="https://pal.lternet.edu/">Palmer Station, Antarctica LTER</a> as part of the <a href="https://lternet.edu/">Long Term Ecological Research Network</a>. <span class="citation" data-cites="Gorman:2014">(<a href="references.html#ref-Gorman:2014" role="doc-biblioref">Gorman, Williams, and Fraser 2014</a>)</span> Although not exactly aligned with this research project, you might be able to imagine a setting where the dimensions of the penguin are known (through, for example, aerial photographs) but the mass is not known. The first model predicts <code>body_mass_g</code> by using only the <code>bill_length_mm</code>, a variable denoting the length of a penguin’s bill, in mm. The second model predicts <code>body_mass_g</code> by using <code>bill_length_mm</code>, <code>bill_depth_mm</code>, <code>flipper_length_mm</code>, <code>sex</code>, and <code>species</code>.</p>
 <div class="important">
@@ -971,7 +971,7 @@ <h1 class="title"><span id="sec-inf-model-mlr" class="quarto-section-identifier"
 <section id="comparing-two-models-to-predict-body-mass-in-penguins" class="level3" data-number="25.3.1"><h3 data-number="25.3.1" class="anchored" data-anchor-id="comparing-two-models-to-predict-body-mass-in-penguins">
 <span class="header-section-number">25.3.1</span> Comparing two models to predict body mass in penguins</h3>
 <p>The question we will seek to answer is whether the predictions of <code>body_mass_g</code> are substantially better when <code>bill_length_mm</code>, <code>bill_depth_mm</code>, <code>flipper_length_mm</code>, <code>sex</code>, and <code>species</code> are used in the model, as compared with a model on <code>bill_length_mm</code> only.</p>
-<p>We refer to the model given with only <code>bill_lengh_mm</code> as the <strong>smaller</strong> model. It is seen in <a href="#tbl-peng-lm-bill" class="quarto-xref">Table&nbsp;<span>25.5</span></a> with coefficient estimates of the parameters as well as standard errors and p-values. We refer to the model given with <code>bill_lengh_mm</code>, <code>bill_depth_mm</code>, <code>flipper_length_mm</code>, <code>sex</code>, and <code>species</code> as the <strong>larger</strong> model. It is seen in <a href="#tbl-peng-lm-all" class="quarto-xref">Table&nbsp;<span>25.6</span></a> with coefficient estimates of the parameters as well as standard errors and p-values. Given what we know about high correlations between body measurements, it is somewhat unsurprising that all of the variables have low p-values, suggesting that each variable is a statistically discernible predictor of <code>body_mass_g</code>, given all other variables in the model. However, in this section, we will go beyond the use of p-values to consider independent predictions of <code>body_mass_g</code> as a way to compare the smaller and larger models.</p>
+<p>We refer to the model given with only <code>bill_length_mm</code> as the <strong>smaller</strong> model. It is seen in <a href="#tbl-peng-lm-bill" class="quarto-xref">Table&nbsp;<span>25.5</span></a> with coefficient estimates of the parameters as well as standard errors and p-values. We refer to the model given with <code>bill_length_mm</code>, <code>bill_depth_mm</code>, <code>flipper_length_mm</code>, <code>sex</code>, and <code>species</code> as the <strong>larger</strong> model. It is seen in <a href="#tbl-peng-lm-all" class="quarto-xref">Table&nbsp;<span>25.6</span></a> with coefficient estimates of the parameters as well as standard errors and p-values. Given what we know about high correlations between body measurements, it is somewhat unsurprising that all of the variables have low p-values, suggesting that each variable is a statistically discernible predictor of <code>body_mass_g</code>, given all other variables in the model. However, in this section, we will go beyond the use of p-values to consider independent predictions of <code>body_mass_g</code> as a way to compare the smaller and larger models.</p>
 <p><strong>The smaller model:</strong></p>
 <p><span class="math display">\[
 \begin{aligned}