Skip to content

Commit 60bda39

Browse files
committed
Update 01-Problem-Setup.qmd
1 parent 348b6fc commit 60bda39

1 file changed

Lines changed: 5 additions & 5 deletions

File tree

01-Problem-Setup.qmd

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@ Suppose that we are given the [**N**ational **H**ealth **A**nd **N**utrition **E
77
Using algebraic expressions, we formulate the following:
88

99
$$
10-
BloodPressure=f(Age, BMI, Income, ...)+\epsilon
10+
BloodPressure=f(Age, BMI, Income)+\epsilon
1111
$$
1212

1313
Where $f(Age, BMI, Income, ...)$ is a machine learning model that takes in the clinical and demographic variables and make a prediction on the $BloodPressure$. This model is not perfect to give the correct prediction, so there is an "error term" $\epsilon$ (Greek letter epsilon) that captures the imperfectness of the model.
@@ -16,11 +16,11 @@ A machine learning model, such as the one described above, has *two main uses:*
1616

1717
1. **Prediction:** How accurately can we predict outcomes?
1818

19-
- Given a new person's $Age, BMI, Income, …$ , predict the person's $BloodPressure$ and compare it to the true value.
19+
- Given a new person's $Age, BMI, Income$ , predict the person's $BloodPressure$ and compare it to the true value.
2020

2121
2. **Inference:** Which predictors are associated with the response, and how strong is the association?
2222

23-
- Suppose the model is described as $BloodPressure = f(Age,BMI,Income,…)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$. Each variable has a relationship to the outcome: an increase of $Age$ by 1 will lead to an increase of $BloodPressure$ by 3. This measures the strength of association between a variable and the outcome.
23+
- Suppose the model is described as $BloodPressure = f(Age,BMI,Income)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$. Each variable has a relationship to the outcome: an increase of $Age$ by 1 will lead to an increase of $BloodPressure$ by 3. This measures the strength of association between a variable and the outcome.
2424

2525
## Population and Sample
2626

@@ -33,7 +33,7 @@ The way we formulate machine learning model is based on some fundamental concept
3333
In Machine Learning problems, we often like to take two, non-overlapping samples from the population: the **Training Set**, and the **Test Set**. We **train** our model using the Training Set, which gives us a function $f()$ that relates the predictors to the outcome. Then, for our main use cases:
3434

3535
1. **Prediction:** We use the trained model to predict the outcome using predictors from the Test Set and compare to the true value in the Test Set.
36-
2. **Inference**: We examine the function $f()$'s trained values, which are called **parameters**. For instance, $f(Age,BMI,Income,…)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$, the values $20$, $3$, $-.2$, and $.00015$ are the parameters. Because these parameters are derived from the Training Set, they are an *estimated* quantity from a sample, similar to other summary statistics like the mean of a sample. Therefore, to say anything about the true population, we have to use statistical tools such as p-values and confidence intervals.
36+
2. **Inference**: We examine the function $f()$'s trained values, which are called **parameters**. For instance, $f(Age,BMI,Income)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$, the values $20$, $3$, $-.2$, and $.00015$ are the parameters. Because these parameters are derived from the Training Set, they are an *estimated* quantity from a sample, similar to other summary statistics like the mean of a sample. Therefore, to say anything about the true population, we have to use statistical tools such as p-values and confidence intervals.
3737

3838
If the concepts of population, sample, estimation, p-value, and confidence interval is new to you, we recommend do a bit of reading here \[todo\].
3939

@@ -43,7 +43,7 @@ The little example model we showcased above is an example of a **linear model**,
4343

4444
### Prediction
4545

46-
Suppose we try to use the variable $BMI$ to predict $BloodPressure$ using a linear model.
46+
Suppose we try to use the single variable $BMI$ to predict $BloodPressure$ using a linear model.
4747

4848
```{python}
4949
import pandas as pd

0 commit comments

Comments
 (0)