You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: 01-Problem-Setup.qmd
+5-5Lines changed: 5 additions & 5 deletions
Original file line number
Diff line number
Diff line change
@@ -7,7 +7,7 @@ Suppose that we are given the [**N**ational **H**ealth **A**nd **N**utrition **E
7
7
Using algebraic expressions, we formulate the following:
8
8
9
9
$$
10
-
BloodPressure=f(Age, BMI, Income, ...)+\epsilon
10
+
BloodPressure=f(Age, BMI, Income)+\epsilon
11
11
$$
12
12
13
13
Where $f(Age, BMI, Income, ...)$ is a machine learning model that takes in the clinical and demographic variables and make a prediction on the $BloodPressure$. This model is not perfect to give the correct prediction, so there is an "error term" $\epsilon$ (Greek letter epsilon) that captures the imperfectness of the model.
@@ -16,11 +16,11 @@ A machine learning model, such as the one described above, has *two main uses:*
16
16
17
17
1.**Prediction:** How accurately can we predict outcomes?
18
18
19
-
- Given a new person's $Age, BMI, Income, …$ , predict the person's $BloodPressure$ and compare it to the true value.
19
+
- Given a new person's $Age, BMI, Income$ , predict the person's $BloodPressure$ and compare it to the true value.
20
20
21
21
2.**Inference:** Which predictors are associated with the response, and how strong is the association?
22
22
23
-
- Suppose the model is described as $BloodPressure = f(Age,BMI,Income,…)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$. Each variable has a relationship to the outcome: an increase of $Age$ by 1 will lead to an increase of $BloodPressure$ by 3. This measures the strength of association between a variable and the outcome.
23
+
- Suppose the model is described as $BloodPressure = f(Age,BMI,Income)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$. Each variable has a relationship to the outcome: an increase of $Age$ by 1 will lead to an increase of $BloodPressure$ by 3. This measures the strength of association between a variable and the outcome.
24
24
25
25
## Population and Sample
26
26
@@ -33,7 +33,7 @@ The way we formulate machine learning model is based on some fundamental concept
33
33
In Machine Learning problems, we often like to take two, non-overlapping samples from the population: the **Training Set**, and the **Test Set**. We **train** our model using the Training Set, which gives us a function $f()$ that relates the predictors to the outcome. Then, for our main use cases:
34
34
35
35
1.**Prediction:** We use the trained model to predict the outcome using predictors from the Test Set and compare to the true value in the Test Set.
36
-
2.**Inference**: We examine the function $f()$'s trained values, which are called **parameters**. For instance, $f(Age,BMI,Income,…)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$, the values $20$, $3$, $-.2$, and $.00015$ are the parameters. Because these parameters are derived from the Training Set, they are an *estimated* quantity from a sample, similar to other summary statistics like the mean of a sample. Therefore, to say anything about the true population, we have to use statistical tools such as p-values and confidence intervals.
36
+
2.**Inference**: We examine the function $f()$'s trained values, which are called **parameters**. For instance, $f(Age,BMI,Income)=20 + 3 \cdot Age - .2 \cdot BMI + .00015 \cdot Income$, the values $20$, $3$, $-.2$, and $.00015$ are the parameters. Because these parameters are derived from the Training Set, they are an *estimated* quantity from a sample, similar to other summary statistics like the mean of a sample. Therefore, to say anything about the true population, we have to use statistical tools such as p-values and confidence intervals.
37
37
38
38
If the concepts of population, sample, estimation, p-value, and confidence interval is new to you, we recommend do a bit of reading here \[todo\].
39
39
@@ -43,7 +43,7 @@ The little example model we showcased above is an example of a **linear model**,
43
43
44
44
### Prediction
45
45
46
-
Suppose we try to use the variable $BMI$ to predict $BloodPressure$ using a linear model.
46
+
Suppose we try to use the single variable $BMI$ to predict $BloodPressure$ using a linear model.
0 commit comments