Lectures: CSE206_Fa24-15.pdf Lab/Tutorial: none
- This week continues hypothesis testing, focusing on Type I and Type II errors and the trade-off between them.
- For normal models with known variance, explicit formulas for Type I and Type II error probabilities are derived using standard normal quantiles.
- The power function of a test is introduced: the probability that the test correctly rejects the null hypothesis when the alternative is true.
- Tests are called unbiased when their power under any alternative parameter is at least as large as the significance level.
- Tests are called consistent if, for any fixed alternative, the power tends to 1 as sample size grows.
- The lecture uses the normal model as a main example, but the definitions and ideas are general and apply to other models as well.
- Hypotheses.
- The parameter space
$\Theta$ is split into two disjoint parts: $$ H_0:\theta\in\Theta_0,\quad H_1:\theta\in\Theta_1, $$ where$H_0$ is the null hypothesis and$H_1$ is the alternative.
- The parameter space
- Statistic and decision rule.
- A test statistic
$t(\xi_1,\dots,\xi_n)$ is used to decide. - The real line is partitioned into:
- acceptance region
$R_0$ : if$t\in R_0$ , do not reject$H_0$ ; - critical (rejection) region
$R_1$ : if$t\in R_1$ , reject$H_0$ .
- acceptance region
- A test statistic
- Type I and Type II errors.
- Type I error: reject
$H_0$ when$H_0$ is actually true ($\theta\in\Theta_0$ ). - Type II error: fail to reject
$H_0$ when$H_1$ is true ($\theta\in\Theta_1$ ).
- Type I error: reject
- Significance level.
- The test is designed to keep the probability of Type I error below a prescribed level
$\alpha$ : $$ \alpha \ge \sup_{\theta\in\Theta_0}P_\theta(t\in R_1). $$ - In many standard tests, equality is achieved (the “worst-case” Type I error probability is exactly
$\alpha$ ).
- The test is designed to keep the probability of Type I error below a prescribed level
- Model (Example 1.1).
-
$\xi\sim N(\theta,\sigma^2)$ ,$\sigma^2$ known. Sample$\xi_1,\dots,\xi_n$ . - Test statistic: sample mean $$ t(\xi_1,\dots,\xi_n) = \bar{\xi}n = \frac{1}{n}\sum{j=1}^n \xi_j. $$
- Hypotheses:
$H_0:\theta=\theta_0$ ; alternative$H_1:\theta=\theta_1$ with$\theta_1>\theta_0$ (simple alternative). - Critical region of the form
$R_1 = [t_\alpha,\infty)$ .
-
- Type I error probability.
- Under
$H_0$ :$\bar{\xi}_n\sim N(\theta_0,\sigma^2/n)$ . - Standardize: $$ Z=\frac{\sqrt{n}(\bar{\xi}_n-\theta_0)}{\sigma}\sim N(0,1). $$
- Type I error probability is $$ P_{\theta_0}(\bar{\xi}n\ge t\alpha) = P\left(Z\ge \frac{\sqrt{n}(t_\alpha-\theta_0)}{\sigma}\right). $$
- The critical value is chosen so this equals
$\alpha$ , giving $$ t_\alpha = \theta_0 + z_{1-\alpha}\frac{\sigma}{\sqrt{n}}, $$ where$z_{1-\alpha}$ is the standard normal quantile with right-tail area$\alpha$ .
- Under
- Type II error probability (for a simple alternative
$\theta_1$ ).- Under
$H_1$ :$\bar{\xi}_n\sim N(\theta_1,\sigma^2/n)$ . - Type II error is $$ \beta(\theta_1) = P_{\theta_1}(\bar{\xi}n < t\alpha) = P\left(\frac{\sqrt{n}(\bar{\xi}n-\theta_1)}{\sigma} < \frac{\sqrt{n}(t\alpha-\theta_1)}{\sigma}\right). $$
- In terms of
$Z\sim N(0,1)$ : $$ \beta(\theta_1) = P\left(Z < \frac{\sqrt{n}(t_\alpha-\theta_1)}{\sigma}\right). $$ - Plug
$t_\alpha=\theta_0 + z_{1-\alpha}\sigma/\sqrt{n}$ : $$ \frac{\sqrt{n}(t_\alpha-\theta_1)}{\sigma} = z_{1-\alpha} - \frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}, $$ so $$ \beta(\theta_1) = P\left(Z < z_{1-\alpha} - \frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}\right). $$
- Under
- Intuition / mental model.
-
$t_\alpha$ is chosen to control the Type I error probability at$\alpha$ ; this simultaneously determines the Type II error probabilities for specific alternative values. - Larger separation
$\theta_1-\theta_0$ or larger sample size$n$ reduces$\beta(\theta_1)$ (the chance of missing a real difference).
-
- Plain-language definition.
- The power function of a test,
$W(\theta)$ , tells you how likely the test is to reject the null hypothesis for each possible parameter value. - For parameters in
$\Theta_1$ (where the alternative holds), higher power means a better test (less likely to miss a real effect).
- The power function of a test,
- Formal definition.
- For a given critical region
$R_1$ : $$ \beta(\theta) = P_\theta(t\in R_0) = P_\theta(\text{Type II error}),\quad \theta\in\Theta_1, $$ and $$ W(\theta) = 1-\beta(\theta) = P_\theta(t\in R_1),\quad \theta\in\Theta_1. $$ -
$W(\theta)$ is the power function.
- For a given critical region
- Example (normal, composite alternative).
- With
$H_0:\theta=\theta_0$ and composite alternative$H_1:\theta>\theta_0$ , critical region$R_1 = [t_\alpha,\infty)$ , the power function is $$ W(\theta) = P_\theta(\bar{\xi}n\ge t\alpha) = P\left(Z\ge \frac{\sqrt{n}(t_\alpha-\theta)}{\sigma}\right) = \Phi\left(\frac{\sqrt{n}(\theta-\theta_0)}{\sigma} - z_{1-\alpha}\right), $$ where$\Phi$ is the standard normal cdf.
- With
- Intuition / mental model.
-
$W(\theta)$ is small near the boundary between null and alternative and increases as$\theta$ moves farther into the alternative region, ideally approaching 1.
-
- Unbiased test.
- A test is called unbiased if its power under any alternative is at least as large as its significance level.
- Formal condition: if $$ W(\theta)\ge \alpha\quad \text{for all }\theta\in\Theta_1, $$ then the test is unbiased.
- Equivalent statements (from lecture):
- The probability of not committing Type II error is at least
$\alpha$ for all$\theta\in\Theta_1$ . - The probability of falling in the critical region
$R_1$ is at least$\alpha$ whenever$H_1$ is true. - The probability of correct rejection (when
$H_1$ is true) is at least as large as the probability of wrong rejection (when$H_0$ is true).
- The probability of not committing Type II error is at least
- Consistent test.
- A test is consistent if, for any fixed alternative
$\theta\in\Theta_1$ , the power tends to 1 as the sample size$n\to\infty$ : $$ W(\theta)\to 1\quad (n\to\infty). $$
- A test is consistent if, for any fixed alternative
- Intuition / mental model.
- Unbiasedness ensures the test is not systematically “weak” under the alternative relative to its false-alarm rate.
- Consistency ensures that with enough data, the test almost certainly detects any fixed difference from the null.
- Given
$\xi_1,\dots,\xi_n\sim N(\theta,\sigma^2)$ , testing$H_0:\theta=\theta_0$ vs$H_1:\theta>\theta_0$ :- Test statistic:
$\bar{\xi}_n$ . - Standardization under
$H_0$ : $$ Z = \frac{\sqrt{n}(\bar{\xi}_n-\theta_0)}{\sigma}\sim N(0,1). $$ - Choose critical value $$ t_\alpha = \theta_0 + z_{1-\alpha}\frac{\sigma}{\sqrt{n}} $$ so that $$ P_{\theta_0}(\bar{\xi}n\ge t\alpha) = \alpha. $$
- Test statistic:
- When to use it.
- In one-sided normal-approximation tests for means with known variance, to set the decision threshold corresponding to a desired significance level.
- For alternative
$\theta_1>\theta_0$ :- Type II error probability:
$$
\beta(\theta_1) = P_{\theta_1}(\bar{\xi}n< t\alpha)
= P\left(Z < z_{1-\alpha} - \frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}\right),
$$
with
$Z\sim N(0,1)$ . - Power: $$ W(\theta_1) = 1-\beta(\theta_1) = \Phi\left(\frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma} - z_{1-\alpha}\right). $$
- Type II error probability:
$$
\beta(\theta_1) = P_{\theta_1}(\bar{\xi}n< t\alpha)
= P\left(Z < z_{1-\alpha} - \frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}\right),
$$
with
- When to use it.
- To calculate the probability of correctly rejecting the null hypothesis under specific alternative values and to understand how sample size and effect size affect power.
- Setup.
-
$\xi_1,\dots,\xi_n\sim N(\theta,\sigma^2)$ ,$\sigma^2$ known. - Test
$H_0:\theta=\theta_0$ vs$H_1:\theta=\theta_1$ with$\theta_1>\theta_0$ . - Test statistic: $\bar{\xi}n$. Critical region: $R_1=[t\alpha,\infty)$.
-
- Step 1: choose
$t_\alpha$ via Type I error condition.- Under
$H_0$ : $$ \bar{\xi}_n\sim N(\theta_0,\sigma^2/n),\quad Z=\frac{\sqrt{n}(\bar{\xi}_n-\theta_0)}{\sigma}\sim N(0,1). $$ - Require $P_{\theta_0}(\bar{\xi}n\ge t\alpha)=\alpha$, so $$ \alpha = P\left(Z\ge \frac{\sqrt{n}(t_\alpha-\theta_0)}{\sigma}\right) = P(Z\ge z_{1-\alpha}), $$ leading to $$ t_\alpha = \theta_0 + z_{1-\alpha}\frac{\sigma}{\sqrt{n}}. $$
- Under
- Step 2: compute
$\beta(\theta_1)$ for the simple alternative.- Under
$H_1$ :$\bar{\xi}_n\sim N(\theta_1,\sigma^2/n)$ . - Type II error probability: $$ \beta(\theta_1) = P_{\theta_1}(\bar{\xi}n<t\alpha) = P\left(\frac{\sqrt{n}(\bar{\xi}n-\theta_1)}{\sigma} < \frac{\sqrt{n}(t\alpha-\theta_1)}{\sigma}\right). $$
- Since the standardized variable is
$N(0,1)$ , $$ \beta(\theta_1) = P\left(Z< z_{1-\alpha} - \frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}\right), $$ and$W(\theta_1)=1-\beta(\theta_1)$ .
- Under
- Check your intuition.
- For fixed
$\theta_1-\theta_0>0$ , as$n$ increases,$\frac{\sqrt{n}(\theta_1-\theta_0)}{\sigma}$ grows,$\beta(\theta_1)$ shrinks, and$W(\theta_1)$ approaches 1: with enough data, the test almost surely detects the difference.
- For fixed
- Setup.
- Same as above but with composite alternative
$H_1:\theta>\theta_0$ . - Critical region:
$R_1=[t_\alpha,\infty)$ with$t_\alpha = \theta_0 + z_{1-\alpha}\sigma/\sqrt{n}$ .
- Same as above but with composite alternative
- Step 1: write power for general
$\theta>\theta_0$ .- For any
$\theta\in \Theta_1=(\theta_0,\infty)$ : $$ W(\theta) = P_\theta(\bar{\xi}n\ge t\alpha) = P\left(\frac{\sqrt{n}(\bar{\xi}n-\theta)}{\sigma}\ge \frac{\sqrt{n}(t\alpha-\theta)}{\sigma}\right). $$ - With
$Z\sim N(0,1)$ : $$ W(\theta) = P\left(Z\ge z_{1-\alpha} - \frac{\sqrt{n}(\theta-\theta_0)}{\sigma}\right) = \Phi\left(\frac{\sqrt{n}(\theta-\theta_0)}{\sigma} - z_{1-\alpha}\right). $$
- For any
- Step 2: discuss unbiasedness and consistency.
- For
$\theta=\theta_0$ , the power equals$\alpha$ by construction. - For
$\theta>\theta_0$ , the argument of$\Phi$ is larger than$-z_{1-\alpha}$ , so$W(\theta)\ge\alpha$ , confirming the test is unbiased. - For any fixed
$\theta>\theta_0$ , as$n\to\infty$ ,$\frac{\sqrt{n}(\theta-\theta_0)}{\sigma}\to\infty$ , so$W(\theta)\to 1$ : the test is consistent.
- For
- Check your intuition.
- An unbiased test gives at least as much probability of rejecting
$H_0$ under any alternative as under$H_0$ ; a consistent test will almost certainly reject$H_0$ for any fixed true alternative parameter when enough data are available.
- An unbiased test gives at least as much probability of rejecting
- Hypothesis testing compares a null
$H_0$ and alternative$H_1$ using a test statistic and a critical region; decisions lead to either correct outcomes or Type I/II errors. - The significance level
$\alpha$ is an upper bound on the Type I error probability; in many tests, critical values are chosen so this bound is achieved exactly. - For normal models with known variance, Type I and Type II error probabilities can be computed explicitly via standard normal quantiles.
- The power function
$W(\theta) = P_\theta(\text{reject }H_0)$ quantifies how effective a test is at detecting deviations from$H_0$ across the alternative parameter space. - A test is unbiased if its power is at least
$\alpha$ for every parameter value in the alternative; this means it is never systematically weaker under the alternative than under the null. - A test is consistent if, for any fixed alternative parameter, the power approaches 1 as the sample size grows, ensuring that large samples make true differences almost surely noticeable.
- The week 15 lecture uses the normal mean test to derive these ideas concretely, but the definitions and concepts generalize to a wide range of statistical testing problems.