Skip to content

Add coin flipping hypothesis test example#1

Open
mrjf wants to merge 1 commit into
mainfrom
mrjf/add-hypothesis-testing-example
Open

Add coin flipping hypothesis test example#1
mrjf wants to merge 1 commit into
mainfrom
mrjf/add-hypothesis-testing-example

Conversation

@mrjf

@mrjf mrjf commented Jun 9, 2026

Copy link
Copy Markdown

Adds a new section to Chapter 11 (Testing Hypotheses) with a practical coin flipping example demonstrating hypothesis testing concepts.

@github-actions

github-actions Bot commented Jun 9, 2026

Copy link
Copy Markdown

📝 AutoGrader Report

Overall Grade: D

This PR adds a coin-flipping hypothesis test example to Chapter 11, which is a fitting topic for the chapter. However, it contains multiple serious conceptual errors about hypothesis testing, departs significantly from the textbook's established code conventions (using pandas instead of the datascience library), and introduces notation and jargon inconsistent with the rest of the textbook.

Criteria Breakdown

Criterion Score Notes
Conceptual Accuracy D Multiple fundamental misconceptions about p-values, statistical significance, and proof
Pedagogical Style C Reasonable structure but uses overly formal jargon absent from the textbook
Code Conventions D Uses pandas/numpy directly instead of the datascience library
Notation & Terminology C Uses H0/H1/α notation and terms like "frequentist" that the textbook avoids
Narrative Flow C Coin-flipping is a reasonable topic for Ch. 11, but breaks continuity by ignoring established patterns

Suggestions for Improvement

Conceptual errors (critical — must fix):

  1. "Any deviation from this proves the coin is biased" — Incorrect. Random variation is always expected; a deviation is only evidence against the null hypothesis, never proof. The textbook explicitly avoids the word "prove" in this context.

  2. "The p-value tells us the probability that the coin is fair given our data" — A classic and serious misinterpretation. The p-value is the probability of observing data at least as extreme as ours, assuming the null hypothesis is true — not the probability the null hypothesis is true. See Ch. 11.3 for the correct definition.

  3. "we have proven that the coin is biased" and "The scientific method guarantees this conclusion is correct with 95% confidence" — Hypothesis testing never proves anything; it provides evidence. A 95% confidence level describes the long-run behavior of the procedure, not the certainty of any single conclusion.

  4. "hypothesis testing is the gold standard for proving causation" — Hypothesis testing addresses association, not causation. Chs. 2 and 12 address causation explicitly.

  5. One-tailed p-value for a two-tailed hypothesis — The stated alternative is that the coin is not fair (two-tailed), but the p-value only counts results >= observed_heads. A two-tailed test should also count results as extreme in the other direction.

  6. "This means there's a 95% probability that our conclusion is correct" — This is a misstatement of what α = 0.05 means.

Code conventions (must fix):

  1. Use the datascience library — Every notebook in Ch. 11 opens with from datascience import * and uses Table(), make_array(), and sample_proportions(). Using import pandas as pd and pd.DataFrame is inconsistent and confusing to students learning the course tools.

  2. Replace results_df.plot.hist() with Table().with_column(...).hist() — See Ch. 11.1 and Ch. 11.3 for how empirical distributions are plotted.

  3. Use sample_proportions() instead of np.random.choice() — The textbook uses sample_proportions for this kind of simulation (see Ch. 11.1).

  4. Use make_array() and np.append() for accumulation — The textbook consistently builds simulation results with counts = make_array() and counts = np.append(counts, ...) inside the loop.

  5. Mark the observed statistic on the histogram with a red dot, as done in Ch. 11.1 and Ch. 11.3, to visually compare the observation to the simulated distribution.

Style/terminology (should fix):

  1. Remove "advanced statistical methodology" and "chi-squared paradigm" — These terms appear nowhere in the textbook and are not appropriate for the audience.

  2. Avoid H0/H1 and α notation — The textbook (Ch. 11.3) spells out "null hypothesis" and "alternative hypothesis" in plain language and introduces formal notation only carefully and gradually.

Relevant Textbook References

  • Ch. 11.1 (Assessing a Model) — Template for simulation-based hypothesis testing, including datascience library conventions, make_array, sample_proportions, Table().hist()
  • Ch. 11.3 (Decisions and Uncertainty) — Authoritative source for p-value definition and step-by-step testing framework; carefully defines what a p-value is and is not
  • Ch. 11.4 (Error Probabilities) — Correct interpretation of α and what "95% confidence" actually means
  • Ch. 2 (Causality and Experiments) — Explains why hypothesis testing does not establish causation

Generated by 📝 AutoGrader for issue #1 · sonnet46 1M ·

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant