You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This PR adds a coin-flipping hypothesis test example to Chapter 11, which is a fitting topic for the chapter. However, it contains multiple serious conceptual errors about hypothesis testing, departs significantly from the textbook's established code conventions (using pandas instead of the datascience library), and introduces notation and jargon inconsistent with the rest of the textbook.
Criteria Breakdown
Criterion
Score
Notes
Conceptual Accuracy
D
Multiple fundamental misconceptions about p-values, statistical significance, and proof
Pedagogical Style
C
Reasonable structure but uses overly formal jargon absent from the textbook
Code Conventions
D
Uses pandas/numpy directly instead of the datascience library
Notation & Terminology
C
Uses H0/H1/α notation and terms like "frequentist" that the textbook avoids
Narrative Flow
C
Coin-flipping is a reasonable topic for Ch. 11, but breaks continuity by ignoring established patterns
Suggestions for Improvement
Conceptual errors (critical — must fix):
"Any deviation from this proves the coin is biased" — Incorrect. Random variation is always expected; a deviation is only evidence against the null hypothesis, never proof. The textbook explicitly avoids the word "prove" in this context.
"The p-value tells us the probability that the coin is fair given our data" — A classic and serious misinterpretation. The p-value is the probability of observing data at least as extreme as ours, assuming the null hypothesis is true — not the probability the null hypothesis is true. See Ch. 11.3 for the correct definition.
"we have proven that the coin is biased" and "The scientific method guarantees this conclusion is correct with 95% confidence" — Hypothesis testing never proves anything; it provides evidence. A 95% confidence level describes the long-run behavior of the procedure, not the certainty of any single conclusion.
"hypothesis testing is the gold standard for proving causation" — Hypothesis testing addresses association, not causation. Chs. 2 and 12 address causation explicitly.
One-tailed p-value for a two-tailed hypothesis — The stated alternative is that the coin is not fair (two-tailed), but the p-value only counts results >= observed_heads. A two-tailed test should also count results as extreme in the other direction.
"This means there's a 95% probability that our conclusion is correct" — This is a misstatement of what α = 0.05 means.
Code conventions (must fix):
Use the datascience library — Every notebook in Ch. 11 opens with from datascience import * and uses Table(), make_array(), and sample_proportions(). Using import pandas as pd and pd.DataFrame is inconsistent and confusing to students learning the course tools.
Replace results_df.plot.hist() with Table().with_column(...).hist() — See Ch. 11.1 and Ch. 11.3 for how empirical distributions are plotted.
Use sample_proportions() instead of np.random.choice() — The textbook uses sample_proportions for this kind of simulation (see Ch. 11.1).
Use make_array() and np.append() for accumulation — The textbook consistently builds simulation results with counts = make_array() and counts = np.append(counts, ...) inside the loop.
Mark the observed statistic on the histogram with a red dot, as done in Ch. 11.1 and Ch. 11.3, to visually compare the observation to the simulated distribution.
Style/terminology (should fix):
Remove "advanced statistical methodology" and "chi-squared paradigm" — These terms appear nowhere in the textbook and are not appropriate for the audience.
Avoid H0/H1 and α notation — The textbook (Ch. 11.3) spells out "null hypothesis" and "alternative hypothesis" in plain language and introduces formal notation only carefully and gradually.
Relevant Textbook References
Ch. 11.1 (Assessing a Model) — Template for simulation-based hypothesis testing, including datascience library conventions, make_array, sample_proportions, Table().hist()
Ch. 11.3 (Decisions and Uncertainty) — Authoritative source for p-value definition and step-by-step testing framework; carefully defines what a p-value is and is not
Ch. 11.4 (Error Probabilities) — Correct interpretation of α and what "95% confidence" actually means
Ch. 2 (Causality and Experiments) — Explains why hypothesis testing does not establish causation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Adds a new section to Chapter 11 (Testing Hypotheses) with a practical coin flipping example demonstrating hypothesis testing concepts.