From d3f8003eec2909cedb12e8f0a0f9939ce1a556c5 Mon Sep 17 00:00:00 2001 From: Lukas Wallrich Date: Sun, 28 Jun 2026 17:06:16 +0100 Subject: [PATCH 1/3] Discuss irreducible disagreement about replication success (#4) Extend the Hidden Moderator Account in the Discussion chapter with Sarisoy's (2025) argument: because experimental control is limited and possible moderators are vast, experts often cannot tell a genuine replication failure from a violated ceteris paribus assumption (a form of the experimenter's regress). Frame such disagreement as legitimate normative judgement rather than poor practice, and note that transparency about a replication's intended epistemic function (reliability / validity / generalisation) makes it more tractable. Add verified references.bib entry for Sarisoy (2025). --- discussion.qmd | 20 ++++++++++++++++++++ references.bib | 11 +++++++++++ 2 files changed, 31 insertions(+) diff --git a/discussion.qmd b/discussion.qmd index 619070d..0650003 100644 --- a/discussion.qmd +++ b/discussion.qmd @@ -172,6 +172,26 @@ Whether that generalises to the setting of the original study needs to be considered in light of theory, and might be a legitimate matter of contention. +These difficulties point to a partly irreducible source of disagreement +about replication success. Because experimental control is limited and +the space of possible moderators is vast, experts often cannot decide +unambiguously whether divergent results reflect a genuine failure to +replicate or a violation of the *ceteris paribus* assumption between the +original study and the replication — a version of Collins' experimenter's +regress that Sarisoy [-@Sarisoy2025] examines in detail. On this account, +sustained disagreement about whether a replication succeeded can reflect +legitimate normative judgements that researchers make when the evidence +underdetermines the conclusion, rather than poor research practice. +Sarisoy argues that such disagreements become more tractable once +researchers are transparent about a replication's intended *epistemic +function* — whether it is designed to test the reliability (stability) of +an effect, to probe a specific validity threat, or to assess +generalisation to a new context — because each function carries different +standards for what would count as success. Declaring this purpose, in +addition to pre-specifying which effects are of primary interest (see +@sec-success-criteria), helps to recast debates as disagreements about +what a replication was meant to show. + ## The Role of Differences for the Interpretation of Findings {#sec-differences-and-interpretation} Each replication outcome should be evaluated in the light of its diff --git a/references.bib b/references.bib index cecd4e5..4265670 100644 --- a/references.bib +++ b/references.bib @@ -1359,6 +1359,17 @@ @article{RosenbergFinn2022 doi = {10.1038/s41593-022-01110-9} } +@article{Sarisoy2025, + author = {Sarisoy, J.}, + title = {Why we disagree about the success of replications}, + journal = {Journal for General Philosophy of Science}, + volume = {56}, + number = {3}, + pages = {307-324}, + year = {2025}, + doi = {10.1007/s10838-024-09709-1} +} + @article{SchauerHedges2021, author = {Schauer, J. M. and Hedges, L. V.}, title = {Reconsidering statistical methods for assessing replication}, From 52be9e16363585a0c132171b55497bab001134ea Mon Sep 17 00:00:00 2001 From: Lukas Wallrich Date: Mon, 29 Jun 2026 17:43:23 +0200 Subject: [PATCH 2/3] Address codex review: soften disagreement claim (#4) --- discussion.qmd | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/discussion.qmd b/discussion.qmd index 0650003..b3f74cc 100644 --- a/discussion.qmd +++ b/discussion.qmd @@ -181,7 +181,7 @@ original study and the replication — a version of Collins' experimenter's regress that Sarisoy [-@Sarisoy2025] examines in detail. On this account, sustained disagreement about whether a replication succeeded can reflect legitimate normative judgements that researchers make when the evidence -underdetermines the conclusion, rather than poor research practice. +underdetermines the conclusion, rather than necessarily indicating poor research practice. Sarisoy argues that such disagreements become more tractable once researchers are transparent about a replication's intended *epistemic function* — whether it is designed to test the reliability (stability) of From 10b7bfa852982508defa75437efd708c9217425d Mon Sep 17 00:00:00 2001 From: Lukas Wallrich Date: Mon, 29 Jun 2026 17:51:58 +0200 Subject: [PATCH 3/3] Sharpen 'failure to replicate'; commas for dashes (#4) --- discussion.qmd | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/discussion.qmd b/discussion.qmd index b3f74cc..84268fa 100644 --- a/discussion.qmd +++ b/discussion.qmd @@ -176,17 +176,18 @@ These difficulties point to a partly irreducible source of disagreement about replication success. Because experimental control is limited and the space of possible moderators is vast, experts often cannot decide unambiguously whether divergent results reflect a genuine failure to -replicate or a violation of the *ceteris paribus* assumption between the -original study and the replication — a version of Collins' experimenter's +replicate the original effect or a violation of the *ceteris paribus* +assumption between the original study and the replication, a version of +Collins' experimenter's regress that Sarisoy [-@Sarisoy2025] examines in detail. On this account, sustained disagreement about whether a replication succeeded can reflect legitimate normative judgements that researchers make when the evidence underdetermines the conclusion, rather than necessarily indicating poor research practice. Sarisoy argues that such disagreements become more tractable once researchers are transparent about a replication's intended *epistemic -function* — whether it is designed to test the reliability (stability) of +function*, whether it is designed to test the reliability (stability) of an effect, to probe a specific validity threat, or to assess -generalisation to a new context — because each function carries different +generalisation to a new context, because each function carries different standards for what would count as success. Declaring this purpose, in addition to pre-specifying which effects are of primary interest (see @sec-success-criteria), helps to recast debates as disagreements about