From 796a352e601c36132fa97c51623fffb1f2781b59 Mon Sep 17 00:00:00 2001
From: Lukas Wallrich <lukas.wallrich@gmail.com>
Date: Mon, 29 Jun 2026 00:12:29 +0200
Subject: [PATCH 1/2] Split robustness reproductions; suggest shuffled-data
 protocol (#18)

planning.qmd (Reproduction before Replication, renders as 4.2): give robustness reproductions their own paragraph, regrouping the existing numerical/recoding/preregistration-adherence sentences into the preceding paragraph. Content and citations unchanged.

execution_reproductions.qmd (Preregistration): add a recommendation to develop and register the analysis protocol on a masked dataset in which the outcome/condition is shuffled across cases, preserving marginal distributions and pipeline structure while withholding directional results. No citation added (no clearly-matching Crossref-verified source).
---
 execution_reproductions.qmd | 12 ++++++++++++
 planning.qmd                | 24 +++++++++++++-----------
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/execution_reproductions.qmd b/execution_reproductions.qmd
index 6d590bb..f0c5ee8 100644
--- a/execution_reproductions.qmd
+++ b/execution_reproductions.qmd
@@ -104,6 +104,18 @@ planned analyses.
 
 While preregistration of a reproduction may seem paradoxical when data are already accessible, it remains valuable as a (personal) commitment device: specifying the analysis plan in advance keeps researchers accountable and helps produce robust reproductions. If the data could already have been accessed, some readers may discount the registration, yet we would recommend to still start with this.
 
+To develop and register the analysis protocol without being steered by the
+results, researchers can work on a masked version of the data in which the
+outcome (or condition) is randomly shuffled across cases, breaking the
+pairing between each record and its result. Such a masked dataset preserves
+the marginal distribution of each variable and the structural information
+needed to build and debug the analysis pipeline, for example variable types,
+value ranges, missing-data patterns, and the code paths that each step
+exercises, while withholding the directional relationships between predictors
+and outcomes. Finalising the protocol on this masked version, and only then
+applying it to the intact data, reduces the risk that analytic choices are
+consciously or unconsciously adjusted to produce a particular result.
+
 ## Deviations
 
 Reproductions may aim to test whether the precise same approach yields the
diff --git a/planning.qmd b/planning.qmd
index 55272e8..cc82974 100644
--- a/planning.qmd
+++ b/planning.qmd
@@ -83,20 +83,22 @@ special attention should be paid to processing steps such as exclusion
 of outliers, transformation of variables, and handling of missing data.
 However, in many research areas information on these steps is often
 incomplete [@FieldEtAl2019]; older research tends to be especially
-limited in terms of the methodological details they provide. In
-addition, we recommend testing the robustness of the original finding by
-making small alterations to the data processing and analyses procedure
-(*robustness reproductions*). For example, if the analyses were run for
-a subset of the data (e.g., participants aged 21 to 30 or without
-outliers ± 3 standard deviations), this subset can be changed (e.g.,
-participants aged 18 to 30 or without outliers ± 2 standard deviations).
-Here, the initial focus should be on choices that are not determined by
-the *theory* that is presented, though this can also be used to explore
-the generalisability of some aspects of theory. Finally, if the original
+limited in terms of the methodological details they provide. If the original
 study was preregistered and the original code is available, reproduction
-researchers can check whether the original analyses adhere to the
+researchers can also check whether the original analyses adhere to the
 preregistered analysis plan.
 
+Beyond reproducing the original analyses as reported, we recommend testing
+the robustness of the original finding by making small alterations to the
+data processing and analyses procedure (*robustness reproductions*). For
+example, if the analyses were run for a subset of the data (e.g.,
+participants aged 21 to 30 or without outliers ± 3 standard deviations),
+this subset can be changed (e.g., participants aged 18 to 30 or without
+outliers ± 2 standard deviations). Here, the initial focus should be on
+choices that are not determined by the *theory* that is presented, though
+this can also be used to explore the generalisability of some aspects of
+theory.
+
 If neither code nor data are available (or shared by the authors), no
 reproduction is possible. Researchers can still use automated tools to
 compare reported *p*-values with those that can be computed from test

From d91cf31ba701a84590e2883ac3f57240d34b694b Mon Sep 17 00:00:00 2001
From: Lukas Wallrich <lukas.wallrich@gmail.com>
Date: Mon, 29 Jun 2026 17:44:13 +0200
Subject: [PATCH 2/2] Address codex review: narrow masked-data protocol claims
 (#18)

---
 execution_reproductions.qmd | 22 +++++++++++++---------
 1 file changed, 13 insertions(+), 9 deletions(-)

diff --git a/execution_reproductions.qmd b/execution_reproductions.qmd
index f0c5ee8..00bb7e5 100644
--- a/execution_reproductions.qmd
+++ b/execution_reproductions.qmd
@@ -106,15 +106,19 @@ While preregistration of a reproduction may seem paradoxical when data are alrea
 
 To develop and register the analysis protocol without being steered by the
 results, researchers can work on a masked version of the data in which the
-outcome (or condition) is randomly shuffled across cases, breaking the
-pairing between each record and its result. Such a masked dataset preserves
-the marginal distribution of each variable and the structural information
-needed to build and debug the analysis pipeline, for example variable types,
-value ranges, missing-data patterns, and the code paths that each step
-exercises, while withholding the directional relationships between predictors
-and outcomes. Finalising the protocol on this masked version, and only then
-applying it to the intact data, reduces the risk that analytic choices are
-consciously or unconsciously adjusted to produce a particular result.
+outcome column is randomly shuffled across cases, breaking the link between
+each record and its result. Shuffling an experimental condition or treatment
+label is only appropriate with design-aware masking. Such a masked dataset
+preserves the marginal distributions of the shuffled and unchanged variables,
+while often retaining enough structural information to build and debug much of
+the analysis pipeline, for example variable types, value ranges, missing-data
+patterns, and the code paths that each step exercises, while withholding the
+directional relationships between predictors and outcomes. For clustered,
+paired, longitudinal, blocked, or stratified designs, any shuffling should
+respect the relevant design structure. Finalising the protocol on this masked
+version, and only then applying it to the intact data, can reduce the risk
+that analytic choices are consciously or unconsciously adjusted to produce a
+particular result.
 
 ## Deviations