forrtproject · LukasWallrich · Jun 28, 2026 · Jun 29, 2026
diff --git a/execution_reproductions.qmd b/execution_reproductions.qmd
@@ -104,6 +104,22 @@ planned analyses.
 
 While preregistration of a reproduction may seem paradoxical when data are already accessible, it remains valuable as a (personal) commitment device: specifying the analysis plan in advance keeps researchers accountable and helps produce robust reproductions. If the data could already have been accessed, some readers may discount the registration, yet we would recommend to still start with this.
 
+To develop and register the analysis protocol without being steered by the
+results, researchers can work on a masked version of the data in which the
+outcome column is randomly shuffled across cases, breaking the link between
+each record and its result. Shuffling an experimental condition or treatment
+label is only appropriate with design-aware masking. Such a masked dataset
+preserves the marginal distributions of the shuffled and unchanged variables,
+while often retaining enough structural information to build and debug much of
+the analysis pipeline, for example variable types, value ranges, missing-data
+patterns, and the code paths that each step exercises, while withholding the
+directional relationships between predictors and outcomes. For clustered,
+paired, longitudinal, blocked, or stratified designs, any shuffling should
+respect the relevant design structure. Finalising the protocol on this masked
+version, and only then applying it to the intact data, can reduce the risk
+that analytic choices are consciously or unconsciously adjusted to produce a
+particular result.
+
 ## Deviations
 
 Reproductions may aim to test whether the precise same approach yields the

diff --git a/planning.qmd b/planning.qmd
@@ -83,20 +83,22 @@ special attention should be paid to processing steps such as exclusion
 of outliers, transformation of variables, and handling of missing data.
 However, in many research areas information on these steps is often
 incomplete [@FieldEtAl2019]; older research tends to be especially
-limited in terms of the methodological details they provide. In
-addition, we recommend testing the robustness of the original finding by
-making small alterations to the data processing and analyses procedure
-(*robustness reproductions*). For example, if the analyses were run for
-a subset of the data (e.g., participants aged 21 to 30 or without
-outliers ± 3 standard deviations), this subset can be changed (e.g.,
-participants aged 18 to 30 or without outliers ± 2 standard deviations).
-Here, the initial focus should be on choices that are not determined by
-the *theory* that is presented, though this can also be used to explore
-the generalisability of some aspects of theory. Finally, if the original
+limited in terms of the methodological details they provide. If the original
 study was preregistered and the original code is available, reproduction
-researchers can check whether the original analyses adhere to the
+researchers can also check whether the original analyses adhere to the
 preregistered analysis plan.
 
+Beyond reproducing the original analyses as reported, we recommend testing
+the robustness of the original finding by making small alterations to the
+data processing and analyses procedure (*robustness reproductions*). For
+example, if the analyses were run for a subset of the data (e.g.,
+participants aged 21 to 30 or without outliers ± 3 standard deviations),
+this subset can be changed (e.g., participants aged 18 to 30 or without
+outliers ± 2 standard deviations). Here, the initial focus should be on
+choices that are not determined by the *theory* that is presented, though
+this can also be used to explore the generalisability of some aspects of
+theory.
+
 If neither code nor data are available (or shared by the authors), no
 reproduction is possible. Researchers can still use automated tools to
 compare reported *p*-values with those that can be computed from test