From 1971dfba3b01690c9919afc72e588db847fbbf1e Mon Sep 17 00:00:00 2001
From: Lukas Wallrich <lukas.wallrich@gmail.com>
Date: Mon, 29 Jun 2026 00:28:14 +0200
Subject: [PATCH 1/2] Make checklist items concrete and actionable (#23)

Reworks the appendix checklist (R1) from ten high-level recommendations into 48 concrete, checkable items. Each original recommendation is kept verbatim as a subsection heading, preserving the existing topical groupings and order, with specific past-tense items beneath that a researcher can tick off. Items are grounded in the handbook's own chapters; cross-references use plain-text chapter/appendix names so the single-file render resolves cleanly. The #sec-checklist label (referenced from the conclusion) is preserved.
---
 appendix_checklist.qmd | 161 ++++++++++++++++++++++++++++++++++++++---
 1 file changed, 149 insertions(+), 12 deletions(-)

diff --git a/appendix_checklist.qmd b/appendix_checklist.qmd
index ce2a90f..8d03ef2 100644
--- a/appendix_checklist.qmd
+++ b/appendix_checklist.qmd
@@ -2,18 +2,155 @@
 title: "Reproduction and Replication Checklist"
 ---
 
-The checklist below summarises the handbook's core recommendations for
-planning, conducting, and reporting reproductions and replications.
+The checklist below operationalises the handbook's core recommendations
+into concrete steps for planning, conducting, and reporting reproductions
+and replications. The groupings follow the stages of the reproduction and
+replication process, and each item is phrased so that it can be ticked off
+once it has been addressed. Not every item applies to every project (for
+example, reproductions and replications have different requirements), so
+use the list as a prompt rather than a set of universal requirements.
 
 ## Checklist {#sec-checklist}
 
-- [ ] Justify choice of target study and claims
-- [ ] Choose a reproduction/replication type that aligns with your aims
-- [ ] Gather and review all relevant materials
-- [ ] Reproduce before you replicate, where possible
-- [ ] Discuss all updates, changes, and extensions of the original materials (as close as possible, as updated as necessary)
-- [ ] Preregister your study and analysis plan
-- [ ] Predetermine conditions for success and failure
-- [ ] Use balanced language when describing the outcomes
-- [ ] Carefully evaluate outcomes and potential reasons for divergences
-- [ ] Report your research comprehensively and openly accessible
+### Justify choice of target study and claims
+
+- [ ] Stated the goal of the project (e.g., assessing a finding's
+  reliability, building on it, or resolving doubts) and whether the target
+  was selected top-down (representing a field) or bottom-up (driven by a
+  specific study)
+- [ ] Identified the specific claim(s) and effect(s) that the study will
+  target, and justified why they matter
+- [ ] Documented the target study's value (e.g., citations, theoretical
+  relevance, societal or practical implications)
+- [ ] Assessed the uncertainty around the original claim (e.g., strength of
+  evidence, sample size, number and significance pattern of prior studies)
+- [ ] Searched for existing reproductions and replications (e.g., the FORRT
+  Replication Database, ReplicationWiki, the Institute for Replication
+  papers, the CODECHECK register)
+- [ ] Reviewed post-publication discussions of the target study (e.g.,
+  published comments, PubPeer, Altmetric, blog posts)
+- [ ] Considered potential researcher biases and disclosed any conflicts of
+  interest relevant to the choice of target
+- [ ] Confirmed feasibility before committing (data availability, achievable
+  sample size, resources, equipment, and expertise)
+
+### Choose a reproduction or replication type that aligns with your aims
+
+- [ ] Clarified whether the aim is to verify a method or analysis
+  (reproduction) or to test a finding or theory anew (replication)
+- [ ] Selected a specific type that matches the aim (e.g., computational,
+  recoding, or robustness reproduction, multiverse analysis, internal
+  replication, close replication, close replication with extension, or
+  conceptual replication), drawing on the types described in the
+  Understanding chapter
+- [ ] Assembled a team with the expertise and resources the chosen type
+  requires
+
+### Gather and review all relevant materials
+
+- [ ] Obtained the original report and any supplementary materials
+- [ ] Located the original data and analysis code, or requested a
+  replication package from the original authors or the journal's data
+  editor (templates for contacting authors are in the Templates appendix)
+- [ ] Respected the licences attached to any reused data and materials, and
+  sought approval where a licence does not permit reuse or alteration
+- [ ] Checked whether shared data meet the FAIR criteria (findable,
+  accessible, interoperable, reusable)
+- [ ] Reviewed the original study protocol for the detail needed to
+  reproduce or replicate it, and noted what is missing
+
+### Reproduce before you replicate, where possible
+
+- [ ] Attempted a numerical reproduction with the original data and code,
+  setting a seed where analyses rely on random numbers
+- [ ] Where no code is available, reconstructed the analyses from the report
+  (recoding reproduction), paying attention to exclusions, transformations,
+  and the handling of missing data
+- [ ] Ran robustness reproductions by varying analytical choices that are
+  not dictated by the theory (e.g., outlier thresholds, age ranges,
+  subsetting)
+- [ ] Where neither data nor code is available, screened the original for
+  statistical inconsistencies using automated tools (e.g., statcheck,
+  papercheck)
+- [ ] Where the original was preregistered, checked whether the reported
+  analyses followed the preregistered plan
+- [ ] Recorded the software and package versions used (e.g., R's
+  `sessionInfo()` or Python's `session_info.show()`) and reported
+  reproducibility indicators comparing original and reproduction results
+
+### Discuss all updates, changes, and extensions of the original materials
+
+- [ ] Stayed as close as possible to the original study, deviating only
+  where necessary
+- [ ] Documented and justified every deviation (e.g., reconstructing
+  unspecified materials, updating deprecated stimuli, translating
+  materials, changing the sample, or updating methods and apparatus)
+- [ ] Where materials were translated or effect sizes will be compared
+  across samples, tested measurement invariance rather than assuming it
+- [ ] Added controls, manipulation checks, or attention checks where they
+  help to interpret the results
+- [ ] Piloted new materials or procedures to check that instructions are
+  clear and that all data are recorded, without using pilots to estimate
+  effect sizes
+- [ ] Where reporting was incomplete, consulted the original authors on the
+  protocol before collecting data
+
+### Preregister your study and analysis plan
+
+- [ ] Preregistered the hypotheses, design, and a full analysis plan
+  (ideally with analysis code tested on simulated or pilot data) before
+  collecting or accessing the data
+- [ ] Justified the sample size rather than simply matching the original,
+  using an approach suited to replication (e.g., the small telescopes
+  approach, equivalence testing against a smallest effect size of interest,
+  a Bayesian design method, or meta-analytic estimates)
+- [ ] Considered publishing as a Registered Report to guard against
+  publication bias
+- [ ] Planned how amendments and deviations from the preregistration will be
+  documented, with version history preserved
+
+### Predetermine conditions for success and failure
+
+- [ ] Specified which effects are of primary interest and how results will
+  be aggregated, noting that requiring several effects to agree reduces
+  statistical power
+- [ ] Defined the criterion that will distinguish replication success from
+  failure (e.g., effect-size comparison, significance, or equivalence), as
+  discussed in the Discussion chapter
+- [ ] Specified any sequential or gating conditions (e.g., a manipulation
+  check that must pass before replicability is evaluated)
+
+### Use balanced language when describing the outcomes
+
+- [ ] Used descriptive and impersonal language, avoiding overstated claims
+  of "success" or "failure"
+- [ ] Took the historical context of the original study into account when
+  commenting on its reporting, data sharing, or brevity
+- [ ] Where results diverged, invited a comment from the original authors (a
+  template is in the Templates appendix)
+
+### Carefully evaluate outcomes and potential reasons for divergences
+
+- [ ] Compared original and replication (or reproduction) results using the
+  predefined success criteria
+- [ ] Examined the raw data for distributional anomalies and careless
+  responding, and reported the results of control or attention checks
+- [ ] Where results diverged, discussed threats to statistical conclusion,
+  internal, construct, and external validity in both the original and the
+  new study
+- [ ] Evaluated each outcome in light of the closeness between the studies
+  and the relevant theory (inductive and deductive interpretations)
+- [ ] Moved beyond a general appeal to hidden moderators when interpreting a
+  failure to replicate
+
+### Report your research comprehensively and openly accessible
+
+- [ ] Reported the methods and results comprehensively, following relevant
+  standards (e.g., the TOP guidelines and, in psychology, the JARS
+  reporting standards)
+- [ ] Shared the preregistration, analysis plan, analysis code, materials,
+  and data (within ethical and legal limits) under an open licence
+- [ ] Published the report so that it is openly accessible (e.g., as a
+  preprint) and citable
+- [ ] Made the findings discoverable for others (e.g., an entry in the FORRT
+  Replication Database or a comment on the original study via PubPeer)

From b97730a52550d20f6aea879befd211003cf1356d Mon Sep 17 00:00:00 2001
From: Lukas Wallrich <lukas.wallrich@gmail.com>
Date: Mon, 29 Jun 2026 17:45:25 +0200
Subject: [PATCH 2/2] Address codex review: refine checklist items (#23)

---
 appendix_checklist.qmd | 18 ++++++++++--------
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/appendix_checklist.qmd b/appendix_checklist.qmd
index 8d03ef2..86e17d4 100644
--- a/appendix_checklist.qmd
+++ b/appendix_checklist.qmd
@@ -36,8 +36,9 @@ use the list as a prompt rather than a set of universal requirements.
 
 ### Choose a reproduction or replication type that aligns with your aims
 
-- [ ] Clarified whether the aim is to verify a method or analysis
-  (reproduction) or to test a finding or theory anew (replication)
+- [ ] Clarified whether the aim is to reproduce results using the original
+  data, code, materials, or analysis specification (reproduction) or to test
+  a finding or theory anew (replication)
 - [ ] Selected a specific type that matches the aim (e.g., computational,
   recoding, or robustness reproduction, multiverse analysis, internal
   replication, close replication, close replication with extension, or
@@ -85,10 +86,11 @@ use the list as a prompt rather than a set of universal requirements.
 - [ ] Documented and justified every deviation (e.g., reconstructing
   unspecified materials, updating deprecated stimuli, translating
   materials, changing the sample, or updating methods and apparatus)
-- [ ] Where materials were translated or effect sizes will be compared
-  across samples, tested measurement invariance rather than assuming it
+- [ ] Where translated multi-item measures or comparable latent constructs
+  are used, tested measurement invariance where feasible
 - [ ] Added controls, manipulation checks, or attention checks where they
-  help to interpret the results
+  help to interpret the results and do not materially change the target
+  procedure
 - [ ] Piloted new materials or procedures to check that instructions are
   clear and that all data are recorded, without using pilots to estimate
   effect sizes
@@ -99,7 +101,7 @@ use the list as a prompt rather than a set of universal requirements.
 
 - [ ] Preregistered the hypotheses, design, and a full analysis plan
   (ideally with analysis code tested on simulated or pilot data) before
-  collecting or accessing the data
+  collecting new data or conducting outcome-relevant confirmatory analyses
 - [ ] Justified the sample size rather than simply matching the original,
   using an approach suited to replication (e.g., the small telescopes
   approach, equivalence testing against a smallest effect size of interest,
@@ -112,8 +114,8 @@ use the list as a prompt rather than a set of universal requirements.
 ### Predetermine conditions for success and failure
 
 - [ ] Specified which effects are of primary interest and how results will
-  be aggregated, noting that requiring several effects to agree reduces
-  statistical power
+  be aggregated, noting that conjunctive criteria requiring several
+  effects all to meet a threshold can reduce statistical power
 - [ ] Defined the criterion that will distinguish replication success from
   failure (e.g., effect-size comparison, significance, or equivalence), as
   discussed in the Discussion chapter