Zombie Trials in Clinical Research — Prevalence, Detection, and Impact on Evidence-Based Medicine

May 27, 2026

Reviewed 27 May 2026

Zombie Trials in Clinical Research — Prevalence, Detection, and Impact on Evidence-Based Medicine

An analysis of individual participant data from RCTs submitted to a single journal found that 44% contained false data and 26% were outright fabrications. These “zombie trials” persist in the literature, inflate treatment effect sizes by a median of 58%, and drive clinical guideline recommendations that are rarely corrected—even after retraction.

TL;DR Zombie trials are randomized controlled trials with fabricated or fatally flawed data that remain in the scientific record. Prevalence estimates range from 26% of submissions at one journal to hundreds of thousands across all of medicine. Their inclusion in meta-analyses inflates treatment effects by a median of 58%, shifts 30% of statistically significant findings to non-significance when removed, and drives inappropriate clinical guidelines. Traditional risk-of-bias tools cannot detect them. New frameworks—RIGID, TRACT, INSPECT-SR—are being developed to screen for data authenticity, but adoption remains limited. A 2026 JAMA Network Open study found that just six “superretractors” accounted for 22% of all retracted RCTs, with a median lag of nearly 14 years before retraction.

This post is derived from a verified BioSkepsis research thread View the research on BioSkepsis →

What Are Zombie Trials and Why Do They Matter for Biomedical Evidence?

The term “zombie trial” was introduced by John Carlisle, editor of Anaesthesia, to describe RCTs whose data are so problematic that the study should have been retracted—but was not. These are trials that look alive on PubMed, carry DOIs, get cited in Cochrane reviews, yet contain data that is fabricated, falsified, or so deeply flawed that no reliable conclusion can be drawn from them (PMID:33040331).

The distinction matters because zombie trials occupy the top of the evidence hierarchy. Randomized controlled trials are the gold standard for causal inference in medicine. When they are fabricated, the distortion propagates upward through meta-analyses, systematic reviews, and clinical practice guidelines—ultimately reaching the patient.

Ioannidis estimated that hundreds of thousands of zombie randomized trials circulate in the literature, arguing that the problem extends far beyond the anesthesiology field where Carlisle’s original analysis was conducted (PMID:33124075).

Prevalence of Fabricated and False Data in Randomized Controlled Trials

The most granular prevalence data come from Carlisle’s analysis of 526 RCTs submitted to Anaesthesia between February 2017 and March 2020. Among the 153 trials where individual participant data (IPD) were available, 44% contained false data and 26% met the threshold for “zombie” classification—meaning the flaws were so severe that the trial was deemed irrecoverable. Among the 373 trials without IPD access, only 2% were flagged as false and 1% as zombie, strongly suggesting that aggregated summary data conceal fabrication (PMID:33040331).

The geographic distribution was uneven. China and Egypt had the highest proportions of false data submissions: 48% and 90% of their respective IPD-available trials contained false data. South Korea, India, and Japan followed at 32%, 62%, and 18% (PMID:33040331).

Self-reported data provide a complementary but lower bound. Fanelli’s meta-analysis of 18 surveys found that 1.97% of scientists admitted to fabricating, falsifying, or modifying data at least once, while 14.12% reported personal knowledge of a colleague committing such misconduct. Medical and pharmacological researchers reported higher rates than scientists in other fields (PMID:19478950).

Single-author-group prevalence—86% implausible baselines

An investigation of 35 RCTs from a single author group found that approximately 86% reported baseline characteristics that were mathematically unlikely to result from proper randomization. This was detected using the Carlisle method of testing whether the distribution of baseline variables across treatment arms is consistent with genuine random allocation (PMID:39628711).

Cochrane prenatal nutrition—25% of included RCTs removed

Formal trustworthiness assessments of Cochrane reviews for prenatal nutritional interventions led to the removal of 25% of included RCTs due to integrity concerns. The trials had passed conventional risk-of-bias appraisal before the integrity screen was applied (PMID:39628711).

Prevalence of false data by IPD availability (Carlisle, Anaesthesia 2017–2020)
Metric	With IPD (n=153)	Without IPD (n=373)
False data rate	44%	2%
Zombie classification rate	26%	1%
IPD detection odds ratio (false)	47 (95% CrI: 17–144)
IPD detection odds ratio (zombie)	79 (95% CrI: 19–384)

How Zombie Trials Distort Meta-Analyses and Systematic Reviews

The downstream consequences of zombie data in evidence synthesis are severe and quantifiable. O’Connell et al. systematically traced a cohort of untrustworthy spinal-pain trials through 32 unique systematic reviews and 10 clinical practice guidelines. Across 55 meta-analytic comparisons, removing the untrustworthy “index trials” reduced treatment effect sizes by a median of 58% (IQR 40–74). In 85% of comparisons, the impact was classified as “high” (PMID:37453533).

The statistical significance of pooled estimates is also affected. In 12 of 40 comparisons (30%) that were originally statistically significant (p < 0.05), the effect became non-significant once the zombie trials were removed. Nine out of 10 reviews conducting narrative synthesis had drawn positive conclusions about the intervention; nine out of 10 CPGs had made positive treatment recommendations (PMID:37453533).

Ivermectin for COVID-19—fabricated studies drove mortality benefit

Several systematic reviews initially concluded that ivermectin significantly reduced COVID-19 mortality. The Cochrane ivermectin review team developed the Research Integrity Assessment (RIA) tool to screen included studies and found that more than 40% of eligible trials were excluded due to integrity concerns, including spreadsheets containing repeating blocks of duplicated participant data. The reported mortality benefit could not be sustained once these studies were removed (PMID:36054583).

Osteoporosis—15 years of guideline influence from fabricated fracture trials

Twelve osteoporosis RCTs from a single Japanese author group dominated fracture-prevention literature for over 15 years. A meta-analysis of vitamin K for hip fracture prevention initially reported a significant reduction (OR 0.23; 95% CI 0.12–0.47). A sensitivity analysis excluding the three affected trials shifted the result to a non-significant estimate with wide confidence intervals (OR 0.30; 95% CI 0.05–1.74). The trials were eventually retracted for fabrication and plagiarism, but not before influencing guidelines in the US, Scotland, and Japan (PMID:23346882).

The Self-Correction Failure: Why Zombie Data Persists in Clinical Guidelines

The biomedical literature’s capacity for self-correction is, in practice, minimal. Kataoka et al. conducted a meta-epidemiological study of 587 articles (525 systematic reviews and 62 CPGs) that cited retracted RCTs. Among the 127 articles published after retraction that included the retracted trial in their evidence synthesis, none corrected themselves. Among the 239 articles published before retraction where the RCT was later retracted, only 5% of systematic reviews and 5% of CPGs corrected or retracted their results (PMID:35779825).

The citation pattern compounds the problem. Approximately 90% of systematic reviews published after an RCT had been retracted cited the trial without caution or acknowledgment of its retracted status. The retracted data continues to function as if it were valid, feeding into new syntheses and guideline updates (PMID:35779825).

Investigation timelines are part of the problem. Misconduct investigations in fields such as women’s health have been shown to take a median of over 11 years. During this interval, the suspect data continues to accumulate citations and influence downstream evidence products (PMID:39628711).

Self-correction rates after RCT retraction (Kataoka et al., 2022)
Document type	Total citing retracted RCTs	Corrected after retraction	Correction rate
Systematic reviews	525	9	5%
Clinical practice guidelines	62	2	5%
SRs published post-retraction citing without caution	~90%

Superretractors: Concentrated Sources of Fabricated RCT Data

A 2026 JAMA Network Open cohort study linked an openly available dataset of retracted RCTs (VITALITY) to lists of authors with the most retractions. Among 1,330 retracted RCTs, just 6 superretractors—concentrated in anesthesiology and endocrinology/metabolism—coauthored 290 (22%) of all retracted trials. Expanding the list to 18 career-long top-cited scientists with 10 or more retractions captured 327 trials (25%), with 84% of these overlapping with the superretractor set (PMID:41984475).

The lag between publication and retraction was striking. Superretractor-authored RCTs had a median publication-to-retraction interval of 5,111 days—approximately 14 years. During that window, these papers accrued a median of 21 citations (IQR 12–42), compared with 5 citations (IQR 1–19) for non-superretractor retractions. In multivariable regression, time to retraction was the only variable significantly and positively associated with total citations (PMID:41984475).

The implication is that a small number of prolific fabricators can poison an entire evidence ecosystem. Their work, often published in high-impact journals and cited by dozens of systematic reviews, has an outsized influence on the conclusions that shape clinical practice.

Statistical Methods for Detecting Fabricated Data in Clinical Trials

Detection of zombie trials depends on a hierarchy of statistical forensics, from baseline plausibility checks through machine-learning anomaly detection to full individual-participant-data audit.

The Carlisle method. The most widely cited approach tests whether the distribution of baseline characteristics across treatment groups is mathematically consistent with genuine randomization. Key indicators include perfectly balanced groups across many variables, distributions that are too narrow or too wide, and means/SDs that are inconsistent with the reported sample size. In one cohort, this method flagged 86% of 35 RCTs from a single author group as having implausible baselines (PMID:39628711).

Central statistical monitoring (CSM). CSM applies thousands of statistical tests across multiple variables to calculate a “data inconsistency score” for each clinical site in a multicentre trial. Pogue et al. developed and validated risk scores that discriminated between centres with and without fabricated data, achieving area-under-the-curve values of 0.90–0.95 (PMID:23283577).

IPD forensics. When raw datasets are available, investigators look for repeating blocks of data (identical participant rows duplicated), calculation errors where figures do not reconcile with tables, and excessive similarity or difference between groups that would not arise naturally. The Cochrane ivermectin review discovered fabricated spreadsheets through exactly this kind of row-level inspection (PMID:36054583).

Funnel-plot and sensitivity analysis. At the meta-analytic level, zombie trials often appear as extreme outliers reporting implausibly large effects. Integrity-based sensitivity analysis—removing the suspect trials and measuring the shift in the pooled estimate—quantifies the degree to which conclusions depend on untrustworthy sources (PMID:37453533).

Emerging Integrity Frameworks: RIGID, TRACT, and INSPECT-SR

The recognition that conventional risk-of-bias tools assume data authenticity has driven the development of dedicated integrity screening frameworks.

RIGID (Research Integrity in Guidelines and evIDence synthesis) is a six-step framework co-developed with 80 international experts. Its steps are: (1) standard systematic-review processes, (2) exclusion of retracted studies, (3) integrity assessment using the TRACT checklist, (4) committee discussion and voting on risk ratings, (5) formal author engagement including IPD requests, and (6) re-assessment and pathway allocation—included, awaiting classification, or not included. In its pilot deployment for the international PCOS guideline (endorsed by 39 societies across six continents), 45 of 101 originally identified studies (45%) were excluded due to integrity concerns (PMID:39628711).

TRACT (Trustworthiness in RAndomised Clinical Trials) is a 19-item checklist organized into seven domains: governance, author group, plausibility of intervention usage, timeframe, drop-out rates, baseline characteristics, and outcomes. Each item is rated as no concerns, some concerns, or major concerns. When a majority of items are rated as major concern, the framework triggers a more thorough investigation including IPD assessment (PMID:37337220).

INSPECT-SR (INveStigating ProblEmatic Clinical Trials in Systematic Reviews) is a Cochrane-led project developing a tool specifically for use within systematic reviews. It combines expert consensus from a Delphi survey with empirical evidence, and includes a planned extension (INSPECT-IPD) for forensic checks of underlying datasets (PMID:38471680).

RIA (Research Integrity Assessment), developed by the Cochrane COVID-19 ivermectin review team, assesses six criteria: retraction status, prospective registration, ethics approval, author group, plausibility of methods, and plausibility of results. It was used at the eligibility-check stage, resulting in the exclusion of more than 40% of studies in the first update of the ivermectin review (PMID:36054583).

Comparison of integrity screening tools for RCTs
Tool	Scope	Key innovation	IPD step?
RIGID	Guideline development & evidence synthesis	Integrity committee with multi-expert voting; author engagement pathway	Yes (Step 5–6)
TRACT	Any published or submitted RCT	19-item checklist across 7 domains; escalation trigger	Triggered by major concerns
INSPECT-SR	Systematic reviews (Cochrane-led)	Delphi consensus + empirical validation; planned INSPECT-IPD extension	In development
RIA	Cochrane reviews (eligibility check)	Six-criterion screen at study inclusion stage	No (summary-level)

Who Needs to Act on the Zombie Trial Problem?

BioSkepsisSystematic reviewers and meta-analysts

Every meta-analysis should include integrity screening as a standard step. BioSkepsis provides citation-grounded literature synthesis with evidence tiering, enabling reviewers to trace each included trial’s provenance and flag inconsistencies before pooling data.

BioSkepsisClinical guideline developers

Guideline panels should adopt frameworks like RIGID or INSPECT-SR. BioSkepsis can accelerate the literature surveillance step by synthesizing large trial corpora with automated relevance screening and PMID-grounded claim verification—catching problems that manual screening misses.

BioSkepsisJournal editors and peer reviewers

Pre-publication IPD screening, as Carlisle has advocated, should become standard for RCT submissions. BioSkepsis supports the evidence-checking layer by enabling rapid comparison of a submitted trial’s claims against the existing verified literature.

BioSkepsisClinicians and pharmacologists

Practitioners who rely on guideline recommendations need to understand that the underlying evidence may be contaminated. BioSkepsis provides a way to independently verify the strength and integrity of the evidence behind any clinical claim, directly from the primary literature.

Frequently Asked Questions About Zombie Trials

What is a zombie trial in clinical research?

A zombie trial is a published or unpublished RCT with serious questions about the trustworthiness of its data or findings, regardless of whether it has been formally retracted. The term was coined by John Carlisle, editor of Anaesthesia, to describe studies that have the semblance of real research but are hollow shells masquerading as reliable information (PMID:33040331).

How prevalent are zombie trials in the biomedical literature?

An analysis of 153 RCTs with IPD submitted to Anaesthesia found that 44% contained false data and 26% contained fabricated data (PMID:33040331). Ioannidis has estimated that hundreds of thousands of zombie RCTs circulate in the literature (PMID:33124075). Self-reported survey data suggest 1.97% of scientists admit to fabrication, but 14% report knowledge of a colleague committing misconduct (PMID:19478950).

How do zombie trials affect meta-analyses and systematic reviews?

Zombie trials inflate treatment effect sizes. In spinal pain research, removing untrustworthy trials reduced effect sizes by a median of 58% (IQR 40–74). In 30% of cases, statistically significant effects became non-significant after excluding the suspect trials. Nine out of 10 CPGs made positive recommendations based on evidence from these untrustworthy sources (PMID:37453533).

What is the Carlisle method for detecting fabricated trial data?

The Carlisle method tests whether the means and standard deviations of baseline variables across treatment groups are mathematically plausible under genuine randomization. Key flags include perfectly balanced groups, distributions that are too narrow or too wide, and repeating data blocks. In one investigation, this method identified 86% of trials from a single author group as having implausible baseline distributions (PMID:39628711).

What is the RIGID framework for research integrity?

RIGID is a six-step framework for assessing study integrity during evidence synthesis. It uses TRACT for initial screening, an integrity committee for voting on risk ratings, formal author engagement for data requests, and a structured re-assessment algorithm. In its pilot for the international PCOS guideline, 45% of originally identified studies were excluded due to integrity concerns (PMID:39628711).

Why can traditional risk-of-bias tools not detect zombie trials?

Tools like Cochrane’s RoB 2 assess methodological features—blinding, allocation concealment, attrition—but assume the reported data are genuine. A fabricated trial can describe perfect methods and receive a low risk-of-bias rating while its underlying data are entirely false. Dedicated integrity tools like INSPECT-SR and TRACT are needed to assess data authenticity separately (PMID:38471680; PMID:37337220).

What role do superretractors play in the zombie trial problem?

A 2026 JAMA Network Open study found that 6 superretractors coauthored 22% of 1,330 retracted RCTs. These authors, concentrated in anesthesiology and endocrinology, had a median publication-to-retraction lag of 5,111 days—nearly 14 years during which their fabricated data influenced meta-analyses and guidelines (PMID:41984475).

Verify the Evidence Behind Any Biomedical Claim

BioSkepsis synthesizes primary literature with PMID-grounded citations and evidence tiering—so you can trace every claim to its source and catch zombie data before it reaches your review.

Start free

Sources & Further Reading

Carlisle JB. False individual patient data and zombie randomised controlled trials submitted to Anaesthesia. Anaesthesia. 2021;76(4):472-479. PMID:33040331
Ioannidis JPA. Hundreds of thousands of zombie randomised trials circulate among us. Anaesthesia. 2021;76(4):444-447. PMID:33124075
Fanelli D. How many scientists fabricate and falsify research? A systematic review and meta-analysis of survey data. PLoS One. 2009;4(5):e5738. PMID:19478950
Mousa A, Flanagan M, Tay CT, et al. Research Integrity in Guidelines and evIDence synthesis (RIGID): a framework for assessing research integrity in guideline development and evidence synthesis. EClinicalMedicine. 2024;74:102717. PMID:39628711
O'Connell N, Moore RA, Stewart G, et al. Trials We Cannot Trust: Investigating Their Impact on Systematic Reviews and Clinical Guidelines in Spinal Pain. J Pain. 2023;24(12):2103-2130. PMID:37453533
Kataoka Y, Banno M, Tsujimoto Y, et al. Retracted randomized controlled trials were cited and not corrected in systematic reviews and clinical practice guidelines. J Clin Epidemiol. 2022;150:90-97. PMID:35779825
Weibel S, Popp M, Reis S, et al. Identifying and managing problematic trials: A research integrity assessment tool for randomized controlled trials in evidence synthesis. Res Synth Methods. 2023;14(3):357-369. PMID:36054583
Mol BW, Lai S, Rahim A, et al. Checklist to assess Trustworthiness in RAndomised Controlled Trials (TRACT checklist): concept proposal and pilot. Res Integr Peer Rev. 2023;8(1):6. PMID:37337220
Wilkinson J, Heal C, Antoniou GA, et al. Protocol for the development of a tool (INSPECT-SR) to identify problematic randomised controlled trials in systematic reviews of health interventions. BMJ Open. 2024;14(3):e084164. PMID:38471680
Lyu C, Matbouriahi M, Naudet F, Ioannidis JPA, Cristea IA. Retracted Randomized Clinical Trials From Superretractors and Top-Cited Scientists With Multiple Retractions. JAMA Netw Open. 2026;9(4):e267424. PMID:41984475
Pogue JM, Devereaux PJ, Thorlund K, Yusuf S. Central statistical monitoring: detecting fraud in clinical trials. Clin Trials. 2013;10(2):225-235. PMID:23283577
Iwamoto J, Sato Y. Menatetrenone for the treatment of osteoporosis [RETRACTED]. Expert Opin Pharmacother. 2013;14(4):449-458. PMID:23346882