Science

Half of social science studies fail replication tests

By · 2026-06-12
Half of social science studies fail replication tests
Photo by Markus Winkler on Unsplash

Across 3,900 social science papers examined over seven years, 49% failed to reproduce the original result when tested, a replication rate of 51%, according to findings published by the SCORE project (Systematizing Confidence in Open Research and Evidence) [1]. That places the field at the statistical threshold between systematic knowledge production and coin-flip reliability.

SCORE employed two distinct verification methods: reanalysis of existing data and full replication from scratch, processing more than 100 papers through both pathways [1][3]. Reanalysis checks whether the original data supports the published conclusion when processed independently. Replication runs the study again with new subjects and new data collection. Both methods converged on the same approximate failure rate, which means the problem is not confined to statistical errors or selective reporting, it extends to the experimental design and the phenomena being measured.

The replication rate was not uniform. Newer papers replicated more reliably than older ones, and papers published in journals requiring extensive data sharing also showed higher replication rates [1][4]. This creates a selection gradient: the subset of research produced under transparency requirements performs better under verification. Whether that reflects higher-quality methodology or simply greater compliance with documentation standards remains an open question. The gradient does not necessarily separate true findings from false ones, it may separate auditable findings from opaque ones.

A White House executive order issued in May 2025 named the "reproducibility crisis" as a priority concern for federal science policy [1][5]. The timing positions SCORE's findings within a broader institutional reckoning. The project's data and materials now sit in a living repository on the Open Science Framework, making the verification system itself transparent and available for further analysis [1][6].

A 51% replication rate means social science operates at the boundary where a field generates enough reliable findings to justify continued institutional support while producing enough unreliable findings to require permanent verification infrastructure [2]. The open question is not whether the field is broken, it is whether coin-flip reliability represents a stable equilibrium for disciplines studying high-variance human systems, and what it means when policy, education, and clinical practice are built on research that replicates at the rate of a fair coin toss. Half-right is not random when institutions treat it as authoritative.

The implications extend beyond academic credibility: when interventions are designed, funding allocated, and curricula revised based on findings that fail to replicate more often than they succeed, the cost is not merely intellectual but material, measured in wasted resources, ineffective programs, and eroded public trust in empirical inquiry. SCORE does not resolve the crisis, but it quantifies it with enough precision that ignoring the problem now requires willful institutional blindness. What remains is whether the research community will treat 51% as a baseline to improve upon or a threshold to defend.