The empirical scientist conducts controlled experiments and keeps accurate, unbiased records of all observable conditions at the time the experiment is conducted. If a researcher has discovered a genuinely new or previously unobserved natural phenomenon, other researchers -with access to his or her notes and some apparatus of their own devising, should be able to reproduce or confirm the discovery. If sufficient corroboration is forthcoming the scientific community eventually acknowledges that the phenomenon is real and adapts existing theory to accommodate the new observations.
The validation of scientific truth requires replication or reproduction. Replicability most commonly refers to obtaining an experiment’s result in an independent study, by a different investigator with different data, while reproducibility refers to different investigators using the same data, methods, and/or computer code to reach the same conclusions.
Yet today the scientific process of replication and reproduction has ceased to function properly. A vast proportion of the scientific claims in published literature have not been replicated or reproduced. Estimates are that a majority of these published claims that cannot be replicated or reproduced are in fact false.
An extraordinary number of scientific and social-scientific disciplines no longer reliably produce true results, a state of affairs commonly referred to as the Irreproducibility Crisis. A substantial majority of 1500 active scientists, recently surveyed by Nature magazine coined the urgent situation a “Crisis”. The scientific world’s completely inappropriate professional incentives bear much of the blame for this catastrophic failure.
Politicians and bureaucrats commonly act to maximize their self-interest rather than acting as disinterested servants of the public good. This applies specifically to scientists, peer reviewers and government experts. The different participants in the scientific research system all serve their own interests as they follow the systems incentives.
Well published university researchers earn tenure, promotion, lateral moves to more prestigious universities, salary increases, grants, professional reputation, and public esteem–above all, from publishing exciting new positive results. The same incentives affect journal editors who receive acclaim for their journal, and personal awards by publishing what may be viewed as exciting new research-even though the research has not been thoroughly vetted.
Grantors want to fund exciting research, and government funders possess the added incentive that exciting research with positive results supports the expansion of their organization’s mission. American university administrations want to host grant-winning research, from which they profit by receiving overhead costs–frequently the majority of the amount in the grant. As one who has experienced and viewed this first hand, it will boggle the readers mind as to the huge portions of most research grants that goes to the university as overhead rather than to support actual research costs.
All these incentives reward published research with new positive claims but not necessarily reproducible research. Researchers, editors, grantors, bureaucrats, university administrations–each has an incentive to seek out what appears to be exciting new research that draws money, status, and power. There are few if any incentives to double check their work. Above all, they have little incentive to reproduce the research, but only to check that the exciting claim holds up, because if it does not, they will lose money status and prestige.
The scientific world’s incentives for new findings rather than reproducible studies, drastically affects what becomes submitted for publication. Scientists who try to build their careers on checking old findings, or publishing negative results are unlikely to achieve professional success. The result is that scientists do not submit negative results for publication. Some negative results go to the file drawer. Others somehow turn into positive results as researchers consciously or unconsciously massage their data and their analyses. (As a science modeler we call this “tuning,” a technical word for cheating). Neither do they perform or publish many replication studies, since the scientific world’s incentives do not reward those activities either.
The concept of statistical significance is being tortured to the point that literally hundreds if not thousands of useless papers claiming that significance, appear everywhere.
Researchers try to determine whether the relationships they study differ from what can be explained by chance alone by gathering data and applying hypothesis tests, also called tests of statistical significance. Most commonly they start by testing the chance that there is no actual relationship between two variables which is called the “null hypothesis”. If that fails and it is likely that there is relationship, they go on to other hypothesis. How well the data supports a “null hypothesis” (no relationship) is a statistic called a p-value. If the p-value is less that 5% or .05 it is assumed there may be a relationship between the variables being studied.
The governments central role in science, both in funding scientific research and in using scientific research to justify regulation, adds tremendously to the growth of flimsy statistical significance throughout the academic world. Within a generation, statistical significance went from a useful shorthand—that agricultural and industrial researchers used to judge whether to continue their current procedures or switch to something new—to a prerequisite for regulation, government grants, tenure and every other form of scientific prestige and also essential for publication.
Many more scientists use a variety of statistical practices with more or less culpable carelessness including:
- improper statistical methodology
- biased data manipulation that produces desired results
- selecting only measures that produce statistical significance and ignoring any that do not
- using illegitimate manipulations of research techniques
Still others run statistical analyses until they find a statistically significant result and publish the one result. This is called “p-hacking”. Far too many researchers report their methods unclearly and let the uninformed reader assume they actually followed a rigorous scientific process.
The most insidious of all scientific cheating is p-HARKING. That is when a scientist chooses a hypothesis only after collecting all the data that produces a desired result. A more obvious word for it is CHEATING. Irreproducible research hypotheses produced by HARKING sends whole disciplines chasing down rabbit holes.
Publication bias and p-harking collectively have degraded scientific research as a whole. In addition, for decades surveys show that researchers are unlikely too publish any negative results their studies uncover.
A false research claim can become the foundation for an entire body of literature that is uniformly false and yet becomes an established truth. We cannot tell exactly which pieces of research have been affected by these errors until scientists replicate every piece of published research. Yet we do possess sophisticated statistical strategies that does allow us to diagnose specific claims that support government regulation. One such method–an acid test for statistical skullduggery, is p-value plotting described in detail in the the National Association of Scholars handbook, SHIFTING SANDS. A brief paper back I cannot recommend too strongly.
Note: Note: Portions of this essay were excerpted from the book Shifting Sands with permission of the National Association of Scholars (NAS) and its authors Peter Wood, Stanley Young, Warren Kindzierski, and David Randall.