The frequent insignificance of a “significant” <i>p</i>‐value

David C. McGiffin; Geoff Cumming; Paul S. Myles

doi:10.1111/jocs.15960

REVIEW
The frequent insignificance of a “significant” p‐value

David C. McGiffin, Geoff Cumming, Paul S. Myles

Source

Journal of Cardiac Surgery > 36 > 11 > 4322 - 4331

Abstract

Null hypothesis significance testing (NHST) and p‐values are widespread in the cardiac surgical literature but are frequently misunderstood and misused. The purpose of the review is to discuss major disadvantages of p‐values and suggest alternatives. We describe diagnostic tests, the prosecutor's fallacy in the courtroom, and NHST, which involve inter‐related conditional probabilities, to help clarify the meaning of p‐values, and discuss the enormous sampling variability, or unreliability, of p‐values. Finally, we use a cardiac surgical database and simulations to explore further issues involving p‐values. In clinical studies, p‐values provide a poor summary of the observed treatment effect, whereas the three‐number summary provided by effect estimates and confidence intervals is more informative and minimizes over‐interpretation of a “significant” result. p‐values are an unreliable measure of the strength of evidence; if used at all they give only, at best, a very rough guide to decision making. Researchers should adopt Open Science practices to improve the trustworthiness of research and, where possible, use estimation (three‐number summaries) or other better techniques.