摘要

Jager and Leek have tried to estimate a false-discovery rate (FDR) in abstracts of articles published in five medical journals during 2000-2010. Their approach is flawed in sampling, calculations, and conclusions. It uses a tiny portion of select papers in highly select journals. Randomized controlled trials and systematic reviews (designs with the lowest anticipated false-positive rates) are 52% of the analyzed papers, while these designs account for only 4% in PubMed in the same period. The FDR calculations consider the entire published literature as equivalent to a single genomic experiment where all performed analyses are reported without selection or distortion. However, the data used are the P-values reported in the abstracts of published papers; these P-values are a highly distorted, highly select sample. Besides selective reporting biases, all other biases, in particular confounding in observational studies, are also ignored, while these are often the main drivers for high false-positive rates in the biomedical literature. A reproducibility check of the raw data shows that much of the data Jager and Leek used are either wrong or make no sense: most of the usable data were missed by their script, 94% of the abstracts that reported >= 2 P-values had high correlation/overlap between reported outcomes, and only a minority of P-values corresponded to relevant primary outcomes. The Jager and Leek paper exemplifies the dreadful combination of using automated scripts with wrong methods and unreliable data. Sadly, this combination is common in the medical literature.

  • 出版日期2014-1

全文