Reports of clinical trials often contain a wealth of data comparing treatments. This can lead to problems in interpretation, particularly when significance testing is used extensively. We examined 45 reports of comparative trials published in the British Medical Journal, the Lancet, or the New England Journal of Medicine to illustrate these statistical problems. The issues we considered included the analysis of multiple end points, the analysis of repeated measurements over time, subgroup analyses, trials of multiple treatments, and the overall number of significance tests in a trial report. Interpretation of large amounts of data is complicated by the common failure to specify in advance the intended size of a trial or statistical stopping rules for interim analyses. In addition, summaries or abstracts of trials tend to emphasize the more statistically significant end points. Overall, the reporting of clinical trials appears to be biased toward an exaggeration of treatment differences. Trials should have a clearer predefined policy for data analysis and reporting. In particular, a limited number of primary treatment comparisons should be specified in advance. The overuse of arbitrary significance levels (for example, P less than 0.05) is detrimental to good scientific reporting, and more emphasis should be given to the magnitude of treatment differences and to estimation methods such as confidence intervals.
This article has been cited by other articles:
Bassler, D., Busse, J. W, Karanicolas, P. J, Guyatt, G. H
(2008). Evidence-based medicine targets the individual patient, part 2: guides and tools for individual decision-making. Evid. Based Med.
13: 130-131
[Full Text]
Hopewell, S., Eisinga, A., Clarke, M.
(2008). Better reporting of randomized trials in biomedical journal and conference abstracts. Journal of Information Science
34: 162-173
[Abstract]
Bibbins-Domingo, K., Fernandez, A.
(2007). BiDil for Heart Failure in Black Patients: Implications of the U.S. Food and Drug Administration Approval. ANN INTERN MED
146: 52-56
[Abstract][Full Text]
Novack, L, Jotkowitz, A, Knyazer, B, Novack, V
(2006). Evidence-based medicine: assessment of knowledge of basic epidemiological and research methods among medical doctors. Postgrad. Med. J.
82: 817-822
[Abstract][Full Text]
Gotzsche, P. C
(2006). Believability of relative risks and odds ratios in abstracts: cross sectional study. BMJ
333: 231-234
[Abstract][Full Text]
Strange, V., Allen, E., Oakley, A., Bonell, C., Johnson, A., Stephenson, J., The Ripple Study Team,
(2006). Integrating Process with Outcome Data in a Randomized Controlled Trial of Sex Education. Evaluation
12: 330-352
[Abstract]
Bittl, J. A.
(2006). The Future of an Illusion. J Am Coll Cardiol
47: 2380-2383
[Full Text]
Blance, A., Tu, Y.-K., Gilthorpe, M. S
(2005). A multilevel modelling solution to mathematical coupling. Stat Methods Med Res
14: 553-565
[Abstract]
Tuech, J J, Pessaux, P, Moutel, G, Thoma, V, Schraub, S, Herve, C
(2005). Methodological quality and reporting of ethical requirements in phase III cancer trials. J. Med. Ethics
31: 251-255
[Abstract][Full Text]
King, T. E. Jr, Safrin, S., Starko, K. M., Brown, K. K., Noble, P. W., Raghu, G., Schwartz, D. A.
(2005). Analyses of Efficacy End Points in a Controlled Trial of Interferon-{gamma}1b for Idiopathic Pulmonary Fibrosis. Chest
127: 171-177
[Abstract][Full Text]
Chan, A.-W., Hrobjartsson, A., Haahr, M. T., Gotzsche, P. C., Altman, D. G.
(2004). Empirical Evidence for Selective Reporting of Outcomes in Randomized Trials: Comparison of Protocols to Published Articles. JAMA
291: 2457-2465
[Abstract][Full Text]
Als-Nielsen, B., Chen, W., Gluud, C., Kjaergard, L. L.
(2003). Association of Funding and Conclusions in Randomized Drug Trials: A Reflection of Treatment Effect or Adverse Events?. JAMA
290: 921-928
[Abstract][Full Text]
Kjaergard, L. L., Liu, J., Als-Nielsen, B., Gluud, C.
(2003). Artificial and Bioartificial Support Systems for Acute and Acute-on-Chronic Liver Failure: A Systematic Review. JAMA
289: 217-222
[Abstract][Full Text]
Rathore, S. S., Wang, Y., Krumholz, H. M.
(2002). Sex-Based Differences in the Effect of Digoxin for the Treatment of Heart Failure. NEJM
347: 1403-1411
[Abstract][Full Text]
Badrinath, P, Wakeman, A. P, Wakeman, J. G, Yudkin, J. S, Parmar, M. S, Twisselmann, B.
(2002). Preventing stroke with ramipril. BMJ
325: 439-439
[Full Text]
Halpern, S. D., Karlawish, J. H. T., Berlin, J. A.
(2002). The Continuing Unethical Conduct of Underpowered Clinical Trials. JAMA
288: 358-362
[Abstract][Full Text]
Altman, D. G., Goodman, S. N., Schroter, S.
(2002). How Statistical Expertise Is Used in Medical Research. JAMA
287: 2817-2820
[Abstract][Full Text]
Killiany, R. J., Hyman, B. T., Gomez-Isla, T., Moss, M. B., Kikinis, R., Jolesz, F., Tanzi, R., Jones, K., Albert, M. S.
(2002). MRI measures of entorhinal cortex vs hippocampus in preclinical AD. Neurology
58: 1188-1196
[Abstract][Full Text]
Harvey, B. J.
(2002). Reporting the clinical importance of randomized controlled trials. CMAJ
166: 712-712
[Full Text]
Chan, K. B.Y., Man-Son-Hing, M., Molnar, F. J., Laupacis, A.
(2001). How well is the clinical importance of study results reported? An assessment of randomized controlled trials. CMAJ
165: 1197-1202
[Abstract][Full Text]
Ruiz-Canela, M., de Irala-Estevez, J., Martinez-Gonzalez, M. A., Gomez-Gracia, E., Fernandez-Crehuet, J.
(2001). Methodological quality and reporting of ethical requirements in clinical trials. J. Med. Ethics
27: 172-176
[Abstract][Full Text]
Ioannidis, J. P. A., Lau, J.
(2001). Completeness of Safety Reporting in Randomized Trials: An Evaluation of 7 Medical Areas. JAMA
285: 437-443
[Abstract][Full Text]
Sim, I., Owens, D. K., Lavori, P. W., Rennels, G. D.
(2000). Electronic Trial Banks: A Complementary Method for Reporting Randomized Trials. Med Decis Making
20: 440-450
[Abstract]
Arrivé, L., Renard, R., Carrat, F., Belkacem, A., Dahan, H., Le Hir, P., Monnier-Cholley, L., Tubiana, J.-M.
(2000). A Scale of Methodological Quality for Clinical Studies of Radiologic Examinations. Radiology
217: 69-74
[Abstract][Full Text]
Curran-Everett, D.
(2000). Multiple comparisons: philosophies and illustrations. Am. J. Physiol. Regul. Integr. Comp. Physiol.
279: R1-R8
[Abstract][Full Text]
Rushton, L.
(2000). Reporting of occupational and environmental research: use and misuse of statistical and epidemiological methods. Occup. Environ. Med.
57: 1-9
[Abstract][Full Text]
Rigby, A. S.
(1999). Getting past the statistical referee: moving away from P-values and towards interval estimation. Health Educ Res
14: 713-715
[Full Text]
Christensen, M. L., Helms, R. A., Chesney, R. W.
(1999). Is Pediatric Labeling Really Necessary?. Pediatrics
104: 593-597
[Full Text]
Goodman, S. N.
(1999). Toward Evidence-Based Medical Statistics. 1: The P Value Fallacy. ANN INTERN MED
130: 995-1004
[Abstract][Full Text]
Hawkins, B. S.
(1999). The CONSORT Statement: Will It Lead to Improved Reporting of Clinical Trials in Ophthalmology?. Arch Ophthalmol
117: 677-680
[Full Text]
Thompson, B.
(1999). If Statistical Significance Tests are Broken/Misused, What Practices Should Supplement or Replace Them?. Theory Psychology
9: 165-181
[Abstract]
Brien, P., Brien, M., Davis, B. A., Gray, B. J., Meade, W. C., Pretorius, M., Winstanley, H. W., Johnson, J. P., Weinstein, J. N.
(1999). Clinical Crossroads: A 45-Year-Old Man With Low Back Pain. JAMA
281: 893-895
[Full Text]
Kemp, J. P., Minkwitz, M. C., Bonuccelli, C. M., Warren, M. S.
(1999). Therapeutic Effect of Zafirlukast as Monotherapy in Steroid-Naive Patients With Severe Persistent Asthma. Chest
115: 336-342
[Abstract][Full Text]
Bath, F. J., Owen, V. E., Bath, P. M. W.
(1998). Quality of Full and Final Publications Reporting Acute Stroke Trials : A Systematic Review. Stroke
29: 2203-2210
[Abstract][Full Text]
Junker, C. A.
(1998). Adherence to Published Standards of Reporting: A Comparison of Placebo-Controlled Trials Published in English or German. JAMA
280: 247-249
[Abstract][Full Text]
Moher, D.
(1998). CONSORT: An Evolving Tool to Help Improve the Quality of Reports of Randomized Controlled Trials. JAMA
279: 1489-1491
[Full Text]
Yang, C.-H. J., Cheng, A.-L., Decensi, A., Costa, A., Muto, Y., Moriwaki, H., Okuno, M.
(1996). Polyprenoic Acid in Hepatocellular Carcinoma. NEJM
335: 1460-1462
[Full Text]
Altman, D. G
(1996). Better reporting of randomised controlled trials: the CONSORT statement. BMJ
313: 570-571
[Full Text]
Vickers, A.
(1995). Trial puts negative gloss on essentially positive results. BMJ
311: 511-511
[Full Text]
Altman, D G
(1994). The scandal of poor medical research. BMJ
308: 283-284
[Full Text]
Ottenbacher, K. J.
(1993). The Interpretation of Averages in Health Professions Research: An Empirical Examination. Eval Health Prof
16: 333-341
[Abstract]
Cleophas, T.J.M.
(1993). Interaction in Cardiovascular Crossover Studies: The Standard and the Clinical Analysis. ANGIOLOGY
44: 271-277
[Abstract]
Reynolds, E.H., Heller, A.J., Chadwick, D., Johnson, T., Hernandez-Vidal, A., Monfort, J.-C., Mattson, R. H., Collins, J. F., Cramer, J. A.
(1993). Valproate versus Carbamazepine for Seizures. NEJM
328: 207-209
[Full Text]
Baer, L., Jenike, M. A., Mitchell, W. D., Filley, C. M., Thompson, T. L.
(1991). Correspondence. J Geriatr Psychiatry Neurol
4: 122-123
Cleophas, T. J. M.
(1990). Underestimation of Treatment Effect in Crossover Trials. ANGIOLOGY
41: 673-680
[Abstract]
Cleophas, T. J.
(1989). Testing Crossover Studies for Carryover Effects. ANGIOLOGY
40: 287-293
[Abstract]
Killen, J. D., Robinson, T. N.
(1988). Chapter 4: School-Based Research on Health Behavior Change: The Stanford Adolescent Heart Health Program as a Model for Cardiovascular Disease Risk Reduction. REVIEW OF RESEARCH IN EDUCATION
15: 171-200