To determine why many diagnostic tests have proved to be valueless after optimistic introduction into medical practice, we reviewed a series of investigations and identified two major problems that can cause erroneous statistical results for the "sensitivity" and "specificity" indexes of diagnostic efficacy. Unless an appropriately broad spectrum is chosen for the diseased and nondiseased patients who comprise the study population, the diagnostic test may receive falsely high values for its "rule-in" and "rule-out" performances. Unless the interpretation of the test and the establishment of the true diagnosis are done independently, bias may falsely elevate the test's efficacy. Avoidance of these problems might have prevented the early optimism and subsequent disillusionment with the diagnostic value of two selected examples: the carcinoembryonic antigen and nitro-blue tetrazolium tests.
This article has been cited by other articles:
Bonauto, D. K., Silverstein, B. A., Fan, Z. J., Smith, C. K., Wilcox, D. N.
(2008). Evaluation of a symptom diagram for identifying carpal tunnel syndrome. Occup Med (Lond)
58: 561-566
[Abstract][Full Text]
Thompson, I. M., Tangen, C. M., Kristal, A. R.
(2008). Prostate-Specific Antigen: A Misused and Maligned Prostate Cancer Biomarker. JNCI J Natl Cancer Inst
100: 1487-1488
[Full Text]
Kazlauskaite, R., Evans, A. T., Villabona, C. V., Abdu, T. A. M., Ambrosi, B., Atkinson, A. B., Choi, C. H., Clayton, R. N., Courtney, C. H., Gonc, E. N., Maghnie, M., Rose, S. R., Soule, S. G., Tordjman, K., Consortium for Evaluation of Corticotropin Test in,
(2008). Corticotropin Tests for Hypothalamic-Pituitary- Adrenal Insufficiency: A Metaanalysis. J. Clin. Endocrinol. Metab.
93: 4245-4253
[Abstract][Full Text]
Jelinek, M.
(2008). Spectrum bias: why generalists and specialists do not connect. Evid. Based Med.
13: 132-133
[Full Text]
Willis, B. H
(2008). Spectrum bias--why clinicians need to be cautious when applying diagnostic test studies. Fam Pract
25: 390-396
[Abstract][Full Text]
Lambert, J., Halfon, P., Penaranda, G., Bedossa, P., Cacoub, P., Carrat, F.
(2008). How to Measure the Diagnostic Accuracy of Noninvasive Liver Fibrosis Indices: The Area Under the ROC Curve Revisited. Clin. Chem.
54: 1372-1378
[Abstract][Full Text]
(2008). References. JOURNAL OF THE ICRU
8: 57-62
Toll, D., Oudega, R, Vergouwe, Y, Moons, K., Hoes, A.
(2008). A new diagnostic rule for deep vein thrombosis: safety and efficiency in clinically relevant subgroups. Fam Pract
25: 3-8
[Abstract][Full Text]
Baughman, A. L., Bisgard, K. M., Cortese, M. M., Thompson, W. W., Sanden, G. N., Strebel, P. M.
(2008). Utility of Composite Reference Standards and Latent Class Analysis in Evaluating the Clinical Accuracy of Diagnostic Tests for Pertussis. CVI
15: 106-114
[Abstract][Full Text]
Kunst, H., Khan, K. S.
(2007). New Tests for the Diagnosis of Latent Tuberculosis Infection. ANN INTERN MED
147: 672-673
[Full Text]
deFilippi, C. R., Seliger, S. L., Maynard, S., Christenson, R. H.
(2007). Impact of Renal Disease on Natriuretic Peptide Testing for Diagnosing Decompensated Heart Failure and Predicting Mortality. Clin. Chem.
53: 1511-1519
[Abstract][Full Text]
Nishimura, K., Sugiyama, D., Kogata, Y., Tsuji, G., Nakazawa, T., Kawano, S., Saigo, K., Morinobu, A., Koshiba, M., Kuntz, K. M., Kamae, I., Kumagai, S.
(2007). Meta-analysis: Diagnostic Accuracy of Anti-Cyclic Citrullinated Peptide Antibody and Rheumatoid Factor for Rheumatoid Arthritis. ANN INTERN MED
146: 797-808
[Abstract][Full Text]
Al-Okaili, R. N., Krejza, J., Woo, J. H., Wolf, R. L., O'Rourke, D. M., Judy, K. D., Poptani, H., Melhem, E. R.
(2007). Intraaxial Brain Masses: MR Imaging-based Diagnostic Strategy--Initial Experience. Radiology
243: 539-550
[Abstract][Full Text]
Lauer, M. S., Murthy, S. C., Blackstone, E. H., Okereke, I. C., Rice, T. W.
(2007). [18F]Fluorodeoxyglucose Uptake by Positron Emission Tomography for Diagnosis of Suspected Lung Cancer: Impact of Verification Bias. Arch Intern Med
167: 161-165
[Abstract][Full Text]
Medeiros, F. A., Ng, D., Zangwill, L. M., Sample, P. A., Bowd, C., Weinreb, R. N.
(2007). The Effects of Study Design and Spectrum Bias on the Evaluation of Diagnostic Accuracy of Confocal Scanning Laser Ophthalmoscopy in Glaucoma. IOVS
48: 214-222
[Abstract][Full Text]
Reisinger, J., Hollinger, K., Lang, W., Steiner, C., Winter, T., Zeindlhofer, E., Mori, M., Schiller, A., Lindorfer, A., Wiesinger, K., Siostrzonek, P.
(2007). Prediction of neurological outcome after cardiopulmonary resuscitation by serial determination of serum neuron-specific enolase. Eur Heart J
28: 52-58
[Abstract][Full Text]
Moore, D. A.J., Evans, C. A.W., Gilman, R. H., Caviedes, L., Coronel, J., Vivar, A., Sanchez, E., Pinedo, Y., Saravia, J. C., Salazar, C., Oberhelman, R., Hollm-Delgado, M.-G., LaChira, D., Escombe, A. R., Friedland, J. S.
(2006). Microscopic-Observation Drug-Susceptibility Assay for the Diagnosis of TB.. NEJM
355: 1539-1550
[Abstract][Full Text]
Okada, H., Shirakawa, T., Gotoh, A., Kamiyama, Y., Muto, S., Ide, H., Hamaguchi, Y., Horie, S.
(2006). Enumeration of bacterial cell numbers and detection of significant bacteriuria by use of a new flow cytometry-based device.. J. Clin. Microbiol.
44: 3596-3599
[Abstract][Full Text]
Doria, A. S., Moineddin, R., Kellenberger, C. J., Epelman, M., Beyene, J., Schuh, S., Babyn, P. S., Dick, P. T.
(2006). US or CT for Diagnosis of Appendicitis in Children and Adults? A Meta-Analysis. Radiology
241: 83-94
[Abstract][Full Text]
Gaeta, M., Minutoli, F., Vinci, S., Salamone, I., D'Andrea, L., Bitto, L., Magaudda, L., Blandino, A.
(2006). High-resolution CT grading of tibial stress reactions in distance runners.. Am. J. Roentgenol.
187: 789-793
[Abstract][Full Text]
Jeyanathan, M., Alexander, D. C., Turenne, C. Y., Girard, C., Behr, M. A.
(2006). Evaluation of In Situ Methods Used To Detect Mycobacterium avium subsp. paratuberculosis in Samples from Patients with Crohn's Disease.. J. Clin. Microbiol.
44: 2942-2950
[Abstract][Full Text]
Klem, I., Heitner, J. F., Shah, D. J., Sketch, M. H. Jr, Behar, V., Weinsaft, J., Cawley, P., Parker, M., Elliott, M., Judd, R. M., Kim, R. J.
(2006). Improved Detection of Coronary Artery Disease by Stress Perfusion Cardiovascular Magnetic Resonance With the Use of Delayed Enhancement Infarction Imaging. J Am Coll Cardiol
47: 1630-1638
[Abstract][Full Text]
Medeiros, F. A., Zangwill, L. M., Bowd, C., Sample, P. A., Weinreb, R. N.
(2006). Influence of disease severity and optic disc size on the diagnostic performance of imaging instruments in glaucoma.. IOVS
47: 1008-1015
[Abstract][Full Text]
Sica, G. T.
(2006). Bias in Research Studies. Radiology
238: 780-789
[Abstract][Full Text]
Kymes, S. M, Lee, K., Fletcher, J. W, SNAP (CSP 027) Study Group,
(2006). Assessing diagnostic accuracy and the clinical value of positron emission tomography imaging in patients with solitary pulmonary nodules (SNAP). Clin Trials
3: 31-42
[Abstract]
Baker, S. G, Kramer, B. S, McIntosh, M., Patterson, B. H, Shyr, Y., Skates, S.
(2006). Evaluating markers for the early detection of cancer: overview of study designs and methods. Clin Trials
3: 43-56
[Abstract]
Beam, C. A., Conant, E. F., Sickles, E. A.
(2006). Correlation of Radiologist Rank as a Measure of Skill in Screening and Diagnostic Interpretation of Mammograms. Radiology
238: 446-453
[Abstract][Full Text]
Clouse, M. E., Chen, J., Krumholz, H. M., Clouse, M. E., Chen, J., Krumholz, H. M.
(2006). Noninvasive Screening for Coronary Artery Disease With Computed Tomography Is Useful. Circulation
113: 125-146
[Full Text]
Rutten, F. H, Moons, K. G M, Cramer, M.-J. M, Grobbee, D. E, Zuithoff, N. P A, Lammers, J.-W. J, Hoes, A. W
(2005). Recognising heart failure in elderly patients with stable chronic obstructive pulmonary disease in primary care: cross sectional diagnostic study. BMJ
331: 1379-
[Abstract][Full Text]
Shojania, K G, Burton, E C, McDonald, K M, Goldman, L
(2005). Overestimation of clinical diagnostic performance caused by low necropsy rates. Qual Saf Health Care
14: 408-413
[Abstract][Full Text]
Krejza, J., Kochanowicz, J., Mariak, Z., Lewko, J., Melhem, E. R.
(2005). Middle Cerebral Artery Spasm after Subarachnoid Hemorrhage: Detection with Transcranial Color-coded Duplex US. Radiology
236: 621-629
[Abstract][Full Text]
Rutjes, A. W.S., Reitsma, J. B., Vandenbroucke, J. P., Glas, A. S., Bossuyt, P. M.M.
(2005). Case-Control and Two-Gate Designs in Diagnostic Accuracy Studies. Clin. Chem.
51: 1335-1341
[Abstract][Full Text]
Oudega, R., Hoes, A. W., Moons, K. G.M.
(2005). The Wells Rule Does Not Adequately Rule Out Deep Venous Thrombosis in Primary Care Patients. ANN INTERN MED
143: 100-107
[Abstract][Full Text]
Kemper, A. R., Keating, L. M., Jackson, J. L., Levin, E. M.
(2005). Comparison of Monocular Autorefraction to Comprehensive Eye Examinations in Preschool-aged and Younger Children. Arch Pediatr Adolesc Med
159: 435-439
[Abstract][Full Text]
Gaeta, M., Minutoli, F., Scribano, E., Ascenti, G., Vinci, S., Bruschetta, D., Magaudda, L., Blandino, A.
(2005). CT and MR Imaging Findings in Athletes with Early Tibial Stress Injuries: Comparison with Bone Scintigraphy Findings and Emphasis on Cortical Abnormalities. Radiology
235: 553-561
[Abstract][Full Text]
Siddiqui, M A R, Azuara-Blanco, A, Burr, J
(2005). The quality of reporting of diagnostic accuracy studies published in ophthalmic journals. Br. J. Ophthalmol.
89: 261-265
[Abstract][Full Text]
Rathmann, W., Martin, S., Haastert, B., Icks, A., Holle, R., Lowel, H., Giani, G., for the KORA Study Group,
(2005). Performance of Screening Questionnaires and Risk Scores for Undiagnosed Diabetes: The KORA Survey 2000. Arch Intern Med
165: 436-441
[Abstract][Full Text]
Ransohoff, D. F.
(2005). Lessons from Controversy: Ovarian Cancer Screening and Serum Proteomics. JNCI J Natl Cancer Inst
97: 315-319
[Abstract][Full Text]
Weinstein, S., Obuchowski, N. A., Lieber, M. L.
(2005). Clinical Evaluation of Diagnostic Tests. Am. J. Roentgenol.
184: 14-19
[Full Text]
Eng, J., Krishnan, J. A., Segal, J. B., Bolger, D. T., Tamariz, L. J., Streiff, M. B., Jenckes, M. W., Bass, E. B.
(2004). Accuracy of CT in the Diagnosis of Pulmonary Embolism: A Systematic Literature Review. Am. J. Roentgenol.
183: 1819-1827
[Abstract][Full Text]
Hansen, A. J., Young, S. W., De Petris, G., Tessier, D. J., Hernandez, J. L., Johnson, D. J.
(2004). Histologic Severity of Appendicitis Can Be Predicted by Computed Tomography. Arch Surg
139: 1304-1308
[Abstract][Full Text]
Winkler, R. L., Smith, J. E.
(2004). On Uncertainty in Medical Testing. Med Decis Making
24: 654-658
[Abstract]
Carrier, J., Stewart, S., Godden, S., Fetrow, J., Rapnicki, P.
(2004). Evaluation and Use of Three Cowside Tests for Detection of Subclinical Ketosis in Early Postpartum Cows. J DAIRY SCI
87: 3725-3735
[Abstract][Full Text]
Terasawa, T., Blackmore, C. C., Bent, S., Kohlwes, R. J.
(2004). Systematic Review: Computed Tomography and Ultrasonography To Detect Acute Appendicitis in Adults and Adolescents. ANN INTERN MED
141: 537-546
[Abstract][Full Text]
Cabana, M. D., Slish, K. K., Nan, B., Clark, N. M.
(2004). Limits of the HEDIS Criteria in Determining Asthma Severity for Children. Pediatrics
114: 1049-1055
[Abstract][Full Text]
Blackstone, E. H., Lauer, M. S.
(2004). Caveat emptor: The treachery of work-up bias. J. Thorac. Cardiovasc. Surg.
128: 341-344
[Full Text]
Rodeheffer, R. J.
(2004). Measuring plasma B-type natriuretic peptide in heart failure: Good to go in 2004?. J Am Coll Cardiol
44: 740-749
[Abstract][Full Text]
Delgado-Rodriguez, M, Llorca, J
(2004). Bias. J. Epidemiol. Community Health
58: 635-641
[Abstract][Full Text]
Swarr, D., Keren, R.
(2004). Comparison of Alternative Diagnostic Approaches for Managing Appendicitis in Children: The Effect of Disease Prevalence and Spectrum. Pediatrics
114: 513-514
[Full Text]
Hall, M. C., Kieke, B., Gonzales, R., Belongia, E. A.
(2004). Spectrum Bias of a Rapid Antigen Detection Test for Group A {beta}-Hemolytic Streptococcal Pharyngitis in a Pediatric Population. Pediatrics
114: 182-186
[Abstract][Full Text]
Heim, S. W., Schectman, J. M., Siadaty, M. S., Philbrick, J. T.
(2004). D-Dimer Testing for Deep Venous Thrombosis: A Metaanalysis. Clin. Chem.
50: 1136-1147
[Abstract][Full Text]
Stevens, S. M., Elliott, C. G., Chan, K. J., Egger, M. J., Ahmed, K. M.
(2004). Withholding Anticoagulation after a Negative Result on Duplex Ultrasonography for Suspected Symptomatic Deep Venous Thrombosis. ANN INTERN MED
140: 985-991
[Abstract][Full Text]
Ragozzino, M. W., Brancatelli, G., Vilgrain, V., Federle, M. P., Uzan, F., Zappa, M., Menu, Y.
(2004). Biases Likely Invalidate the Conclusions [letter] * Dr Brancatelli and colleagues respond:. Radiology
231: 926-927
[Full Text]
Patel, U. D., Hollander, H., Saint, S.
(2004). Index of Suspicion. NEJM
350: 1990-1995
[Full Text]
Moayyedi, P, Duffy, J, Delaney, B
(2004). New approaches to enhance the accuracy of the diagnosis of reflux disease. Gut
53: iv55-iv57
[Abstract][Full Text]
Obuchowski, N. A.
(2004). How Many Observers Are Needed in Clinical Studies of Medical Imaging?. Am. J. Roentgenol.
182: 867-869
[Full Text]
Moons, K. G.M., Biesheuvel, C. J., Grobbee, D. E.
(2004). Test Research versus Diagnostic Research. Clin. Chem.
50: 473-476
[Full Text]
Whiting, P., Rutjes, A. W.S., Reitsma, J. B., Glas, A. S., Bossuyt, P. M.M., Kleijnen, J.
(2004). Sources of Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. ANN INTERN MED
140: 189-202
[Abstract][Full Text]
Deyo, R. A., Jarvik, J. J.
(2003). New Diagnostic Tests: Breakthrough Approaches or Expensive Add-ons?. ANN INTERN MED
139: 950-951
[Full Text]
Obuchowski, N. A.
(2003). Special Topics III: Bias. Radiology
229: 617-621
[Abstract][Full Text]
Borrill, Z, Houghton, C, Sullivan, P J, Sestini, P
(2003). Retrospective analysis of evidence base for tests used in diagnosis and monitoring of disease in respiratory medicine. BMJ
327: 1136-1138
[Abstract][Full Text]
Henderson, A. R.
(2003). A Triptych of Statistics. Clin. Chem.
49: 1959-1962
[Full Text]
Romagnuolo, J., Bardou, M., Rahme, E., Joseph, L., Reinhold, C., Barkun, A. N.
(2003). Magnetic Resonance Cholangiopancreatography: A Meta-Analysis of Test Performance in Suspected Biliary Disease. ANN INTERN MED
139: 547-557
[Abstract][Full Text]
Rothman, R., Owens, T., Simel, D. L.
(2003). Does This Child Have Acute Otitis Media?. JAMA
290: 1633-1640
[Abstract][Full Text]
Schrecengost, J. E., LeGallo, R. D., Boyd, J. C., Moons, K. G.M., Gonias, S. L., Rose, C. E. Jr, Bruns, D. E.
(2003). Comparison of Diagnostic Accuracies in Outpatients and Hospitalized Patients of D-Dimer Testing for the Evaluation of Suspected Pulmonary Embolism. Clin. Chem.
49: 1483-1490
[Abstract][Full Text]
Barry, J-C, Konig, H-H
(2003). Test characteristics of orthoptic screening examination in 3 year old kindergarten children. Br. J. Ophthalmol.
87: 909-916
[Abstract][Full Text]
Bhandari, M., Montori, V. M., Swiontkowski, M. F., Guyatt, G. H.
(2003). User's Guide to the Surgical Literature: How to Use an Article About a Diagnostic Test. JBJS
85: 1133-1140
[Full Text]
Stein, P. D., Hull, R. D., Ghali, W. A., Patel, K. C., Olson, R. E., Meyers, F. A., Kalra, N. K.
(2003). Tracking the Uptake of Evidence: Two Decades of Hospital Practice Trends for Diagnosing Deep Vein Thrombosis and Pulmonary Embolism. Arch Intern Med
163: 1213-1219
[Abstract][Full Text]
Battaglia, M., Pewsner, D.
(2003). Commentary: Black and white or shades of grey?. Int J Epidemiol
32: 314-315
[Full Text]
Nodera, H., Herrmann, D.N., Holloway, R.G., Logigian, E.L.
(2003). A Bayesian argument against rigid cut-offs in electrodiagnosis of median neuropathy at the wrist. Neurology
60: 458-464
[Abstract][Full Text]
Raptopoulos, V., Katsou, G., Rosen, M. P., Siewert, B., Goldberg, S. N., Kruskal, J. B.
(2003). Acute Appendicitis: Effect of Increased Use of CT on Selecting Patients Earlier. Radiology
226: 521-526
[Abstract][Full Text]
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G.
(2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. ANN INTERN MED
138: W1-W12
[Abstract][Full Text]
Trowbridge, R. L., Rutkowski, N. K., Shojania, K. G.
(2003). Does This Patient Have Acute Cholecystitis?. JAMA
289: 80-86
[Abstract][Full Text]
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G.
(2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. Clin. Chem.
49: 7-18
[Abstract][Full Text]
Martin, G. S., Ely, E. W., Carroll, F. E., Bernard, G. R.
(2002). Findings on the Portable Chest Radiograph Correlate With Fluid Balance in Critically Ill Patients. Chest
122: 2087-2095
[Abstract][Full Text]
Summerton, N.
(2002). Symptoms of possible oncological significance: separating the wheat from the chaff. BMJ
325: 1254-1255
[Full Text]
Collins, M. T.
(2002). Interpretation of a Commercial Bovine Paratuberculosis Enzyme-Linked Immunosorbent Assay by Using Likelihood Ratios. CVI
9: 1367-1371
[Abstract][Full Text]
McFarland, E. G., Pilgram, T. K., Brink, J. A., McDermott, R. A., Santillan, C. V., Brady, P. W., Heiken, J. P., Balfe, D. M., Weinstock, L. B., Thyssen, E. P., Littenberg, B.
(2002). CT Colonography: Multiobserver Diagnostic Performance. Radiology
225: 380-390
[Abstract][Full Text]
Clark, T. J., Voit, D., Gupta, J. K., Hyde, C., Song, F., Khan, K. S.
(2002). Accuracy of Hysteroscopy in the Diagnosis of Endometrial Cancer and Hyperplasia: A Systematic Quantitative Review. JAMA
288: 1610-1621
[Abstract][Full Text]
Salerno, D F, Copley-Merriman, C, Taylor, T N, Shinogle, J, Schulz, R M
(2002). A review of functional status measures for workers with upper extremity disorders. Occup. Environ. Med.
59: 664-670
[Abstract][Full Text]
Eng, J.
(2002). Predicting the Presence of Acute Pulmonary Embolism: A Comparative Analysis of the Artificial Neural Network, Logistic Regression, and Threshold Models. Am. J. Roentgenol.
179: 869-874
[Abstract][Full Text]
Mulherin, S. A., Miller, W. C.
(2002). Spectrum Bias or Spectrum Effect? Subgroup Variation in Diagnostic Test Evaluation. ANN INTERN MED
137: 598-602
[Abstract][Full Text]
Huot, S. J., Hansson, J. H., Dey, H., Concato, J.
(2002). Utility of Captopril Renal Scans for Detecting Renal Artery Stenosis. Arch Intern Med
162: 1981-1984
[Abstract][Full Text]
Kinkel, K., Lu, Y., Both, M., Warren, R. S., Thoeni, R. F.
(2002). Detection of Hepatic Metastases from Cancers of the Gastrointestinal Tract by Using Noninvasive Imaging Methods (US, CT, MR Imaging, PET): A Meta-Analysis. Radiology
224: 748-756
[Abstract][Full Text]
Richardson, W S., Wilson, M. C
(2002). Textbook descriptions of disease -- where's the beef?. Evid. Based Med.
7: 100-102
[Full Text]
Ransohoff, D. F., Traverso, G., Kinzler, K. W., Vogelstein, B.
(2002). Fecal DNA Tests for Colorectal Cancer. NEJM
346: 1912-1913
[Full Text]
Stanford, M R, Gras, L, Wade, A, Gilbert, R E
(2002). Reliability of expert interpretation of retinal photographs for the diagnosis of toxoplasma retinochoroiditis. Br. J. Ophthalmol.
86: 636-639
[Abstract][Full Text]
Tapia, O., Slepenkin, A., Sevrioukov, E., Hamor, K., de la Maza, L. M., Peterson, E. M.
(2002). Inclusion Fluorescent-Antibody Test as a Screening Assay for Detection of Antibodies to Chlamydia pneumoniae. CVI
9: 562-567
[Abstract][Full Text]
Feinstein, A R
(2002). Misguided efforts and future challenges for research on "diagnostic tests". J. Epidemiol. Community Health
56: 330-332
[Abstract][Full Text]
Knottnerus, J A
(2002). Challenges in dia-prognostic research. J. Epidemiol. Community Health
56: 340-341
[Full Text]