Background Despite the proved value of mammography in screeningfor breast cancer, its efficacy depends on radiologists' interpretations.The variability in such interpretations is not well understood.
Methods Using a technique of stratified random sampling, weselected 150 mammograms obtained in 1987: 27 from women withhistopathologically confirmed breast cancer and 123 from womenwith no evidence of breast cancer after three years of follow-upexaminations. Ten radiologists, who were unaware of the diagnosesand research hypothesis, each interpreted the 150 mammograms.Disagreement was analyzed within pairs of the 10 radiologists,as well as for the group of 150 women as a whole.
Results The diagnostic consistency between pairs of radiologistswas moderate, with a median weighted percentage of agreementof 78 percent (weighted kappa, 0.47). The frequency of the radiologists'recommendations for an immediate workup ranged from 74 to 96percent for mammograms from the women with cancer and from 11to 65 percent for films from the women without cancer. A substantialdisagreement in management recommendations -- in which one radiologistrecommended routine follow-up and another recommended a biopsyfor the same patient -- occurred in 3 percent of the pairwisecomparisons but in 25 percent of the comparisons for the groupof women as a whole. When two or more radiologists recommendeda biopsy for the same patient, a disagreement in the statedlocation (right or left breast) occurred in 2 percent of thepairwise comparisons among the radiologists but in 9 percentof comparisons for the group of women as a whole. Because somedisagreement was likely, given that 10 radiologists read eachfilm, the pairwise comparison is a more conservative estimateof disagreement.
Conclusions Although mammography is of value in screening womenfor breast cancer, radiologists can differ, sometimes substantially,in their interpretations of mammograms and in their recommendationsfor management. Efforts to improve accuracy and reduce variabilityin interpretation may increase the effectiveness of mammographyin detecting early breast cancers.
Source Information
From the Departments of Internal Medicine (J.G.E., C.K.W., D.H.H., A.R.F.), Diagnostic Radiology (C.H.L.), and Epidemiology and Public Health (A.R.F.), Yale University, New Haven, Conn. Presented in part at the meeting of the American Association of Physicians, Washington, D.C., April 30-May 3, 1993.
Address reprint requests to Dr. Elmore at the Primary Care Center, Yale University School of Medicine, 20 York St., New Haven, CT 06504.
Gur, D., Bandos, A. I., Cohen, C. S., Hakim, C. M., Hardesty, L. A., Ganott, M. A., Perrin, R. L., Poller, W. R., Shah, R., Sumkin, J. H., Wallace, L. P., Rockette, H. E.
(2008). The "Laboratory" Effect: Comparing Radiologists' Performance and Variability during Prospective Clinical and Laboratory Mammography Interpretations. Radiology
249: 47-53
[Abstract][Full Text]
Hemingway, H., Chen, R., Junghans, C., Timmis, A., Eldridge, S., Black, N., Shekelle, P., Feder, G.
(2008). Appropriateness Criteria for Coronary Angiography in Angina: Reliability and Validity. ANN INTERN MED
149: 221-231
[Abstract][Full Text]
Kopans, D. B., D. Pisano, E., Acharyya, S., Hendrick, R. E., Yaffe, M. J., Conant, E. F., Fajardo, L. L., Bassett, L. W., Baum, J. K., Gatsonis, C. A.
(2008). DMIST Results: Technologic or Observer Variability?. Radiology
248: 703-704
[Full Text]
Gierada, D. S., Pilgram, T. K., Ford, M., Fagerstrom, R. M., Church, T. R., Nath, H., Garg, K., Strollo, D. C.
(2007). Lung Cancer: Interobserver Agreement on Interpretation of Pulmonary Findings at Low-Dose CT Screening. Radiology
246: 265-272
[Abstract][Full Text]
Skaane, P., Hofvind, S., Skjennald, A.
(2007). Randomized Trial of Screen-Film versus Full-Field Digital Mammography with Soft-Copy Reading in Population-based Screening Program: Follow-up and Final Results of Oslo II Study. Radiology
244: 708-717
[Abstract][Full Text]
Jiang, Y., Miglioretti, D. L., Metz, C. E., Schmidt, R. A.
(2007). Breast Cancer Detection Rate: Designing Imaging Trials to Demonstrate Improvements. Radiology
243: 360-367
[Abstract][Full Text]
Berlin, L.
(2007). Accuracy of Diagnostic Procedures: Has It Improved Over the Past Five Decades?. Am. J. Roentgenol.
188: 1173-1178
[Full Text]
Skaane, P., Kshirsagar, A., Stapleton, S., Young, K., Castellino, R. A.
(2007). Effect of Computer-Aided Detection on Independent Double Reading of Paired Screen-Film and Full-Field Digital Screening Mammograms. Am. J. Roentgenol.
188: 377-384
[Abstract][Full Text]
Leung, J. W. T., Margolin, F. R., Dee, K. E., Jacobs, R. P., Denny, S. R., Schrumpf, John. D.
(2007). Performance Parameters for Screening and Diagnostic Mammography in a Community Practice: Are There Differences Between Specialists and General Radiologists?. Am. J. Roentgenol.
188: 236-241
[Abstract][Full Text]
Taplin, S. H., Rutter, C. M., Lehman, C. D.
(2006). Testing the effect of computer-assisted detection on interpretive performance in screening mammography.. Am. J. Roentgenol.
187: 1475-1482
[Abstract][Full Text]
Rosenberg, R. D., Yankaskas, B. C., Abraham, L. A., Sickles, E. A., Lehman, C. D., Geller, B. M., Carney, P. A., Kerlikowske, K., Buist, D. S. M., Weaver, D. L., Barlow, W. E., Ballard-Barbash, R.
(2006). Performance Benchmarks for Screening Mammography. Radiology
241: 55-66
[Abstract][Full Text]
Pinsky, P. F., Freedman, M., Kvale, P., Oken, M., Caporaso, N., Gohagan, J.
(2006). Abnormalities on chest radiograph reported in subjects in a cancer screening trial.. Chest
130: 688-693
[Abstract][Full Text]
Ganott, M. A., Sumkin, J. H., King, J. L., Klym, A. H., Catullo, V. J., Cohen, C. S., Gur, D.
(2006). Screening Mammography: Do Women Prefer a Higher Recall Rate Given the Possibility of Earlier Detection of Cancer?. Radiology
238: 793-800
[Abstract][Full Text]
Skaane, P., Balleyguier, C., Diekmann, F., Diekmann, S., Piguet, J.-C., Young, K., Niklason, L. T.
(2005). Breast Lesion Detection and Classification: Comparison of Screen-Film Mammography and Full-Field Digital Mammography with Soft-copy Reading--Observer Performance Study. Radiology
237: 37-44
[Abstract][Full Text]
Burnside, E. S., Park, J. M., Fine, J. P., Sisney, G. A.
(2005). The Use of Batch Reading to Improve the Performance of Screening Mammography. Am. J. Roentgenol.
185: 790-796
[Abstract][Full Text]
Crewson, P. E.
(2005). Reader Agreement Studies. Am. J. Roentgenol.
184: 1391-1397
[Full Text]
Jobanputra, P., Arthur, V., Pugh, M., Spannuth, F., Griffiths, P., Thomas, E., Sheeran, T.
(2005). Quality of care for NSAID users: development of an assessment tool. Rheumatology (Oxford)
44: 633-637
[Abstract][Full Text]
Elmore, J. G., Armstrong, K., Lehman, C. D., Fletcher, S. W.
(2005). Screening for Breast Cancer. JAMA
293: 1245-1256
[Abstract][Full Text]
Urbain, J.-L.
(2005). Breast cancer screening, diagnostic accuracy and health care policies. CMAJ
172: 210-211
[Full Text]
Barlow, W. E., Chi, C., Carney, P. A., Taplin, S. H., D'Orsi, C., Cutter, G., Hendrick, R. E., Elmore, J. G.
(2004). Accuracy of Screening Mammography Interpretation by Characteristics of Radiologists. JNCI J Natl Cancer Inst
96: 1840-1850
[Abstract][Full Text]
Kopans, D. B.
(2004). Be Careful to Not Willingly Suspend Disbelief. Radiology
233: 645-647
[Full Text]
Wagner, R. F., Beam, C. A., Beiden, S. V.
(2004). Reader Variability in Mammography and Its Implications for Expected Utility over the Population of Readers and Cases. Med Decis Making
24: 561-572
[Abstract]
Baker, J. A., Lo, J. Y., Delong, D. M., Floyd, C. E.
(2004). Computer-aided Detection in Screening Mammography: Variability in Cues. Radiology
233: 411-417
[Abstract][Full Text]
Smith, P. D., Temte, J., Beasley, J. W., Mundt, M.
(2004). Radiographs in the Office: Is a Second Reading Always Needed?. J Am Board Fam Med
17: 256-263
[Abstract][Full Text]
Carney, P. A., Elmore, J. G., Abraham, L. A., Gerrity, M. S., Hendrick, R. E., Taplin, S. H., Barlow, W. E., Cutter, G. R., Poplack, S. P., D'Orsi, C. J.
(2004). Radiologist Uncertainty and the Interpretation of Screening. Med Decis Making
24: 255-264
[Abstract]
Pijnappel, R M, Peeters, P H M, Hendriks, J H C L, Mali, W P T. M
(2004). Reproducibility of mammographic classifications for non-palpable suspect lesions with microcalcifications. Br. J. Radiol.
77: 312-314
[Abstract][Full Text]
Gur, D., Sumkin, J. H., Rockette, H. E., Ganott, M., Hakim, C., Hardesty, L., Poller, W. R., Shah, R., Wallace, L.
(2004). Changes in Breast Cancer Detection and Mammography Recall Rates After the Introduction of a Computer-Aided Detection System. JNCI J Natl Cancer Inst
96: 185-190
[Abstract][Full Text]
Whiting, P., Rutjes, A. W.S., Reitsma, J. B., Glas, A. S., Bossuyt, P. M.M., Kleijnen, J.
(2004). Sources of Variation and Bias in Studies of Diagnostic Accuracy: A Systematic Review. ANN INTERN MED
140: 189-202
[Abstract][Full Text]
Skaane, P., Young, K., Skjennald, A.
(2003). Population-based Mammography Screening: Comparison of Screen-Film and Full-Field Digital Mammography with Soft-Copy Reading--Oslo I Study. Radiology
229: 877-884
[Abstract][Full Text]
Elmore, J. G., Nakano, C. Y., Koepsell, T. D., Desnick, L. M., D'Orsi, C. J., Ransohoff, D. F.
(2003). International Variation in Screening Mammography Interpretations in Community-Based Programs. JNCI J Natl Cancer Inst
95: 1384-1393
[Abstract][Full Text]
Kundel, H. L., Polansky, M.
(2003). Measurement of Observer Agreement. Radiology
228: 303-308
[Abstract][Full Text]
Majid, A. S., de Paredes, E. S., Doherty, R. D., Sharma, N. R., Salvador, X.
(2003). Missed Breast Carcinoma: Pitfalls and Pearls. RadioGraphics
23: 881-895
[Abstract][Full Text]
Guenin, M. A., Sickles, E. A., Wolverton, D. E., Dee, K. E.
(2003). Generalists versus Specialists in Mammography [letter] * Dr Sickles and colleagues respond:. Radiology
227: 609-611
[Full Text]
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G.
(2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. ANN INTERN MED
138: W1-W12
[Abstract][Full Text]
Bossuyt, P. M., Reitsma, J. B., Bruns, D. E., Gatsonis, C. A., Glasziou, P. P., Irwig, L. M., Moher, D., Rennie, D., de Vet, H. C.W., Lijmer, J. G.
(2003). The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration. Clin. Chem.
49: 7-18
[Abstract][Full Text]
Herman, C. R., Gill, H. K., Eng, J., Fajardo, L. L.
(2002). Screening for Preclinical Disease: Test and Disease Characteristics. Am. J. Roentgenol.
179: 825-831
[Full Text]
Dahl, L B, Hasvold, P, Arild, E, Hasvold, T, Wren, C, Dahl, L B, Hasvold, P, Arild, E, Hasvold, T
(2002). Heart murmurs recorded by a sensor based electronic stethoscope and e-mailed for remote assessment. Arch. Dis. Child.
87: 297-301
[Abstract][Full Text]
Kessler, L. G., Andersen, M. R., Etzioni, R.
(2002). Much Ado About Mammography Variability. JNCI J Natl Cancer Inst
94: 1346-1347
[Full Text]
Elmore, J. G., Miglioretti, D. L., Reisch, L. M., Barton, M. B., Kreuter, W., Christiansen, C. L., Fletcher, S. W.
(2002). Screening Mammograms by Community Radiologists: Variability in False-Positive Rates. JNCI J Natl Cancer Inst
94: 1373-1380
[Abstract][Full Text]
Hofer, T. P., Hayward, R. A.
(2002). Are Bad Outcomes from Questionable Clinical Decisions Preventable Medical Errors? A Case of Cascade Iatrogenesis. ANN INTERN MED
137: 327-333
[Abstract][Full Text]
Berg, W. A., D'Orsi, C. J., Jackson, V. P., Bassett, L. W., Beam, C. A., Lewis, R. S., Crewson, P. E.
(2002). Does Training in the Breast Imaging Reporting and Data System (BI-RADS) Improve Biopsy Recommendations or Feature Analysis Agreement with Experienced Breast Imagers at Mammography?. Radiology
224: 871-880
[Abstract][Full Text]
Farmer, E. R.
(2002). The Fundamental Issue of Diagnosis. Arch Dermatol
138: 684-685
[Full Text]
Rosen, E. L., Baker, J. A., Soo, M. S.
(2002). Malignant Lesions Initially Subjected to Short-term Mammographic Follow-up. Radiology
223: 221-228
[Abstract][Full Text]
Baker, J. A., Soo, M. S.
(2002). Breast US: Assessment of Technical Quality and Image Interpretation. Radiology
223: 229-238
[Abstract][Full Text]
Esserman, L., Cowley, H., Eberle, C., Kirkpatrick, A., Chang, S., Berbaum, K., Gale, A.
(2002). Improving the Accuracy of Mammography: Volume and Outcome Relationships. JNCI J Natl Cancer Inst
94: 369-375
[Abstract][Full Text]
Jiang, Y., Nishikawa, R. M., Schmidt, R. A., Toledano, A. Y., Doi, K.
(2001). Potential of Computer-aided Diagnosis to Reduce Variability in Radiologists' Interpretations of Mammograms Depicting Microcalcifications. Radiology
220: 787-794
[Abstract][Full Text]
Venta, L. A., Hendrick, R. E., Adler, Y. T., DeLeon, P., Mengoni, P. M., Scharl, A. M., Comstock, C. E., Hansen, L., Kay, N., Coveler, A., Cutter, G.
(2001). Rates and Causes of Disagreement in Interpretation of Full-Field Digital Mammography and Film-Screen Mammography in a Diagnostic Setting. Am. J. Roentgenol.
176: 1241-1248
[Abstract][Full Text]
Brenner, R. J., Bassett, L. W., Fajardo, L. L., Dershaw, D. D., Evans, W. P. III, Hunt, R., Lee, C., Tocino, I., Fisher, P., McCombs, M., Jackson, V. P., Feig, S. A., Mendelson, E. B., Margolin, F. R., Bird, R., Sayre, J.
(2001). Stereotactic Core-Needle Breast Biopsy: A Multi-institutional Prospective Trial. Radiology
218: 866-872
[Abstract][Full Text]
Nelson, J. C, Pepe, M. S
(2000). Statistical description of interrater variability in ordinal ratings. Stat Methods Med Res
9: 475-496
[Abstract]
Berg, W. A., Campassi, C., Langenberg, P., Sexton, M. J.
(2000). Breast Imaging Reporting and Data System: Inter- and Intraobserver Variability in Feature Analysis and Final Assessment. Am. J. Roentgenol.
174: 1769-1777
[Abstract][Full Text]
Taplin, S. H., Rutter, C. M., Elmore, J. G., Seger, D., White, D., Brenner, R. J.
(2000). Accuracy of Screening Mammography Using Single Versus Independent Double Interpretation. Am. J. Roentgenol.
174: 1257-1262
[Abstract][Full Text]
Starren, J., Johnson, S. B.
(2000). An Object-oriented Taxonomy of Medical Data Presentations. J. Am. Med. Inform. Assoc.
7: 1-20
[Abstract][Full Text]
Pisano, E. D.
(2000). Current Status of Full-Field Digital Mammography1. Radiology
214: 26-28
[Full Text]
Urban, N.
(1999). Screening for ovarian cancer. BMJ
319: 1317-1318
[Full Text]
Rubenfeld, G. D., Caldwell, E., Granton, J., Hudson, L. D., Matthay, M. A.
(1999). Interobserver Variability in Applying a Radiographic Definition for ARDS. Chest
116: 1347-1353
[Abstract][Full Text]
Fultz, P. J., Jacobs, C. V., Hall, W. J., Gottlieb, R., Rubens, D., Totterman, S. M. S., Meyers, S., Angel, C., Priore, G. D., Warshal, D. P., Zou, K. H., Shapiro, D. E.
(1999). Ovarian Cancer: Comparison of Observer Performance for Four Methods of Interpreting CT Scans. Radiology
212: 401-410
[Abstract][Full Text]
Kuhl, C. K., Mielcareck, P., Klaschik, S., Leutner, C., Wardelmann, E., Gieseke, J., Schild, H. H.
(1999). Dynamic Breast MR Imaging: Are Signal Intensity Time Course Data Useful for Differential Diagnosis of Enhancing Lesions?. Radiology
211: 101-110
[Abstract][Full Text]
Greendale, G. A., Reboussin, B. A., Sie, A., Singh, H. R., Olson, L. K., Gatewood, O., Bassett, L. W., Wasilauskas, C., Bush, T., Barrett-Connor, E., for the Postmenopausal Estrogen/Progestin Interven,
(1999). Effects of Estrogen and Estrogen-Progestin on Mammographic Parenchymal Density. ANN INTERN MED
130: 262-268
[Abstract][Full Text]
Fisher, E. S., Welch, H. G.
(1999). Avoiding the Unintended Consequences of Growth in Medical Care: How Might More Be Worse?. JAMA
281: 446-453
[Abstract][Full Text]
Shekelle, P. G., Kahan, J. P., Bernstein, S. J., Leape, L. L., Kamberg, C. J., Park, R.E.
(1998). The Reproducibility of a Method to Identify the Overuse and Underuse of Medical Procedures. NEJM
338: 1888-1895
[Abstract][Full Text]
Paradise, J. L., Bernard, B. S., Colborn, D. K., Janosky, J. E.
(1998). Assessment of Adenoidal Obstruction in Children: Clinical Signs Versus Roentgenographic Findings. Pediatrics
101: 979-986
[Abstract][Full Text]
Silen, W., Gaskin, T. A., D'Orsi, C. J., Swets, J. A., Hall, F. M., Elmore, J. G., Wells, C. K., Feinstein, A. R.
(1995). Variability in the Interpretation of Mammograms. NEJM
332: 1171-1173
[Full Text]
Field, S, Michell, M J, Wallis, M G W, Wilson, A R M
(1995). What should be done about interval breast cancers?. BMJ
310: 203-204
[Full Text]