|
Background Computer-aided detection identifies suspicious findings on mammograms to assist radiologists. Since the Food and Drug Administration approved the technology in 1998, it has been disseminated into practice, but its effect on the accuracy of interpretation is unclear.
Methods We determined the association between the use of computer-aided detection at mammography facilities and the performance of screening mammography from 1998 through 2002 at 43 facilities in three states. We had complete data for 222,135 women (a total of 429,345 mammograms), including 2351 women who received a diagnosis of breast cancer within 1 year after screening. We calculated the specificity, sensitivity, and positive predictive value of screening mammography with and without computer-aided detection, as well as the rates of biopsy and breast-cancer detection and the overall accuracy, measured as the area under the receiver-operating-characteristic (ROC) curve.
Results Seven facilities (16%) implemented computer-aided detection during the study period. Diagnostic specificity decreased from 90.2% before implementation to 87.2% after implementation (P<0.001), the positive predictive value decreased from 4.1% to 3.2% (P=0.01), and the rate of biopsy increased by 19.7% (P<0.001). The increase in sensitivity from 80.4% before implementation of computer-aided detection to 84.0% after implementation was not significant (P=0.32). The change in the cancer-detection rate (including invasive breast cancers and ductal carcinomas in situ) was not significant (4.15 cases per 1000 screening mammograms before implementation and 4.20 cases after implementation, P=0.90). Analyses of data from all 43 facilities showed that the use of computer-aided detection was associated with significantly lower overall accuracy than was nonuse (area under the ROC curve, 0.871 vs. 0.919; P=0.005).
Conclusions The use of computer-aided detection is associated with reduced accuracy of interpretation of screening mammograms. The increased rate of biopsy with the use of computer-aided detection is not clearly associated with improved detection of invasive breast cancer.
Studies of the use of computer-aided detection in actual practice are limited by small numbers of patients or facilities, inability to control for confounding covariates associated with patients or radiologists, and lack of longitudinal follow-up to ascertain cancer outcomes, which precludes the estimation of sensitivity, specificity, and overall accuracy.9,10,11,12,13,14 Using data for a large, geographically diverse group of patients, we assessed the effect of computer-aided detection on the performance of screening mammography in community-based settings. We evaluated the sensitivity, specificity, positive predictive value, cancer-detection rate, biopsy rate, and overall accuracy of screening mammography with and without the use of computer-aided detection. By combining data for patients with independent survey data from radiologists and facilities, we could adjust for characteristics of all three groups in our analyses.
Methods
Study Design
We linked data from surveys that were mailed to mammography facilities and affiliated radiologists to data on mammograms and cancer outcomes for women screened between 1998 and 2002 at Breast Cancer Surveillance Consortium facilities. The federally funded consortium facilitates research by linking mammogram registries to population-based cancer registries.15 Three consortium registries participated in our study: the Group Health Cooperative Breast Cancer Surveillance System, a Washington State health plan with more than 100,000 female enrollees over the age of 40 years; the New Hampshire Mammography Network, which captures data for more than 85% of screening mammograms in New Hampshire; and the Colorado Mammography Program, which captures data for approximately half the screening mammograms in regional Denver. Study procedures were approved by institutional review boards at the University of Washington and the Group Health Cooperative in Seattle, Dartmouth College in New Hampshire, and the Cooper Institute in Colorado.
Study Data
The methods used to survey the facilities and radiologists have been described previously.6,16,17,18 In brief, the surveys measured factors that may affect the interpretation of mammograms (e.g., procedures used in reading the images, use of computer-aided detection, years of experience of radiologists in mammography, and number of mammograms interpreted by radiologists in the previous year). Surveys and informed-consent materials were mailed in early 2002.
The consortium developed the methods for collecting and assessing the quality of mammographic and patient data.15 We included bilateral mammograms designated by radiologists as obtained for "routine screening" of women 40 years of age or older who did not have a history of breast cancer. Mammographic data included assessments of the Breast Imaging Reporting and Data System (BI-RADS), recommendations by radiologists for further evaluation, ages of patients, breast density, time since most recent mammography, and the incidence of biopsy after screening (collected by two of the three registries). BI-RADS assessments were coded as follows: 0, additional imaging evaluation needed; 1, negative; 2, benign abnormality; 3, abnormality that is probably benign; 4, suspicious abnormality; or 5, abnormality highly suggestive of cancer.19 We ascertained newly diagnosed invasive breast cancers and ductal carcinomas in situ through December 31, 2003, through linkage with regional Surveillance, Epidemiology, and End Results registries or with local or statewide tumor registries.
Performance Measures and Data Classification
We calculated specificity, sensitivity, positive predictive value, and overall accuracy. We defined mammograms with BI-RADS assessment scores of 0, 4, or 5 as positive and mammograms with BI-RADS assessment scores of 1 or 2 as negative. Mammograms with a BI-RADS assessment score of 3 were defined as positive if the radiologist also recommended immediate evaluation and as negative otherwise.20 Specificity was defined as the percentage of screening mammograms that were negative among patients who did not receive a diagnosis of breast cancer within 1 year after screening. Sensitivity was defined as the percentage of screening mammograms that were positive among patients who received a diagnosis of breast cancer within 1 year after screening. The positive predictive value was defined as the probability of a breast-cancer diagnosis within 1 year after a positive screening mammogram.19 Overall accuracy was assessed with the use of a receiver-operating-characteristic (ROC) curve, which plots the true positive rate (sensitivity) against the false positive rate (1specificity). The area under the ROC curve (AUC) estimates the probability that two hypothetical mammograms, one showing cancer and one not, will be classified correctly as positive and negative, respectively.21 We also measured the recall rate (the percentage of screening mammograms that were positive) and the rates of biopsy and cancer detection (per 1000 screening mammograms).
After the initial survey, each registry provided additional information in 2005 regarding the use of computer-aided detection at affiliated facilities from 1998 through 2002. For facilities that used computer-aided detection, registry staff ascertained the date of implementation, the brand of computer-aided detection software used, and the estimated percentage of screening mammograms that were interpreted with the use of computer-aided detection after it was implemented. Among facilities that implemented computer-aided detection, all but one reported using computer-aided detection for 100% of screening mammograms after implementation. Thus, we represented the use of computer-aided detection as a binary variable, indicating whether or not it was used at facilities during each study month.
Women were classified on the basis of demographic and clinical covariates known to be associated with the accuracy of mammography,17,22,23 including age (in 5-year categories), breast density, and months since the most recent mammography. Facilities were classified according to academic affiliation, relative frequencies of screening and diagnostic imaging, interpretation of screening mammograms in batches of 10 or more, availability of interventional services (e.g., core biopsy), number of radiologists who specialized in breast imaging, interpretation of screening mammograms by more than one radiologist (i.e., double-reading), and the frequency of feedback about performance and the method used to review it. Radiologists were classified on the basis of years of experience with mammography and the annual number of mammograms interpreted, which was self-reported rather than collected from registry data because radiologists may interpret mammograms at nonconsortium facilities. Other characteristics of radiologists have not been associated with performance, so they were not included in the analyses.17,18
Statistical Analysis
We performed descriptive analyses to characterize facilities that did and those that did not implement computer-aided detection, as well as the patients and radiologists at these facilities. We used chi-square tests to compare unadjusted performance measures for screening mammography at facilities that adopted computer-aided detection with those that did not. Among the facilities that implemented computer-aided detection, we compared the performance of screening mammography before and after implementation. We examined the overall cancer-detection rates (per 1000 screening mammograms) as well as the rates of detection for invasive cancers and ductal carcinomas in situ.
To adjust for covariates associated with patients, facilities, or radiologists, we used mixed-effects logistic-regression analysis to model specificity, sensitivity, and positive predictive value as functions of the use of computer-aided detection, mammography registry, characteristics of patients (age, breast density, and time since most recent mammography), characteristics of radiologists (years of experience interpreting mammograms and number of mammograms interpreted annually), and four characteristics of facilities that were individually associated with specificity, sensitivity, or positive predictive value in separate analyses (P<0.10). For specificity, we modeled the odds of a true negative screening mammogram. For sensitivity, we modeled the odds of a true positive screening mammogram. For positive predictive value, we modeled the odds of a cancer diagnosis within 1 year after a positive screening mammogram. Models included a random effect at the facility level to account for correlation of mammography outcomes within each facility. We reran each model with an interaction term between the use of computer-aided detection and the study month to assess whether the effect of computer-aided detection on performance changed over time.
We used mixed-effects ordinal-regression analysis to fit an ROC model that included covariates associated with patients, radiologists, facilities, and registries as fixed effects and two random effects for the facility-level "threshold" (the likelihood that a mammogram would be interpreted as positive) and "accuracy" (the ability to discriminate cancer from noncancer).24 We tested for a significant difference between the AUCs with and those without computer-aided detection, using a likelihood-ratio test. Hypothesis tests were two-sided, with an alpha level of 0.05.
Results
Study Data
Of 51 facilities that contributed mammographic data to registries in the period from 1998 through 2002, 43 (84%) responded to the survey. Within these 43 facilities, there were 159 radiologists who interpreted mammograms, of whom 122 (77%) provided complete responses and written informed consent for linkage to mammography and facility data. Radiologists who did and those who did not respond had similar performance measures for screening mammography.17 Complete mammographic data were available for 222,135 women (a total of 429,345 screening mammograms), including 2351 women who received a diagnosis of breast cancer within 1 year after screening (Table 1). As in previous studies,22,23 age, breast density, and time since most recent mammography for patients were associated with specificity, sensitivity, and positive predictive value.
|
Women screened at the 36 facilities that did not implement computer-aided detection during the study period were older, had denser breasts, and were less likely to have undergone mammography within the previous 9 to 20 months than those screened at the 7 facilities that implemented computer-aided detection (Table 2), implying a higher overall risk of breast cancer among the women screened at the nonimplementing facilities.25 On average, radiologists at facilities that did not implement computer-aided detection had more years of experience with mammography than did radiologists at facilities that implemented computer-aided detection. Characteristics of patients and radiologists at the facilities that adopted computer-aided detection were similar before and after its implementation. Facilities that did and those that did not implement computer-aided detection were similar across a range of characteristics, including the presence or absence of radiologists who specialized in breast imaging (Table 3).
|
|
Differences in characteristics of patients and radiologists (Table 2) would predict lower specificity, higher recall rates, and higher sensitivity at facilities that never implemented computer-aided detection than at facilities that did.17,22,26 Indeed, at the 36 facilities that never implemented computer-aided detection, specificity was significantly lower (P<0.001) and recall rates were significantly higher (P<0.001) than at the 7 facilities that adopted computer-aided detection but had not yet implemented it (Table 4). After these 7 facilities implemented computer-aided detection, the opposite was true: the specificity was significantly lower and the recall rate was significantly higher than at the 36 facilities that never implemented computer-aided detection (P<0.001 for both comparisons), even though characteristics of patients and radiologists remained stable (Table 2).
|
Before the adoption of computer-aided detection at the 7 facilities, the biopsy rate was similar to that at the 36 facilities that never implemented computer-aided detection. After the seven facilities implemented computer-aided detection, the biopsy rate increased by 20% (from 14.7 biopsies per 1000 screening mammograms before implementation to 17.6 biopsies after implementation, P<0.001). As anticipated from risk factors of patients (i.e., older age, denser breasts, and less recent mammography) at the 36 facilities that never implemented computer-aided detection,22,25 the cancer-detection rate was significantly higher at these facilities than at the 7 facilities that adopted computer-aided detection but had not yet implemented it (P=0.03). Before and after the implementation of computer-aided detection at the seven facilities, the cancer-detection rate was similar (4.15 and 4.20 cases per 1000 screening mammograms, respectively; P=0.90), but the proportions of detected invasive cancers and ductal carcinomas in situ changed: the rate of detection of invasive breast cancer decreased by 12% (from 2.98 cases per 1000 screening mammograms before implementation to 2.63 cases after implementation, P=0.32), whereas the rate of detection of ductal carcinomas in situ increased by 34% (from 1.17 to 1.57 cases per 1000 screening mammograms, P=0.09). The percentage of detected cases of cancer that were ductal carcinomas in situ increased significantly after implementation of computer-aided detection as compared with before implementation (37.4% vs. 28.1%, P=0.049).
Adjusted Performance
Of the 429,345 mammograms in this analysis, 332,869 (78%) were interpreted by participating radiologists and thus could be included in analyses that adjusted simultaneously for characteristics of patients, facilities, and radiologists. After adjustment, the use of computer-aided detection as compared with nonuse remained associated with significantly lower specificity and positive predictive value, as well as nonsignificantly greater sensitivity (Table 5). Because mammograms are usually correctly interpreted, odds ratios do not accurately estimate percent changes in specificity, sensitivity, or positive predictive value. Thus, among women with breast cancer, the 46% greater adjusted odds of a positive mammogram with the use of computer-aided detection as compared with nonuse (odds ratio, 1.46) is consistent with the absolute increase in sensitivity of 3.6% associated with the implementation of computer-aided detection in unadjusted analyses (from 80.4% before implementation to 84.0% after implementation).
|
As shown in Figure 1, the modeled AUC was 0.919 without the use of computer-aided detection but was 0.871 with its use (P=0.005). Because accuracy increases as the AUC approaches 1.0, the use of computer-aided detection was associated with significantly lower overall accuracy than was nonuse.
|
The use of computer-aided detection in clinical practice has increased since the FDA approved the technology and Medicare began reimbursing for its use. In our observational study of large numbers of community-based mammography facilities and patients, the use of computer-aided detection was associated with increases in potential harms of screening mammography, including higher recall and biopsy rates, and was of uncertain clinical benefit.
As others have reported,9,11,12,13,14 we found that the use of computer-aided detection was associated with higher recall rates than nonuse, implying that rates of false positive results were also higher with use, since most recalls do not result in a diagnosis of cancer. Increased recall rates and rates of false positive results may be logical consequences of the design of computer-aided detection software. With the goal of alerting radiologists to overlooked suspicious areas, computer-aided detection programs insert up to four marks on the average screening mammogram.13,27,28 Thus, for every true positive mark resulting from computer-aided detection that is associated with an underlying cancer, radiologists encounter nearly 2000 false positive marks.29
Increased recall rates could be a necessary cost of improved cancer detection. The use of computer-aided detection was associated with a nonsignificant trend toward increased sensitivity but with no substantive change in the overall detection of cancer. Use of the technology was, however, more strongly associated with the detection of ductal carcinoma in situ than with the detection of invasive breast cancer, a finding that may stem from the propensity of computer-aided detection software to mark calcifications.5,30,31,32 To the extent that ductal carcinoma in situ is a precursor to invasive cancer,33 the greater percentage of cancers found that were ductal carcinomas in situ after the implementation of computer-aided detection than before implementation may be viewed optimistically as a shift toward detecting breast cancer at an earlier stage with the use of computer-aided detection. On the other hand, the natural history of ductal carcinoma in situ is certainly more indolent than that of invasive cancer,34 and the effect of computer-aided detection on mortality from breast cancer may be limited if it chiefly promotes the identification of ductal carcinoma in situ rather than invasive cancer.35
No single measure is sufficient to judge the effect of computer-aided detection on interpretive performance.36 Rather, the benefits of true positive results must be weighed against the consequences of false positive results, including associated economic costs. Our results suggest that approximately 157 women would be recalled (and 15 women would undergo biopsy) owing to the use of computer-aided detection in order to detect one additional case of cancer, possibly a ductal carcinoma in situ (see the Supplementary Appendix, available with the full text of this article at www.nejm.org). After accounting for the additional fees for the use of computer-aided detection37 and the costs of diagnostic evaluations after recalls resulting from the use of computer-aided detection,38 we calculated that system-wide use of computer-aided detection in the United States could increase the annual national costs of screening mammography by approximately 18% ($550 million) (see the Supplementary Appendix).
Facilities that adopted computer-aided detection had performance measures before its implementation that differed from those at facilities that never adopted computer-aided detection; these differences were consistent with differences in the characteristics of patients and radiologists at the two groups of facilities. The use of computer-aided detection may have caused a regression toward mean levels of performance among radiologists whose interpretations of mammograms tended to differ from those of most radiologists, but its implementation was associated with changes in specificity and recall rates that overshot levels at facilities that never implemented computer-aided detection. Moreover, the use of computer-aided detection remained significantly associated with decreased specificity, decreased positive predictive value, and decreased overall accuracy in analyses that adjusted for differences in characteristics of patients, radiologists, and facilities. Nevertheless, the association between the use of computer-aided detection and the observed changes in performance could be explained by factors we did not measure.
Although six of the seven facilities that adopted computer-aided detection reported using it for 100% of mammograms after implementation, we did not measure the use of computer-aided detection at the level of the individual mammogram. In this respect, our estimates of the effects of computer-aided detection on performance may be conservative. All facilities that implemented computer-aided detection used the same commercial product. The manufacturer has updated its detection software since 2002, but we are unaware of any community-based studies that have found improved detection of breast cancer with recent versions of the software.
Even in our large study, only 156 cases of cancer developed during the study among women screened at facilities using computer-aided detection, resulting in wide confidence intervals around estimates of sensitivity and cancer-detection rates after implementation. Because of the rarity of breast cancer in community samples, very large samples would be needed to study the effect of the use of computer-aided detection on sensitivity with high statistical power (approximately 750,000 mammograms interpreted in total, half with the use of computer-aided detection and half without).
In conclusion, we found that, among large numbers of diverse facilities and radiologists, the use of computer software designed to improve the interpretation of mammograms was associated with significantly higher false positive rates, recall rates, and biopsy rates and with significantly lower overall accuracy in screening mammography than was nonuse. The nonsignificant trend toward greater sensitivity with the use of computer-aided detection as compared with nonuse may be largely explained by increased detection of ductal carcinoma in situ. As an FDA-approved technology whose use can be reimbursed by Medicare, computer-aided detection has been incorporated quickly into mammography practices, despite tentative evidence of clinical benefits. Now that computer-aided detection is used in the screening of millions of healthy women, larger studies are needed to judge more precisely whether benefits of routine use of computer-aided detection outweigh its harms.
Supported by grants from the National Cancer Institute (NCI) (U01 CA69976, to Dr. Taplin; U01CA86082, to Dr. Carney; U01CA63736, to Dr. Cutter; and K05 CA104699, to Dr. Elmore), the Agency for Healthcare Research and Quality and NCI (R01 CA 107623, to Dr. Elmore), and the American Cancer Society (MRSGT-05-214-01-CPPB, to Dr. Fenton). Dr. D'Orsi is a Georgia Cancer Coalition scholar.
Presented at the annual meeting of the Breast Cancer Surveillance Consortium, Chapel Hill, NC, April 24, 2006.
Dr. Taplin was the principal investigator on NCI grant CA69976 when the study began, but he is now affiliated with the NCI. Dr. D'Orsi reports serving as a paid clinical adviser for, and having equity ownership in, R2 Technologies. Dr. Hendrick reports having equity ownership in Koning and Biolucent and receiving lecture fees from GE Medical Systems. No other potential conflict of interest relevant to this article was reported.
All opinions expressed in this article are those of the authors and should not be construed to imply the opinion or endorsement of the federal government or the NCI.
Source Information
From the University of California, Davis, Sacramento (J.J.F.); the National Cancer Institute, Bethesda, MD (S.H.T.); Oregon Health and Science University, Portland (P.A.C.); Group Health Cooperative, Seattle (L.A.); the University of California, San Francisco, San Francisco (E.A.S.); the Emory Clinic, Atlanta (C.D.); Northwestern University, Chicago (E.A.B., R.E.H.); the University of Alabama at Birmingham, Birmingham (G.C.); Cancer Research and Biostatistics, Seattle (W.E.B.); and the University of Washington, Seattle (J.G.E.).
Address reprint requests to Dr. Fenton at the Department of Family and Community Medicine, UC Davis Health System, 4860 Y St., Ste. 2300, Sacramento, CA 95817, or at joshua.fenton{at}ucdmc.ucdavis.edu.
References
| |||||||||||||||||||||||||||||||||||||||||||||||||||||
This article has been cited by other articles:
HOME | SUBSCRIBE | SEARCH | CURRENT ISSUE | PAST ISSUES | COLLECTIONS | PRIVACY | HELP | beta.nejm.org Comments and questions? Please contact us. The New England Journal of Medicine is owned, published, and copyrighted © 2008 Massachusetts Medical Society. All rights reserved. |