The Reproducibility of a Method to Identify the Overuse and Underuse of Medical Procedures
Paul G. Shekelle, M.D., Ph.D., James P. Kahan, Ph.D., Steven J. Bernstein, M.D., M.P.H., Lucian L. Leape, M.D., Caren J. Kamberg, M.P.H., and R.E. Park, Ph.D.
Background To assess the overuse and underuse of medical procedures,various methods have been developed, but their reproducibilityhas not been evaluated. This study estimates the reproducibilityof one commonly used method.
Methods We performed a parallel, three-way replication of theRANDUniversity of California at Los Angeles appropriatenessmethod as applied to two medical procedures, coronary revascularizationand hysterectomy. Three nine-member multidisciplinary panelsof experts were composed for each procedure by stratified randomsampling from a list of experts nominated by the relevant specialtysocieties. Each panel independently rated the same set of clinicalscenarios in terms of the appropriateness of the relevant procedureon a riskbenefit scale ranging from 1 to 9. Final ratingswere used to classify the procedure in each scenario as necessaryor not necessary (to evaluate underuse) and inappropriate ornot inappropriate (to evaluate overuse). Reproducibility wasmeasured by overall agreement and by the kappa statistic. Thecriteria for underuse and overuse derived from these ratingswere then applied to real populations of patients who had undergonecoronary revascularization or hysterectomy.
Results The rates of agreement among the three coronary-revascularizationpanels were 95, 94, and 96 percent for inappropriate-use scenariosand 93, 92, and 92 percent for necessary-use scenarios. Agreementamong the three hysterectomy panels was 88, 70, and 74 percentfor inappropriate-use scenarios. Scenarios involving necessaryuse of hysterectomy were not assessed. The three-way kappa statisticto detect overuse was 0.52 for coronary revascularization and0.51 for hysterectomy. The three-way kappa statistic to detectunderuse of coronary revascularization was 0.83. Applicationof individual panels' criteria to real populations of patientsresulted in a 100 percent variation in the proportion of casesclassified as inappropriate and a 20 percent variation in theproportion of cases classified as necessary.
Conclusions The appropriateness method is far from perfect.Appropriateness criteria may be useful in comparing levels ofappropriate procedures among populations but should not by themselvesbe used to direct care for individual patients.
The appropriateness of health procedures has commanded considerableattention recently.1,2,3 Escalating health care costs and identificationof inappropriate care have led to the critical examination ofpossible overuse and underuse of many medical and surgical proceduresand questions as to when or whether they are needed. Centralto this examination is the determination of what constitutesappropriate indications for any given procedure. Ideally, thisdetermination would be derived solely from rigorously conductedresearch that established conclusively the clinical circumstancesunder which patients benefit from the procedure. Unfortunately,satisfactory data on efficacy and effectiveness are unusual.4In fact, several studies estimate that only 15 to 20 percentof medical practices can be justified on the basis of rigorousscientific data establishing their effectiveness.5,6 For mostconditions, something other than rigorous data on efficacy oreffectiveness must be used to determine criteria of appropriateness.
One frequently used method that combines expert opinion, thetype of information most commonly employed, with available scientificevidence is the RANDUniversity of California at Los Angelesappropriateness method, which was developed in 1984 by the HealthServices Utilization Study.7 This method has been used to evaluatethe appropriateness of a variety of medical and surgical interventions.1,2,3,8,9It combines a systematic review of the scientific literaturewith expert opinion and yields specific criteria of appropriatenessthat can be used as the basis for review criteria, practiceguidelines, or both. In general, it quantitatively assessesthe expert judgment of a multidisciplinary group of cliniciansconcerning a comprehensive series of clinical indications ona riskbenefit scale ranging from 1 to 9. It is iterative,with two rounds of anonymous ratings and a face-to-face groupdiscussion between rounds. Each panelist has equal weight indetermining the final result: an explicit appropriateness ratingfor clinically detailed patient scenarios.
A central criticism of the appropriateness method is the potentialsensitivity of the results to the selection of particular experts,leading to concern about the results' validity.10,11 To addressthis concern, we conducted a rigorous test of the reproducibilityof the appropriateness method as used to identify the overuseand underuse of medical procedures.
Methods
We performed a parallel, three-way replication of the appropriatenesspanel process for two medical procedures, coronary revascularizationand hysterectomy. We chose these procedures because they arecommonly performed and they differ in the amount of availablescientific evidence concerning efficacy. We examined all indicationsfor coronary revascularization (948 clinical scenarios) andnonemergency, nononcologic indications for hysterectomy (1718clinical scenarios). Table 1 presents examples of indicationsthat were rated.
Table 1. Examples of the Indications for Coronary Revascularization and Hysterectomy Rated by Expert Panels.
Selection of Panelists
We solicited nominations for the coronary-revascularizationand hysterectomy panels from a variety of relevant, respectedmedical and surgical societies and organizations. From all sources,69 cardiologists, 30 primary care physicians, and 81 cardiovascularsurgeons were nominated for the coronary-revascularization panel,and 57 obstetriciangynecologists and 30 primary carephysicians were nominated for the hysterectomy panel.
We requested a current curriculum vitae from each nominee. Physicianswho had previously served as expert panelists for assessmentsof the appropriateness of coronary revascularization or hysterectomywere excluded. Each panelist was classified according to specialty,location of practice, type of practice (academic or private),and sex. Drawing from the pool of qualified nominees by stratifiedrandom sampling, we made assignments to four panels for eachprocedure. We sent the panelists who were selected a letterinviting them to participate. Those who declined were replacedwith new physicians from the appropriate strata until four panelsfor each procedure had been composed. Our interaction with oneof these panels was only by mail. We report here the resultsfrom the three panels that followed the conventional appropriatenessmethod, which includes a face-to-face panel discussion.
Synthesis of the Literature and Selection of Moderators
For each procedure, a synthesis of the scientific literaturewas prepared and peer-reviewed by external experts for completenessand accuracy. Three experienced moderators were selected, onefor each panel. Moderators were aware only of the names of theirown panelists and their own results; they were unaware of thenames of other panelists and of the actions and results of theother panels.
Operation of the Panels
Each panel was conducted in identical fashion, with panelistsreceiving the same literature synthesis, set of clinical scenarios,and instructions. The panelists first independently rated theappropriateness of using the relevant procedure in each scenarioand returned their rating forms by mail. The ratings were thentabulated before the face-to-face panel meeting. Each coronary-revascularizationpanel had a 2-day face-to-face meeting (all three of which tookplace over a 10-day span in October 1994). Likewise, the threehysterectomy panels met independently for two days each in November1994. All panel meetings occurred in the same room at the RANDoffice in Washington, D.C. In the only departure from usualpractice, we did not allow panelists to alter clinical scenarios,because we wanted an identical set of scenarios in order tocompare results among panels. To minimize the potential effectof this change, we extensively tested our scenarios with nonpanelistsfor clinical sensibility before we used them.
After obtaining the final-round appropriateness ratings, wehad the coronary-revascularization panelists rate again eachscenario that they had judged appropriate for use of the relevantprocedure, this time according to necessity criteria. The conceptof necessity goes beyond that of appropriateness, in that withholdinga procedure that was deemed necessary for a person's clinicalsituation would constitute wrongful underuse of the procedure.12Because our study was restricted to the use of hysterectomyfor nonemergency, nononcologic indications, we did not ask thehysterectomy panel for necessity ratings.
Statistical Analysis
With final ratings from each panel, we assigned an appropriatenesscategory to each clinical indication. Disagreement was consideredto have occurred when at least three panelists rated an indicationin the top third of the riskbenefit scale (7, 8, or 9)and at least three panelists rated the same indication in thebottom third (1, 2, or 3). A median panel rating of 7, 8, or9 without disagreement defined an indication as appropriate.A median panel rating of 1, 2, or 3 without disagreement definedan indication as inappropriate. Indications with a median ratingof 4, 5, or 6, and all indications with disagreement, were classifiedas uncertain. Indications judged appropriate with a median panelrating of 7, 8, or 9 on the necessity scale without disagreementwere considered evidence of a procedure's necessity.
We analyzed the final-round appropriateness ratings using thepairwise percentage of agreement between panels, the kappa statistic(a measure of agreement that takes into account the agreementdue to chance), and the three-way kappa statistic among panels.We used terminology suggested by Landis and Koch13 to assigndescriptive terms to numerical values of kappa. To identifyoveruse, we used the ratings to classify each procedure as "inappropriate"or "not inappropriate." To identify underuse of coronary revascularization,we used the classification of "necessary" or "not necessary."These classifications are the same as those used in previousstudies of overuse and underuse. For each calculation, the indicationwas weighted by the frequency with which it occurs in practice.For the weights for overuse of coronary revascularization, weused data from 2532 persons (randomly selected from 15 hospitalsin New York State) who had undergone coronary revascularization.For the weights for underuse of coronary revascularization,we used data from 1294 persons (randomly selected from 15 NewYork hospitals) who had undergone coronary angiography. Forhysterectomy, we used data from 636 women (randomly selectedfrom seven managed-care organizations) who had undergone hysterectomyfor nonemergency, nononcologic indications. The methods usedfor collecting data and assigning appropriateness criteria basedon medical records have been previously reported.1,2,3 In brief,clinical data were collected from the medical records in sufficientdetail to allow each case to be matched with one of the clinicalscenarios rated by the panels for appropriateness.
Stata software (version 5.0, Stata, College Station, Tex.) wasused for calculations. Confidence intervals were calculatedby the bias-corrected bootstrap method.
Results
Participation rates were extremely high among those invitedto serve as panel members. Of the cardiovascular panelists invited,98 percent agreed to participate, and of the hysterectomy panelists,91 percent agreed to participate. The three panels for eachprocedure were well matched with regard to all measured characteristics(Table 2).
The respective final-round ratings of Panels A, B, and C showeddisagreement on 1, 4, and 4 percent of the coronary-revascularizationscenarios and 9, 6, and 2 percent of the hysterectomy scenarios.
The degree of agreement on appropriateness among the panelswas mixed. Table 3 shows the pairwise agreement, pairwise kappastatistic, and three-way kappa statistic for overuse and underuse.For coronary revascularization, there were high levels of agreementamong panels, with moderate agreement beyond chance with regardto overuse and almost perfect agreement beyond chance with regardto underuse. For hysterectomy, Panels A and B had a very highlevel of agreement, and substantial agreement beyond chance,with regard to overuse. Panel C had a lower level of overallagreement with the other two panels. For both procedures, thethree-way agreement beyond chance with regard to overuse wasmoderate, and for coronary revascularization, the three-wayagreement beyond chance with regard to underuse was almost perfect.
Table 3. Comparisons of Panel Ratings of Overuse and Underuse.
Figure 1 shows the effect of using the appropriateness ratingsof the three coronary-revascularization panels to classify the2532 cases of coronary revascularization in New York. Had PanelA's ratings alone been used to classify care, 160 procedureswould have been labeled as inappropriate. Of these, none wouldhave been rated as necessary by either of the other two panels,and 18 would have been rated as appropriate by one of the otherpanels. Similarly, if Panel B's ratings alone had been usedto classify care, 186 procedures would have been labeled asinappropriate, and none of these would have been rated as necessaryor appropriate by either of the other two panels. Finally, ifPanel C's ratings alone had been used to classify care, 97 procedureswould have been labeled as inappropriate; none of these wouldhave been rated as necessary, but 2 would have been rated asappropriate by one of the other panels. In no instance was acase rated as necessary by one panel and inappropriate by another.
Figure 1. Effect of the Three Panels' Appropriateness Ratings on the Determination of Overuse of Coronary Revascularization.
Figure 2 provides similar data about the underuse of coronaryrevascularization. Of 1294 uses of angiography, 498, 464, and402 would have been rated as necessary by Panels A, B, and C,respectively. No use of angiography judged necessary by onepanel was rated as inappropriate by either of the other twopanels; some were rated as uncertain by at least one other panel(24, 31, and 4 by Panels A, B, and C, respectively).
Figure 2. Effect of the Three Panels' Appropriateness Ratings on the Determination of Underuse of Coronary Revascularization.
Finally, Figure 3 shows the effect of using the appropriatenessratings of the three hysterectomy panels to classify 636 casesof hysterectomy. Using Panel A's ratings or Panel B's ratingsalone would have labeled 200 or 153 hysterectomies, respectively,as inappropriate, with 7 of them for each panel rated as appropriateby one of the other two panels. Using Panel C's ratings alonewould have labeled 331 hysterectomies as inappropriate, with92 of them rated as appropriate by one of the other two panels.
Figure 3. Effect of the Three Panels' Appropriateness Ratings on the Determination of Overuse of Hysterectomy.
We examined the indications for which results were discordantamong panels and found none in which conclusive evidence fromrandomized, clinical trials supported a given action. For overuseof revascularization, three indications involved discordantratings (in a total of 20 cases). Sixteen cases were accountedfor by one indication (patients with chronic stable angina,mild or moderate angina, and single-vessel disease who had lessthan strongly positive results on an exercise stress test orin whom the stress test was not done). For underuse of revascularization,13 indications involved discordant ratings (in a total of 46cases). Four indications accounted for 35 cases, including threethat involved patients presenting within 21 days after an acutemyocardial infarction and one that involved asymptomatic patientswith three-vessel disease. The 92 cases of hysterectomy withdiscordant results were spread over 28 indications, of which26 (93 percent, involving 90 [98 percent] of the cases) involveduterine bleeding (or pelvic discomfort) with "major impairment"of the patient, which was defined as follows: "during the last3 months the patient had had a significant worsening in levelof activity (e.g., 2 or more days per month) due to her bleedingor pain, or the bleeding or pain is continuing to have a significantnegative effect on her functional ability."
Discussion
Our results show that the appropriateness method of identifyingoveruse is far from perfect. The degree of agreement among panelsabout care identified as inappropriate was only moderate. Furthermore,the number of cases categorized as inappropriate varied by afactor of about two for both procedures. However, our resultsfor identifying underuse are more reassuring. Agreement amongpanels was nearly perfect, and the number of cases classifiedas necessary varied by only 20 percent among panels.
The literature is sparse on studies evaluating the reproducibilityand reliability of alternative methods for determining appropriateness.We do know, however, that alternative methods are certain tobe less than perfect. The reliability of individual surgeons'decisions to recommend hysterectomy has been estimated to havea kappa of 0.23.14 Although imperfect, the reproducibility ofthe appropriateness method is markedly better. Three-to-fivefoldvariations in the rate of use of hysterectomy have been reported15,16,17and have been attributed to variability among physicians.18Although imperfect, the appropriateness method is less variable.A recent report on coronary angiography after myocardial infarctionreported a 2.5-fold variation in the rate of use among 16 KaiserPermanente hospitals.19 For cases in which coronary angiographywas judged necessary (by a process identical to that describedhere), there was a 1.6-fold variation. Again, although imperfect,the results of the appropriateness method for coronary revascularizationare less variable.
Although systematic data are lacking, the results of other methods,such as meta-analysis, decision analysis, and cost-effectivenessanalysis, have also been variable. For example, meta-analyseson the same topics have reached different conclusions,20 andmeta-analyses do not always agree with subsequent clinical trials.21,22A recent systematic evaluation of the agreement between meta-analysesand subsequent large clinical trials reported a kappa of 0.3.23Likewise, three independent decision analyses on the use ofisoniazid prophylaxis for patients with positive results ontuberculin skin tests came to three different conclusions.24The estimates of the cost effectiveness of autologous blooddonation have also varied greatly, even for the same surgicalprocedure.25,26,27,28,29,30 Whether any of these methods ismore or less reliable than the appropriateness method remainsto be studied systematically.
The area of medicine with the largest amount of rigorous dataon reliability is diagnostic testing. Although not a diagnostictest, the appropriateness method shares many characteristicswith diagnostic tests, in that both involve classifying patientsinto two or more categories and both therefore have a reproducibility,false positive, and false negative rate. In ischemic cardiacdisease and in women's health, the reliability of thallium scintigraphyfor the diagnosis of ischemic cardiac disease has been estimatedto have a kappa of 0.4531 and a kappa of 0.66,32 the reliabilityof coronary angiography in determining the presence or absenceof stenosis has been estimated to have a kappa of 0.53,33 thereliability of screening mammography has been estimated to havea kappa of 0.47,34 and the reliability of the classificationof cervical smears with grade III histologic features has beenestimated to have a kappa of 0.5035 and a kappa of 0.58.36 Giventhese values, the reproducibility of the appropriateness methodis about the same as that of several well-accepted diagnostictests.
However, the variability we observed in the appropriatenessmethod does have important implications for clinical use. Whenthe method is used to measure rates in a single population,the fact that the classification of inappropriate use variesby a factor of two means that precise estimates are not possible.At best, in a single population, the appropriateness methodcan estimate whether the proportion of cases with overuse issmall or large. The appropriateness method will perform moreacceptably as a way to assess the relative proportions of overuseand underuse among populations. Bias due to misclassificationwill be present in all comparison groups. Although the absolutemeasure of overuse and underuse may be biased because of misclassification,the relative difference among groups is less likely to be biased.
In making decisions for individual patients, however, the situationis different. Like diagnostic tests, the appropriateness methoddoes not have sufficient reproducibility to justify its useas a gold standard of appropriateness. Clinicians and patientsmay wish to use results of the appropriateness method as a startingpoint for discussions about the expected net outcome of a medicalprocedure. Purchasers, however, should consider the appropriatenessmethod as no more than a screening test to identify care thatmay be inappropriately under- or overdelivered. Care that isso identified should then be examined at the next level, whichmust involve direct contact with the provider, and possiblythe patient as well, to ascertain additional details about thecare delivered. Under no circumstances should the care of individualpatients be guided solely by the results of the appropriatenessmethod without additional clinical information.
Our data certainly make it clear that the reproducibility ofthe appropriateness method could be improved. Although our resultsfor coronary revascularization may be acceptable, we need toknow whether the difference between groups of experts consideringother procedures is likely to be of a magnitude similar to thatseen for hysterectomy between Panel C and the other two panels.The variability in the effect of "major impairment" of functionon the appropriateness ratings reflects the different way thatPanel C interpreted the trade-off between risk and benefit forthese patients; the symptom of major impairment was not judgedsufficient to outweigh the risk of the procedure. This findingunderscores the variability of physicians' interpretations ofthe importance of patients' symptoms (as opposed, for example,to mortality or the probability of a myocardial infarction).It also highlights the need for clinical trials of hysterectomythat directly measure symptoms as a primary outcome and theneed to involve patients in quality-of-life decisions.
Further research is needed to identify which procedures arelikely to be associated with reliable appropriateness-methodresults. We can conjecture that the more firm evidentiary basisunderlying the indications for revascularization resulted ina more reliable extrapolation beyond the evidence on the partof the experts. For hysterectomy, where the evidence was scantand the judgments were dependent on individual values, reliabilitywas reduced. This hypothesis can be further explored by examiningin detail the panel discussions or analyzing the results ofdifferent panels for different procedures. Multiple determinationshave also been suggested as a way to improve the reliabilityof some diagnostic tests, such as mammography and coronary angiography.33,34
The use of stratified random sampling, the high participationrate achieved, the similarity of the panelists in many features,and the identical nature of the process in each panel all strengthenthis study as a fair and rigorous test of the reproducibilityof the expert-panel component of the appropriateness method.However, our study has several limitations. Additional componentsnot tested include the development of the systematic reviewand the construction of the clinical scenarios, each of whichmay contribute to variability. Also, we studied only two procedures.Although this was a deliberate choice designed to identify likelyupper and lower boundaries of reproducibility (with coronaryrevascularization and hysterectomy, respectively), values forother procedures may be below the values reported here for hysterectomy.
Future studies of the reproducibility of methods identifyingoveruse and underuse of health procedures should be conductedas rigorously as the study reported here. Only then can we informwith empirical evidence what has thus far been a debate basedlargely on theory and opinion about how best to determine whatcare is appropriate.
Supported by a grant (HSO7185-02) from the Agency for HealthCare Policy and Research. Dr. Shekelle is the recipient of aSenior Research Associate Career Development Award from theDepartment of Veterans Affairs.
We are indebted to Mark Chassin, M.D., for helpful commentsand to the physicians who served as panelists.
Source Information
From the West Los Angeles Veterans Affairs Medical Center, Los Angeles (P.G.S.); RAND, Santa Monica, Calif. (P.G.S., J.P.K., C.J.K., R.E.P.); the Ann Arbor Veterans Affairs Medical Center and the Departments of Internal Medicine and Health Management and Policy, University of Michigan, Ann Arbor (S.J.B.); and the Harvard School of Public Health, Boston (L.L.L.).
Address reprint requests to Dr. Shekelle at RAND, 1700 Main St., P.O. Box 2138, Santa Monica, CA 90407-2138.
References
Leape LL, Hilborne LH, Park RE, et al. The appropriateness of use of coronary artery bypass graft surgery in New York State. JAMA 1993;269:753-760. [Free Full Text]
Bernstein SJ, Hilborne LH, Leape LL, et al. The appropriateness of use of coronary angiography in New York State. JAMA 1993;269:766-769. [Free Full Text]
Bernstein SJ, McGlynn EA, Siu AL, et al. The appropriateness of hysterectomy: a comparison of care in seven health plans. JAMA 1993;269:2398-2402. [Free Full Text]
Fink A, Brook RH, Kosecoff J, Chassin MR, Solomon DH. Sufficiency of clinical literature on the appropriate uses of six medical and surgical procedures. West J Med 1987;147:609-614. [Medline]
Institute of Medicine. Assessing medical technologies. Washington, D.C.: National Academy Press, 1985.
Dubinsky M, Ferguson JH. Analysis of the National Institutes of Health Medicare coverage assessment. Int J Technol Assess Health Care 1990;6:480-488. [Medline]
Brook RH, Chassin MR, Fink A, Solomon DH, Kosecoff J, Park RE. A method for the detailed assessment of the appropriateness of medical technologies. Int J Technol Assess Health Care 1986;2:53-63. [Medline]
Gray D, Hampton JR, Bernstein SJ, Kosecoff J, Brook RH. Audit of coronary angiography and bypass surgery. Lancet 1990;335:1317-1320. [CrossRef][Medline]
Bengtson A, Herlitz J, Karlsson T, Brandrup-Wognsen G, Hjalmarson A. The appropriateness of performing coronary angiography and coronary artery revascularization in a Swedish population. JAMA 1994;271:1260-1265. [Free Full Text]
Phelps CE. The methodologic foundations of studies of the appropriateness of medical care. N Engl J Med 1993;329:1241-1245. [Free Full Text]
Hicks NR. Some observations on attempts to measure appropriateness of care. BMJ 1994;309:730-733. [Free Full Text]
Kahan JP, Bernstein SJ, Leape LL, et al. Measuring the necessity of medical procedures. Med Care 1994;32:357-365. [Medline]
Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174. [CrossRef][Medline]
Rutkow IM, Gittelsohn AM, Zuidema GD. Surgical decision making: the reliability of clinical judgment. Ann Surg 1979;190:409-419. [Medline]
Roos NP. Hysterectomy: variations in rates across small areas and across physicians' practices. Am J Public Health 1984;74:327-335. [Free Full Text]
Hysterectomies in New York State: a statistical profile. Albany: New York State Department of Health, 1988:1-13.
Haas S, Acker D, Donahue C, Katz ME. Variation in hysterectomy rates across small geographic areas of Massachusetts. Am J Obstet Gynecol 1993;169:150-154. [Medline]
Carlson KJ, Nichols DH, Schiff I. Indications for hysterectomy. N Engl J Med 1993;328:856-860. [Free Full Text]
Selby JV, Fireman BH, Lundstrom RJ, et al. Variation among hospitals in coronary-angiography practices and outcomes after myocardial infarction in a large health maintenance organization. N Engl J Med 1996;335:1888-1896. [Free Full Text]
Chalmers TC, Berrier J, Sacks HS, Levin H, Reitman D, Nagalingam R. Meta-analysis of clinical trials as a scientific discipline. II. Replicate variability and comparison of studies that agree and disagree. Stat Med 1987;6:733-744. [Medline]
Borzak S, Ridker PM. Discordance between meta-analyses and large-scale randomized, controlled trials: examples from the management of acute myocardial infarction. Ann Intern Med 1995;123:873-877. [Free Full Text]
Cappelleri JC, Ioannidis JP, Schmid CH, et al. Large trials vs. meta-analysis of smaller trials: how do their results compare? JAMA 1996;276:1332-1338. [Free Full Text]
LeLorier J, Grégoire G, Benhaddad A, Lapierre J, Derderian F. Discrepancies between meta-analyses and subsequent large randomized, controlled trials. N Engl J Med 1997;337:536-542. [Free Full Text]
Colice GL. Decision analysis, public health policy, and isoniazid chemoprophylaxis for young adult tuberculin skin reactors. Arch Intern Med 1990;150:2517-2522. [Free Full Text]
Birkmeyer JD, AuBuchon JP, Littenberg B, et al. Cost-effectiveness of preoperative autologous donation in coronary artery bypass grafting. Ann Thorac Surg 1994;57:161-168. [Abstract]
Birkmeyer JD, Goodnough LT, AuBuchon JP, Noordsij PG, Littenberg B. The cost-effectiveness of preoperative autologous blood donation for total hip and knee replacement. Transfusion 1993;33:544-551. [CrossRef][Medline]
Goodnough LT, Grishaber JE, Birkmeyer JD, Monk TG, Catalona WJ. Efficacy and cost-effectiveness of autologous blood predeposit in patients undergoing radical prostatectomy procedures. Urology 1994;44:226-231. [CrossRef][Medline]
Kattan MW, Eastham JA, Yawn DH, Scardino PT. A decision analysis of the cost effectiveness of preoperative autologous blood donation prior to radical prostatectomy for clinically localized prostate cancer. Med Decis Making 1995;15:429-429.abstract
Etchason J, Petz L, Keeler E, et al. The cost effectiveness of preoperative autologous blood donations. N Engl J Med 1995;332:719-724. [Free Full Text]
Sonnenberg FA, Nizam RA, Yomtovian RA, et al. Cost-effectiveness of autologous blood donation revisited: the impact of increased risk of bacterial infection following allogeneic transfusion. Med Decis Making 1995;15:428-428.abstract
Wackers FJ, Bodenheimer M, Fleiss JL, Brown M. Factors affecting uniformity in interpretation of planar thallium-201 imaging in a multicenter trial. J Am Coll Cardiol 1993;21:1064-1074. [Abstract]
Atwood JE, Jensen D, Froelicher V, et al. Agreement in human interpretation of analog thallium myocardial perfusion images. Circulation 1981;64:601-609. [Free Full Text]
DeRouen TA, Murray JA, Owen W. Variability in the analysis of coronary arteriograms. Circulation 1977;55:324-328. [Free Full Text]
Elmore JG, Wells CK, Lee CH, Howard DH, Feinstein AR. Variability in radiologists' interpretations of mammograms. N Engl J Med 1994;331:1493-1499. [Free Full Text]
Kato K, Santamaria M, De Ruiz PA, et al. Inter-observer variation in cytological and histological diagnoses of cervical neoplasia and its epidemiologic implication. J Clin Epidemiol 1995;48:1167-1174. [CrossRef][Medline]
Ismail SM, Colclough AB, Dinnen JS, et al. Observer variation in histopathological diagnosis and grading of cervical intraepithelial neoplasia. BMJ 1989;298:707-710.
van Hulst, L T C, Fransen, J, Broeder, A A d., Grol, R, van Riel, P L C M, Hulscher, M E J L
(2009). Development of quality indicators for monitoring of the disease course in rheumatoid arthritis. Ann Rheum Dis
68: 1805-1810
[Abstract][Full Text]
Douglas, R. S., Tsirbas, A., Gordon, M., Lee, D., Khadavi, N., Garneau, H. C., Goldberg, R. A., Cahill, K., Dolman, P. J., Elner, V., Feldon, S., Lucarelli, M., Uddin, J., Kazim, M., Smith, T. J., Khanna, D., for the International Thyroid Eye Disease Society,
(2009). Development of Criteria for Evaluating Clinical Response in Thyroid Eye Disease Using a Modified Delphi Technique. Arch Ophthalmol
127: 1155-1160
[Abstract][Full Text]
Sistrom, C. L.
(2009). The Appropriateness of Imaging: A Comprehensive Conceptual Framework. Radiology
251: 637-649
[Abstract][Full Text]
Shekelle, P. G
(2009). Appropriateness criteria: a useful tool for the cardiologist. Heart
95: 517-520
[Full Text]
Halm, E. A., Press, M. J., Tuhrim, S., Wang, J., Rojas, M., Chassin, M. R.
(2008). Does Managed Care Affect Quality? Appropriateness, Referral Patterns, and Outcomes of Carotid Endarterectomy. American Journal of Medical Quality
23: 448-456
[Abstract]
Hemingway, H., Chen, R., Junghans, C., Timmis, A., Eldridge, S., Black, N., Shekelle, P., Feder, G.
(2008). Appropriateness Criteria for Coronary Angiography in Angina: Reliability and Validity. ANN INTERN MED
149: 221-231
[Abstract][Full Text]
Sibai, A. M, Tohme, R. A, Saade, G. A, Ghanem, G., Alam, S., for the Lebanese Interventional Coronary Registry,
(2008). The appropriateness of use of coronary angiography in Lebanon: implications for health policy. Health Policy Plan
23: 210-217
[Abstract][Full Text]
Coenen, S., Ferech, M., Haaijer-Ruskamp, F. M, Butler, C. C, Vander Stichele, R. H, Verheij, T. J M, Monnet, D. L, Little, P., Goossens, H., the ESAC Project Group,
(2007). European Surveillance of Antimicrobial Consumption (ESAC): quality indicators for outpatient antibiotic use in Europe. Qual Saf Health Care
16: 440-445
[Abstract][Full Text]
Francis, H C, Prys-Picard, C O, Fishwick, D, Stenton, C, Burge, P S, Bradshaw, L M, Ayres, J G, Campbell, S M, Niven, R M.
(2007). Defining and investigating occupational asthma: a consensus approach. Occup. Environ. Med.
64: 361-365
[Abstract][Full Text]
Griffin, S C, Barber, J A, Manca, A, Sculpher, M J, Thompson, S G, Buxton, M J, Hemingway, H
(2007). Cost effectiveness of clinically appropriate decisions on alternative treatments for angina pectoris: prospective observational study. BMJ
334: 624-624
[Abstract][Full Text]
Smith, K. L., Soriano, T. A., Boal, J.
(2007). Brief Communication: National Quality-of-Care Standards in Home-Based Primary Care. ANN INTERN MED
146: 188-192
[Abstract][Full Text]
Halm, E. A., Tuhrim, S., Wang, J. J., Rojas, M., Hannan, E. L., Chassin, M. R.
(2007). Has evidence changed practice?: Appropriateness of carotid endarterectomy after the clinical trials. Neurology
68: 187-194
[Abstract][Full Text]
Baicker, K., Buckles, K. S., Chandra, A.
(2006). Geographic Variation In The Appropriate Use Of Cesarean Delivery. Health Aff (Millwood)
25: w355-w367
[Abstract][Full Text]
Guttmann, A., Razzaq, A., Lindsay, P., Zagorski, B., Anderson, G. M.
(2006). Development of Measures of the Quality of Emergency Department Care for Children Using a Structured Panel Process. Pediatrics
118: 114-123
[Abstract][Full Text]
Wang, C. J., McGlynn, E. A., Brook, R. H., Leonard, C. H., Piecuch, R. E., Hsueh, S. I., Schuster, M. A.
(2006). Quality-of-Care Indicators for the Neurodevelopmental Follow-up of Very Low Birth Weight Children: Results of an Expert Panel Process.. Pediatrics
117: 2080-2092
[Abstract][Full Text]
Poston, G. J., Adam, R., Alberts, S., Curley, S., Figueras, J., Haller, D., Kunstlinger, F., Mentha, G., Nordlinger, B., Patt, Y., Primrose, J., Roh, M., Rougier, P., Ruers, T., Schmoll, H. J., Valls, C., Vauthey, N. J.-N., Cornelis, M., Kahan, J. P.
(2005). OncoSurge: A Strategy for Improving Resectability With Curative Intent in Metastatic Colorectal Cancer. JCO
23: 7125-7134
[Abstract][Full Text]
Hutchings, A, Raine, R, Sanderson, C, Black, N
(2005). An experimental study of determinants of the extent of disagreement within clinical guideline development groups. Qual Saf Health Care
14: 240-245
[Abstract][Full Text]
Engels, Y., Campbell, S., Dautzenberg, M., van den Hombergh, P., Brinkmann, H., Szecsenyi, J., Falcoff, H., Seuntjens, L., Kuenzi, B., Grol, R.
(2005). Developing a framework of, and quality indicators for, general practice management in Europe. Fam Pract
22: 215-222
[Abstract][Full Text]
Asch, S. M., McGlynn, E. A., Hogan, M. M., Hayward, R. A., Shekelle, P., Rubenstein, L., Keesey, J., Adams, J., Kerr, E. A.
(2004). Comparison of Quality of Care for Patients in the Veterans Health Administration and Patients in a National Sample. ANN INTERN MED
141: 938-945
[Abstract][Full Text]
Campbell, S M, Shield, T, Rogers, A, Gask, L
(2004). How do stakeholder groups vary in a Delphi technique about primary mental health care and what factors influence their ratings?. Qual Saf Health Care
13: 428-434
[Abstract][Full Text]
Sheldon, T. A, Cullum, N., Dawson, D., Lankshear, A., Lowson, K., Watt, I., West, P., Wright, D., Wright, J.
(2004). What's the evidence that NICE guidance has been implemented? Results from a national evaluation using time series analysis, audit of patients' notes, and interviews. BMJ
329: 999-
[Abstract][Full Text]
Chertow, G. M., Normand, S.-L. T., McNeil, B. J.
(2004). "Renalism": Inappropriately Low Rates of Coronary Angiography in Elderly Individuals with Renal Insufficiency. J. Am. Soc. Nephrol.
15: 2462-2468
[Abstract][Full Text]
Hakim, R. B., Benedict, M. B., Merrick, N. J.
(2004). Quality of Care for Women Undergoing a Hysterectomy: Effects of Insurance and Race/Ethnicity. AJPH
94: 1399-1405
[Abstract][Full Text]
Schneider, E. C., Epstein, A. M., Malin, J. L., Kahn, K. L., Emanuel, E. J.
(2004). Developing a System to Assess the Quality of Cancer Care: ASCO's National Initiative on Cancer Care Quality. JCO
22: 2985-2991
[Full Text]
Taffe, P., Burnand, B., Wietlisbach, V., Vader, J.-P.
(2004). Influence of Clinical and Economical Factors on the Expert Rating of Appropriateness of Preoperative Use of Recombinant Erythropoietin in Elective Orthopedic Surgery Patients. Med Decis Making
24: 122-130
[Abstract]
Shekelle, P.
(2004). The Appropriateness Method. Med Decis Making
24: 228-231
Mariotto, A.
(2003). A Decree against Inappropriate Medical Care in Italy. ANN INTERN MED
139: 958-958
[Full Text]
Gandjour, A., Neumann, I., Lauterbach, K. W.
(2003). Appropriateness of invasive cardiovascular interventions in German hospitals (2000-2001): an evaluation using the RAND appropriateness criteria. Eur. J. Cardiothorac. Surg.
24: 571-577
[Abstract][Full Text]
Beyersdorf, F.
(2003). Editorial comment. Eur. J. Cardiothorac. Surg.
24: 578-579
[Full Text]
Schilling, J., Gerstl, P., Kapetanios, E., Lee, C.-Y., Bertel, O.
(2003). Assessment of Indications in Interventional Cardiology: Appropriateness and Necessity of Coronary Angiography and Revascularization. American Journal of Medical Quality
18: 155-163
[Abstract]
McGlynn, E. A., Asch, S. M., Adams, J., Keesey, J., Hicks, J., DeCristofaro, A., Kerr, E. A.
(2003). The Quality of Health Care Delivered to Adults in the United States. NEJM
348: 2635-2645
[Abstract][Full Text]
Halm, E. A., Chassin, M. R., Tuhrim, S., Hollier, L. H., Popp, A. J., Ascher, E., Dardik, H., Faust, G., Riles, T. S.
(2003). Revisiting the Appropriateness of Carotid Endarterectomy. Stroke
34: 1464-1471
[Abstract][Full Text]
Barnato, A. E., Garber, A. M.
(2003). Performance of the RAND Appropriateness Criteria. Med Decis Making
23: 177-179
Marshall, M N, Shekelle, P G, McGlynn, E A, Campbell, S, Brook, R H, Roland, M O
(2003). Can health care quality indicators be transferred between countries?. Qual Saf Health Care
12: 8-12
[Abstract][Full Text]
Sackley, C., Pound, K.
(2002). Setting priorities for a discharge plan for stroke patients entering nursing home care. Clin Rehabil
16: 859-866
[Abstract]
Roger, V. L., Jacobsen, S. J., Weston, S. A., Pellikka, P. A., Miller, T. D., Bailey, K. R., Gersh, B. J.
(2002). Sex Differences in Evaluation and Outcome After Stress Testing. Mayo Clin Proc.
77: 638-645
[Abstract]
Campbell, S M, Hann, M, Hacker, J, Durie, A, Thapar, A, Roland, M O
(2002). Quality assessment for three common conditions in primary care: validity and reliability of review criteria developed by expert panels for angina, asthma and type 2 diabetes. Qual Saf Health Care
11: 125-130
[Abstract][Full Text]
BERNSTEIN, S. J., LAZARO, P., FITCH, K., AGUILAR, M. D., RIGTER, H., KAHAN, J. P.
(2002). Appropriateness of coronary revascularization for patients with chronic stable angina or following an acute myocardial infarction: multinational versus Dutch criteria. Int J Qual Health Care
14: 103-109
[Abstract][Full Text]
Mathew, J. P., Fontes, M. L., Garwood, S., Davis, E., White, W. D., McCloskey, G., Fitch, J. C.K., Afifi, S., Lee, D. L., Kraker, P., Rafferty, T. D., Barash, P. G., Gillam, L., Prokop, E.
(2002). Transesophageal Echocardiography Interpretation: A Comparative Analysis Between Cardiac Anesthesiologists and Primary Echocardiographers. Anesth. Analg.
94: 302-309
[Abstract][Full Text]
Quintana, J M, Cabriada, J, Lopez de Tejada, I, Varona, M, Oribe, V, Barrios, B, Arostegui, I, Bilbao, A
(2002). Development of explicit criteria for cholecystectomy. Qual Saf Health Care
11: 320-326
[Abstract][Full Text]
Campbell, S M, Braspenning, J, Hutchinson, A, Marshall, M
(2002). Research methods used in developing and applying quality indicators in primary care. Qual Saf Health Care
11: 358-364
[Abstract][Full Text]
Tobacman, J. K., Scott, I. U., Cyphert, S. T., Zimmerman, M. B.
(2001). Comparison of Appropriateness Ratings for Cataract Surgery between Convened and Mail-only Multidisciplinary Panels. Med Decis Making
21: 490-497
[Abstract]
McNeil, B. J.
(2001). Hidden Barriers to Improvement in the Quality of Care. NEJM
345: 1612-1620
[Full Text]
Shekelle, P. G., MacLean, C. H., Morton, S. C., Wenger, N. S.
(2001). Assessing Care of Vulnerable Elders: Methods for Developing Quality Indicators. ANN INTERN MED
135: 647-652
[Full Text]
Schneider, E. C., Leape, L. L., Weissman, J. S., Piana, R. N., Gatsonis, C., Epstein, A. M.
(2001). Racial Differences in Cardiac Revascularization Rates: Does "Overuse" Explain Higher Rates among White Patients?. ANN INTERN MED
135: 328-337
[Abstract][Full Text]
Hearnshaw, H M, Harker, R M, Cheater, F M, Baker, R H, Grimshaw, G M
(2001). Expert consensus on the desirable characteristics of review criteria for improvement of health care quality. Qual Saf Health Care
10: 173-178
[Abstract][Full Text]
Hemingway, H., Crook, A. M., Feder, G., Banerjee, S., Dawson, J. R., Magee, P., Philpott, S., Sanders, J., Wood, A., Timmis, A. D.
(2001). Underuse of Coronary Revascularization Procedures in Patients Considered Appropriate Candidates for Revascularization. NEJM
344: 645-654
[Abstract][Full Text]
Shekelle, P. G.
(2001). Are Appropriateness Criteria Ready for Use in Clinical Practice?. NEJM
344: 677-678
[Full Text]
Hsu, J., Go, A., Selby, J., Pogach, L., Woolf, S. H., Rothemich, S. F., Asch, S. M., Sloss, E. M., Brook, R. H., Kravitz, R. L.
(2001). Overuse of Administrative Data to Measure Underuse of Care. JAMA
285: 735-737
[Full Text]
Buetow, S A, Coster, G D
(2000). New Zealand and United Kingdom experiences with the RAND modified Delphi approach to producing angina and heart failure criteria for quality assessment in general practice. Qual Saf Health Care
9: 222-231
[Abstract][Full Text]
Epstein, A. M., Ayanian, J. Z., Keogh, J. H., Noonan, S. J., Armistead, N., Cleary, P. D., Weissman, J. S., David-Kasdan, J. A., Carlson, D., Fuller, J., Marsh, D., Conti, R. M.
(2000). Racial Disparities in Access to Renal Transplantation -- Clinically Appropriate or Due to Underuse or Overuse?. NEJM
343: 1537-1544
[Abstract][Full Text]
Hannan, E. L.
(2000). The Continuing Quest for Measuring and Improving Access to Necessary Care. JAMA
284: 2374-2376
[Full Text]
Quintana, J. M., Arostegui, I., Azkarate, J., Goenaga, J. I., Guisasola, I., Alfageme, A., Diego, A.
(2000). Evaluation by explicit criteria of the use of total hip joint replacement. Rheumatology (Oxford)
39: 1234-1241
[Abstract][Full Text]
Fitch, K., Lazaro, P., Aguilar, M. D., Kahan, J. P., van het Loo, M., Bernstein, S. J.
(2000). European criteria for the appropriateness and necessity of coronary revascularization procedures. Eur. J. Cardiothorac. Surg.
18: 380-387
[Abstract][Full Text]
Kalant, N., Berlinguet, M., Diodati, J. G., Dragatakis, L., Marcotte, F.
(2000). How valid are utilization review tools in assessing appropriate use of acute care beds?. CMAJ
162: 1809-1813
[Abstract][Full Text]
Conigliaro, J., Whittle, J., Good, C. B., Hanusa, B. H., Passman, L. J., Lofgren, R. P., Allman, R., Ubel, P. A., O'Connor, M., Macpherson, D. S.
(2000). Understanding Racial Variation in the Use of Coronary Revascularization Procedures: The Role of Clinical Factors. Arch Intern Med
160: 1329-1335
[Abstract][Full Text]
Bernstein, S J, Brorsson, B, Åberg, T, Emanuelsson, H, Brook, R H, Werkö, L
(1999). Appropriateness of referral of coronary angiography patients in Sweden. Heart
81: 470-477
[Abstract][Full Text]
Leape, L. L., Hilborne, L. H., Bell, R., Kamberg, C., Brook, R. H.
(1999). Underuse of Cardiac Procedures: Do Women, Ethnic Minorities, and the Uninsured Fail To Receive Needed Revascularization?. ANN INTERN MED
130: 183-192
[Abstract][Full Text]
Kravitz, R. L.
(1999). Ethnic Differences in Use of Cardiovascular Procedures: New Insights and New Challenges. ANN INTERN MED
130: 231-233
[Full Text]
Wassertheil-Smoller, S., Tobin, J., Steingart, R., Hsu, J., Black, N., Ayanian, J. Z., Shekelle, P. G., Park, R.E., Naylor, C. D.
(1998). Assessing the Appropriateness of Medical Care. NEJM
339: 1478-1481
[Full Text]
Naylor, C. D.
(1998). What is Appropriate Care?. NEJM
338: 1918-1920
[Full Text]