|
| |||||||||||||||||||||||||||||||||||||
|
Methods
In 1998, we measured the quality of care in a stratified, random sample of 60 primary care practices in six geographic areas of England. These practices were nationally representative in terms of size, whether the practice was approved for residency training, and the sociodemographic characteristics of their populations.9 We followed up 42 of these practices in 2003 and 2005. The reduction in the number of practices was due partly to attrition and partly to the retirement of solo physicians and the closing of other practices. The 42 practices for which data were available for the longitudinal analysis were still nationally representative in terms of socioeconomic status, but solo practitioners were underrepresented.4 However, these 42 practices have values close to the national averages for socioeconomic status, population density, and type of housing of the patient population, and their performance was also typical of English family practices during the first year of the pay-for-performance program.2 The analysis was restricted to the 42 practices for which data were available for all three time points (1998, 2003, and 2005). The research protocol was approved by the ethics committee of the multicenter Manchester National Health Service.
Data Collection
Trained research staff extracted the data to assess the quality of clinical care for the categories of coronary heart disease (15 clinical indicators), asthma (12 clinical indicators), and type 2 diabetes (21 clinical indicators). Data were collected from both computerized and handwritten medical records with the use of evidence-based review criteria10,11 developed with the RAND–UCLA appropriateness method.12 Patients with these three conditions were randomly selected from lists of those receiving the relevant drugs (see the Supplementary Appendix, available with the full text of this article at www.nejm.org) according to repeat prescriptions within the previous 6 months, and separate samples of patients treated in 1998, 2003, and 2005 were selected. In 1998, for two practices, there were no eligible patients who had coronary heart disease because of the young age of the patient population, so for that time point, the results for this condition are based on only 40 practices. Data were collected for up to 20 patients for each of the three conditions in each practice in 1998 (some small practices did not have 20 patients for each of the conditions) and for up to 12 patients for each condition in each practice in 2003 and 2005. Data were collected for a total of 2300 patients in 1998, for 1495 patients in 2003, and for 1482 patients in 2005. These data are presented as a pooled analysis across practices.
Although the study did not include conditions that were not rewarded with financial incentives in the pay-for-performance program, there were clinical indicators for coronary heart disease, asthma, and type 2 diabetes for which financial incentives were not provided in 2004. We compared 30 indicators for which financial incentives were provided with 17 indicators for which financial incentives were not provided. In this analysis, we excluded three clinical indicators for which this distinction was unclear — that is, it was not clear whether there were financial incentives provided for the indicator at all three time points.
Statistical Analysis
An overall score for the quality of care was computed for each patient included for 1998, 2003, and 2005. For each patient with asthma, coronary heart disease, or type 2 diabetes, the score was computed as a ratio: the number of clinical indicators for which appropriate care was provided, divided by the number of indicators relevant to that patient. Expressed as a percentage, this score represents the percentage of "necessary care"10 provided to each patient, within a range from 0 to 100. We adopted this measure for consistency with our previous investigation of this sample.4 Scores for the quality of care at the practice level were computed as the simple average of the scores for individual patients within each practice.
We also performed analyses for individual clinical indicators. Data were available for at least five patients in a practice for each indicator analyzed and for geographic areas where the number of practices meeting this criterion was at least 10. In 1998, data were collected for indicators requiring an activity to be undertaken annually on the basis of data recorded during the previous 14 months, whereas in 2003 and 2005, data were collected for indicators requiring an activity to be undertaken annually on the basis of data recorded during the previous 15 months, in line with the pay-for-performance contract. For consistency, we generated 15-month versions of the clinical indicators for 1998 from our original data for use in the present analysis.
We compared scores for observed quality in 2005 with the scores predicted on the basis of the trend between 1998 and 2003. Practice scores for individual indicators, computed as a percentage of patients receiving appropriate care as the indicator, were subject to a ceiling effect of 100%. In calculating the expected scores for 2005, it was inappropriate to use a simple linear model, because such a model would fail to account for ceiling effects (i.e., some predicted scores would have exceeded 100% if the previous linear trend had been extrapolated from 2003 to 2005). Probit and logit models are most commonly used to model binary data. We adopted the logit model a priori for the current analysis, calculating expected values for 2005 on the basis of the logit curve that the scores from 1998 to 2003 followed and extrapolating this curve to 2005. After performing the analysis, we computed probit predictions for comparative purposes and found these all to be within 1 percentage point of their logit equivalent. For quality-of-care scores between 20 and 80%, the logit and probit curves are essentially linear.
For each practice, we therefore computed a predicted score for 2005 using a logit projection from 1998 and 2003. We computed predicted values for overall scores and for individual clinical indicators. The predicted scores were then compared with the actual scores for the practices in 2005. However, because of floor and ceiling effects, the differences between actual and predicted scores are not equivalent across the scale: the difference between an observed score of 54 and a predicted score of 50 does not have the same import as the difference between a score of 99 and a score of 95. To adjust for this difference, observed and predicted scores were converted into their logit equivalents before the analysis. Under the transformation, a proportion, P, is transformed into a log odds, as Logit(P)=ln[P÷(1–P)]. Since Logit(P) cannot be computed where the value of P is 0 or 1, we used the empirical logit, Logit(P)=ln[(P+0.5÷n)÷(1–P+0.5÷n)], in these cases, where n is the number of observations over which P is calculated.13 The logit transformation maps the scale of 0 to 100% to a scale of less than infinity to infinity.13 The transformation therefore "stretches" the scores at the extremes, which increases the effect on the results of the analysis of practices for which scores are close to the floor or the ceiling.
The transformed observed and predicted scores for 2005 were then compared by means of a matched-pairs two-tailed t-test. In view of the relatively small number of practices included in the analysis and the distributional assumptions made by this test, a bootstrap procedure based on 1000 bootstrap samples was used to confirm the significance of the tests.
Although the logit model is appropriate to an analysis of individual binary indicators, the overall scores for quality used here may not conform to the logit curve as the scores approach the ceiling. Therefore we repeated the analysis, using a linear model applied to the untransformed scores. The linear model makes no adjustment for ceiling effects, except that we did not allow predictions greater than 100%. Hence the results of the sensitivity analysis are likely to be conservative.
For comparisons of clinical indicators for which financial incentives were provided with those for which they were not provided, we derived observed and predicted scores for clinical quality at the practice level separately for the groups of indicators in the management of each condition for which financial incentives were provided and for those indicators for which they were not provided. The logit transformation was then applied to these scores. A matched-pairs t-test was used to compare the difference between the observed scores for indicators for which financial incentives were provided and those for which financial incentives were not provided with the predicted difference. The significance of the test was confirmed with the use of the bootstrap procedure.
Results
The quality of care in the categories of coronary heart disease, asthma, and type 2 diabetes improved between 2003 and 2005, continuing the earlier trend (Figure 1). However, the increase in the rate of improvement between 2003 and 2005 was significant for asthma (P<0.001) and diabetes (P=0.002) (Table 2). Scores for coronary heart disease also increased, but the change in the rate of improvement was not significant (P=0.07). Similarly, the sensitivity analysis performed with the use of the more conservative linear model showed significant increases in the rate of improvement for asthma and diabetes, as compared with the rates for coronary heart disease.
|
|
Observed scores for clinical quality for 1998, 2003, and 2005 and the predicted score for 2005 for each indicator in each condition are shown in Table 3. The table includes examples of changes in individual clinical indicators that are likely to be particularly important for improving patient outcomes, such as control of cholesterol and blood pressure in the management of coronary heart disease.14
|
|
Although the quality of care in the categories of asthma, coronary heart disease, and type 2 diabetes was improving before the introduction of the 2004 contract, our results suggest that the introduction of pay for performance was associated with a modest acceleration in improvement for two of these three conditions: diabetes and asthma. In most of the 42 practices for which data were available, the annual improvement for both was accelerated. The results are based on care reported in the medical records but not necessarily on care provided, and it is a common criticism of pay-for-performance programs that their main effect is to promote better recording of care rather than better care. However, the panels used to develop the indicators judged that to provide good care, it was necessary both to provide the care and to record the processes and intermediate outcomes assessed in terms of the indicators we used.10
Because some details of the new contract were publicized before the contract was introduced in 2004, it is possible that some practices were preparing for the incentives during our second round of data collection in 2003. If this is true, we might have overestimated improvements in quality made before pay for performance was introduced,4 so that the results reported here could represent a conservative estimate of the actual improvement resulting from pay for performance.
The key question is whether the increased rate of improvement in quality of care after the new contract was introduced can be attributed to pay for performance or to other factors. The pay-for-performance program was the only major national policy implemented in primary care in England in 2004 that targeted the types of care processes evaluated in this study. However, since practices were observed at only two time points before the introduction of pay for performance, we were unable to determine whether the rate of improvement had already accelerated as a result of earlier but still ongoing initiatives. No control group could be recruited, because financial incentives were applied simultaneously across the whole of the United Kingdom. A final concern is that the patients included in the study were selected on the basis of the presence or absence of treatment with relevant drugs, and those who were untreated or who did not comply with treatment were excluded from the analysis. This selection bias could have resulted in overestimation of the quality of care at all three time points, although the trends in quality should have been unaffected.
The study focuses on three chronic conditions — asthma, coronary heart disease, and type 2 diabetes — for which financial incentives were provided under the pay-for-performance program and which had also been subject to considerable quality-improvement activity in the United Kingdom as part of a national quality-improvement strategy. The finding of a significant increase in the rate of improvement for asthma and diabetes but not for coronary heart disease may reflect the fact that in 2003 scores for quality for coronary heart disease were already higher than those for the other two conditions. Coronary heart disease had been a particular target of earlier quality-improvement initiatives, with 98% of the Primary Care Trusts reporting coronary heart disease initiatives in 2001 and 2002.15
The finding of no significant difference in the rate of improvement between clinical indicators for which financial incentives were provided and those for which they were not provided suggests that the pay-for-performance program may not necessarily have been responsible for the acceleration in improvement that we found between 2003 and 2005. However, the study was not designed or powered for this analysis, and the broad confidence limits for many of the clinical indicators shown in Table 3 reflect the uncertainty associated with the small sample available for the analysis. In addition, there may have been a "halo effect," as a result of which some indicators for which financial incentives were not provided may have been indirectly rewarded. For example, the clinical indicator "control of total serum cholesterol in coronary heart disease to 190 mg per deciliter (5 mmol per liter) or less," which in the 2004 contract became an indicator for which a financial incentive was provided, is likely to have influenced performance on "evidence of action being taken if cholesterol was raised," a clinical indicator for which a financial incentive was not specifically provided. Improvements may therefore have spilled over onto other aspects of care that were not subject to performance monitoring. This effect has previously been noted in the Department of Veterans Affairs quality-improvement programs.16 The study was not able to assess what is perhaps a more important question, namely, what the effect of financial incentives was on care for conditions for which no financial incentives were provided at all.
The introduction of the pay-for-performance program has been associated with a general trend in the National Health Service away from placing implicit trust in health care professionals and toward more active monitoring of their performance than before the program was introduced.17 Financial incentives are most likely to be an effective means of influencing professional behavior when performance targets and rewards are aligned to the values of the staff being rewarded.18,19 Professional motivation alone may not be sufficient to improve the quality of care, especially when physicians have to make financial investments in their practices — for example, by employing more staff to achieve gains in quality. Sustained improvement in quality of care, which involves a range of health care providers (e.g., physicians, nurses, and administrative staff), requires a combination of other factors, including clear goals, good teamwork, and effective leadership.20
Owing to the inherent limitations of our study design and data, it was not possible to determine whether improvements in quality of care resulted only from the pay-for-performance program; our findings are consistent with previous work, which suggested that financial incentives can change professional behavior21,22 and that patients receive higher-quality care in geographic areas where performance measures and monitoring have been established.16 However, there are also potential, unintended consequences of such schemes.1,23 These include the possible neglect of geographic areas where financial incentives for improvements in care are not provided and of "myopia" (the pursuit of short-term targets at the expense of legitimate long-term objectives) or "misrepresentation" (deliberate manipulation of data so that reported behavior differs from actual behavior).24 In addition, external incentives may crowd out motivation — the desire to do a task well for its own sake.25,26 In the United Kingdom, family practitioners have predicted that among the adverse consequences of financial incentives may be a reduction in the continuity of care, fragmentation of care as a result of specialization within practices, and neglect of conditions for which financial incentives are not provided.27 Despite these concerns, overall job satisfaction among family physicians was higher in 2004 than in 2001.28 Moreover, a recent report from the United States suggests that targeted quality-improvement programs have not resulted in a deterioration in the quality of care in untargeted disease areas.29 Our results generally support the view of the Institute of Medicine that pay-for-performance programs can make a useful contribution to improving quality,30 particularly when such programs are part of a comprehensive quality-improvement program.31
The size of the gains in quality in relation to the costs of pay for performance remains a political issue in the United Kingdom,32 and the government now accepts that it paid more than it had expected to pay for the improvements in performance.33 The proportion of practice income taken as profit by general practitioners appears to have increased after the new contract was introduced, suggesting that gains in quality could have been achieved at a lower cost. For the years 2006 through 2007, the pay-for-performance framework has been amended to introduce higher payment thresholds, new targets, and new disease areas34 without increasing physicians' maximum available income from incentive payments. Physicians in the United Kingdom may now need to work harder or employ more staff to earn the same rewards that they had received before 2006.
Supported by the U.K. Department of Health. The views presented here are those of the authors and not necessarily of the U.K. Department of Health.
Dr. Roland reports serving as an academic advisor to the government and the British Medical Association negotiating teams during the development of the United Kingdom pay-for-performance scheme during 2001 and 2002. No other potential conflict of interest relevant to this article was reported.
Source Information
From the National Primary Care Research and Development Centre, University of Manchester, Manchester, United Kingdom.
Address reprint requests to Dr. Campbell at the National Primary Care Research and Development Centre, University of Manchester, Oxford Rd., Manchester M13 9PL, United Kingdom.
References
| |||||||||||||||||||||||||||||||||||||
This article has been cited by other articles:
HOME | SUBSCRIBE | SEARCH | CURRENT ISSUE | PAST ISSUES | COLLECTIONS | PRIVACY | TERMS OF USE | HELP | beta.nejm.org Comments and questions? Please contact us. The New England Journal of Medicine is owned, published, and copyrighted © 2009 Massachusetts Medical Society. All rights reserved. |