To the Editor: Wang et al. (Nov. 22 issue)1 provide a well-reasonedassessment of the statistical issues related to subgroup analyses.However, one important point that should be made is that significancetesting during subgroup analyses is seldom appropriate. Themajority of subgroup analyses are exploratory in nature, andno significance testing should be performed unless an alphalevel needed to achieve significance is attributed to the comparisonof interest in advance. Therefore, although a P value may appropriatelybe calculated to assess the degree of imbalance during suchexploratory analyses, no subsequent significance testing shouldbe allowed. On the basis of the degree of imbalance observedand consideration of the previous probability of a given outcome,we might reasonably decide that a result is not due to chance.However, when we do this, we are on our own, and we are no longerworking within the confines of the frequentist statistical model.We should not pretend otherwise.
Scott Proestel, M.D. Food and Drug Administration Silver Spring, MD 20993 scott.proestel{at}fda.hhs.gov
The views expressed in this letter are those of the author anddo not represent those of the Food and Drug Administration.
References
Wang R, Lagakos SW, Ware JH, Hunter DJ, Drazen JM. Statistics in medicine -- reporting of subgroup analyses in clinical trials. N Engl J Med 2007;357:2189-2194. [Free Full Text]
To the Editor: Wang et al. remind us that subgroup analysesare often underpowered to detect true differences in treatmenteffects and, conversely, may yield spurious false positive resultsfrom multiple comparisons. An additional fundamental limitationof subgroup analyses is that they typically consider patientcharacteristics one variable at a time, whereas patients havemultiple characteristics simultaneously. By considering eachvariable separately, subgroup analyses sequentially divide patientsinto two groups that are more similar than dissimilar, frequentlygiving the (misleading) impression of a consistent treatmenteffect across patients.
Research has shown that important subgroups with extreme differencesin the risk of the primary outcome, differing across many variablessimultaneously, are often concealed within these analyses, sometimesobscuring subgroups of patients who are harmed by treatment.1,2,3,4Since the heterogeneity in the risk of the primary outcome isubiquitous, is typically large, can make overall trial resultsdifficult to interpret, frequently gives rise to important differencesin risk–benefit trade-offs, and can most often be adequatelycaptured by simple risk models, multivariate risk stratificationof results, with tests of interaction between treatment effectand risk strata, should become routine.1,2,3,4 Journals shouldstrongly consider requiring such analyses.
David Kent, M.D. Tufts–New England Medical Center Boston, MA 02111 dkent1{at}tufts-nemc.org
Rodney Hayward, M.D. Veterans Affairs Ann Arbor HealthcareSystem Ann Arbor, MI 48105
Dr. Kent reports receiving research funding from Pfizer. No other potential conflict of interest relevant to this letterwas reported.
References
Ioannidis JP, Lau J. Heterogeneity of the baseline risk within patient populations of clinical trials: a proposed evaluation algorithm. Am J Epidemiol 1998;148:1117-1126. [Free Full Text]
Hayward RA, Kent DM, Vijan S, Hofer TP. Multivariable risk prediction can greatly enhance the statistical power of clinical trial subgroup analysis. BMC Med Res Methodol 2006;6:18-18. [CrossRef][Medline]
Rothwell PM, Mehta Z, Howard SC, Gutnikov SA, Warlow CP. From subgroups to individuals: general principles and the example of carotid endarterectomy. Lancet 2005;365:256-265. [Web of Science][Medline]
Kent DM, Hayward RA. Limitations of applying summary results of clinical trials to individual patients: the need for risk stratification. JAMA 2007;298:1209-1212. [Free Full Text]
The authors reply: We agree with Proestel's cautions about testingfor the equality of treatment groups within the individual levels(say, males and females) of a baseline factor. Claims of heterogeneityof the treatment effect across the levels of a baseline factorshould not be based on such tests. Furthermore, we in generaldo not recommend such testing after an interaction test, regardlessof whether the latter is significant or not. One exception isif there was a prespecified reason to assess the treatment effectwithin a specific subgroup of the patients. A good way to presentinformation about plausible treatment effects with the levelsof a baseline factor is by means of a forest plot.1,2 The confidenceintervals in such plots should not be used to indirectly assess"statistical significance" based on whether they exclude a nulleffect (say, a relative risk of 1), since doing so creates thesame problems noted by Proestel for significance tests.
One important consideration for subgroup analysis is how subgroupsare formed. Kent and Hayward recommend a specific way of formingsubgroups on the basis of multiple, rather than individual,baseline characteristics. In their approach, patients are dividedinto separate groups according to their risks of a disease outcome,which are calculated from a prespecified, externally validatedformula involving multiple baseline characteristics. The purposeof such subgroup analyses is to assess whether the treatmenteffect is homogenous across patients with different risks. Weagree that such an approach can provide valuable informationto guide individualized patient care. Moreover, when a specificrisk-score algorithm is unavailable, it still could be appropriateto assess the heterogeneity of the treatment effect with theuse of a prespecified clinically meaningful categorization basedon multiple baseline characteristics. In other settings, interestin the heterogeneity of treatment effects may be motivated bymetabolic, physiological, anatomical, genetic, or other independentlyidentifiable features of the patients or their disease, notby their risk of the disease outcome under study. These considerationsshould be the main determinants of how subgroups are formed.
Finally, we do not believe that journals should dictate thescientific questions that investigators address, including whetherand how they undertake subgroup analyses of any specific type.Rather, investigators should use a well-reasoned and fully describedapproach to subgroup analyses and report them in accordancewith the guidelines offered in our article.
Rui Wang, M.S. Stephen W. Lagakos, Ph.D. Harvard University Boston, MA 02115
Wactawski-Wende J, Kotchen JM, Anderson GL, et al. Calcium plus vitamin D supplementation and the risk of colorectal cancer. N Engl J Med 2006;354:684-696. [Free Full Text]