|
Background Current staging methods are inadequate for predicting the outcome of treatment of nonsmall-cell lung cancer (NSCLC). We developed a five-gene signature that is closely associated with survival of patients with NSCLC.
Methods We used computer-generated random numbers to assign 185 frozen specimens for microarray analysis, real-time reverse-transcriptase polymerase chain reaction (RT-PCR) analysis, or both. We studied gene expression in frozen specimens of lung-cancer tissue from 125 randomly selected patients who had undergone surgical resection of NSCLC and evaluated the association between the level of expression and survival. We used risk scores and decision-tree analysis to develop a gene-expression model for the prediction of the outcome of treatment of NSCLC. For validation, we used randomly assigned specimens from 60 other patients.
Results Sixteen genes that correlated with survival among patients with NSCLC were identified by analyzing microarray data and risk scores. We selected five genes (DUSP6, MMD, STAT1, ERBB3, and LCK) for RT-PCR and decision-tree analysis. The five-gene signature was an independent predictor of relapse-free and overall survival. We validated the model with data from an independent cohort of 60 patients with NSCLC and with a set of published microarray data from 86 patients with NSCLC.
Conclusions Our five-gene signature is closely associated with relapse-free and overall survival among patients with NSCLC.
Gene-expression profiling (see Glossary) by means of microarrays3,4 and reverse-transcriptase polymerase chain reaction (RT-PCR)5,6 is useful for classifying tumors and formulating a prognosis for patients with various types of cancer,7,8,9 including lung cancer.10,11,12,13,14,15,16 The use of microarrays in clinical practice is limited, however, by the large number of genes used in gene profiling,17 the need for complicated methods, and the lack of both reproducibility and independent validation. The genes selected for profiling in studies of lung cancer have varied considerably; only a few genes have been consistently included.10,11,12,13 Moreover, gene-expression profiles can vary according to the microarray platform and the analytic strategy used.6
The RT-PCR method can be applied to paraffin-embedded pathological specimens and is reproducible and applicable in clinical practice. However, RT-PCR can be used to analyze only a small number of genes.17 In a previous study, our group performed microarray analysis of cell lines derived from specimens of invasive NSCLC and identified 672 genes associated with invasive activity.18 We also identified genes (CRMP-1 and HLJ1) that are associated with clinical outcome of patients with NSCLC.19,20 A recent study showed that the results of RT-PCR analysis of eight genes correlated with the outcomes of patients with adenocarcinoma of the lung.5
In the current study, we examined gene expression in 125 surgical specimens of NSCLC, using microarrays and real-time RT-PCR in order to identify a gene signature that is correlated with the clinical outcome.
Methods
Patients and Tissue Specimens
We used computer-generated random numbers to assign specimens from 185 consecutive patients for microarray analysis. We studied frozen specimens of lung-cancer tissue from 125 randomly selected patients who underwent surgical resection of NSCLC at the Taichung Veterans General Hospital between December 1999 and December 2003. Of these 125 specimens, 60 were adenocarcinomas, 52 were squamous-cell carcinomas, and 13 were other types of cancer. We validated the five-gene risk-prediction model using an independent cohort of 60 randomly selected patients who underwent surgical resection of NSCLC at the Taichung Veterans General Hospital between November 1999 and December 2003. The patients had not received adjuvant chemotherapy. The study was approved by the institutional review board of the hospital. Written informed consent was obtained from all patients.
Microarray Analysis of Complementary DNA
The 672 genes associated with invasive activity, identified in a previous study by our group,18 were rearrayed in duplicate on a nylon membrane. We isolated 4 µg of total RNA from each specimen, amplified it using an amplification kit (Ambion), and labeled it with digoxigenin during reverse transcription.21 The details of target preparation, hybridization, color development, image analysis, and spot quantification have been described previously.18,21,22
RT-PCR Analysis
To validate the levels of expression of genes found on microarray analysis, RT-PCR was performed on 16 genes and a control gene for TATA-boxbinding protein (TBP), with the use of specific TaqMan probes and primer sets; the transcripts were amplified with reagent (TaqMan One-Step RT-PCR Master Mix Reagent, Applied Biosystems) and a sequence detection system (ABI Prism 7900HT, Applied Biosystems). Gene expression was quantified in relation to the expression of TBP with the use of sequence detector software and the relative quantification method (Applied Biosystems) (for details, see the Methods section of the Supplementary Appendix, available with the full text of this article at www.nejm.org). We chose TBP as the internal control for real-time RT-PCR because it is invariant in clinical cancer specimens.23
Statistical Analysis
The 125 specimens were randomly assigned to either the training set or the testing set (see Table 1 of the Supplementary Appendix). The average intensity for each gene in the microarray was assessed. To reduce variation among microarrays, the intensity values for samples in each microarray were rescaled by means of a quantile normalization method.24 To reduce background noise, background intensity values of less than 3000 were assigned the value of 3000.22 Each intensity value was then log-transformed to a base-2 scale. Genes with coefficients of variation of less than 3% were excluded from further analyses. Finally, the gene-expression intensity values were transformed to ordinal coding values, according to the ranking of the level of gene expression among the 485 genes in 125 patients (60,625 observations). The intensity value was coded as 1 for expression levels ranked as at or below the 25th percentile of the total gene expression, 2 for levels above the 25th and at or below the 50th percentiles, 3 for levels above the 50th and at or below the 75th percentiles, and 4 for levels above the 75th percentile.
Hazard ratios from univariate Cox regression analysis were used to determine which genes were associated with death from any cause or recurrence of cancer. Protective genes were defined as those associated with a hazard ratio for death of less than 1; risk genes were defined as those associated with a hazard ratio for death of more than 1. We used univariate Cox proportional-hazards regression analysis to evaluate the association between survival and the level of expression of each gene from microarray analysis.25 For genes that were significantly correlated with survival, we used a linear combination of the gene-expression coding values weighted by the regression coefficients to calculate a risk score for each patient.6,10
16-Gene Signature
Risk scores were calculated for 16 genes. A patient's risk score was calculated as the sum of the levels of expression of each gene, as measured by microarray analysis, multiplied by the corresponding regression coefficients (see the Methods section of the Supplementary Appendix). Patients were classified as having a high-risk gene signature or a low-risk gene signature, with the 50th percentile (median) of the risk score as the threshold value (median, 4.9; range, 1.3 to 21.9). The median risk score was chosen as the threshold value to reflect the fact that almost half of patients with early-stage NSCLC relapse within 5 years after potentially curative surgery2 and also in order to eliminate the effect of extreme values in the training cohort by ensuring that there were equal numbers of patients in the high-risk and low-risk groups. The risk scores and the threshold value derived from the training cohort were not reestimated but were applied directly to the testing cohort.
Five-Gene Signature
The levels of expression of the 16 genes were confirmed by RT-PCR and indexed by Spearman's rank-correlation test.26 From these 16 genes, we further identified five genes that were significantly associated with survival. The levels of expression of these five genes, as measured by RT-PCR, were used to construct the recursive-partitioning decision tree.27,28 Avadis software29,30 (Strand Genomic) was then used to classify patients as having a high-risk gene signature or a low-risk gene signature on the basis of the decision tree.
Our rationale for using a decision tree based on RT-PCR rather than on microarray analysis was practicality. RT-PCR uses a small number of genes to capture the relevant covariate structure, especially the complex interaction and nonlinearity of levels of gene expression.28 In our univariate-splitting tree, only one of the five genes was used to make a splitting decision at each intermediate node. To avoid overfitting, we used a pruning method called minimum error (see the Methods section and Figure 1 of the Supplementary Appendix).
The KaplanMeier method was used to estimate overall survival and relapse-free survival. Differences in survival between the high-risk group and the low-risk group were analyzed with the log-rank test. Multivariate Cox proportional-hazards regression analysis with stepwise selection was used to evaluate independent prognostic factors associated with survival, and the five-gene signature, age, sex, tumor stage, and histologic characteristics were used as covariates. A P value of less than 0.05 was considered to indicate statistical significance, and all tests were two-tailed.
We also studied an independent cohort of 60 patients who underwent surgical resection of NSCLC at the Taichung Veterans General Hospital between November 1999 and December 2003. This cohort was used to validate our five-gene risk-prediction model.
To further validate our model, we applied it to microarray data from 86 patients with NSCLC, reported by Beer et al.10 (available at http://dot.ped.med.umich.edu:2000/ourimage/pub/Lung/index.html). The five genes (and their corresponding Affymetrix probe sets) were DUSP6 (X93920_at), MMD (X85750_at), STAT1 (M97936_at), ERBB3 (S61953_at), and LCK (M26692_s_at); the control gene was TBP (X54993_s_at). To make the levels of gene expression from the microarrays and from RT-PCR comparable, we log-transformed the microarray data to a base-2 scale after assigning a value of 1.1 to intensity values of less than 1.1. After log transformation, the levels of expression of the five genes were divided by the level of expression of the control gene TBP in order to calculate the relative level of expression. We applied the decision-tree model to these relative levels of expression, using the data from 86 patients with NSCLC.10 Because the maximum follow-up time for the survival analysis in our study was 62 months, we used the 5-year survival data for the 86 patients.
Results
The 16-Gene Signature and Survival
On microarray analysis of tumors from the 125 patients, 485 of 672 genes had a coefficient of variation greater than 3% and were thus included in the analyses. Hazard ratios from the univariate Cox regression analysis showed that the levels of expression of 16 genes correlated with death from any cause: 4 were protective genes (associated with a hazard ratio of less than 1) and 12 were risk genes (associated with a hazard ratio of more than 1 (Table 1).
|
|
The Five-Gene Signature and Survival
There was a significant correlation between the results of microarray and RT-PCR analyses for the gene-expression data for 5 of the 16 genes in 101 of the 125 tumor specimens (Table 1). These five genes were for dual-specificity phosphatase 6 (DUSP6), monocyte-to-macrophage differentiation-associated protein (MMD), signal transducer and activator of transcription 1 (STAT1), v-erb-b2 avian erythroblastic leukemia viral oncogene homolog 3 (ERBB3), and lymphocyte-specific protein tyrosine kinase (LCK).
We identified 59 patients with high-risk gene signatures and 42 with low-risk gene signatures, according to gene expression as measured with RT-PCR and decision-tree analysis (see Figure 1 of the Supplementary Appendix). The structure of the decision tree was based on the threshold of expression of each of the five genes, as automatically determined according to a recursive-partition algorithm. The use of this algorithm resulted in the most accurate separation of patients with a high-risk signature from those with a low-risk signature. Table 2 summarizes the clinical characteristics of the 101 patients, hereafter called the original cohort, according to their five-gene signatures. The five-gene signature was strongly associated with overall survival (sensitivity, 98%; specificity, 93%; positive predictive value, 95%; negative predictive value, 98%; and overall accuracy, 96%).
|
|
|
The clinical characteristics of the 60 patients in the validation cohort are listed in Table 2. The median duration of follow-up was 17 months. Patients with a high-risk gene signature had a shorter median overall survival than those with a low-risk gene signature (21 months vs. not reached) (Figure 2E). According to Cox multivariate regression analysis, the five-gene signature was significantly associated with overall survival (Table 3).
We analyzed the five-gene signatures in tumor specimens obtained from patients in the validation cohort with stage I or stage II disease both together and separately. Among patients with stage I or II disease combined, those with a high-risk gene signature had a shorter overall survival than those with a low-risk gene signature (Figure 2F). Among patients with stage I disease, low-risk gene signatures were associated with a longer overall survival than were high-risk gene signatures (P=0.02 by the log-rank test). Among patients with stage II disease, overall survival did not differ significantly between those with high-risk and those with low-risk gene signatures, probably owing to the small number of patients.
We also validated the five-gene signature in an independent set of microarray data from 86 patients from a Western population with NSCLC.10 Table 2 of the Supplementary Appendix lists the clinical characteristics of these 86 patients according to their five-gene signatures. The patients with high-risk gene signatures had a shorter overall survival than did those with low-risk gene signatures (Figure 2G) (P=0.06 by the log-rank test). According to Cox multivariate regression analysis, the high-risk five-gene signature and tumor stage III were significantly associated with death from any cause (Table 3).
Discussion
NSCLC is a heterogeneous disease. Even in patients with similar clinical and pathological features, the outcome varies: some are cured, whereas in others, the cancer recurs. Staging systems for lung cancer that are based on clinical and pathological findings may have reached their limit of usefulness for predicting outcomes, but molecular methods add value. Gene-expression profiling with the use of microarrays3,4 or PCR5,6 has been shown to estimate the prognosis for patients with lung cancer accurately.10,11,12,13,14,15,16 However, the use of microarrays in clinical practice is limited by the large number of genes in the analysis,17 complicated methods, lack of reproducibility and independent validation of the results, and the need for fresh-frozen tissue.17 RT-PCR involving a small number of genes may be a more clinically useful method. It allows for accurate and reproducible quantification of results for RNA obtained from small amounts of paraffin-embedded specimens.17,31 The results of RT-PCR performed on 8 genes, selected from a total of 45, have recently been shown to correlate with the outcomes of lung adenocarcinoma.5
We identified an RT-PCRbased five-gene signature (including DUSP6, MMD, STAT1, ERBB3, and LCK) using risk scores based on microarray and decision-tree analyses of 125 frozen tumor specimens from patients with NSCLC. The specimens were randomly divided into a training set (63 specimens) and a testing set (62 specimens). The presence of a high-risk five-gene signature in the NSCLC tumors was associated with an increased risk of recurrence and decreased overall survival.
Our selection of genes in the microarray training set was validated in the microarray testing set, and the patterns of gene expression found on microarray analysis were validated by RT-PCR. Our results were also validated in an independent cohort of 60 patients who were treated at the Taichung Veterans General Hospital. These results in our Chinese patients were also validated with the use of a set of published NSCLC microarray data from patients from a Western population with NSCLC. Thus, we believe that the data we obtained using the five-gene signature are reliable.
The identification of five genes that are closely associated with the outcomes in patients with NSCLC has clinical implications. Cisplatin-based adjuvant chemotherapy is effective in some patients with NSCLC.32 We propose that patients who have tumors with a high-risk gene signature could benefit from this type of adjuvant therapy, whereas those with a low-risk gene signature could be spared what may be unnecessary treatment. Prospective, large-scale, multicenter studies are necessary to test this idea.
The identification of five genes that can predict the clinical outcome in patients with NSCLC may reveal targets for the development of therapy for lung cancer. STAT1 causes arrested growth and apoptosis in many types of cancer cells by inducing the expression of p21WAF1 and caspase.33,34 MMD is preferentially expressed in mature macrophages.35 Our group has shown that macrophage activation promotes cancer metastasis,22 although the function of the MMD protein is unknown. DUSP6 inactivates extracellular signal-regulated kinase 2 (ERK2) (also known as mitogen-activated protein kinase 1 [MAPK1]), resulting in tumor suppression and apoptosis.36 ERBB3, a member of the epidermal growth factor receptor family of tyrosine kinases, can shorten cell survival.37 LCK, a member of the Src family of protein tyrosine kinases, is expressed mainly in T cells and is one of the first signaling molecules downstream of the T-cell receptor. It plays a key role not only in the differentiation and activation of T cells but also in the induction of apoptosis.38 In addition, LCK is expressed in many cancers and regulates the mobility of cancer cells.39,40
In conclusion, the five-gene expression signature we identified is closely associated with the clinical outcome in patients with surgically resected NSCLC. This signature could be useful in stratifying patients according to risk in trials of adjuvant treatment of the disease.
Glossary
Decision tree: A statistical tool for predicting which patient belongs to which specific class (e.g., good or poor clinical outcome) on the basis of feature information (gene-expression levels), with the use of a recursive-partitioning process and tree-based classification rules.
Gene-expression profiling: Determination of the level of expression of thousands of genes simultaneously by DNAmicroarray or real-time RT-PCR.
High-risk gene signature: Aberrant expression of a panel of genes in tissue that signifies a high risk of an adverse outcome (relapse or death in patients with cancer).
Independent cohort: An independent group of patients having clinical characteristics similar to those of an original group of patients in a study. The independent cohort is used to confirm the findings of the original study.
Risk gene: A gene for which altered expression in the tissue of interest is associated with an increased risk of an adverse clinical outcome (relapse or death in patients with cancer).
Risk score: A score that predicts the likelihood of an individual patient's survival on the basis of statistical analysisof risk factors (the expression levels of risk genes) associated with survival.
Supported by grants from the National Research Program for Genomic Medicine of the National Science Council of the Republic of China (NSC94-3112-B002-013-Y) and from Advpharma.
Dr. Terng reports being an employee of Advpharma. No other potential conflict of interest relevant to this article was reported.
Source Information
From National Taiwan University College of Public Health (H.-Y.C., W.J.C.), National Taiwan University College of Medicine (H.-Y.C., S.-L.Y., C.-L.C., C.-H.W., S.-F.K., H.-N.L., S.S., W.J.C., J.J.W.C., P.-C.Y.), Academia Sinica (C.-H.C, P.-C.Y.), National Taiwan University Hospital (A.Y., W.-K.C., P.-C.Y.), and Advpharma (H.-J.T.) all in Taipei, Taiwan; and Taichung Veterans General Hospital (G.-C.C., C.-Y.C.) and National Chung-Hsing University (G.-C.C., C.-C.L., J.J.W.C.) both in Taichung, Taiwan.
Drs. W.J. Chen, J.J.W. Chen, and P.C. Yang contributed equally to this article.
Address reprint requests to Dr. Yang at the Department of Internal Medicine, National Taiwan University Hospital, No. 7, Chung-Shan S. Rd., Taipei, Taiwan 100, or at pcyang{at}ha.mc.ntu.edu.tw.
References
| |||||||||||||||||||||||||||||||||||||||||||||||||||||||
Related Letters:
Five-Gene Signature in NonSmall-Cell Lung Cancer
Michiels S., Hill C., Raz D. J., Jablons D. M., Dobbin K. K., Gounaris I., Quintás-Cardama A., Gibbons D. L., Chen H.-Y., Chen W. J., Yang P.-C.
Extract |
Full Text |
PDF
N Engl J Med 2007;
356:1581-1583, Apr 12, 2007.
Correspondence
This article has been cited by other articles:
HOME | SUBSCRIBE | SEARCH | CURRENT ISSUE | PAST ISSUES | COLLECTIONS | PRIVACY | HELP | beta.nejm.org Comments and questions? Please contact us. The New England Journal of Medicine is owned, published, and copyrighted © 2008 Massachusetts Medical Society. All rights reserved. |