|
Published at www.nejm.org July 29, 2007 (10.1056/NEJMoa073493) |
| |||||||||||||||||||||||||||||||||||||||||||||||
Background Multiple sclerosis has a clinically significant heritable component. We conducted a genomewide association study to identify alleles associated with the risk of multiple sclerosis.
Methods We used DNA microarray technology to identify common DNA sequence variants in 931 family trios (consisting of an affected child and both parents) and tested them for association. For replication, we genotyped another 609 family trios, 2322 case subjects, and 789 control subjects and used genotyping data from two external control data sets. A joint analysis of data from 12,360 subjects was performed to estimate the overall significance and effect size of associations between alleles and the risk of multiple sclerosis.
Results A transmission disequilibrium test of 334,923 single-nucleotide polymorphisms (SNPs) in 931 family trios revealed 49 SNPs having an association with multiple sclerosis (P<1x10–4); of these SNPs, 38 were selected for the second-stage analysis. A comparison between the 931 case subjects from the family trios and 2431 control subjects identified an additional nonoverlapping 32 SNPs (P<0.001). An additional 40 SNPs with less stringent P values (<0.01) were also selected, for a total of 110 SNPs for the second-stage analysis. Of these SNPs, two within the interleukin-2 receptor
Conclusions Alleles of IL2RA and IL7RA and those in the HLA locus are identified as heritable risk factors for multiple sclerosis.
gene (IL2RA) were strongly associated with multiple sclerosis (P=2.96x10–8), as were a nonsynonymous SNP in the interleukin-7 receptor
gene (IL7RA) (P=2.94x10–7) and multiple SNPs in the HLA-DRA locus (P=8.94x10–81).
Studies of twins and sibling pairs suggest that genetic factors influence susceptibility to multiple sclerosis; the evidence indicates that multiple genes, each exerting only modest effects, probably play a part.3,8 Candidate-gene studies have validated associations between multiple sclerosis and polymorphic variants within the major histocompatibility complex (MHC), but no other loci with a definitive association with the disease have been found. Early efforts to screen the genome for linkage with the use of low-density maps of microsatellites were unsuccessful.9,10,11 On the assumption that this method lacked the statistical power to identify genetic variants with associations that are not easily detected, we used a more powerful linkage scan in which we analyzed 4506 single-nucleotide polymorphisms (SNPs) in 2692 samples from 730 multiplex families with multiple sclerosis. This analysis revealed linkage with genomewide significance in the MHC region (maximum logarithm of the odds [LOD] score, 11.66), but no other region having a significant linkage with multiple sclerosis was identified.12 These results indicate that in multiple sclerosis, linkage studies lack the statistical power to detect susceptibility loci that may reside outside the MHC region.
Association studies have greater statistical power than linkage studies to detect common genetic variants that confer a modest risk of a disease.13 Genomewide association analyses (see Glossary), which are unbiased, "hypothesis-free" scans of the genome, have identified susceptibility loci outside the MHC region in type 2 diabetes,14,15,16,17 inflammatory bowel disease,18,19,20 rheumatoid arthritis,21 systemic lupus erythematosus,22 and type 1 diabetes.23,24 Here we present the results of a large-scale genomewide association scan aimed at identifying alleles associated with multiple sclerosis.
To maximize genotyping efficiency, we used a staged approach (Figure 1). First, we used a DNA microarray (GeneChip Human Mapping 500K Array Set, Affymetrix) to examine most of the common genetic variants in 1003 family trios, consisting of a patient with multiple sclerosis and both parents.25 After removing SNPs and DNA samples with low genotyping rates, excessive mendelian errors, or low frequencies of minor alleles from further analysis, there remained a set of 334,923 SNPs that were genotyped in 931 family trios. These markers capture more than 2.2 million common SNPs (minor allele frequency,
0.05) observed in the HapMap26 CEPH (Centre d'Etude du Polymorphisme Humain) subjects, consisting of Utah residents with Northern and Western European ancestry (CEU, release 21), with an average pairwise coefficient of determination (r2) of 0.77 (62% with
0.8).
|
Methods
Patients and Controls
Table 1 lists the demographic features of the case subjects, who all received the diagnosis of multiple sclerosis on the basis of reliable clinical criteria.8,27,28 Subjects with clinically isolated syndromes or neuromyelitis optica29 were excluded from samples in the United Kingdom, whereas 4% of subjects in the United States had a clinically isolated syndrome at the time of enrollment. Healthy control subjects from Brigham and Women's Hospital in Boston and the University of California at San Francisco consisted of unrelated people who reported themselves as being non-Hispanic whites and free of chronic inflammatory disease. (For details, see the Supplementary Appendix, available with the full text of this article at www.nejm.org.)
|
We used methods similar to those described in a recently performed genomewide association scan.15 A minimum of 1 µg of genomic DNA (diluted in 1x TE buffer at 50 ng per microliter) from case subjects and controls was arrayed on 96-well master plates at the project's centralized DNA bank. Before scanning, DNA concentrations were determined by fluorescence measurement with molecular probes (PicoGreen, Molecular Probes). As a genetic fingerprint, a panel of 24 SNPs, including a sex-confirmation assay, was genotyped with the use of the Sequenom platform. Twenty-three of these SNPs are included on both of the chips in the Affymetrix GeneChip Human Mapping 500K arrays and served as a cross-platform sample verification.
HLA Typing
Medium-resolution typing of HLA-DRB1 and HLA-DQB1 (two to four digits) was performed on the 931 family trios in the first screening step.30 Of the 931 case subjects in these families, 531 (57.0%) carried at least one copy of HLA-DRB1*1501. Because complete HLA-DRB1 genotype data for all members of the replication sets were not available, all the consortium subjects (trio family members, case subjects, and control subjects) were genotyped for the rs3135388 (A/G) SNP to identify the DRB1*1501 allele associated with multiple sclerosis, since the presence of this allele and the rs3135388A SNP are highly correlated.31 Both HLA-DRB1*1501 and rs3135388 genotyping results were available for 2757 of 2793 subjects (98.7%) in the 931 family trios. Data from 2730 of these 2757 subjects showed complete concordance between the rs3135388A SNP and the DRB1*1501 genotype (>99% for tagging the correct number of DRB1*1501 alleles).
Methods of genomewide scanning, DNA fingerprinting concordance, quality control, SNP exclusion criteria, additional control data, statistical analysis, technical validation, and replication genotyping are described in the Supplementary Appendix.
Results
Screening Phase
Figure 1 shows an outline of the experiments. Figure 2 shows results from the transmission disequilibrium testing of the 334,923 SNPs typed in the 931 family trios that were included in the screening stage. A plot of the association results for the initial genome scan is shown in Figure 3; the P values for all SNPs are plotted as a function of P values from the expected (uniform) null distribution. After exclusion of the SNPs across the extended MHC region, the observed distribution closely matches the expected (null) distribution (genomic inflation factor, 1.05) with an excess in the tail at P<0.001 (more associated SNPs observed than expected under the null hypothesis).
|
|
Replication Phase
The Sequenom genotyping platform was used to genotype these 174 SNPs in a second set of 609 family trios, 2322 case subjects, and 789 control subjects from the International Multiple Sclerosis Genetics Consortium. These data were supplemented by an independent set of 1475 control subjects from the WTCCC and 723 from the NIMH, for a total of 2987 controls. Of the 174 SNPs, 22 failed assay design or were redundant, 10 failed our replication assay quality control, and 20 failed to meet our original scan quality control; also, 12 that were not included on the Affymetrix array were added for scientific interest. Table 1 of the Supplementary Appendix shows results from the remaining 110 SNPs for this second-stage analysis alone and in combination with data from the screening phase in the form of an extension analysis. Table 2 shows results for the top 16 non-MHC SNPs (a number that was chosen arbitrarily) and the HLA-DRB1 surrogate rs3135388, showing evidence for an association with multiple sclerosis in both stages of the study.
|
Combined Analysis
A combined analysis including all 1540 family trios, 2322 case subjects, and 5418 control subjects (a total of 12,360 subjects) gave final estimates of effect size. The program UNPHASED, a software application for performing genetic association analysis in nuclear families and unrelated subjects, implements maximum-likelihood inference on haplotype and genotype effects while allowing for missing data, such as uncertain phase and missing genotypes.32 Table 2 shows the results of the combined analysis.
A number of allelic variants had a significant association with multiple sclerosis. Of these, two SNPs in intron 1 of the IL2RA gene encoding the alpha chain of the interleukin-2 receptor (also called CD25, located at chromosome 10p15) are notable: rs12722489 (P=2.96x10–8; odds ratio, 1.25; 95% confidence interval [CI], 1.16 to 1.36) and rs2104286 (P=2.16x10–7; odds ratio, 1.19; 95% CI, 1.11 to 1.26) (Figure 4). These SNPs are in strong linkage disequilibrium with each other (coefficient of determination, 0.62 from HapMap CEU26). A nonsynonymous coding SNP (rs6897932) in exon 6 of IL7RA, a gene located on chromosome 5p13 that encodes a transmembrane domain of the IL7R
chain of the interleukin-7 receptor (CD127), also showed highly significant evidence of association with multiple sclerosis (P=2.94x10–7; odds ratio, 1.18; 95% CI, 1.11 to 1.26) (Figure 4).
|
Analysis of the 925 SNPs from the MHC region (positions between 29 and 34 Mb on chromosome 6) conditional on HLA-DRB1*1501 revealed a highly significant residual association signal peaking at rs9270986 (P=1.83x10–17; odds ratio, 5.80; 95% CI, 3.53 to 9.53), which lies close to DRB1. A portion of this residual signal is probably related to allelic heterogeneity at DRB1.30
Discussion
We report on a genomewide association study of multiple sclerosis that examined a significant fraction of common variations in the human genome. Using this technique, we identified a set of SNPs located outside the MHC region that are associated with multiple sclerosis. Among the most significant associations are SNPs in genes encoding the IL2R
and IL7R
chains. The IL2R
chain has been implicated in the pathogenesis of type 1 diabetes24 and Graves' disease.34 These results add to the evidence from pathological and immunologic studies that multiple sclerosis is an autoimmune inflammatory disorder.35
The evidence that certain alleles of the genes encoding IL2R
and IL7R
are associated with multiple sclerosis supports the idea that polymorphisms within genes related to the regulation of the immune response are important factors in multiple sclerosis.36 In particular, regulatory T cells expressing CD4 and CD2537 show a loss of function in a number of autoimmune disorders.7,38,39,40,41 Moreover, the dominant effect of disrupting the function of the interleukin-2 gene in mice is an autoimmune disease characterized by dysfunction of CD4+CD25high regulatory T cells.42,43 This evidence suggests a link between a susceptibility variant in IL2RA and the pathogenic events that result in multiple sclerosis. It is important to acknowledge, however, that CD25 — the protein encoded by IL2RA — is not a specific marker of regulatory T cells. A SNP in the IL2RA gene that has recently been implicated in multiple sclerosis44 (P=0.04) appears to be incorrectly identified as monomorphic on the HapMap26 and may represent a chance observation that is unrelated to multiple sclerosis. The IL7R
chain, a component of the receptor for interleukin-7, has also been implicated in multiple sclerosis by measurements of messenger RNA expression and by candidate-gene approaches.45,46,47 (Of the 12,360 case subjects and control subjects that we analyzed in this study, 6717 were also analyzed by Gregory et al.47) Interleukin-7 is important for homeostasis of the memory T-cell pool48 and may also be important for the generation of autoreactive T cells in patients with multiple sclerosis.49 Moreover, the interleukin-7 receptor is critical for the development of gamma and delta T cells,50 which are among the earliest T cells observed in the inflammatory lesions of patients with multiple sclerosis.51
Although we chose the majority of SNPs for replication on the basis of the P values identified in the screening phase, it is notable that the SNPs at IL2RA and IL7RA, which ultimately had the most significant associations with multiple sclerosis, originally gave modest P values of 0.0013 and 0.0058, respectively, in the transmission disequilibrium testing. Moreover, a recent study showed an association between the same SNP in the IL7RA gene and the risk of multiple sclerosis (P=2.9x10–7).47 A combined analysis of these two studies (with the use of the nonoverlapping union of the data sets) gives a P value of 1.92x10–10 (odds ratio, 1.20; 95% CI, 1.14 to 1.27) for the total of 2027 family trios, 2842 case subjects, and 6717 control subjects. Although this P value has not been adjusted for multiple hypothesis testing, it is clear that the same allelic variant in the interleukin-7 receptor has been identified in several studies. Of note, this SNP introduces a coding change (T244I) that alters the ratio of soluble to membrane-bound interleukin-7 receptor47 and has also shown a strong association with type 1 diabetes.52 Whether the allelic variants found in this study have a primary role in initiating multiple sclerosis or influence susceptibility to multiple sclerosis is unknown.
Given the strong heritability of some autoimmune diseases, we speculate that there are common and unique allelic variants that contribute to the particular autoimmune disease phenotype. Besides the association between MHC variants and autoimmune diseases, PTPN22 encoding lymphoid protein tyrosine phosphatase, a suppressor of T-cell activation and development,53 has emerged as an example of a gene harboring a susceptibility variant in many autoimmune diseases. The 620Trp variant (rs2476601) of PTPN22 is associated with type 1 diabetes,54,55 rheumatoid arthritis,21,56 Graves' disease,55,57 and systemic lupus erythematosus56,58 but does not contribute to susceptibility to multiple sclerosis.56,59,60 In contrast, allelic variation at IL2RA occurs in type 1 diabetes,24 Graves' disease,34 and multiple sclerosis but current results in rheumatoid arthritis do not show this association (Gregersen PK, Klareskog L: personal communication). Fine-mapping studies in large DNA collections for these diseases will shed light on the possibility of allelic heterogeneity at this locus.
Because of their modest risk ratios, each of the alleles of IL2RA and IL7RA that were identified in this genomewide scan explains a small proportion (less than 0.2%) of the variance in the risk of the development of multiple sclerosis. For each locus, our initial screen (931 family trios) had a power of only about 6% to detect these loci at P<1x10–4 and a power of less than 50% to reach P<0.01. It is highly likely that other loci with similar low risk ratios exist. Nevertheless, associations of the magnitude we found are undetectable in linkage studies; each locus confers a sibling-recurrence risk ratio of less than 1.01 and would require the scanning of hundreds of thousands of sibling pairs before a meaningful effect on regional LOD scores would be expected.
The effect sizes of the allelic variants we identified in this scan are similar to those associated with polygenic autoimmune disorders and other complex traits.61 These variants are not rare mutations of the type that occur in diseases caused by a defect in a single gene, such as muscular dystrophy or sickle cell anemia. Rather, they are polymorphic variants that also occur in normal populations. However, each is more common in patients with multiple sclerosis than in control subjects, and each has a small effect on the risk of the disease. In considering the complex genetic architecture of multiple sclerosis, we recognize that our approach has little if any statistical power to detect rare variants that could contribute to susceptibility — even those conferring a relatively large genetic risk. We anticipate that multiple sclerosis will show some degree of genetic heterogeneity and that with increasing sample sizes and better statistical power, alternative genetic mechanisms will be revealed for certain subgroups of patients with the disease. However, for most patients, we expect that the variants identified in our study and those that may emerge in follow-up studies could account for a substantial part of the heritability of multiple sclerosis in the general population.
With the identification of a larger set of genetic variants, a systems biology approach will be needed to characterize common pathways amenable to therapeutic intervention. As for our identification of a variant of IL2RA as a susceptibility element in multiple sclerosis, it is intriguing that clinical efficacy has been observed in phase 2 studies assessing a monoclonal antibody targeting the IL2R
chain.62,63
Glossary
Genetic association testing: The genotyping of a genetic variant in a population for which information on phenotypes, such as disease occurrence or a range of various trait values, is available. Allele frequencies of that variant, for example, in case subjects and control subjects, are compared. If a significant difference is observed, there is said to be an association between the variant (genotype) and the disease or trait (phenotype).
Genomewide association study: A comprehensive search of the human genome for genetic risk factors with the use of association testing, typically involving hundreds of thousands to millions of genotypes (e.g., testing of SNPs) per sample.
Genomic inflation factor: A comparison of unassociated genetic markers with those of control subjects for potential differences in allele frequency related to imperfect matching between case subjects and control subjects (also referred to as population substructure or stratification). The expectation is that there should be no difference (or, technically, inflation of the test statistic) over the majority of markers tested. If inflation is observed, the observed test statistic can be adjusted accordingly. These values do not control for multiple testing.
Genotyping call rate: Percentage of nonmissing genotype calls in a set of DNA samples (the number of nonmissing genotypes divided by the number of all genotypes, multiplied by 100).
HapMap: A public resource created by the International HapMap Project (www.hapmap.org), a catalogue of genetic variants (SNPs) that are common in human populations.
Mendelian error: A situation in which a child's genotype is incompatible with the observed genotypes of the biologic parents, usually caused by an experimental genotyping error or by erroneous identification of the subjects as related.
Minor allele frequency: The allele frequency of the less frequently occurring allele of a SNP.
Nonsynonymous SNP: A SNP that leads to a change in the amino acid sequence of the gene's resulting protein and that may therefore affect the three-dimensional structure and its function.
PLINK: A free, open-source statistical tool for genomewide association analyses (pngu.mgh.harvard.edu/~purcell/plink/).
Transmission disequilibrium test: A family-based test of genetic association that measures the overtransmission of an allele from heterozygous parents to affected offspring.
Supported by grants (AP-3758-A16 and RG-2899), a Collaborative Research Award (CA-1001-A14), and a postdoctoral fellowship (FG-1718-A1, to Dr. McCauley) from the National Multiple Sclerosis Society; grants (NS049477, NS032830, and NS26799) from the National Institute of Neurological Disorders and Stroke; grants (AI067152 and P01-AI039671) from the National Institute of Allergy and Infectious Diseases; a grant (076113) from the Wellcome Trust; a grant (U54-RR020278-1) from the National Center for Research Resources; the Penates Foundation; the Nancy Davis Center Without Walls; and a Jacob Javits Merit Award (NS2427, to Dr. Hafler) from the National Institute of Neurological Disorders and Stroke.
No potential conflict of interest relevant to this article was reported.
We thank the Wellcome Trust Case Control Consortium and the investigators who contributed to the generation of the data (listed at www.wtccc.org.uk); the National Institutes of Mental Health for generously allowing the use of their genotype data; the patients and their families for participating in this study; Susan Pobywjlo (Brigham and Women's Hospital) for organizing patient collections; Kathryn Irenze (Broad Institute) for technical assistance; Brendan Blumenstiel, Matt DeFelice, Melissa Parkin, and the Affymetrix production team; Marcia Nizzari, George Grant, Pei Lin, and the Broad Institute Genetic Analysis Platform Informatics team; Stacey Donnelly (Broad Institute) and Andrew J.P. Lowe (Harvard Center for Neurodegeneration and Repair) for administrative support; Sandra West (Duke University and University of Miami) and Robin Lincoln (University of California, San Francisco) for assistance with sample management; Justin Giles, David Sexton, and Yuki Bradford (Vanderbilt University) and James Jaworski (University of Miami) for assistance with analysis; and a small number of key private donors whose early vision, partnership, and contributions ultimately made this project possible.
Source Information
The writing group (David A. Hafler, M.D., Alastair Compston, F.Med.Sci., Ph.D., Stephen Sawcer, M.B., Ch.B., Ph.D., Eric S. Lander, Ph.D., Mark J. Daly, Ph.D., Philip L. De Jager, M.D., Ph.D., Paul I.W. de Bakker, Ph.D., Stacey B. Gabriel, Ph.D., Daniel B. Mirel, Ph.D., Adrian J. Ivinson, Ph.D., Margaret A. Pericak-Vance, Ph.D., Simon G. Gregory, Ph.D., John D. Rioux, Ph.D., Jacob L. McCauley, Ph.D., Jonathan L. Haines, Ph.D., Lisa F. Barcellos, Ph.D., Bruce Cree, M.D., Ph.D., Jorge R. Oksenberg, Ph.D., and Stephen L. Hauser, M.D.) assume responsibility for the overall content and integrity of the article.
This article (10.1056/NEJMoa073493) was published at www.nejm.org on July 29, 2007. It will appear in the August 30 issue of the Journal.
References
chain (IL7R
) with multiple sclerosis. Nat Genet (in press).The writing group's affiliations are as follows: the Division of Molecular Immunology, Center for Neurologic Diseases, Department of Neurology, Brigham and Women's Hospital, and Harvard Medical School, Boston (D.A.H., P.L.D.J.); Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge, MA (D.A.H., E.S.L., M.J.D., P.L.D.J., P.I.W.B., S.B.G., D.B.M., J.D.R.); Department of Clinical Neurosciences, Addenbrooke's Hospital, University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom (A.C., S.S.); Massachusetts General Hospital, Harvard Medical School, Boston (M.J.D., P.I.W.B.); Harvard Partners Center for Genetics and Genomics, Boston (P.L.D.J., P.I.W.B.); Harvard Center for Neurodegeneration and Repair, Harvard Medical School, Boston (A.J.I.); Duke University Medical Center, Durham, NC (M.A.P.-V., S.G.G.); University of Miami School of Medicine, Miami (M.A.P.-V.); Université de Montréal, Montreal Heart Institute, Montreal (J.D.R.); Center for Human Genetics Research, Vanderbilt University Medical Center, Nashville (J.L.M., J.L.H.); University of California at Berkeley, Berkeley (L.F.B.); and University of California at San Francisco, San Francisco (L.F.B., B.C., J.R.O., S.L.H.).
The following groups participated in this study: Clinical and Sample Collection Groups (in order of the number of samples collected): University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom — S. Sawcer (project coleader), M. Ban, A. Compston; University of California at San Francisco, San Francisco — J.R. Oksenberg (project coleader), B. Cree, S.L. Hauser; Brigham and Women's Hospital, Boston — P.L. De Jager (project coleader), H.L. Weiner, D.A. Hafler. Project Management and Genotyping Centers: Harvard Center for Neurodegeneration and Repair, Boston — A.J. Ivinson (project leader); Brigham and Women's Hospital, Boston — D.A. Hafler; Broad Institute of Harvard University and Massachusetts Institute of Technology, Cambridge, MA — S.B. Gabriel, D.B. Mirel; Duke University Medical Center, Durham, NC — S.G. Gregory, M.A. Pericak-Vance. Analysis Group: Massachusetts General Hospital, Boston — M.J. Daly (project coleader), P.I.W. de Bakker; Brigham and Women's Hospital, Boston — P.L. De Jager, L.M. Maier; University of California at Berkeley, Berkeley — L.F. Barcellos, J.R. Oksenberg; University of Cambridge School of Clinical Medicine, Cambridge, United Kingdom — S. Sawcer; University of Miami School of Medicine, Miami — M.A. Pericak-Vance; and Vanderbilt University Medical Center, Nashville — J.L. McCauley, J.L. Haines (project leader).
| |||||||||||||||||||||||||||||||||||||||||||||||
This article has been cited by other articles:
HOME | SUBSCRIBE | SEARCH | CURRENT ISSUE | PAST ISSUES | COLLECTIONS | PRIVACY | HELP | beta.nejm.org Comments and questions? Please contact us. The New England Journal of Medicine is owned, published, and copyrighted © 2008 Massachusetts Medical Society. All rights reserved. |