| |||||||||||||||||||||||||||||||||||||||||||
Although associations with common single-nucleotide polymorphisms (SNPs) identified in genomewide association studies have proven robust and reproducible (see diagram), nearly all these SNPs are associated with relative risks of 1.5 per copy or less. In aggregate, the SNPs discovered to date account for a small fraction of the overall inherited risk of each disease. The mechanisms whereby DNA variation in most of these regions influences disease are not obvious from our previous understanding of pathophysiology, the genes in the regions, or the nature of the DNA changes observed.
|
Do the small effects of multiple genes, the modest fraction of heritability explained, and the lack of overlap with our previous biologic understanding suggest an underlying weakness in the genomewide approach? We believe not. Rather, these features illuminate the limits of current knowledge at the interface of three historically distinct approaches to understanding disease causality — genetic mapping, epidemiology, and studies of pathophysiological mechanisms.
Genetic mapping turns hypothesis-driven research on its head. Rather than starting with a functional hypothesis, it is based on the theory that systematic genomewide study of DNA variation in relation to disease can lead to the localization of causal genes. Like linkage mapping, such studies can implicate only a region of the genome; to conclusively identify causal genes and mutations, each such region must be sequenced in cases and controls, and functional studies performed. In diseases that follow mendelian patterns of inheritance, this process typically reveals many different causal mutations within each disease-related gene.
Theoretical considerations and empirical data demonstrate that very large sample sizes are required when genomewide association studies are used to pinpoint novel disease-causing genes. The reasons for this are that many different causal genes may influence each disease and that the common SNPs studied are often not themselves the causal variants in each such gene. Moreover, a very stringent level of statistical significance is required to compensate for the statistical fluctuations encountered in a genome's worth of data.
Even with the new technologies used, the statistical laws of study size and power still hold: if the effects of common SNPs are small, then the samples necessary to detect them will be large. For example, in a typical genomewide scan of 1500 patients with a disease and 1500 controls, the power to achieve genomewide significance (typically, P<10–7) for a variant with 20% frequency is only 13% if the risk per allele is 1.3 and 1% if the risk is 1.2. The most important implication of these power calculations is that most such studies to date have been underpowered to identify many regions harboring disease-causing genes. This hypothesis has been validated by investigators who pooled genomewide association data from 5000 to 20,000 subjects for phenotypes such as diabetes,2 Crohn's disease, and height3 and discovered associations with multiple loci that were not statistically significant in the individual studies. Thus, the opportunities for genomewide association studies to identify new genomic risk loci — which may harbor rare mutations of larger effect — will not be exhausted until large samples are assembled for each disease and trait of clinical importance.
From the perspective of epidemiology, current data indicate that the underlying genetic architecture for most diseases probably includes dozens, and potentially hundreds, of risk alleles for each disease — some common and of small effect, others rare and of larger effect. Whereas genomewide association studies offer a method for finding the former, the latter require sequencing of DNA from large numbers of individual patients — in genes implicated by genomewide studies, in biologic candidate genes, and ultimately throughout the genome. The interpretation of such complex and rapidly evolving information is unfamiliar ground for physicians who have been educated to consider a relatively small and stable set of disease-specific risk factors. At present, patients should be wary of companies that seek to sell such information through direct-to-consumer marketing; with much further elaboration and validation, however, the use of such information may eventually be commonplace in clinical medicine.
Acknowledging the potentially important contributions of genomewide association studies to risk stratification, we suggest that the greatest ultimate impact of these discoveries will be on our understanding of the biology and pathophysiology of human diseases and phenotypes. In a few cases, the gene or genes identified have known functions — for instance, the discoveries that complement factor H and other complement factors are associated with risk of age-related macular degeneration and that two genes involved in autophagy are risk factors for Crohn's disease — but were not known to be involved in the disease. However, in most cases the genes and regions identified had not previously been identified by functional studies, model systems, or mendelian genetics, proving that genomewide association studies can help to fill critical gaps in our current knowledge of biology.
From this perspective, common SNPs are canaries in the coal mine, signaling the relationship to a disease of a biologically important gene or gene regulatory mechanism in humans whose ultimate importance cannot be estimated until the full set of mutations is found, the biologic pathways understood, and clinical utility demonstrated. For example, the 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase reaction, the rate-limiting step in cholesterol synthesis, was biochemically identified decades ago as a target for pharmacologic inhibition, and statins were developed to reduce levels of low-density lipoprotein (LDL) cholesterol and cardiovascular risk. Recently, genomewide association studies have demonstrated that common, noncoding SNPs in HMG-CoA reductase are significantly associated with LDL cholesterol levels but that the effect sizes are extremely small — a 5% difference in LDL level.4 Why is the effect of statins larger than that of the common SNPs that can be used to identify a genetic effect? Presumably because common SNPs have a small effect on enzyme activity (perhaps limited by natural selection), whereas drugs aimed at this mechanism are able to inhibit the enzyme to a much greater degree. Thus, the small relative risks observed for SNP variants in genomewide association studies do not imply that the biologic impact of their discovery is necessarily also small.
If these arguments are correct, they highlight one of the great translational research challenges of our time: localizing human disease genes, sequencing to identify causal mutations, and using this information to develop mechanistic understanding, clinically useful risk prediction, and therapeutic proof of concept. Many complementary approaches must be pursued in parallel: resequencing of genes in many patients to search for causative variants of large effect; manipulation of each disease gene in cell and animal models to study molecular mechanisms and identify phenotypes for study in patients; careful phenotypic study of persons carrying a defined genotype, including primary cells derived from them; and application of mendelian randomization5 in large data sets to determine whether variants associated with endophenotypes are also associated with clinical outcomes.
The opportunities for physician scientists are exciting and substantial. Rather than seeking a new twist on a long-studied pathway or asking whether discoveries in model organisms are relevant to humans, researchers can explore the bounty of genes proven by genomewide association studies to have relevance to human health and disease. Our challenge will be to develop research methods that take us from genetic localization to medically useful application, as well as to support investigators who want to seize this opportunity and translate it into greater understanding of disease and better care for patients.
Dr. Altshuler reports receiving consulting fees from Medical Portfolio Management, Eisai, and Merck; holding equity in Medical Portfolio Management; and receiving grant support from Novartis. No other potential conflict of interest relevant to this article was reported.
Source Information
Dr. Hunter is a professor in the Departments of Epidemiology and Nutrition at the Harvard School of Public Health, Boston, and a statistical consultant for the Journal. Dr. Altshuler is a professor in the Departments of Genetics and Medicine, Harvard Medical School and Massachusetts General Hospital, Boston. Dr. Rader is a professor of medicine and pharmacology and associate director of the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia.
References
| |||||||||||||||||||||||||||||||||||||||||||
This article has been cited by other articles:
HOME | SUBSCRIBE | SEARCH | CURRENT ISSUE | PAST ISSUES | COLLECTIONS | PRIVACY | TERMS OF USE | HELP | beta.nejm.org Comments and questions? Please contact us. The New England Journal of Medicine is owned, published, and copyrighted © 2009 Massachusetts Medical Society. All rights reserved. |