The New England Journal of Medicine
e-mail icon  FREE NEJM E-TOC    HOME   |   SUBSCRIBE   |   CURRENT ISSUE   |   PAST ISSUES   |   COLLECTIONS   |    Advanced Search
Sign in | Get NEJM's E-Mail Table of Contents — Free | Subscribe
 
Perspective
FOCUS ON RESEARCH

PreviousPrevious
Volume 358:2760-2763 June 26, 2008 Number 26
NextNext

From Darwin's Finches to Canaries in the Coal Mine — Mining the Genome for New Biology
David J. Hunter, M.B., B.S., Sc.D., David Altshuler, M.D., Ph.D., and Daniel J. Rader, M.D.

 

This Article
- PDF
-PDA Full Text
-PowerPoint Slide Set

Tools and Services
-Add to Personal Archive
-Add to Citation Manager
-Notify a Friend
-E-mail When Cited
-E-mail When Letters Appear

More Information
-Related Article
 by Pharoah, P. D.P.
-PubMed Citation
The observations of finches that Charles Darwin made while in the Galapagos contributed to his theory of the origins of interspecies differences, ultimately leading to our understanding of mutation and natural selection as drivers of phenotypic variation. Now, more than 150 years later, genomewide association studies have identified more than 100 new chromosomal regions at which DNA variation influences risk of common human diseases and clinical phenotypes.1 Since previous approaches to identifying genetic causes of common diseases have met with very limited success, this moment constitutes a watershed in the history of genetics in medicine.

Although associations with common single-nucleotide polymorphisms (SNPs) identified in genomewide association studies have proven robust and reproducible (see diagram), nearly all these SNPs are associated with relative risks of 1.5 per copy or less. In aggregate, the SNPs discovered to date account for a small fraction of the overall inherited risk of each disease. The mechanisms whereby DNA variation in most of these regions influences disease are not obvious from our previous understanding of pathophysiology, the genes in the regions, or the nature of the DNA changes observed.

Figure 1
View larger version (65K):
[in this window]
[in a new window]
Get Slide
 
Two-Stage Genomewide Association Study.

In the most common design, a genomewide set of single-nucleotide polymorphisms (SNPs) is tested in a discovery case–control study, and the most statistically significant SNPs are retested in one or more replication studies. If multiple discovery studies have been performed for the same disease, pooling these studies increases the statistical power to detect disease-associated SNPs.

 
For example, in this issue of the Journal, Pharoah et al. (pages 2796–2803) discuss six common markers of risk for breast cancer that have been discovered through genomewide association studies. Each marker has a modest influence on a woman's risk of disease; none act through well-understood mechanisms. Pharoah et al. consider the potential usefulness of these markers in targeting patients who would benefit from screening for early detection of disease and argue that as more associated loci are identified, risk-prediction algorithms will need to be based on the best available risk estimates. The authors conclude that stable algorithms may eventually be useful in identifying groups of women with clinically meaningful differences in risk.

Do the small effects of multiple genes, the modest fraction of heritability explained, and the lack of overlap with our previous biologic understanding suggest an underlying weakness in the genomewide approach? We believe not. Rather, these features illuminate the limits of current knowledge at the interface of three historically distinct approaches to understanding disease causality — genetic mapping, epidemiology, and studies of pathophysiological mechanisms.

Genetic mapping turns hypothesis-driven research on its head. Rather than starting with a functional hypothesis, it is based on the theory that systematic genomewide study of DNA variation in relation to disease can lead to the localization of causal genes. Like linkage mapping, such studies can implicate only a region of the genome; to conclusively identify causal genes and mutations, each such region must be sequenced in cases and controls, and functional studies performed. In diseases that follow mendelian patterns of inheritance, this process typically reveals many different causal mutations within each disease-related gene.

Theoretical considerations and empirical data demonstrate that very large sample sizes are required when genomewide association studies are used to pinpoint novel disease-causing genes. The reasons for this are that many different causal genes may influence each disease and that the common SNPs studied are often not themselves the causal variants in each such gene. Moreover, a very stringent level of statistical significance is required to compensate for the statistical fluctuations encountered in a genome's worth of data.

Even with the new technologies used, the statistical laws of study size and power still hold: if the effects of common SNPs are small, then the samples necessary to detect them will be large. For example, in a typical genomewide scan of 1500 patients with a disease and 1500 controls, the power to achieve genomewide significance (typically, P<10–7) for a variant with 20% frequency is only 13% if the risk per allele is 1.3 and 1% if the risk is 1.2. The most important implication of these power calculations is that most such studies to date have been underpowered to identify many regions harboring disease-causing genes. This hypothesis has been validated by investigators who pooled genomewide association data from 5000 to 20,000 subjects for phenotypes such as diabetes,2 Crohn's disease, and height3 and discovered associations with multiple loci that were not statistically significant in the individual studies. Thus, the opportunities for genomewide association studies to identify new genomic risk loci — which may harbor rare mutations of larger effect — will not be exhausted until large samples are assembled for each disease and trait of clinical importance.

From the perspective of epidemiology, current data indicate that the underlying genetic architecture for most diseases probably includes dozens, and potentially hundreds, of risk alleles for each disease — some common and of small effect, others rare and of larger effect. Whereas genomewide association studies offer a method for finding the former, the latter require sequencing of DNA from large numbers of individual patients — in genes implicated by genomewide studies, in biologic candidate genes, and ultimately throughout the genome. The interpretation of such complex and rapidly evolving information is unfamiliar ground for physicians who have been educated to consider a relatively small and stable set of disease-specific risk factors. At present, patients should be wary of companies that seek to sell such information through direct-to-consumer marketing; with much further elaboration and validation, however, the use of such information may eventually be commonplace in clinical medicine.

Acknowledging the potentially important contributions of genomewide association studies to risk stratification, we suggest that the greatest ultimate impact of these discoveries will be on our understanding of the biology and pathophysiology of human diseases and phenotypes. In a few cases, the gene or genes identified have known functions — for instance, the discoveries that complement factor H and other complement factors are associated with risk of age-related macular degeneration and that two genes involved in autophagy are risk factors for Crohn's disease — but were not known to be involved in the disease. However, in most cases the genes and regions identified had not previously been identified by functional studies, model systems, or mendelian genetics, proving that genomewide association studies can help to fill critical gaps in our current knowledge of biology.

From this perspective, common SNPs are canaries in the coal mine, signaling the relationship to a disease of a biologically important gene or gene regulatory mechanism in humans whose ultimate importance cannot be estimated until the full set of mutations is found, the biologic pathways understood, and clinical utility demonstrated. For example, the 3-hydroxy-3-methylglutaryl coenzyme A (HMG-CoA) reductase reaction, the rate-limiting step in cholesterol synthesis, was biochemically identified decades ago as a target for pharmacologic inhibition, and statins were developed to reduce levels of low-density lipoprotein (LDL) cholesterol and cardiovascular risk. Recently, genomewide association studies have demonstrated that common, noncoding SNPs in HMG-CoA reductase are significantly associated with LDL cholesterol levels but that the effect sizes are extremely small — a 5% difference in LDL level.4 Why is the effect of statins larger than that of the common SNPs that can be used to identify a genetic effect? Presumably because common SNPs have a small effect on enzyme activity (perhaps limited by natural selection), whereas drugs aimed at this mechanism are able to inhibit the enzyme to a much greater degree. Thus, the small relative risks observed for SNP variants in genomewide association studies do not imply that the biologic impact of their discovery is necessarily also small.

If these arguments are correct, they highlight one of the great translational research challenges of our time: localizing human disease genes, sequencing to identify causal mutations, and using this information to develop mechanistic understanding, clinically useful risk prediction, and therapeutic proof of concept. Many complementary approaches must be pursued in parallel: resequencing of genes in many patients to search for causative variants of large effect; manipulation of each disease gene in cell and animal models to study molecular mechanisms and identify phenotypes for study in patients; careful phenotypic study of persons carrying a defined genotype, including primary cells derived from them; and application of mendelian randomization5 in large data sets to determine whether variants associated with endophenotypes are also associated with clinical outcomes.

The opportunities for physician scientists are exciting and substantial. Rather than seeking a new twist on a long-studied pathway or asking whether discoveries in model organisms are relevant to humans, researchers can explore the bounty of genes proven by genomewide association studies to have relevance to human health and disease. Our challenge will be to develop research methods that take us from genetic localization to medically useful application, as well as to support investigators who want to seize this opportunity and translate it into greater understanding of disease and better care for patients.

Dr. Altshuler reports receiving consulting fees from Medical Portfolio Management, Eisai, and Merck; holding equity in Medical Portfolio Management; and receiving grant support from Novartis. No other potential conflict of interest relevant to this article was reported.


Source Information

Dr. Hunter is a professor in the Departments of Epidemiology and Nutrition at the Harvard School of Public Health, Boston, and a statistical consultant for the Journal. Dr. Altshuler is a professor in the Departments of Genetics and Medicine, Harvard Medical School and Massachusetts General Hospital, Boston. Dr. Rader is a professor of medicine and pharmacology and associate director of the Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia.

References

  1. Manolio TA, Brooks LD, Collins FS. A HapMap harvest of insights into the genetics of common disease. J Clin Invest 2008;118:1590-1605. [CrossRef][Web of Science][Medline]
  2. Zeggini E, Scott LJ, Saxena R, et al. Meta-analysis of genome-wide association data and large-scale replication identifies additional susceptibility loci for type 2 diabetes. Nat Genet 2008;40:638-645. [CrossRef][Web of Science][Medline]
  3. Lettre G, Jackson AU, Gieger C, et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nat Genet 2008;40:584-591. [CrossRef][Web of Science][Medline]
  4. Kathiresan S, Melander O, Guiducci C, et al. Six new loci associated with blood low-density lipoprotein cholesterol, high-density lipoprotein cholesterol or triglycerides in humans. Nat Genet 2008;40:189-197. [CrossRef][Web of Science][Medline]
  5. Katan MB. Mendelian randomization, 18 years on. Int J Epidemiol 2004;33:10-11. [Free Full Text]

 

This Article
- PDF
-PDA Full Text
-PowerPoint Slide Set

Tools and Services
-Add to Personal Archive
-Add to Citation Manager
-Notify a Friend
-E-mail When Cited
-E-mail When Letters Appear

More Information
-Related Article
 by Pharoah, P. D.P.
-PubMed Citation

This article has been cited by other articles:



HOME  |  SUBSCRIBE  |  SEARCH  |  CURRENT ISSUE  |  PAST ISSUES  |  COLLECTIONS  |  PRIVACY  |  TERMS OF USE  |  HELP  |  beta.nejm.org

Comments and questions? Please contact us.

The New England Journal of Medicine is owned, published, and copyrighted © 2009 Massachusetts Medical Society. All rights reserved.