The New England Journal of Medicine
e-mail icon  FREE NEJM E-TOC    HOME   |   SUBSCRIBE   |   CURRENT ISSUE   |   PAST ISSUES   |   COLLECTIONS   |    Advanced Search
Sign in | Get NEJM's E-Mail Table of Contents — Free | Subscribe
 
Original Article
PreviousPrevious
Volume 345:1388-1393 November 8, 2001 Number 19
NextNext

Sperm Morphology, Motility, and Concentration in Fertile and Infertile Men
David S. Guzick, M.D., Ph.D., James W. Overstreet, M.D., Ph.D., Pam Factor-Litvak, Ph.D., Charlene K. Brazil, B.S., Steven T. Nakajima, M.D., Christos Coutifaris, M.D., Ph.D., Sandra Ann Carson, M.D., Pauline Cisneros, Ph.D., Michael P. Steinkampf, M.D., Joseph A. Hill, M.D., Dong Xu, M.Phil., Donna L. Vogel, M.D., Ph.D., for the National Cooperative Reproductive Medicine Network

 

This Article
-Abstract
- PDF

Tools and Services
-Add to Personal Archive
-Add to Citation Manager
-Notify a Friend
-E-mail When Cited

More Information
-PubMed Citation
ABSTRACT

Background Although semen analysis is routinely used to evaluate the male partner in infertile couples, sperm measurements that discriminate between fertile and infertile men are not well defined.

Methods We evaluated two semen specimens from each of the male partners in 765 infertile couples and 696 fertile couples at nine sites. The female partners in the infertile couples had normal results on fertility evaluation. The sperm concentration and motility were determined at the sites; semen smears were stained at the sites and shipped to a central laboratory for an assessment of morphologic features of sperm with the use of strict criteria. We used classification-and-regression-tree analysis to estimate threshold values for subfertility and fertility with respect to the sperm concentration, motility, and morphology. We also used an analysis of receiver-operating-characteristic curves to assess the relative value of these sperm measurements in discriminating between fertile and infertile men.

Results The subfertile ranges were a sperm concentration of less than 13.5x106 per milliliter, less than 32 percent of sperm with motility, and less than 9 percent with normal morphologic features. The fertile ranges were a concentration of more than 48.0x106 per milliliter, greater than 63 percent motility, and greater than 12 percent normal morphologic features. Values between these ranges indicated indeterminate fertility. There was extensive overlap between the fertile and the infertile men within both the subfertile and the fertile ranges for all three measurements. Although each of the sperm measurements helped to distinguish between fertile and infertile men, none was a powerful discriminator. The percentage of sperm with normal morphologic features had the greatest discriminatory power.

Conclusions Threshold values for sperm concentration, motility, and morphology can be used to classify men as subfertile, of indeterminate fertility, or fertile. None of the measures, however, are diagnostic of infertility.


Semen analysis is routinely used to evaluate the male partner in infertile couples1,2 and to assess the reproductive toxicity of environmental or therapeutic agents.3 Although widely used thresholds for normal semen measurements have been published by the World Health Organization (WHO),4,5,6,7 the available norms for sperm concentration, motility, and morphology fail to meet rigorous clinical, technical, and statistical standards. In recognition of these limitations, the nomenclature in the most recent WHO manual7 for semen evaluation was changed from "normal" to "reference" values. Two recent prospective studies of semen quality and fertility concluded that the current WHO reference values should be reconsidered.8,9

In this study, we sought to determine values for semen measurements that best discriminate between fertile and infertile men and to evaluate the relative value of standard semen measurements in distinguishing between fertile and infertile men.

Methods

Study Population

As part of a randomized clinical trial of intrauterine insemination and superovulation in the treatment of infertility at nine centers in the United States, we recruited infertile couples in which the female partners had normal results on fertility evaluation.10 All of these couples had been unable to conceive for at least 12 months; the mean duration of infertility was 43 months. The women were required to have regular menstrual cycles, a normal hysterosalpingogram, normal results on laparoscopy, and a luteal-phase endometrial-biopsy specimen that was histologically consistent with menstrual dating. The men were required to have some motile sperm in ejaculated semen specimens.10

Fertile men (controls) were recruited from prenatal classes at the same hospitals in which the infertile couples were recruited, as well as through local advertising. The partners of fertile men had to be pregnant or to have delivered a child within the previous two years. Fertile men were excluded only if they had a history of infertility (inability to conceive during 12 months of attempts), vasovasostomy, or varicocelectomy.

All the men were required to be between the ages of 20 and 55 years at the time of enrollment, and their partners were required to be between the ages of 20 and 40 years. The fertile couples were frequency-matched to the infertile couples according to the five-year age groups of both partners. Matching was performed within each clinical site, except in the case of one combination of age groups for which it was difficult to recruit participants — a male partner 20 to 25 years of age with a female partner 25 to 29 years of age. The matching of couples in this category was performed without regard to clinical site. In all, we studied 765 men from infertile couples and 696 men from fertile couples.

Semen Collection and Laboratory Evaluation

Written informed consent was obtained from all participants after recruitment. Semen samples were collected by masturbation at the clinical site, after the men had been asked to abstain from ejaculation for at least 48 hours before semen was collected. All semen analyses were performed manually within one hour after the sample was collected and included measurements of the volume of the ejaculate and determinations of the sperm concentration and the percentage of sperm with any evidence of flagellar movement (percentage motility). Details of these procedures have been published previously.11

Two semen specimens were obtained from each of the fertile men a mean of 16 days apart; 27 fertile men submitted samples more than 30 days apart. From infertile men, up to six semen samples were obtained — two before randomization and one for each of up to four treatment cycles.10 Of these specimens, we used the two obtained closest together in time; the mean number of days between the specimen collections was 41.5. The mean values for the sperm concentrations, the percentages of motile sperm, and the percentages of sperm with normal morphologic features in the two samples were used in the analysis.

Technicians from the nine clinical sites attended training sessions in semen analysis at the central laboratory at the University of California, Davis. The proficiency of the 26 technicians was tested at the clinical sites approximately twice each year with the use of blindly coded sperm suspensions and videotapes distributed by the central laboratory.

Semen smears were stained at the clinical sites by the Papanicolaou method and shipped to the central laboratory for assessment of sperm morphology by a single technician. Sperm were classified as having normal or abnormal morphologic features according to strict criteria.7 Either 200 or 300 sperm were analyzed per slide. Initially, 100 sperm from each of two different locations on the slide were analyzed. If the difference between the percentage of normal sperm in the two areas was 5 percentage points or less, the mean value was calculated. If the difference was more than 5 percentage points, an additional 100 sperm were evaluated from a third location, and the median of the three values was used.

The technician who assessed sperm morphology attended a training session conducted by Dr. Thinus Kruger, who developed the criteria that are used for strict assessments of morphology.7 A set of 65 slides from patients in the study, scored by Dr. Kruger, was used as the standard for purposes of quality control. The percentage of sperm with normal morphologic features was known to the technician for 10 of these slides but was unknown for the other 55 standard slides. Each day, the technician scored two of the slides whose morphologic values she knew and compared her results with the standard value; she then scored two unknown slides. Approximately every two months, the mean percentage of sperm with normal morphologic features for these 55 slides was compared with the mean percentage as scored by Dr. Kruger. The two mean values never differed by more than 1 percentage point, and the Spearman rank-correlation coefficient for the two data sets was always at least 0.92.

Statistical Analysis

Data were sent to the data coordinating center at Columbia University, where computerized checks for out-of-range values and errors in logic were performed. Data that did not meet the standards were sent back to the clinical sites for verification. Analyses were performed with the use of the SAS (version 6.12, SAS Institute, Cary, N.C.) and S-Plus (StatSci, Seattle) statistical packages.

Classification-and-regression-tree (CART)12 analysis was used to estimate thresholds for each sperm measurement that would discriminate between fertile men and infertile men. The CART algorithm uses an exhaustive search of all possible divisions of participants according to one of the continuous predictor variables to identify the division that results in the greatest improvement in the goodness of fit. In the present application, two thresholds were estimated for each sperm measurement; one became the threshold between the subfertile and indeterminate ranges, and the other became the threshold between the indeterminate and fertile ranges. We used the thresholds for discrimination defined by the CART analysis to create categorical variables for the measurements. We used logistic regression to estimate the association of fertility status (using 1 to indicate infertility and 0 to indicate fertility) with the various semen measurements.

We then calculated the sensitivity and specificity of the CART-defined thresholds for classifying infertility. The analysis of receiver-operating-characteristic curves,13,14 which defines tradeoffs between sensitivity and specificity along the spectrum of possible thresholds, was used to test whether each semen measurement discriminated between fertile and infertile men and to assess the relative performance of the three semen measurements in making this discrimination.

Results

The demographic characteristics of the study population are shown in Table 1. Partners in fertile couples had higher educational levels than partners in infertile couples. Infertile couples were more likely to be white, to smoke, and to consume alcohol.

View this table:
[in this window]
[in a new window]
 
Table 1. Characteristics of Infertile and Fertile Couples from Nine Reproductive-Medicine Centers.

 
There was considerable overlap between the sperm measurements for the fertile men and those for the infertile men. The mean (±SD) sperm concentration was 67±50x106 per milliliter in fertile men (median, 56x106 per milliliter) and 52±42x106 per milliliter in infertile men (median, 42x106 per milliliter). The mean percentage of sperm with motility was 54±13 percent in fertile men (median, 55 percent) and 49±15 percent in infertile men (median, 55 percent). The mean percentage of sperm with normal morphologic features was 14±5 percent in fertile men (median, 14 percent) and 11±6 percent in infertile men (median, 10 percent).

The results of the CART analysis for each sperm measurement are shown in Table 2. The values that best defined infertility were a concentration of less than 13.5x106 per milliliter, less than 32 percent motility, and less than 9 percent normal morphologic features. The ranges associated with indeterminate fertility were concentrations of 13.5 to 48.0x106 per milliliter, 32 to 63 percent motility, and 9 to 12 percent normal morphologic features. The likelihood of infertility increased with decreasing sperm concentration, percentage with motility, or percentage with normal morphologic features (Table 2). For example, relative to a sperm concentration in the fertile range, the odds ratio for infertility was 1.5 (95 percent confidence interval, 1.2 to 1.8) for a sperm concentration in the indeterminate range and 5.3 (95 percent confidence interval, 3.3 to 8.3) for a sperm concentration in the subfertile range.

View this table:
[in this window]
[in a new window]
 
Table 2. Fertile, Indeterminate, and Subfertile Ranges for Sperm Measurements from Classification-and-Regression-Tree Analysis and Corresponding Odds Ratios for Infertility.

 
The odds of infertility increased with an increasing number of sperm measurements in the subfertile range (Table 3). In comparison with an increase by a factor of 2 to 3 in the likelihood of infertility when one sperm measurement was in the subfertile range, there was an increase by a factor of 5 to 7 in the risk of infertility when two sperm measurements were subfertile, and an increase by a factor of 16 when all three measurements were subfertile. For example, when the percentage of sperm with normal morphologic features was in the fertile range but the percentage of motile sperm and the concentration were both in the subfertile range, the odds ratio for infertility was 5.5.

View this table:
[in this window]
[in a new window]
 
Table 3. Odds Ratios for Infertility for Combinations of Sperm Measurements.

 
The frequency distributions of fertile and infertile men with regard to sperm concentration, motility, and morphology, divided on the basis of the thresholds determined by the CART analysis, are shown in Figure 1. There was a marked excess of infertile men with values in the subfertile ranges of these semen measurements, and a corresponding excess of fertile men with values in the fertile ranges. Approximately equal proportions of fertile and infertile men had values in the indeterminate ranges of these variables.


View larger version (14K):
[in this window]
[in a new window]
 
Figure 1. Percentage of Men from Infertile and Fertile Couples with Values in the Subfertile, Indeterminate, and Fertile Ranges for Sperm Concentration (Panel A), Percentage of Motile Sperm (Panel B), and Percentage of Sperm with Normal Morphologic Features (Panel C), as Defined by Classification-and-Regression-Tree Analysis.

Arrows indicate the thresholds between subfertile and indeterminate ranges (left) and indeterminate and fertile ranges (right).

 
On the basis of the area under the receiver-operating-characteristic curve, each of the three measurements — the sperm concentration, percentage of motile sperm, and percentage of sperm with normal morphologic features — provided information that was helpful in discriminating between fertile and infertile men. The area under the curve for the percentage with normal morphologic features (0.66) was significantly greater than that for sperm concentration (0.60; P<0.001) and motility (0.59; P<0.001), whereas the areas under the curves for sperm concentration and motility were similar. The sensitivity and specificity of these measurements for identifying infertile men at various thresholds, including those defined by our CART analysis, are shown in Table 4. Lowering the threshold for indeterminate fertility reduces the sensitivity of the measures (the likelihood of correctly identifying infertile men) but increases their specificity (the likelihood of correctly identifying fertile men).

View this table:
[in this window]
[in a new window]
 
Table 4. Sensitivity and Specificity of Sperm Measurements for Identifying Infertile Men at Various Thresholds.

 
Discussion

The results of this study confirm that measurements of sperm concentration, motility, and morphology all provide useful information for diagnosing male infertility. Sperm morphology, as measured according to strict criteria, appears to be the most informative semen measurement for discriminating between fertile and infertile men. However, none of the measures, alone or in combination, can be considered diagnostic of infertility.

Several different approaches have been used to identify standards for normal semen measurements. Some focus on infertile couples, comparing those who conceive with those who do not.15,16 Others have compared fertile men with infertile men17,18,19 or have followed couples after they discontinued the use of contraception.8,9

Studies focusing on infertile couples undergoing treatment — i.e., those comparing couples who conceive with those who do not conceive15,16 — are limited by the inclusion of infertile couples only; in order to define the fertile ranges of semen measurements, fertile men must also be evaluated. Other reports have involved follow-up of couples who have discontinued their use of contraception.8,9 Although this approach has the advantage of prospectively defining couples as fertile and infertile, it requires large samples, since only 8 to 9 percent of couples are infertile.20 Moreover, in an unknown proportion of infertile couples, the woman is infertile. Two recent studies that used this approach concluded that a reevaluation of the existing standards for normal semen was needed,8,9 but neither study derived new standards.

A comparison of semen measurements between fertile and infertile men, which was our approach, was used in the 1950s by MacLeod and Gold.17,21,22 In these earlier studies, however, modern methods of semen evaluation were not used, and data were obtained from male partners in infertile couples regardless of the fertility status of the female partners. Nevertheless, the minimal standards for sperm concentration (20x106 per milliliter) and motility (40 percent motile cells) reported by MacLeod and Gold are near the values we derived. The standard for morphology (60 percent with normal morphologic features) cannot be compared with our results because a different scoring system was used.

In our study, sperm morphology was assessed by a single person with extensive training and substantial experience, and reliability was monitored on an ongoing basis. The application of our results in clinical laboratories would require the training of technicians and the implementation of tools for continuous calibration. The subfertile and fertile ranges for morphology (less than 9 percent and more than 12 percent with normal morphologic features, respectively) might appear to be so close that they would be hard to distinguish. With the system of training for technicians and the calibration methods we used, however, there were only 2 of the 65 quality-assurance slides for which the assessment of the percentage of normal sperm spanned the range from less than 9 percent to more than 12 percent in 14 readings during the course of the study. Although we assessed morphology according to strict criteria, our results do not necessarily imply that this method is superior to other approaches. Nevertheless, our results do provide a reference value for sperm morphology that is missing from the current WHO manual for semen evaluation.7

Instead of a single value for each semen measurement that presumably distinguishes between "normal" and "abnormal," we estimated the best two values that allow for the delineation of three groups — fertile, indeterminate, and subfertile. We believe that this classification system is clinically meaningful23 and is appropriate to what is, biologically, a continuous function.

Our data suggest that caution must be used in interpreting the significance of any given subfertile or indeterminate semen measurement. Although low values for each measurement increase the likelihood that a male factor contributes to infertility, there was substantial overlap in the frequency distributions in our study. Thus, values for sperm concentration, motility, or morphology that are in the subfertile range do not exclude the possibility of normal fertility.

To facilitate recruitment, we defined fertility as pregnancy within the previous two years rather than current pregnancy. Even though we attempted to match the fertile and infertile couples according to age and geographic location, there were demographic differences between the two groups. There were also differences between the two groups in the interval between the collection of the two semen specimens. In addition, despite normal results on fertility evaluation of the female partner, there may have been unrecognized subclinical female factors contributing to the infertility of the infertile couples. Finally, confirmation of the validity of these thresholds for semen measurements in an independent sample of fertile and infertile men is needed.

Notwithstanding these limitations, our data from a large group of couples with well-documented fertility or infertility provide clinical standards for semen measurements that may be useful for diagnosing male-factor infertility and for distinguishing between subfertile, indeterminate, and fertile ranges. These thresholds can be applied in clinical practice and research, provided that there is strict quality control.

Supported by cooperative agreements (U10 HD26975, U10 HD26981, U01 HD27006, U10 HD27009, U10 HD27001, U10 HD27049, U10 HD33172, and U10 HD33173) with the National Institute of Child Health and Human Development.

We are indebted to Thinus F. Kruger, M.D., for scoring the set of sperm-morphology slides that were used for laboratory quality control, and to Catherine Treece, C.L.A., for analyzing the slides.

* Other members of the National Cooperative Reproductive Medicine Network are listed in the Appendix.


Source Information

From the University of Rochester, Rochester, N.Y. (D.S.G.); the University of California, Davis (J.W.O., C.K.B., S.T.N.); Columbia University, New York (P.F.-L., D.X.); the University of Pennsylvania Medical Center, Philadelphia (C.C.); Baylor College of Medicine, Houston (S.A.C., P.C.); the University of Alabama, Birmingham (M.P.S.); Brigham and Women's Hospital, Boston (J.A.H.); and the National Institutes of Health, Bethesda, Md. (D.L.V.).

Address reprint requests to Dr. Guzick at the Department of Obstetrics and Gynecology, University of Rochester School of Medicine and Dentistry, Box 668, 601 Elmwood Ave., Rochester, NY 14642, or at david_guzick{at}urmc.rochester.edu.

References

  1. Rowe PH, Comhaire FH, Hargreave TB, Mellows HJ. WHO manual for the standardized investigation and diagnosis of the infertile couple. Cambridge, England: Cambridge University Press, 1993. 
  2. Investigation of the infertile couple. Birmingham, Ala.: American Fertility Society, 1994.
  3. Environmental Protection Agency. Guidelines for reproductive toxicity assessment. Fed Regist 1996;61:56274-56274. 
  4. World Health Organization. WHO laboratory manual for the examination of human semen and semen-cervical mucus interaction. Singapore: Press Concern, 1980.
  5. World Health Organization. WHO laboratory manual for the examination of human semen and semen-cervical mucus interaction. 2nd ed. Cambridge, England: Cambridge University Press, 1987.
  6. World Health Organization. WHO laboratory manual for the examination of human semen and sperm-cervical mucus interaction. 3rd ed. Cambridge, England: Cambridge University Press, 1992.
  7. World Health Organization. WHO laboratory manual for the examination of human semen and sperm-cervical mucus interaction. 4th ed. Cambridge, England: Cambridge University Press, 1999.
  8. Bonde JP, Ernst E, Jensen TK, et al. Relation between semen quality and fertility: a population-based study of 430 first-pregnancy planners. Lancet 1998;352:1172-1177. [CrossRef][ISI][Medline]
  9. Zinaman MJ, Brown CC, Selevan SG, Clegg ED. Semen quality and human fertility: a prospective study with healthy couples. J Androl 2000;21:145-153. [Abstract]
  10. Guzick DS, Carson SA, Coutifaris C, et al. Efficacy of superovulation and intrauterine insemination in the treatment of infertility. N Engl J Med 1999;340:177-183. [Free Full Text]
  11. Overstreet JW, Brazil CK. Semen analysis. In: Lipshultz LI, Howards SS, eds. Infertility in the male. 3rd ed. St. Louis: Mosby–Year Book, 1997:487-90.
  12. Breiman L, Friedman JH, Olshen RA, Stone CJ. Classification and regression trees. Belmont, Calif.: Wadsworth, 1984.
  13. Metz CE. Basic principles of ROC analysis. Semin Nucl Med 1978;8:283-298. [ISI][Medline]
  14. Hanley JA, McNeil BJ. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982;143:29-36. [Free Full Text]
  15. Dunphy BC, Neal LM, Cooke ID. The clinical value of conventional semen analysis. Fertil Steril 1989;51:324-329. [ISI][Medline]
  16. Polansky FF, Lamb EJ. Do the results of semen analysis predict future fertility? A survival analysis study. Fertil Steril 1988;49:1059-1065. [ISI][Medline]
  17. MacLeod J, Gold RZ. The male factor in fertility and infertility. II. Spermatozoon counts in 1000 men of known fertility and in 1000 cases of infertile marriage. J Urol 1951;66:436-449. [ISI][Medline]
  18. Zukerman Z, Rodriguez-Rigau IJ, Smith KD, Steinberger E. Frequency distribution of sperm counts in fertile and infertile males. Fertil Steril 1977;28:1310-1313. [ISI][Medline]
  19. David G, Jouannet P, Martin-Boyce A, Spira A, Schwartz D. Sperm counts in fertile and infertile men. Fertil Steril 1979;31:453-455. [Medline]
  20. Stephen EH. Projections of impaired fecundity among women in the United States: 1995 to 2020. Fertil Steril 1996;66:205-209. [Medline]
  21. MacLeod J, Gold RZ. The male factor in fertility and infertility. III. An analysis of motile activity in the spermatozoa of 1000 fertile men and 1000 men in infertile marriage. Fertil Steril 1951;2:187-204.
  22. MacLeod J, Gold RZ. The male factor in fertility and infertility. IV. Sperm morphology in fertile and infertile marriage. Fertil Steril 1951;2:394-414.
  23. Overstreet JW, Davis RO. Methods and interpretation of semen analysis. In: Key WR, Chang RJ, Rebar RW, Soules M, eds. Infertility: evaluation and treatment. New York: W.B. Saunders, 1995:580-91.
Appendix

In addition to the authors, other investigators in the National Cooperative Reproductive Medicine Network were as follows: Baylor College of Medicine, Houston — P. Casson, S. Lindsey; Brigham and Women's Hospital, Boston — K. Walsh, M. Rein; Columbia University, New York — R. Canfield, R. Coslit, P. Kringas, B. Levin, M.C. Paik, S. Schoenholtz; University of Alabama, Birmingham — R. Blackwell, E. Knochenhauer, K. Hammond, V. Willis; University of California, Davis S. Boyers, J. Chang, R. Covell, K. Sweeney, L. Wisner; Kaiser Permanente, Santa Clara, Calif. — M. Colombo, J. D'Amico; University of Pennsylvania, Philadelphia — K. Timbers, J. Stansberry, L. Blasco, K. Walsh; University of Pittsburgh, Pittsburgh — J. Albert, S. Berga, M. Everson; University of Rochester, Rochester, N.Y. — G. Centola, W. Phipps, G. Santoriello; and the Data Safety and Monitoring Committee — J. Schreiber, S. Fowler, G. Colditz, T.L. Bush.


 

This Article
-Abstract
- PDF

Tools and Services
-Add to Personal Archive
-Add to Citation Manager
-Notify a Friend
-E-mail When Cited

More Information
-PubMed Citation

This article has been cited by other articles:



HOME  |  SUBSCRIBE  |  SEARCH  |  CURRENT ISSUE  |  PAST ISSUES  |  COLLECTIONS  |  PRIVACY  |  HELP  |  beta.nejm.org

Comments and questions? Please contact us.

The New England Journal of Medicine is owned, published, and copyrighted © 2008 Massachusetts Medical Society. All rights reserved.