Objective
We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators’ ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB.
Study Design
We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks’ gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software.
Results
One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ 2 analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, ( P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB.
Conclusion
We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB.
Spontaneous preterm birth (SPTB) remains the leading cause of morbidity and death in nonanomalous newborn infants, yet our understanding of the causes of SPTB is limited. This is, in part, because SPTB is a multifactorial condition with multiple causes and likely results from specific interactions between the environment and genetic factors. There is support of a genetic component to SPTB that is suggested by the presence of racial disparities that persist, despite controlling for multiple risk factors. In addition, there is a strong risk for recurrence of SPTB in women with a personal history of SPTB in a previous pregnancy. In addition, a clear familial predisposition has been demonstrated. Finally, twin studies support the role of genetic risk factors in preterm birth by estimating the heritability at 20-40%.
Efforts to identify the genetic causes of SPTB have produced overall disappointing results. A recent large genome-wide association study of SPTB identified specific single nucleotide polymorphisms (SNPs) that were associated with SPTB, but these subsequently could not be validated. One attempt to summarize the genetic contribution to SPTB concluded that no robustly validated genetic variants that contribute to this complex disease process have been identified. This lack of success is likely due to, at least in part, inadequate phenotyping of SPTB cases, the heterogeneity of the disease process, differences among patient populations, or a combination of these factors.
The Genomic and Proteomic Network for Preterm Birth Research (GPN) was established by the Eunice Kennedy Shriver National Institute for Child Health and Human Development to study the genetic and environmental causes, with a goal to decipher mechanisms underlying SPTB. Accurate and precise phenotypes were needed to accomplish this goal. We previously created a unique phenotyping tool using clinical features that are present at the time of delivery to define 9 phenotypes that are suggestive of underlying causes of SPTB. We applied the phenotype tool to >1000 women with SPTB, were able to classify >95% of women into ≥1 phenotype categories, and demonstrated that most cases of SPTB have evidence of ≥2 phenotypes present and that phenotypes vary by gestational age at delivery and by race. Assigning a phenotype that suggests similar underlying etiology for SPTB among a group of women will likely result in an enhanced ability to identify genes or pathways that are associated with that phenotype.
We hypothesized that associations that exist between SPTB phenotypes that highlight common mechanisms responsible for SPTB will enhance our ability to identify the underlying genetic factors that are responsible for this complication. We further hypothesized that cluster analysis with the use of subcategories within phenotypes might identify subsets of women with a similar genetic risk for SPTB. We sought to test this by evaluating candidate genes that might be associated with SPTB among 1 of the subsets that we identified.
Materials and Methods
This is a secondary analysis of a multicenter, prospective cohort of women who were enrolled in the GPN case-control study.
Patient recruitment
Women with SPTB and who matched uncomplicated term control subjects were prospectively recruited from November 2007 through January 2011 across 8 clinical sites that included the University of Utah/Intermountain Healthcare, University of Texas Medical Branch–Galveston, University of Alabama at Birmingham, Columbia University, Northwestern University, University of Texas–Houston, University of North Carolina–Chapel Hill, and Brown University. This study was approved by the Institutional Review Board at each center, and a written informed consent was obtained from all participants.
Women were included in the study if they experienced a preterm birth of a singleton pregnancy between 20 0/7 and 33 6/7 weeks’ gestation after spontaneous labor. The inclusion criteria for the study have been published previously.
Women were excluded from the study if they were diagnosed with a stillbirth before presentation to labor and delivery or if they needed an indicated delivery for maternal or fetal complications. Women who experienced an intrapartum stillbirth or who had spontaneous labor in addition to maternal or fetal complications were not excluded.
A control group was also collected that consisted of women who experienced a singleton live birth after spontaneous labor at ≥39 weeks’ gestation. Control subjects were excluded if they had a history of a pregnancy that had been complicated by SPTB. Control subjects were used only for the analysis of candidate genes.
Data collection
Clinical and demographic data were collected for cases and control subjects by trained research nurses who used in-person interviews before hospital discharge whenever possible. All interviews and abstraction of medical records were performed within 14 days of delivery. Data that were collected included demographics; medical, social, family, and obstetric history; obstetric course, and complications during the current pregnancy. Patients also completed validated questionnaires to assess factors such as anxiety (Beck anxiety index), depression (Beck depression inventory), perceived stress (Perceived stress scale), and attitude of the subject and partner with respect to pregnancy.
Cluster analysis
A phenotyping tool was designed by the authors (M.S.E., T.A.M., M.W.V.) who grouped maternal social, demographic, family history, and obstetric factors into SPTB categories ( Table 1 ). Category clinical factors were classified into levels of evidence as providing “strong,” “moderate,” and “possible” evidence of the phenotype. Cluster analysis was used to classify 1028 unique SPTB cases. The data included binary indicator variables for several phenotypes that related to SPTB that included infection/inflammation, maternal stress (hypothalamus-pituitary axis activation), decidual hemorrhage, uterine distension, cervical insufficiency, preterm premature rupture of membranes, placental dysfunction, maternal comorbidities, and familial phenotypes. There were 2 or 3 levels of evidence for each of the phenotypes. Identification of 1 level of evidence of a specific phenotype was not mutually exclusive for the other levels of evidence for the same phenotype. For example, 1 subject might have strong and moderate and possible evidence for ≥1 phenotypes. It is possible that the true presence of a phenotype may be more likely in women who had >1 indicator of the phenotype. Thus, this information was used to calculate a “weighted” score for each factor. Three points were given for “strong” evidence, 2 points for “moderate” evidence, and 1 point for “possible” evidence for each phenotype. A subject with evidence from each of the categories of strong, moderate, and possible for a particular phenotype would receive 6 points for that phenotype. The maximum score any individual could receive for each phenotype was therefore 6 points. There was no limit to the number of phenotypes or levels of evidence that each woman could be assigned, provided she met the criteria.
Cluster analysis incorporated demographic variables (including maternal age, race, Hispanic ethnicity, educational attainment, marital status, and nulliparity), binary indicators for each level of phenotypic evidence, and the weighted score for each phenotype category. Using these variables as input, a sample dissimilarity matrix was generated using the “Daisy” method in the R “Cluster” package (R license: http://www.r-project.org/about.html ; Cluster: http://cran.r-project.org/web/packages/cluster/index.html ). χ 2 analysis was performed to evaluate the potential correlation among specific phenotypes. Figure 1 illustrates the clustering of each individual who was included in the analysis.
Candidate gene analysis
Once cluster analysis was complete, 1 cluster (cluster 3) was selected to use for gene-based analysis. We chose this cluster because it contained women with a strong familial phenotype, and we thought it likely that they might have a genetic contribution to their SPTB. The women within the sample cluster were compared with 717 term control subjects.
All cluster cases and term control subjects had biologic samples collected at the time of their delivery, and DNA was subsequently extracted for all study subjects. Genotypes for 905,682 SNPs were generated with the Affymetrix SNP 6.0 (Affymetrix, Santa Clara, CA) genotyping array as previously described. For the present study, genotype data were downloaded from database of genotypes and phenotypes in binary PLINK format ( http://pngu.mgh.harvard.edu/∼purcell/plink/ ). The files contained genotypes for 1419 individual mothers including 702 women with at least 1 SPTB and 717 women with no history of preterm birth. Quality assurance testing was done to identify an appropriate set of samples and SNPs for use in association testing. The samples were screened for sex discrepancies, sample duplications, and high Mendelian error rates. Principal components analysis was performed to ascertain population stratification within the data and to confirm the reported ancestry of individuals in the study. Identity-by-descent estimates were calculated to assess relationships between all pairs of samples. Samples were removed from the analysis if the mean pairwise identity-by-descent value that compared with all other samples was >0.04. Samples with autosomal SNP call rates <0.95 were also excluded from further analysis. SNPs with call rates <0.95, minor allele frequency <0.005, or significant departure from Hardy-Weinberg equilibrium ( P <5 × 10 –8 in non-Hispanic white control subjects) were removed from analysis. A total of 841,350 SNPs on chromosomes 1-22 and the X chromosome passed all criteria that were used for association testing.
Gene-based testing
Genotype-association tests were performed to compare SPTB cases in cluster 3 with the non-SPTB control subjects. All tests were performed with Golden Helix SNP and Variation Suite software (version 8.1; Golden Helix Institute, Bozeman, MT). Significance was tested with logistic regression that assumed an additive genetic model. Results were adjusted for 3 principal components to account for population structure and ethnic stratification within the data. The output of the SNP tests was processed with the VEGAS program ( http://gump.qimr.edu.au/VEGAS/ ) to generate gene-level association test results. The VEGAS program compiles the significance of all SNPs in or near each gene to determine the significance of the entire gene region with the use of a simulation procedure. The test for each gene includes all SNPs within 50 kb of the gene, thereby capturing most cis regulatory regions and other important features in the region of the gene. VEGAS combines the significance of individual SNPs using a linkage disequilibrium model to determine the expected correlation patterns within the gene. Several independent SNP associations within a gene thus may be combined to assess the overall significance of the gene region.
Nine hundred sixty-six genes from previously identified inflammatory pathways were selected for evaluation in this candidate gene analysis. We chose to use genes from inflammatory pathways because inflammation is a common underlying mechanism for multiple causes of SPTB. Based on the number of tests that are required to evaluate 966 genes, a probability value of approximately 5e-5 was required to declare significance in the analysis.
Results
We applied the phenotyping tool to 1028 women with SPTB. Hierarchic clustering of the dissimilarity matrix using R revealed 5 major data clusters. Clusters can be visualized in Figure 2 . The raw values for each of the variables that was assessed in each cluster can be found in Table 3 . Cluster 1 (n = 445) is characterized by “maternal stress” (hypothalamus-pituitary axis activation); cluster 2 (n = 294) is characterized by premature membrane rupture; cluster 3 (n = 120) is characterized by familial factors, and cluster 4 (n = 63) is characterized by maternal comorbidities. Cluster 5 (n = 106) is multifactorial, characterized by infection, decidual hemorrhage, and placental dysfunction. Significant cooccurrence was observed between these 3 phenotypic categories. χ 2 analysis shows correlation between placental dysfunction and decidual hemorrhage ( P < 2.2e-6), placental dysfunction and inflammation/infection ( P = 6.2e-10), and between inflammation/infection and decidual hemorrhage ( P = .0036).
Variable | Cases in cluster 3 | Control subjects | P value |
---|---|---|---|
n | 78 | 717 | |
Mean age, y ± SD | 25.6 ± 5.5 | 25.5 ± 5.7 | .897 |
White, n (%) | 66 (84.6) | 486 (67.9) | .003 |
African American, n (%) | 8 (10.3) | 169 (23.6) | .011 |
Hispanic (any race), y (%) | 10 (12.8) | 133 (18.6) | .273 |
Education | |||
13-16 y, n (%) | 41 (52.6) | 313 (43.7) | .166 |
9-12 y, n (%) | 34 (43.6) | 347 (48.5) | .492 |
Married/living with partner, n (%) | 47 (60.3) | 400 (55.9) | .525 |
Never married, living with partner, n (%) | 14 (17.9) | 118 (16.5) | .860 |
Never married, not living with partner, n (%) | 12 (15.4) | 174 (24.3) | .105 |
Variable | Cluster, % a | ||||
---|---|---|---|---|---|
1 | 2 | 3 | 4 | 5 | |
Married, live together | 40.2 | 48.6 | 62.5 | 54.0 | 78.3 |
Never married, living together | 26.3 | 19.0 | 16.7 | 25.4 | 7.5 |
Never married, not living together | 28.1 | 29.9 | 14.2 | 15.9 | 10.4 |
Education | |||||
9-12 y | 54.8 | 53.1 | 40.8 | 50.8 | 34.0 |
13-16 y | 30.3 | 43.2 | 55.8 | 30.2 | 60.4 |
Race | |||||
Black | 19.1 | 44.9 | 9.2 | 9.5 | 0.9 |
White | 68.8 | 49.3 | 85.0 | 74.6 | 92.5 |
Hispanic (any race) | 31.7 | 5.8 | 11.7 | 46.0 | 9.4 |
Nulliparity | 45.4 | 51.0 | 49.2 | 36.5 | 44.3 |
Infection | |||||
1 | 3.8 | 2.0 | 6.7 | 0.0 | 50.9 |
2 | 6.3 | 5.8 | 10.0 | 9.5 | 32.1 |
3 | 27.2 | 25.5 | 25.0 | 20.6 | 30.2 |
Any | 33.3 | 31.0 | 38.3 | 28.6 | 78.3 |
Maternal stress | |||||
2 | 44.7 | 5.1 | 47.5 | 34.9 | 32.1 |
3 | 63.1 | 25.9 | 25.0 | 42.9 | 15.1 |
Any | 80.7 | 29.9 | 49.2 | 65.1 | 41.5 |
Hemorrhage | |||||
1 | 0.4 | 0.0 | 0.0 | 0.0 | 0.9 |
2 | 12.8 | 6.1 | 11.7 | 6.3 | 53.8 |
3 | 20.0 | 11.2 | 29.2 | 17.5 | 32.1 |
Any | 30.3 | 15.3 | 37.5 | 23.8 | 71.7 |
Distension | |||||
2 | 26.1 | 3.4 | 16.7 | 22.2 | 14.2 |
3 | 1.6 | 4.4 | 5.8 | 4.8 | 13.2 |
Any | 26.7 | 7.8 | 22.5 | 25.4 | 26.4 |
Cervical | |||||
1 | 4.0 | 10.2 | 7.5 | 4.8 | 6.6 |
2 | 2.9 | 7.1 | 6.7 | 9.5 | 2.8 |
3 | 1.6 | 1.4 | 3.3 | 3.2 | 4.7 |
Any | 7.4 | 16.0 | 15.0 | 14.3 | 10.4 |
Placental dysfunction | |||||
1 | 0.7 | 0.3 | 4.2 | 3.2 | 23.6 |
2 | 2.5 | 1.4 | 7.5 | 4.8 | 42.5 |
3 | 3.1 | 0.7 | 3.3 | 0.0 | 33.0 |
Any | 4.9 | 2.0 | 9.2 | 4.8 | 61.3 |
Maternal comorbidity | |||||
1 | 1.1 | 7.1 | 4.2 | 81.0 | 5.7 |
2 | 10.3 | 13.9 | 21.7 | 76.2 | 20.8 |
Any | 10.6 | 18.4 | 23.3 | 100.0 | 24.5 |
Familial | |||||
1 | 13.9 | 21.4 | 60.8 | 17.5 | 17.0 |
2 | 10.8 | 13.9 | 35.8 | 4.8 | 7.5 |
Any | 23.4 | 29.9 | 84.2 | 22.2 | 22.6 |
Preterm premature rupture of membranes | |||||
1 | 12.1 | 34.4 | 8.3 | 30.2 | 25.5 |
2 | 13.3 | 19.0 | 12.5 | 7.9 | 6.6 |
3 | 3.6 | 6.8 | 3.3 | 3.2 | 6.6 |
Any | 26.5 | 53.7 | 22.5 | 38.1 | 34.0 |