Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms




Objective


We sought to use an innovative tool that is based on common biologic pathways to identify specific phenotypes among women with spontaneous preterm birth (SPTB) to enhance investigators’ ability to identify and to highlight common mechanisms and underlying genetic factors that are responsible for SPTB.


Study Design


We performed a secondary analysis of a prospective case-control multicenter study of SPTB. All cases delivered a preterm singleton at SPTB ≤34.0 weeks’ gestation. Each woman was assessed for the presence of underlying SPTB causes. A hierarchic cluster analysis was used to identify groups of women with homogeneous phenotypic profiles. One of the phenotypic clusters was selected for candidate gene association analysis with the use of VEGAS software.


Results


One thousand twenty-eight women with SPTB were assigned phenotypes. Hierarchic clustering of the phenotypes revealed 5 major clusters. Cluster 1 (n = 445) was characterized by maternal stress; cluster 2 (n = 294) was characterized by premature membrane rupture; cluster 3 (n = 120) was characterized by familial factors, and cluster 4 (n = 63) was characterized by maternal comorbidities. Cluster 5 (n = 106) was multifactorial and characterized by infection (INF), decidual hemorrhage (DH), and placental dysfunction (PD). These 3 phenotypes were correlated highly by χ 2 analysis (PD and DH, P < 2.2e-6; PD and INF, P = 6.2e-10; INF and DH, ( P = .0036). Gene-based testing identified the INS (insulin) gene as significantly associated with cluster 3 of SPTB.


Conclusion


We identified 5 major clusters of SPTB based on a phenotype tool and hierarch clustering. There was significant correlation between several of the phenotypes. The INS gene was associated with familial factors that were underlying SPTB.


Spontaneous preterm birth (SPTB) remains the leading cause of morbidity and death in nonanomalous newborn infants, yet our understanding of the causes of SPTB is limited. This is, in part, because SPTB is a multifactorial condition with multiple causes and likely results from specific interactions between the environment and genetic factors. There is support of a genetic component to SPTB that is suggested by the presence of racial disparities that persist, despite controlling for multiple risk factors. In addition, there is a strong risk for recurrence of SPTB in women with a personal history of SPTB in a previous pregnancy. In addition, a clear familial predisposition has been demonstrated. Finally, twin studies support the role of genetic risk factors in preterm birth by estimating the heritability at 20-40%.


Efforts to identify the genetic causes of SPTB have produced overall disappointing results. A recent large genome-wide association study of SPTB identified specific single nucleotide polymorphisms (SNPs) that were associated with SPTB, but these subsequently could not be validated. One attempt to summarize the genetic contribution to SPTB concluded that no robustly validated genetic variants that contribute to this complex disease process have been identified. This lack of success is likely due to, at least in part, inadequate phenotyping of SPTB cases, the heterogeneity of the disease process, differences among patient populations, or a combination of these factors.


The Genomic and Proteomic Network for Preterm Birth Research (GPN) was established by the Eunice Kennedy Shriver National Institute for Child Health and Human Development to study the genetic and environmental causes, with a goal to decipher mechanisms underlying SPTB. Accurate and precise phenotypes were needed to accomplish this goal. We previously created a unique phenotyping tool using clinical features that are present at the time of delivery to define 9 phenotypes that are suggestive of underlying causes of SPTB. We applied the phenotype tool to >1000 women with SPTB, were able to classify >95% of women into ≥1 phenotype categories, and demonstrated that most cases of SPTB have evidence of ≥2 phenotypes present and that phenotypes vary by gestational age at delivery and by race. Assigning a phenotype that suggests similar underlying etiology for SPTB among a group of women will likely result in an enhanced ability to identify genes or pathways that are associated with that phenotype.


We hypothesized that associations that exist between SPTB phenotypes that highlight common mechanisms responsible for SPTB will enhance our ability to identify the underlying genetic factors that are responsible for this complication. We further hypothesized that cluster analysis with the use of subcategories within phenotypes might identify subsets of women with a similar genetic risk for SPTB. We sought to test this by evaluating candidate genes that might be associated with SPTB among 1 of the subsets that we identified.


Materials and Methods


This is a secondary analysis of a multicenter, prospective cohort of women who were enrolled in the GPN case-control study.


Patient recruitment


Women with SPTB and who matched uncomplicated term control subjects were prospectively recruited from November 2007 through January 2011 across 8 clinical sites that included the University of Utah/Intermountain Healthcare, University of Texas Medical Branch–Galveston, University of Alabama at Birmingham, Columbia University, Northwestern University, University of Texas–Houston, University of North Carolina–Chapel Hill, and Brown University. This study was approved by the Institutional Review Board at each center, and a written informed consent was obtained from all participants.


Women were included in the study if they experienced a preterm birth of a singleton pregnancy between 20 0/7 and 33 6/7 weeks’ gestation after spontaneous labor. The inclusion criteria for the study have been published previously.


Women were excluded from the study if they were diagnosed with a stillbirth before presentation to labor and delivery or if they needed an indicated delivery for maternal or fetal complications. Women who experienced an intrapartum stillbirth or who had spontaneous labor in addition to maternal or fetal complications were not excluded.


A control group was also collected that consisted of women who experienced a singleton live birth after spontaneous labor at ≥39 weeks’ gestation. Control subjects were excluded if they had a history of a pregnancy that had been complicated by SPTB. Control subjects were used only for the analysis of candidate genes.


Data collection


Clinical and demographic data were collected for cases and control subjects by trained research nurses who used in-person interviews before hospital discharge whenever possible. All interviews and abstraction of medical records were performed within 14 days of delivery. Data that were collected included demographics; medical, social, family, and obstetric history; obstetric course, and complications during the current pregnancy. Patients also completed validated questionnaires to assess factors such as anxiety (Beck anxiety index), depression (Beck depression inventory), perceived stress (Perceived stress scale), and attitude of the subject and partner with respect to pregnancy.


Cluster analysis


A phenotyping tool was designed by the authors (M.S.E., T.A.M., M.W.V.) who grouped maternal social, demographic, family history, and obstetric factors into SPTB categories ( Table 1 ). Category clinical factors were classified into levels of evidence as providing “strong,” “moderate,” and “possible” evidence of the phenotype. Cluster analysis was used to classify 1028 unique SPTB cases. The data included binary indicator variables for several phenotypes that related to SPTB that included infection/inflammation, maternal stress (hypothalamus-pituitary axis activation), decidual hemorrhage, uterine distension, cervical insufficiency, preterm premature rupture of membranes, placental dysfunction, maternal comorbidities, and familial phenotypes. There were 2 or 3 levels of evidence for each of the phenotypes. Identification of 1 level of evidence of a specific phenotype was not mutually exclusive for the other levels of evidence for the same phenotype. For example, 1 subject might have strong and moderate and possible evidence for ≥1 phenotypes. It is possible that the true presence of a phenotype may be more likely in women who had >1 indicator of the phenotype. Thus, this information was used to calculate a “weighted” score for each factor. Three points were given for “strong” evidence, 2 points for “moderate” evidence, and 1 point for “possible” evidence for each phenotype. A subject with evidence from each of the categories of strong, moderate, and possible for a particular phenotype would receive 6 points for that phenotype. The maximum score any individual could receive for each phenotype was therefore 6 points. There was no limit to the number of phenotypes or levels of evidence that each woman could be assigned, provided she met the criteria.



Cluster analysis incorporated demographic variables (including maternal age, race, Hispanic ethnicity, educational attainment, marital status, and nulliparity), binary indicators for each level of phenotypic evidence, and the weighted score for each phenotype category. Using these variables as input, a sample dissimilarity matrix was generated using the “Daisy” method in the R “Cluster” package (R license: http://www.r-project.org/about.html ; Cluster: http://cran.r-project.org/web/packages/cluster/index.html ). χ 2 analysis was performed to evaluate the potential correlation among specific phenotypes. Figure 1 illustrates the clustering of each individual who was included in the analysis.




Figure 1


Hierarchic clustering of 1028 women with spontaneous preterm birth

The samples were divided into 5 main clusters for further analysis, as indicated by the 5 different colors.

Esplin. Cluster analysis of SPTB phenotypes. Am J Obstet Gynecol 2015 .


Candidate gene analysis


Once cluster analysis was complete, 1 cluster (cluster 3) was selected to use for gene-based analysis. We chose this cluster because it contained women with a strong familial phenotype, and we thought it likely that they might have a genetic contribution to their SPTB. The women within the sample cluster were compared with 717 term control subjects.


All cluster cases and term control subjects had biologic samples collected at the time of their delivery, and DNA was subsequently extracted for all study subjects. Genotypes for 905,682 SNPs were generated with the Affymetrix SNP 6.0 (Affymetrix, Santa Clara, CA) genotyping array as previously described. For the present study, genotype data were downloaded from database of genotypes and phenotypes in binary PLINK format ( http://pngu.mgh.harvard.edu/∼purcell/plink/ ). The files contained genotypes for 1419 individual mothers including 702 women with at least 1 SPTB and 717 women with no history of preterm birth. Quality assurance testing was done to identify an appropriate set of samples and SNPs for use in association testing. The samples were screened for sex discrepancies, sample duplications, and high Mendelian error rates. Principal components analysis was performed to ascertain population stratification within the data and to confirm the reported ancestry of individuals in the study. Identity-by-descent estimates were calculated to assess relationships between all pairs of samples. Samples were removed from the analysis if the mean pairwise identity-by-descent value that compared with all other samples was >0.04. Samples with autosomal SNP call rates <0.95 were also excluded from further analysis. SNPs with call rates <0.95, minor allele frequency <0.005, or significant departure from Hardy-Weinberg equilibrium ( P <5 × 10 –8 in non-Hispanic white control subjects) were removed from analysis. A total of 841,350 SNPs on chromosomes 1-22 and the X chromosome passed all criteria that were used for association testing.


Gene-based testing


Genotype-association tests were performed to compare SPTB cases in cluster 3 with the non-SPTB control subjects. All tests were performed with Golden Helix SNP and Variation Suite software (version 8.1; Golden Helix Institute, Bozeman, MT). Significance was tested with logistic regression that assumed an additive genetic model. Results were adjusted for 3 principal components to account for population structure and ethnic stratification within the data. The output of the SNP tests was processed with the VEGAS program ( http://gump.qimr.edu.au/VEGAS/ ) to generate gene-level association test results. The VEGAS program compiles the significance of all SNPs in or near each gene to determine the significance of the entire gene region with the use of a simulation procedure. The test for each gene includes all SNPs within 50 kb of the gene, thereby capturing most cis regulatory regions and other important features in the region of the gene. VEGAS combines the significance of individual SNPs using a linkage disequilibrium model to determine the expected correlation patterns within the gene. Several independent SNP associations within a gene thus may be combined to assess the overall significance of the gene region.


Nine hundred sixty-six genes from previously identified inflammatory pathways were selected for evaluation in this candidate gene analysis. We chose to use genes from inflammatory pathways because inflammation is a common underlying mechanism for multiple causes of SPTB. Based on the number of tests that are required to evaluate 966 genes, a probability value of approximately 5e-5 was required to declare significance in the analysis.




Results


We applied the phenotyping tool to 1028 women with SPTB. Hierarchic clustering of the dissimilarity matrix using R revealed 5 major data clusters. Clusters can be visualized in Figure 2 . The raw values for each of the variables that was assessed in each cluster can be found in Table 3 . Cluster 1 (n = 445) is characterized by “maternal stress” (hypothalamus-pituitary axis activation); cluster 2 (n = 294) is characterized by premature membrane rupture; cluster 3 (n = 120) is characterized by familial factors, and cluster 4 (n = 63) is characterized by maternal comorbidities. Cluster 5 (n = 106) is multifactorial, characterized by infection, decidual hemorrhage, and placental dysfunction. Significant cooccurrence was observed between these 3 phenotypic categories. χ 2 analysis shows correlation between placental dysfunction and decidual hemorrhage ( P < 2.2e-6), placental dysfunction and inflammation/infection ( P = 6.2e-10), and between inflammation/infection and decidual hemorrhage ( P = .0036).




Figure 2


Graphic representation of phenotype distributions for samples in each cluster

The coloring of each cell in the Figure indicates the proportion of samples in the cluster that are positive for the specified variable. For example, the mean of “maternal stress_any” for cluster 1 is 0.81, which indicates that 81% of the samples in the cluster are positive for some level of hypothalamus-pituitary axis activation. The color gradient is defined in the legend on the top edge of the Figure. The labels on the left side represent the different levels of evidence from each of the phenotype categories that were included in the final cluster analysis. The first word indicates the phenotype category and the number represents the level of evidence within that category (1 = strong; 2 = moderate; 3 = possible).

Dysf , dysfunction; Mar’d , married; Mat , maternal; Nev , never; PPROM , preterm premature rupture of membranes; Tog. , together.

Esplin. Cluster analysis of SPTB phenotypes. Am J Obstet Gynecol 2015 .


Table 2

Comparison of demographic information between cases from cluster 3 and control subjects
































































Variable Cases in cluster 3 Control subjects P value
n 78 717
Mean age, y ± SD 25.6 ± 5.5 25.5 ± 5.7 .897
White, n (%) 66 (84.6) 486 (67.9) .003
African American, n (%) 8 (10.3) 169 (23.6) .011
Hispanic (any race), y (%) 10 (12.8) 133 (18.6) .273
Education
13-16 y, n (%) 41 (52.6) 313 (43.7) .166
9-12 y, n (%) 34 (43.6) 347 (48.5) .492
Married/living with partner, n (%) 47 (60.3) 400 (55.9) .525
Never married, living with partner, n (%) 14 (17.9) 118 (16.5) .860
Never married, not living with partner, n (%) 12 (15.4) 174 (24.3) .105

There is a significant difference in races between the 2 groups. All gene-association tests are adjusted for principal components to account for racial differences.

Esplin. Cluster analysis of SPTB phenotypes. Am J Obstet Gynecol 2015 .


Table 3

The raw values for each of the variables that was assessed in each cluster

























































































































































































































































































































































































Variable Cluster, % a
1 2 3 4 5
Married, live together 40.2 48.6 62.5 54.0 78.3
Never married, living together 26.3 19.0 16.7 25.4 7.5
Never married, not living together 28.1 29.9 14.2 15.9 10.4
Education
9-12 y 54.8 53.1 40.8 50.8 34.0
13-16 y 30.3 43.2 55.8 30.2 60.4
Race
Black 19.1 44.9 9.2 9.5 0.9
White 68.8 49.3 85.0 74.6 92.5
Hispanic (any race) 31.7 5.8 11.7 46.0 9.4
Nulliparity 45.4 51.0 49.2 36.5 44.3
Infection
1 3.8 2.0 6.7 0.0 50.9
2 6.3 5.8 10.0 9.5 32.1
3 27.2 25.5 25.0 20.6 30.2
Any 33.3 31.0 38.3 28.6 78.3
Maternal stress
2 44.7 5.1 47.5 34.9 32.1
3 63.1 25.9 25.0 42.9 15.1
Any 80.7 29.9 49.2 65.1 41.5
Hemorrhage
1 0.4 0.0 0.0 0.0 0.9
2 12.8 6.1 11.7 6.3 53.8
3 20.0 11.2 29.2 17.5 32.1
Any 30.3 15.3 37.5 23.8 71.7
Distension
2 26.1 3.4 16.7 22.2 14.2
3 1.6 4.4 5.8 4.8 13.2
Any 26.7 7.8 22.5 25.4 26.4
Cervical
1 4.0 10.2 7.5 4.8 6.6
2 2.9 7.1 6.7 9.5 2.8
3 1.6 1.4 3.3 3.2 4.7
Any 7.4 16.0 15.0 14.3 10.4
Placental dysfunction
1 0.7 0.3 4.2 3.2 23.6
2 2.5 1.4 7.5 4.8 42.5
3 3.1 0.7 3.3 0.0 33.0
Any 4.9 2.0 9.2 4.8 61.3
Maternal comorbidity
1 1.1 7.1 4.2 81.0 5.7
2 10.3 13.9 21.7 76.2 20.8
Any 10.6 18.4 23.3 100.0 24.5
Familial
1 13.9 21.4 60.8 17.5 17.0
2 10.8 13.9 35.8 4.8 7.5
Any 23.4 29.9 84.2 22.2 22.6
Preterm premature rupture of membranes
1 12.1 34.4 8.3 30.2 25.5
2 13.3 19.0 12.5 7.9 6.6
3 3.6 6.8 3.3 3.2 6.6
Any 26.5 53.7 22.5 38.1 34.0

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 6, 2017 | Posted by in GYNECOLOGY | Comments Off on Cluster analysis of spontaneous preterm birth phenotypes identifies potential associations among preterm birth mechanisms

Full access? Get Clinical Tree

Get Clinical Tree app for offline access