Objective
Mitochondrial DNA (mtDNA) encodes the proteins of the electron transfer chain to produce adenosine triphosphate through oxidative phosphorylation, and is essential to sustain life. mtDNA is unique from the nuclear genome in so much as it is solely maternally inherited (non-mendelian patterning), and shows a relatively high rate of mutation due to the absence of error checking capacity. While it is generally assumed that most new mutations accumulate through the process of heteroplasmy, it is unknown whether mutations initiated in the mother are inherited, occur in utero, or occur and accumulate early in life. The purpose of this study is to examine the maternally heritable and de novo mutation rate in the fetal mtDNA through high-fidelity sequencing from a large population-based cohort.
Study Design
Samples were obtained from 90 matched maternal (blood) and fetal (placental) pairs. In addition, a smaller cohort (n = 5) of maternal (blood), fetal (placental), and neonatal (cord blood) trios were subjected to DNA extraction and shotgun sequencing. The whole genome was sequenced on the Illumina HiSeq platform (Illumina Inc., San Diego, CA), and haplogroups and mtDNA variants were identified through mapping to reference mitochondrial genomes (NC_012920).
Results
We observed 665 single nucleotide polymorphisms and 82 insertions-deletions variants identified in the cohort at large. We achieved high sequencing depth of the mtDNA to an average depth of 65X (range, 20–171X) coverage. The proportions of haplogroups identified in the cohort are consistent with the patient’s self-identified ethnicity (>90% Hispanic), and all maternal-fetal pairs mapped to the identical haplogroup. Only variants from samples with average depth >20X and allele frequency >1% were included for further analysis. While the majority of the maternal-fetal pairs (>90%) demonstrated identical variants at the single nucleotide level, we observed rare mitochondrial single nucleotide polymorphism discordance between maternal and fetal mitochondrial genomes.
Conclusion
In this first in-depth sequencing analysis of mtDNA from maternal-fetal pairs at the time of birth, a low rate of de novo mutations appears in the fetal mitochondrial genome. This implies that these mutations likely arise from the maternal heteroplasmic pool (eg, in the oocyte), and accumulate later in the offspring’s life. These findings have key implications for both the occurrence and screening for mitochondrial disorders.
The maternally transmitted mitochondria are the most abundant organelles in the human oocyte and early embryo, and serve as the primary generator of cellular energy and orchestrator of cellular metabolism via oxidative adenosine triphosphate production. Mitochondrial integrity is crucial to not only oxidative phosphorylation and generation of adenosine triphosphate, but plays a role in lipid and amino acid metabolism, cell proliferation, differentiation, and apoptosis. Mitochondria harbor their own 16.5-kilobase circular, double-stranded DNA genomes (mitochondrial DNA [mtDNA]), which contain a mere 37 genes encoding for 13 polypeptides for oxidative phosphorylation, as well as 22 transfer RNAs, and 2 rRNAs necessary for mitochondrial protein synthesis. Despite its diminutive size, mtDNA is of major evolutionary and functional importance as it is a legacy of the endosymbiosis from prokaryotes that likely created the eukaryotic cell >1.5 billion years ago. Although the vast majority of proteins involved in mitochondrial function are encoded by the nuclear genome, the 13 mtDNA encoded proteins are crucial for human life. Underscoring its role in human health, >200 mtDNA point mutations have been associated with a variety of human diseases ( http://www.mitomap.org ), including diabetes, cancer, and neurodegenerative disorders. With respect to perinatal disorders, there are initial studies suggesting a role for nonsynonymous (protein coding) mtDNA mutations in preeclampsia, oocyte wastage and early embryonic demise, and preterm birth. We have recently demonstrated that mtDNA variation of the human host significantly influences the structure and community members of the vaginal and gut human microbiome. Despite its evident importance in both cellular function and human disease, genomic variation studies of mtDNA have largely been relatively underrepresented when compared to nuclear DNA analyses.
The mitochondrial genome is inherited in a non-mendelian manner, vertically transmitted solely from the mitochondria of the oocyte. This so-called “maternal inheritance” pattern, whereby a mother carrying a mtDNA mutation will pass it on to all of her children regardless of sex, but only her daughters will transmit it to their progeny, has been the clinical hallmark of mitochondrial disorders. Since each cell contains varying numbers of mtDNA copies, some mtDNA genomes in any given cell or tissue are normal, while others contain mutations (defined as heteroplasmy, or intraindividual variation in mtDNA sequences). Homoplasmic mtDNA pathogenic mutations, when nearly 100% of mtDNA genomes in a given cell, tissue, or organism are severely variant, happens rarely, and are thought to result in fetal or neonatal lethality or organ-specific mitochondrial disease (eg, Leber hereditary optic neuropathy). Conversely, low level (1-10%) heteroplasmy is believed to be of little clinical significance and may actually represent a reservoir of genetic mtDNA variants that can increase the functional capacity in response to aging and environmental stressors. The degree of mtDNA mutation has been reported to vary between generations, although it remains unclear the degree to which bottlenecking during oogenesis and accumulated lifetime mutation rates contribute to generational differences in the mitochondrial genomic sequence. While a recent study by Rebolledo-Jaramillo et al estimated the rate of heteroplasmy in 39 maternal-child pairs with known mutations in 1 in 8 subjects, an accurate estimate of the mtDNA mutation rate at the time of birth from a healthy and nondiseased population is still lacking.
In this study, we investigated the maternally inheritable and de novo mutation frequency in fetal mtDNA from a relatively large population-based cohort composed of 90 maternal (blood)-fetal (placental) pairs as detected by NextGen sequencing technology (Illumina HiSeq platform; Illumina Inc., San Diego, CA). Because it was a formal possibility that neonatal cord blood and fetal (placental) detection of mutations may differ, we additionally sequenced 5 trios composed of maternal (blood), fetal (placental), and neonatal (cord blood) mitochondrial genomes. With derived high-fidelity and high-depth sequence data we determined: (1) 665 single nucleotide polymorphisms (SNP) and 82 insertions-deletions (indel) variants in the cohort at large; (2) the rate of haplogroup, SNP, and indel variation between maternal-fetal pairs; (3) the rate of mitochondrial SNP (mtSNP) discordance between maternal and fetal mitochondrial genomes; and (4) an estimate of the mtDNA de novo mutation rate at the time of birth. These findings have the potential to not only inform human developmental and perinatal genomics, but also underscore the rarity with which de novo mitochondrial disease would be anticipated in the newborn and thus provides subsequent insights into mtDNA disease pathogenesis.
Materials and Methods
Study population
The index study was a prospective, observational single-center longitudinal cohort study of term and preterm births conducted in the Harris County Hospital system (August 2011 through July 2014). The Institutional Review Board for Baylor College of Medicine and Affiliated Hospitals reviewed the study design and protocol, and approved the study along with the consent form (H-27393 and H-26589). After selection, informed consent was obtained. A 9-page consent document including background information, purpose, and procedures in detail was reviewed with the subject by a trained clinical registered nurse assigned to the study trial. Inclusion criteria included ability to sign informed consent, willing to provide blood samples, confirmation of singleton gestation, and estimated due date established by ultrasound ≤12 and 0/7 weeks or by last menstrual period consistent with ultrasound ≤14 and 0/7 weeks. Exclusion criteria included the following: use of vaginal or vulvar medications in the past; twin gestation; presence of acute medical illness (including preeclampsia, fever, gestational diabetes, type 2 diabetes); chronic disease (including pulmonary, cardiovascular, gastrointestinal, hepatic, or renal disease); maternal history of cancer; positive maternal hepatitis C virus, human immunodeficiency virus, or hepatitis B virus (confirmed by immunoblot or molecular testing); history of major gastrointestinal surgery except appendectomy or cholecystectomy in the last 5 years; confirmed or suspected condition of immunosuppression; history of uncontrolled gastrointestinal disorders (including inflammatory bowel disease; ulcerative colitis; Crohn’s disease; irritable bowel syndrome; persistent, infectious gastroenteritis; colitis; gastritis; persistent or chronic diarrhea; Clostridium difficile infection; untreated Helicobacter pylori infection; or chronic constipation); urinary incontinence with use of incontinence protection garments; condyloma or human papillomavirus diagnosed within the previous 2 years; treatment for or suspicion of having toxic shock syndrome; history of candidiasis; urinary tract infection; active sexually transmitted disease within the previous 2 months; history of dysplasia (vulvar, vaginal, or cervical) within the last 5 years; or history of recurrent rash in the past 6 months (including psoriasis or recurrent eczema).
Sample size estimation
A cohort size of 90 maternal fetal pairs (180 samples) and 5 trios (15 samples) was based on published observations of others employing 39 maternal offspring pairs, of which 1 in 8 carriers of disease-associated mtDNA mutations were found. In this study, multiplexing of samples to 12/run on the MiSeq resulted in 10 6 read pairs per sample.
Biologic sample attainment
Samples were collected under institutional review board–approved consent from subjects in the form of maternal blood, placental tissue, and fetal cord blood if available. Samples were flash frozen, or placed in EDTA, heparinized, or PureGene (Qiagen, Mansfield, MA) collection tubes (as appropriate) and labeled by study identification number and by maternal, cord, or placental source, respectively.
Whole genome sequencing
DNA from each sample was extracted using Qiagen Gentra Purgene (Qiagen, Mansfield, MA) following manufacturer’s protocols. Ilumina NextGen sequencing platform was utilized to perform paired-end whole genome sequencing on all samples as previously described and detailed.
Variant calling on assembled mitochondrial genomes
Sequence alignment, quality control, and variant calling were performed with BWA, SAMTools, and Altlas2 as previously described. Prior to variant calling, the mtDNA sequence from each sample was aligned by BWA with Cambridge Reference Sequence (NC_012920), an established GenBank sequence. SAMtools was used to remove duplicate reads, convert, sort, and index the aligned data files. Atlas2 suite was adopted for variant calling to generate vcf files. Atlas2 suite employs logistic regression models in conjunction with adjustable cutoffs for accurate separation of true SNPs and indels with sequencing errors. This combined with the probability score and minimal heuristic filters enabled us to filter out sequencing errors, and meanwhile generate highly accurate variants calls. As such, default parameters were used for variant analysis. Sequence variations found in both maternal and fetal samples were scored as germline variations. Any DNA sequence differences beyond anticipated by sequence and platform read between maternal and placental samples were scored as de novo mtDNA mutation. Each was then checked against the MITOMAP ( www.mitomap.org ) database and mtDB ( www.genpat.uu.se/mtDB ) for frequencies in general population. SNPs with >1% allele frequency were considered common.
Haplogroup analysis
HaploGrep was used to define the haplogroup of each individual. HaploGrep is a World Wide Web tool using the latest version of Phylotree as reference to determine haplogroup. It is reported to provide greater accuracy for assignment of haplogroup using NextGen Sequencing reads.
Results
Patient population
There are 90 maternal-fetal pairs in the cohort providing adequate power for analysis of de novo mutation estimates ( Table 1 ). The maternal serum sample and newborn’s placenta tissue sample were collected from each pair. In addition, the fetal cord blood sample was additionally collected for 5 individuals, to yield in total 195 samples for analysis from the time of birth. This constitutes the largest maternal-offspring cohort analyzed to date, and the only one with subject samples acquired immediately at the time of birth. Consistent with the overall study design aimed at a term and preterm comparison, the mean gestational age was 34.05 weeks (SD 4.06 weeks) ( Table 1 ). Consistent with the largely Latino patient demographic of Harris County, TX, the majority of subjects self-identified as Hispanic ethnicity, and these findings were entirely supported by the haplogroup assignment as described further in the text.
Characteristic | Mean (SD) | Range |
---|---|---|
Maternal characteristics | ||
BMI, kg/m 2 | 29.56 (6.14) | 17–45 |
Height, in | 62.1 (2.9) | 56–70 |
Weight, lb | 163.18 (39.4) | 97–308 |
Age, y | 29.32 (7) | 17–45 |
Infant characteristics | ||
Gestational age, wk | 34.05 (4.06) | 24–41 |
Weight, g | 2499.25 (873.2) | 635–4310 |
Variant calling and mitochondrial haplogroup assignment
The mtSNPs and indels were identified as described in “Materials and Methods” section. We achieved high sequencing depth of the mitochondrial genome to on average depth 65X (compared with 12X depth of sequencing for most human genome sequencing), which enabled accurate variant calling. The increasing sequencing depth is independent from the number of variants found in each sample, further supporting our confidence in accuracy of variant calls ( Figure 1 ). To reduce risk of error and maximize detection of potential significant heteroplasmy, only samples with >20X sequencing depth were selected for further analysis employing the Atlas2 suite was used for variant calling (on default parameters). Haplogroup was assigned based on a combination of specific variants (HaploGrep). As shown in Figure 2 , the majority of subjects were sequenced assigned to haplogroups A, B, C, and D, which are unique to indigenous peoples of the Americas. These results are consistent with the patient’s self-identified ethnicity (Hispanic), global distribution (southern Texas, with recent migration patterns from Mexico, and Central and South America) as well as migration trends of haplogroups. Of note, we observed 100% identical haplogroup assignments between each subject’s maternal blood and the identified fetal placental sample, further supporting the methodology of our approach and validity of sequencing.
SNP analysis within maternal (blood), fetal (placental), and fetal (cord blood) specimens
The initial 5 sets of subject trios were analyzed for concordance of identified SNPs among maternal blood, cord blood, and placental tissues ( Figure 3 ). The mtDNA variants discovered among each member of the trio were nearly identical, except for several de novo variants in the D-loop promoter region and COXII. The D-loop is in a noncoding and hypervariable region, and thus changes that occur here do not cause amino acid substitutions or codon disruptions. The mtSNPs identified between placental and cord blood specimens were nearly identical for most sets as well; therefore, we limited all subsequent sequencing to maternal blood and fetal (placental) tissue.