Whole exome and whole genome sequencing

Whole exome and whole genome sequencing


Mary E. Norton


Introduction


Historically, prenatal diagnosis has focused on detection of chromosomal abnormalities, particularly trisomy 21. Chromosomal microarray analysis (CMA) now provides higher-resolution scanning of the genome, such that more cytogenetic abnormalities can be detected (1). With advances in DNA technology, particularly since the completion of the human genome project, an increasing number of single gene disorders have become amenable to genetic diagnosis. Yet even with these available tests, for the majority of fetuses with sonographic abnormalities, the cause and potential associated abnormalities are not detected until after birth. Recently, next-generation sequencing (NGS) has been introduced and provides the ability to screen for a much larger number of conditions when a genetic diagnosis is suspected, but a precise disorder is not evident.


While sequencing of the entire genome (whole genome sequencing [WGS]) is feasible, exome sequencing (ES) is less costly and provides a more manageable amount of data to interpret. ES allows assessment of the protein coding regions of the genome (which include most of the known disease-causing variants), and therefore offers broad diagnostic capability for fetuses with sonographic abnormalities. Unlike targeted genetic tests, ES does not utilize assays designed to detect a single gene or gene variant, but rather sequences the whole exome in one reaction and then targets the analysis as appropriate. Fetal exome sequencing (ES) has now been reported in multiple published series (211) (Table 37.1). However, the utility of fetal ES is limited by the turnaround time, which can be lengthy, as well as the limited phenotypic data available through prenatal imaging, and the limited fetal-specific phenotypic data that are currently available in the genomic databases that are used to curate fetal variants.



Definitions of whole genome sequencing, exome sequencing, and gene panels


NGS tests typically target either a panel of selected genes, the exome (the protein-coding genes that make up 1%–2% of the genome), or the entire genome. Gene panels are usually used to target sets of genes that are associated with a specific clinical phenotype, such as a skeletal dysplasia or cleft lip and/or palate. ES, in contrast, targets the approximately 22,000 protein-coding genes, and WGS is untargeted and includes sequencing data that include regulatory, intronic, and intergenic regions. The significance of sequence variants from these regions is less well understood, and at the present time, WGS is less commonly utilized in clinical practice. Gene panels are increasingly used in prenatal diagnosis and are designed to maximize coverage, sensitivity, and specificity for the included genes. For this reason, gene panels often have higher diagnostic rates than ES or genome sequencing, although diagnostic rates vary among different panels.


Clinical ES targets the approximately 20,000 known protein-coding genes. Not all of these genes are associated with known human diseases, and a smaller number of disease-associated genes (4,000–5,000) comprise the “clinical exome,”and in some laboratory settings only these selected genes are evaluated. It is important to appreciate that ES, by definition, will not detect variants in noncoding regions and also does not detect most copy number variants, inversions, and other structural rearrangements. Therefore, ES is not a “stand-alone” test.


In contrast to panels and ES, clinical WGS is untargeted and generates sequence data that include regulatory, intronic, and intergenic regions. Compared to gene panel or ES, genome sequencing provides a substantially larger quantity of information. Although WGS is also not a completely comprehensive genomic test, emerging analytic approaches can use genome sequencing to detect structural and copy number variants, as well as expansion of short nucleotide repeats associated with disease; these are generally not detectable with ES. However, bioinformatics tools for genome sequencing are less developed than those available for ES, and the cost of genome sequencing remains high, partly because of the cost of data management and analysis. As NGS technologies advance, it is likely that ES will be replaced by WGS, but at present, most clinical testing utilizes ES.


Indications


Most reported series of prenatal ES have studied fetuses with structural anomalies detected by ultrasound (211). Depending on the precise anomaly, or anomalies, some fetal structural anomalies are not isolated but rather are associated with Mendelian syndromes caused by single gene disorders. Given the limited phenotypic information that can be obtained by prenatal imaging, many series include fetuses with one or more anomalies without an obvious cause, usually after a normal CMA. The diagnostic yield varies markedly, from about 5% to 50%, and depends in large part on the stringency of the indications (211). The routine use of prenatal sequencing as a diagnostic test is not currently recommended due to insufficient validation data and knowledge about its benefits and pitfalls (12).


A recent position paper of multiple professional societies, including the International Society for Prenatal Diagnosis (ISPD), the Perinatal Quality Foundation (PQF), and the Society for Maternal-Fetal Medicine (SMFM), describes the following indications for prenatal ES: (1) a current pregnancy with a fetus with a single major anomaly or with multiple organ system anomalies suggestive of a possible genetic etiology, but no genetic diagnosis after CMA; (2) in select situations with no CMA result, following a multidisciplinary review and consensus, in which there is a fetus with features that strongly suggests a single gene disorder; (3) for a couple with a history of a prior undiagnosed fetus (or child) affected with an anomaly or anomalies suggestive of a genetic etiology, and a recurrence of similar anomalies in the current pregnancy without a genetic diagnosis after karyotype or CMA. In addition, when such parents present for preconception counseling and no sample is available from the affected proband, or if a fetal sample cannot be obtained in an ongoing pregnancy, it is considered appropriate to offer sequencing for both biological parents to look for shared carrier status for autosomal recessive mutations that might explain the fetal phenotype. However, where possible, obtaining tissue from a previous abnormal fetus or child for ES is preferable; and (4) for families with a history of recurrent stillbirths of unknown etiology after karyotype and/or CMA, where the fetus in the current pregnancy has a recurrent pattern of anomalies (12).


In most circumstances, interpretation of ES data requires comparison of phenotype findings with the variants that are identified, to determine whether any variants found have been associated with similar features previously. In most cases, if there is no phenotypic overlap, the variants are assumed to not be causative. In part for this reason, there is currently no evidence to support routine prenatal ES in the absence of ultrasound findings.


With decreases in cost, NGS is increasingly being used for the clinical assessment of children and adults suspected to have genetic disorders. While clinical genomic testing of asymptomatic healthy individuals for purposes of disease prediction or risk stratification is not yet supported by evidence, a handful of ongoing studies seek to address the cost and efficacy of predictive genomic testing in both children and adults. In addition to regulatory, ethical, and legal considerations, frameworks from professional bodies have been constructed to guide experienced clinicians in their interpretation and reporting of genomic information (13,14).


Sequencing technique


At the time of the human genome project, scientists used Sanger sequencing, which was expensive and time-consuming. Subsequent efforts to reduce the cost of DNA sequencing have led to the development of the significantly cheaper technologies that are referred to as NGS (15).


In next-generation genomic sequencing, the DNA is cut into small fragments of approximately 1,000–10,000 base pairs (bp); 50–250 bp from each end of the fragment are the “read.” Each read is paired with the read from the opposite end of the fragment; these are termed “paired-end” reads. The pool of fragments is called the “sequencing library.” With WGS, the entire genome is sequenced, while with ES, only the 1%–2% of the genes that include the coding exons are sequenced. With ES, the DNA fragments that overlap with exons and their flanking introns are purified from the entire library. A similar methodology is used for high-throughput sequencing with disease-specific multigene panels.


The sequencing library is then immobilized on a solid surface, amplified in clusters, denatured, and then sequenced by synthesis of a new complementary strand. In this process, each nucleotide (A, C, G, T) that is incorporated has a different fluorescent tag, so that its insertion at a specific location in the sequenced fragment can be recorded (1618). This is repeated multiple times in “massively parallel sequencing” for each nucleotide present in the overlapping fragments. The obtained sequence reads are then aligned using bioinformatics tools to generate a consensus sequence that is compared to the human reference sequence (1617).


The results from the sequence alignment are visualized as the original sequencing reads ordered by their best match to a position on the reference sequence. Software tools are used to identify differences from the reference genome, and a statistical hypothesis is computed to determine whether any mismatch represents a true genetic difference, or some type of error. Given that there are many regions of the human genome that share identical or highly similar sequences, errors in alignment are inherent in the process of short read mapping. Additional alignment errors may occur due to errors in the NGS data, or due to differences in the human reference sequence relative to the sequence data that are undergoing alignment. A sequencing test or run should ideally cover each nucleotide in the human genome 30 times. Given that there are 3 billion base pairs in the human genome, each run produces approximately 90 billion pieces of data, and interpretation of this tremendous quantity of data is highly complex.


Two parameters that are used to describe the quality of an obtained sequence are the depth, which refers to the number of overlapping reads for each base pair, and the sequence coverage, which refers to the fraction of the sequence that is covered at sufficient depth. The American College of Medical Genetics and Genomics (ACMG) recommends that for diagnostic ES, 90%–95% of the sequence should be covered at least 10-fold and that the average depth should be 100-fold (19).


Interpretation


The final step in genomic sequencing is to determine whether any observed genetic differences are likely to change the function of the protein and cause disease or predisposition to disease, and each genome sequencing run requires substantial processing and interpretation. Variation in the genome is ubiquitous, and a typical exome sequence identifies approximately 40,000 sequence variants, while a genome sequence identifies approximately 3 million variants that differ from the human genome reference (17).


After sequencing has been performed, “variant-calling” using bioinformatic algorithms for detecting genetic differences is used to detect mismatches between a reference sequence and the mapped reads. Such mismatches can arise from the presence of a true variant but can also arise due to errors in sequencing chemistry, biases in the reference sequence related to differences in the genetic background between the mapped reads and individuals originally contributing to the human reference sequence, or errors in the alignment process (20,21). Importantly, the reference sequence comes from a group of individuals from different ethnic backgrounds, which introduces complications in the current methodologies for genome sequencing (21,22). Filtering of variants using statistical models is used to assign a likelihood that a detected mismatch represents a true genotype. Molecular diagnostic laboratories use sophisticated computer algorithms to filter out large numbers of variants that are not causal of disease, yielding a small subset that is then more closely assessed for potential pathogenicity (14,21). Regardless of which analytical framework is applied, all variant filtering balances sensitivity and specificity, and removes false variants at the cost of excluding real genetic differences (23).


Variant classification


Variant annotation is the process of determining the potential effects of a variant on the function of one or more genes and an assessment of the likelihood that the phenotype is due to the affected gene or genes. NGS generates thousands of sequence variants, and these must be filtered and prioritized for clinical interpretation. The annotation process enriches for rare variants, which are more likely to be pathogenic, and eliminates common variants, which are more likely to be benign, and also predicts functional effect. Annotation tools include information about genetic variants, such as the presence of the variant in population databases, evolutionary conservation of the variant among different species, and the genomic structure where the variant is located. Large-scale genomic sequencing databases, such as the Genome Aggregation Database (gnomAD), can be used to distinguish common and rare variants in the population. Databases of previously assessed variants, such as ClinVar, have been established to collect and distribute information about previously interpreted variants (24).


Other considerations in variant annotation include the strength of an association of the variant with the disease and with the phenotype of the patient; the possibility of phenotypic heterogeneity must always be considered. In addition to the clinical databases discussed previously, a number of other matching databases, such as Gene-Matcher (https://genematcher.org/), DECIPHER (https://decipher.sanger.ac.uk/), and Phenome-Central (https://www.phenomecentral.org/), can help to identify matching cases with the use of de-identified data, such as gene names or disease features. These tools are publicly available and do not require computational expertise.


Sequencing results


The data obtained through variant annotation are incorporated into a clinical interpretation. In a clinical setting, such interpretation also relies on the expertise of the clinicians involved, and the medical context in which the test was performed. Standard terminology recommended by the ACMG describes variants in genes as pathogenic, likely pathogenic, likely benign, benign, and variant of unknown significance. The ACMG recommendations also describe the process for classifying variants into these five categories based on criteria using typical types of variant evidence (e.g., population data, computational data, functional data, segregation data) (14). Currently, in most molecular diagnostic laboratories, evaluation of pathogenicity is based on application of a categorical system of 28 criteria that are combined to estimate the probability of pathogenicity. If the probability that a variant is pathogenic is greater than 99%, it is considered pathogenic; whereas, if the probability is between 90% and 99%, the variant is classified as likely pathogenic. If the evidence indicates that the probability of pathogenicity is less than 90%, but the findings do not clearly prove that the variant is benign and without health consequences, the variant is termed a variant of uncertain significance (VUS) (17).


Given the recent introduction of these technologies, many variants are currently of unknown significance, although it is likely that with time, their pathogenicity will be clarified. Data-sharing efforts by researchers and providers of clinical genetic testing are increasing the body of knowledge for specific genes and variants, information that will advance the process of interpreting genetic variation in an informed clinical context. The ISPD, SMFM, and PQF endorse the position of the ACMG that laboratory and clinical genomic data sharing is crucial for genetic healthcare (12).


Because of the complexity of analysis and interpretation of clinical genetic testing, the ACMG strongly recommends that clinical molecular genetic testing be performed in a laboratory approved by the Clinical Laboratory Improvement Amendments, with results interpreted by a board-certified clinical molecular geneticist or molecular genetic pathologist or the equivalent (14).


Secondary and incidental findings


In addition to potentially determining the genetic cause of a structural anomaly or other disorder, whole genome or ES results may identify risk variants for genetic diseases that are unrelated to the phenotype being investigated; these results have been referred to as secondary, incidental, or medically actionable findings (14,25). The ACMG recommends that 59 medically actionable genes should be assessed and reported as incidental secondary findings. The goal of such reporting is to identify and manage risks for selected highly penetrant genetic disorders—many of these are cardiovascular or cancer predisposition genes, through established interventions aimed at preventing or significantly reducing morbidity and mortality (26,27). The reported incidence rate of pathogenic (P) or likely pathogenic (LP) variants for such actionable genes has varied, from 1% in Africans (28) to 2.5% in East Asians (29) to 3.3% in Caucasians (30). In prenatal series, the rate of secondary findings is reported from 1.6% to 6.1%, although data are limited from such reports (4,9).


In part due to the unique ethical challenges of prenatal diagnosis, the ACMG recommendations regarding the reporting of secondary findings do not address sequencing done in a prenatal diagnosis setting. Rather, the recommendations note that “This evaluation and reporting should be performed for all clinical germline (constitutional) exome and genome sequencing… in all subjects, irrespective of age but excluding fetal samples” (27). ACMG does not specifically recommend against reporting secondary findings in prenatal series, rather, they are agnostic and leave the subject unresolved. The guidelines state: “Similarly, these recommendations address incidental findings sought and reported during clinical sequencing for a specific clinical indication but do not address preconception sequencing, prenatal sequencing, newborn sequencing, or sequencing of healthy children and adults” (27).


The consideration and return of other results, such as carrier status for recessive diseases, risk-modifying variants, and pharmacogenomic variants, are less standardized.


Whole genome sequencing, exome sequencing, and gene panels


NGS can be applied to the entire genome (WGS), to all of the protein-coding genes (ES), or to gene panels that include a limited number of selected genes. Panels are often used in the context of a specific suspected disease or group of diseases, such as skeletal dysplasias, and are designed to maximize coverage, sensitivity, and specificity for the included genes. Therefore, gene panels often have higher diagnostic rates than ES or genome sequencing, although diagnostic rates vary among gene panels.


Clinical ES targets the approximately 20,000 known, protein-coding genes. Not all of these genes are associated with known human diseases, and a smaller number of disease-associated genes (4,000–5,000) are termed “clinical exomes,” and in some laboratory settings only these selected genes are evaluated. It is important to appreciate that ES, by definition, will not detect variants in noncoding regions, and also does not detect certain types of variants, such as copy number variants, inversions, and other structural rearrangements. Therefore, ES is not a “stand-alone” test.


In contrast to panels and ES, clinical WGS is untargeted and generates sequence data that include regulatory, intronic, and intergenic regions. Compared to gene panels or ES, genome sequencing provides a substantially larger quantity of information. While WGS is also not a completely comprehensive genomic test, emerging analytic approaches can use genome sequencing to detect structural variants and expansion of short nucleotide repeats associated with disease; these are generally not detectable with ES. However, bioinformatics tools for genome sequencing are less developed than those available for ES, and the cost of genome sequencing remains high, partly because of the cost of data management and analysis. As NGS technologies advance, it is likely that ES will be replaced by WGS, but at present, most clinical testing utilizes ES.


In summary, for patients in whom a specific diagnosis, or category of disorders, is considered, a gene panel may be the optimal approach. When there is more diagnostic uncertainty, data suggest that ES may have a higher diagnostic rate (31). And in patients for whom ES is nondiagnostic, WGS has been reported to provide additional diagnostic yield, although experience with WGS in prenatal diagnosis is extremely limited.


Clinical experience in adults and children


When clinical genome and ES are used in patients with suspected genetic disorders but no diagnosis, testing reveals a molecular diagnosis that is thought to be explanatory in 25%–52% (2,3234). The diagnostic rate is highly dependent on the tested population, the availability of additional family members, and the definition of a high-likelihood diagnosis; rates of up to 60% have been reported in selected disease cohorts (35). Diagnostic sensitivity may also differ according to the affected organ system (34). The significant number of residual unexplained cases suggests that new genetic disorders are yet to be discovered and characterized. Potential biologic mechanisms for these disorders include new Mendelian disorders, gene interactions, epigenetic and regulatory mechanisms, uncaptured genetic variation (such as copy-number variation), and environmental contributions.


Prenatal genome sequencing


Clinical interpretation and utility


Clinicians ordering and interpreting genomic sequence tests should appreciate that a genetic test that reports a pathogenic variant is not equivalent to diagnosing the patient with the associated disorder. Rather, the clinician should integrate the genetic test result and the clinical characteristics and family history of the patient to arrive at a clinical-molecular diagnosis. A genetic finding is not an infallible predictive tool; rather, it can help provide evidence for or against conditions that might be causing the phenotype of concern.


In a prenatal setting, ES is most commonly used for evaluation of fetuses for whom standard diagnostic genetic testing, such as CMA, has already been performed and is uninformative. In some cases, when a Mendelian disorder is strongly suspected, ES may be offered concurrently with CMA according to accepted practice guidelines (12). It is important to recognize that ES does not detect all types of variants, including copy number variants, inversions, and other structural rearrangements. Therefore, it is important to continue to use existing cytogenetic techniques in conjunction with ES to ensure maximum variant detection.


In prenatal ES, the highest diagnostic yields, and the fastest turnaround time, are accomplished by trio sequencing, in which the fetus, mother, and father are sequenced at the same time, rather than sequencing the fetus first and only subsequently testing the parents for any candidate variants identified. If proband-only sequencing is performed, validation of diagnostic or potentially diagnostic findings best includes a determination of inheritance through targeted testing of samples from biological parents.


Interpretation of ES results, including in the prenatal setting, requires considering the phenotype and consistency with prior reports on identified variants. The prenatal phenotype is typically limited when compared with that available in a neonate or infant. Many important features simply cannot be detected prenatally, such as intellectual disability, seizures, and other neurologic findings. Additional clinical features are often only recognized after birth, and this should prompt consideration of reanalysis of ES interpretation. In a study of 20 fetuses who underwent prenatal ES, none had diagnoses identified prenatally. However, after birth, additional findings resulted in reinterpretation such that a variant thought to be causative was identified in four cases, for a detection rate of 20% (36).


As this field is evolving so quickly, reanalysis over time often results in reinterpretation. Laboratories should have protocols for reanalysis, and this should be discussed with patients. Some laboratories may routinely consider reinterpretation of results after a specified period of time. In addition, patients who have undergone prenatal ES and are considering another pregnancy should be encouraged to be seen for a preconception visit prior to the next pregnancy. At that time, reinterpretation of ES results can be requested to determine if any identified variants have been reclassified.


A number of series of prenatal ES cases have been reported (see Table 37.2). The largest of these included 196 patients (9); as of this date, most series are quite small. The diagnostic yield in such series ranges from as low as 10% to as high as 57% (5,10); the highest yields are associated with multiple anomalies, and recurrent anomalies highly suspected to be genetic. Lower yields are reported when ES is performed in pregnancies with a single structural anomaly identified by ultrasound, as most of these are multifactorial conditions rather than Mendelian disorders caused by variants in a single gene.


Stay updated, free articles. Join our Telegram channel

May 10, 2020 | Posted by in GYNECOLOGY | Comments Off on Whole exome and whole genome sequencing

Full access? Get Clinical Tree

Get Clinical Tree app for offline access