In 2003, the sequencing of the human genome propelled the field of genetics in two important directions. Firstly, the concept of “genomics” emerged, which includes both the DNA sequence itself (the blueprint letters that direct genetic contributions to health, variation, and disease) and the response of the DNA to modifiers that alter gene expression (such as the environment or other genes). Additionally, genomics encompasses the genetic applications used for diagnostic and therapeutic medical decision-making and the role of genetics in policy development. Secondly, advances in molecular technologies also arose from the sequencing of the human genome. These tools continue to develop with refinements that expand their application, provide faster time to results, and require diminishing amounts of DNA at a dramatically reduced cost. These molecular tools increase our knowledge of basic biology as well as contribute to the identification and treatment of genetic diseases.
As genetic investigation accelerates, a previously unappreciated degree of DNA sequence variability between humans is being recognized. This variation modulates an individual’s response to his/her environment such as the differential responses to medications, susceptibility to adult onset disease (such as diabetes and hypertension), and the likelihood of cancer. Even among recognized Mendelian disorders, gene sequencing continues to document extensive individual variability. For example, the gene responsible for classic cystic fibrosis (CF), the cystic fibrosis transmembrane conductance regulator (CFTR) gene, has more than 1000 different variations. In the prenatal arena, examining genes with sequence variations of unknown significance for the developing fetus is an active area of research.
As of 2018, a number of laboratories offer sequencing of an individual’s DNA for less than $1500 and with completion within several months. This is in stark contrast to the $2.7 billion dollars and more than 10 years required to sequence the first human genome. However, although the sequence of nucleotide base pairs within an individual’s DNA can now be determined relatively quickly and inexpensively, an unfathomable amount of information regarding the 3 billion base pairs in an individual’s genome is returned. Any one person’s unique DNA sequence, if written in 12.0-point font, would span from New York to California. Obtaining the DNA sequence is only one part of a much larger endeavor to understand how the person’s genome, or blueprint, impacts cell functioning and ultimately the individual’s appearance and function of various organs (phenotype). The study of DNA has evolved from a bird’s-eye view of the number of chromosomes (karyotype) to the ability to study a specific gene (genotype), to a more granular level of analysis of the entire genome (whole genome sequencing [WGS]). This chapter will review the various single gene alterations that occur, the technologies in use today, and help the reader develop a framework for understanding genetic disease and human variation in the context of genomics.
What Are the Molecular Changes That Occur in Single Genes?
The ability to examine a specific segment of DNA for changes in the nucleic acid base sequence has rapidly evolved over the past 10 years. With lower costs and faster time to results, the detection of single gene disorders has accelerated ( Fig. 6.1 ). As importantly, a growing understanding of the complexity of the path from nucleotide base sequence to RNA to protein to disease is emerging.
An individual’s DNA sequence provides instructions for the orderly development of proteins. Proteins play key roles in the development and function of cells and tissues, maintaining system functions and interacting with the environment. Changes in protein function or quantity can affect these systems. Protein production starts with the molecular reading of the DNA nucleotide bases that contain the genetic line-up (or sequence) that will direct protein development (the exon) as well as the spacing DNA segments that are not incorporated into the protein instructions (the introns). The introns are ultimately removed, leaving the exons and the final template for the protein. This final template then directs the order of the amino acids that together construct a unique protein. Further modifications to the proteins are then sometime incorporated, with changes to the final protein product in ways that impact health, disease, and interaction with the environment.
What Are the Different Types of Pathogenic Variants?
Several types of DNA changes can occur, some of which mean that the protein is absent or no longer functions in the typical fashion. As noted in Table 6.1 , the specifics of an alteration, such as the location and size, can have profoundly different effects. In some cases, small changes such as a single nucleic acid base substitution can result in significant disease if the protein produced is damaged. A “point mutation” is a change in a single base pair that can cause a substitution, deletion, or insertion that leads to a heritable change in the quality and quantity of protein produced. A “nonsense” variant can prematurely stop the template and the protein from developing. Frameshift variants are insertions or deletions in the genome that are not in multiples of three nucleotides and therefore result in a change in the gene’s reading frame. Variants can also be “silent” if they do not produce a change in the encoded amino acid. In a large gene, such as the CFTR gene responsible for CF, all three types—missense, nonsense, and frameshift—can occur.
|Missense||Variant leading to a different amino acid in translation||Sickle cell anemia|
|Nonsense||Variant leading to a stop codon, where there is premature termination of translation||Hemophilia A|
|Frameshift||Base pair insertion or deletion that alters the reading frame of translation||Tay-Sachs|
|Splice site||Base pair change that leads to errors in posttranslation modifications||Adenosine Deaminase deficiency|
|Transition||Variant that changes a purine base for a purine (A and G), or a pyrimidine for a pyrimidine (C and T)||Beta thalassemia|
|Transversion||Variant that changes a purine base (A or G) for a pyrimidine (C or T), or vice versa||Beta thalassemia|
Variants can also occur at the stage of remodeling of the longer template to a shorter version with removal of introns (the nonprotein coding regions) leaving only the exons (protein coding). Splice sites and splice site variants can exist at the sites where the introns are removed. Changes at splice sites can alter the final protein product and in some cases lead to disease. ( Fig. 6.2 ) Introns also play a part in this step and can produce different exon arrangements to make different protein products. In addition, control of the on/off reading of DNA templates is also vulnerable to variation.
The relationship between a genetic variant and the resulting changes in cell development or function, or the “phenotype,” is complex. Similar phenotypes can be produced by different types of variants within the same gene (termed allelic heterogeneity) and even by variants in different genes (termed locus heterogeneity). Additionally, the same gene variant may create different phenotypes (polypheny) depending on other modifying genes or the environment. One common example of a gene in which variants can cause different disorders is the human beta globin gene. Different variants in this gene can cause beta thalassemia, sickle cell disease, or methemoglobinemia with differing clinical severity and presentation.
For a comprehensive description of phenotypes and their genotype correlations, the database Online Mendelian Inheritance in Man (OMIM) is an excellent resource ( www.omim.org/ ).
What Other Changes to the DNA Can Produce Variation and Disease?
Control of the onset and duration of transcription of segments of the DNA is essential to the development and function of cells and tissues. This timing can be controlled in several different ways, such as by the configuration of the DNA in its condensed form, by specialized proteins, or by other mechanisms such as methylation. Methylation of designated sites of the DNA to control function is known as “epigenomics.” Interaction between the epigenome and the environment plays an important role in the activation or silencing of DNA segments. Methylation and silencing of segments of DNA also plays a role in triplet repeat disorders in which a segment of DNA expands over several generations. Examples include fragile X syndrome and Huntington disease (see Chapter 2 ).
How Are Changes in the DNA Identified for Research or in Clinical Testing?
Genotyping typically evaluates a segment of DNA for previously identified variants. This is in contrast to sequencing, which involves identifying each base pair, or nucleotide, in order, across the length of a gene. Numerous methodologies for genotyping exist. Many of the approaches use indirect means such as altered length of DNA segments, inability to bind to a fluorescently labeled probe, or lack of hybridization to known DNA segments arranged on an array.
The ability to screen large populations of individuals for their carrier status of genetic variants known to cause disease is based largely on genotyping. Genotyping for carrier detection can be useful in homogeneous populations such as the Ashkenazi Jewish (AJ). In this population, because of religious and geographic considerations, genetic variants have traditionally been passed from one generation to the next rather than dispersed across larger populations. This has resulted in a higher carrier frequency for specific autosomal recessive conditions because of a limited number of specific variants. Common examples include Tay-Sachs disease, familial dysautonomia, and Canavan disease ( Table 6.2 ).
|Lysosomal storage diseases|
|Gaucher Type 1||AR||1/18|
|Nieman-Pick Type A||AR||1/80|
|Mucolipidosis Type IV||AR||1/50|
|Non-lysosomal storage diseases|
|Fanconi anemia C||AR||1/90|
|Congenital adrenal hyperplasia||AR||1/10|
|Familial nonsyndromic deafness||AR||1/25|
|Glycogen storage disease type 1A||AR||Unknown|
|Factor XI deficiency||AR||1/190|
Genetic screening for Tay-Sachs disease was first initiated among the AJ population, in whom the carrier rate is high (about 1 in 32), and in whom three hexosaminidase A variants are responsible for 98% of the affected cases. This founder effect means that genotyping targeted to these three specific variants is a very sensitive screen for carriers. Among other ethnic groups, different or unknown variants may occur and therefore a biochemical test that measures hexosaminidase enzyme activity may be more effective as this does not depend on knowing the genes that are responsible.
The same principles apply in testing for other genetic disorders in different ethnic and racial populations. For example, screening for cystic fibrosis carriers usually involves testing for 23 known causative variants. These 23 variants are most common in those of Northern European or AJ heritage, and 90% of CF carriers will be identified by screening for these 23 variants. In more diverse populations, such as among all people in the United States, detection is lower (77%). This decrease in detection reflects the lower ability of the 23-variant panel to identify the more disparate CF variants in a genetically diverse population. For this reason, the detection rate for CF is lower in Hispanic whites, African Americans, and Asian Americans in whom different, often unknown, variants cause more cases of CF.
Limitations exist with genetic carrier testing by genotyping. Genotyping will only detect known variants in a gene and the frequency with which specific known variants cause disease varies by racial and ethnic heritage. For example, delta F508 is the most common cystic fibrosis variant in Northern Europeans (66%), whereas in African Americans, it represents only 48% of the CFTR variants. Overall, 23% of African Americans have CFTR variants not commonly seen in those with Northern European heritage. With increasing cross-cultural heritage, genotyping for a limited number of variants cannot identify all carriers, and there is always a residual risk that an individual is a carrier even after screening. The amount of residual risk will depend on the number of variants genotyped and the gene frequency within the individual’s racial/ethnic background ( Table 6.3 ).
|Race or Ethnicity||Prevalence of CF||Carrier Frequency||Carrier Testing Detection Rate||Carrier Risk after Negative Result|
Genotyping can be used in prenatal genetic testing with suspected structural fetal anomalies or syndromes. In this setting, the responsible gene and the specific variant must be known to perform the correct test. Some constellations of fetal findings suggest specific genetic conditions, as with common short-limbed dwarfisms including achondroplasia, thanatophoric dysplasia, or campomelic dysplasia. Targeted genotyping panels can assess for the common variants within selected genes that are most often associated with the ultrasound findings. Targeted panels for specific candidate genes can be relatively inexpensive, with a rapid turnaround time, and fewer nonspecific and incidental genetic findings. Although gene panels can be helpful in testing for several different genes associated with a particular ultrasound finding, genotyping panels will test only for the previously recognized genes. In addition, in some cases, gene panels may have been developed based on pediatric findings, and the fetal phenotype may be different as characteristics may differ or appear only later in gestation. Finally, the discovery of new gene alterations and their association with a disease state continues at a rapid pace and targeted gene panels may quickly become outdated.
Gene sequencing identifies the nucleic acids and their order in a segment of DNA. The introduction of next-generation sequencing (NGS) (Sanger sequencing being first generation) allowed massively parallel sequencing of millions of DNA segments rapidly and relatively inexpensively. However, the amount of data generated can be enormous and represents only the first step of the analysis. The second or bioinformatics step is necessary to interpret the results and uses computer algorithms, public databases, expert opinion, and sometimes functional analysis (detailed below). Following this assessment, each identified DNA sequence change is categorized based on recommendations of The American College of Medical Genetics and Genomics (ACMG); these five categories include the following: (1) pathogenic, (2) likely pathogenic, (3) uncertain significance, (4) likely benign, and (5) benign. The previously used terms “mutation” and “polymorphism” have been largely replaced by the five categories of genetic variant. What was previously referred to as a “mutation” is now most often considered a “pathogenic” or “likely pathogenic” variant, and a polymorphism should be called a “benign variant.” NGS can be used to evaluate specific genes, a portion of the genome or the whole genome.