Chapter Contents
The nature and structure of a gene 129
Decoding the information in DNA 130
Chromosomes and cell division 131
Chromosome analysis 133
Patterns of inheritance 135
Autosomal dominant inheritance 135
Autosomal recessive inheritance 135
X-linked recessive inheritance 135
Polygenic inheritance 136
Other forms of inheritance 136
Molecular genetic analysis for single gene disorders 137
New diagnostic genetic technology 138
The nature and structure of a gene
Genetics is traditionally defined as the science of biological variation, and has been a scientific discipline for over a century. Human genetics makes up a large part of the field of genetics, but the principal laws of genetics are universal and apply equally to all species, including humans. Mendel’s studies in the 19th century were originally felt to have no relevance to humans, and it is only in retrospect that their importance can be seen. Many of the principles of genetics were discovered through the study of smaller organisms, such as bacteria, yeast and fruit flies. The basic genetic mechanisms of cell division, development and differentiation happen in the same way in widely divergent species. Therefore, it is impossible to look at human genetics in isolation and there are large amounts of information from lower species which have a bearing on human disorders. The study of the genetics of small organisms has had a profound impact on our understanding of human development, and of how human diseases develop. It is likely that such basic science will continue to contribute significantly to the understanding of human genetic disease. In this chapter, I hope to outline the basic elements of genetics, and describe the types of genetic tests now available to help in neonatal diagnosis.
The basic unit of inheritance for any species is the gene. The original concept of a gene arose long before the relationship between genes and nucleic acids was ever understood. A gene was considered to be a stable heritable element which conferred a particular property or phenotype on an individual organism. This element was passed on to subsequent generations of a particular species, and the nature of the phenotype varied according to the nature of the gene. The concept of dominant and recessive traits, which will be discussed below, was derived from studies of inheritance patterns, long before the molecular basis of the gene was understood.
A gene can also be considered in another way, as a specific length of deoxyribonucleic acid (DNA) which encodes a particular function, in most cases the synthesis of a protein. This also is a stable heritable unit. Each cell in an organism, regardless of its function, has the entire set of genes for that particular organism, but only a proportion of those genes will be active. DNA is found in the nucleus of every cell of an organism, as a double helix ( Fig. 8.1 ).
Each strand of the double helix has a backbone of alternating phosphate and deoxyribose sugar molecules, with the sugars attached to the 5’ and 3’ hydroxyl groups of the phosphate group. Attached to the sugar molecule, lying within the helix, is one of four nitrogen-containing nucleic acid bases. Two of these bases, adenine (A) and guanine (G), are purines, and two are the smaller pyrimidines cytosine (C) and thymine (T). The A and T bases pair together by hydrogen bonding, and the G and C bases similarly pair by hydrogen bonds ( Fig. 8.2 ). The two strands of the double helix are held together by paired A–T or G–C bases of opposite strands of the double helix. The DNA strand can be read in only one direction, from 5’ (left hand) to 3’ (right hand). The two strands of DNA are complementary to each other, and the sequence of one strand can be predicted from its opposite. If one strand reads 5’-CAGCGTA-3’, then the opposite strand must read 5’-TACGCTG-3’. The double-stranded sequence would then be written as below:
5’-CAGCGTA-3’
3’-GTCGCAT-5’
The simplicity of the double-helix structure allows for several important functions for DNA.
First, huge amounts of information can be stored in the DNA strand. If a molecule of DNA is 1 million bases long, then there are 4 1 000 000 possible sequences for that stretch of DNA. A genome is the complete DNA sequence of an organism. In humans, the estimated genome size is 3 × 10 9 basepairs (bp). The draft DNA sequence of the entire human genome has been completed, which is a major milestone in human scientific development. It is estimated that the genome has between 22 000 and 25 000 genes. However, despite the DNA sequence being available, the detailed function of many of these genes remains unknown. The next step after the sequencing of the human genome is the understanding of the complexities of all the human genes. The routine practical clinical application of the knowledge of the human genome is still some way off.
Second, the double helix provides a framework for DNA replication. One strand of DNA acts as a template for the synthesis of a new strand. The double helix unwinds, allowing DNA replication enzymes access to the template strand of DNA. The replication system builds a new strand of DNA based on the template. The new double helix formed as a result will contain one original strand and a newly synthesised complementary second strand. This is the basic mechanism of DNA replication in all species.
Third, the double helix provides a basis for repair of damaged DNA. A damaged base can be replaced, knowing its complementary base is present on the opposite strand. Damage to the sugar–phosphate backbone can also be repaired using the opposite strand as a template.
Decoding the information in DNA
About 90% of the DNA in the human genome does not code for any specific property; only about 10% of the genome actually contains coding information in the form of a gene. In simple terms, the genetic code in DNA is transcribed into a molecule called messenger RNA (mRNA). The mRNA is then translated into a protein, which carries out the function encoded by the specific DNA.
A gene has several distinct elements ( Fig. 8.3 ). The major part of the gene is divided into coding regions, called exons, and non-coding regions, called introns. Just before (5’) the first exon, there is a promoter which indicates where transcription of a gene should start. There can be several promoters for one gene, and different promoters can be used according to the tissue in which the gene is being expressed; in other words, the promoter is tissue-specific. Further 5’ of the promoter there can also be enhancers or suppressors, which can increase or decrease the level of transcription of the gene. Not all of the mRNA will code for protein, as some exons will code for mRNA that does not directly encode protein. These areas, known as untranslated regions, can be either at the start (5’) or the end (3’) of the mRNA.
To express the DNA code, mRNA is used. There are several different types of RNA, but mRNA is the most important in decoding DNA. There are three differences between RNA and DNA. First, the sugar backbone of RNA contains ribose rather than deoxyribose. Second, mRNA exists as a single strand, and remains more unstable. Third, in RNA, the base uracil (U) is used instead of thymine, whereas the other three nucleic acids remain the same.
The DNA code in most genes is expressed as a protein, which is a peptide made of the building blocks of individual amino acids. Each amino acid is coded for by a sequence of three DNA bases, known as a codon. For some amino acids, there is more than one codon ( Table 8.1 ). A long series of DNA codons in a gene will thus code for an entire protein. The mRNA codons coding for amino acids are identical to DNA codons, with the substitution of uracil (U) for thymine (T). There is a tightly controlled mechanism for the generation of protein from a DNA template.
FIRST POSITION | SECOND POSITION | THIRD POSITION | |||||||
---|---|---|---|---|---|---|---|---|---|
U | AMINO ACID | C | AMINO ACID | A | AMINO ACID | G | AMINO ACID | ||
U | UUU | Phe | UCU | Ser | UAU | Tyr | UGU | Cys | U |
UUC | Phe | UCC | Ser | UAC | Tyr | UGC | Cys | C | |
UUA | Leu | UCA | Ser | UAA | Stop | UGA | Stop | A | |
UUG | Leu | UCG | Ser | UAG | Stop | UGG | Trp | G | |
C | CUU | Leu | CCU | Pro | CAU | His | CGU | Arg | U |
CUC | Leu | CCC | Pro | CAC | His | CGC | Arg | C | |
CUA | Leu | CCA | Pro | CAA | Gln | CGA | Arg | A | |
CUG | Leu | CCG | Pro | CAG | Gln | CGG | Arg | G | |
A | AUU | Ile | ACU | Thr | AAU | Asn | AGU | Ser | U |
AUC | Ile | ACC | Thr | AAC | Asn | AGC | Ser | C | |
AUA | Ile | ACA | Thr | AAA | Lys | AGA | Arg | A | |
AUG | Met | ACG | Thr | AAG | Lys | AGG | Arg | G | |
G | GUU | Val | GCU | Ala | GAU | Asp | GGU | Gly | U |
GUC | Val | GCC | Ala | GAC | Asp | GGC | Gly | C | |
GUA | Val | GCA | Ala | GAA | Glu | GGA | Gly | A | |
GUG | Val | GCG | Ala | GAG | Glu | GGG | Gly | G |
To decode a gene into protein, the DNA is first transcribed into mRNA. A strand (the ‘sense’ strand) of the DNA double helix is used by the enzyme RNA polymerase to synthesise a complementary strand of mRNA. Transcription of mRNA starts from the 5’ end of the first exon of the gene, until the end of the most 3’ exon. The intervening introns are initially included, and the first molecule is known as pre-mRNA. The intronic RNA sequences are spliced out and a 3’ polyadenine tail is added, producing mature mRNA. The mature mRNA is then transferred from the nucleus to the ribosome, to be used as a template for the production of protein. The mature mRNA has both 5’ and 3’ untranslated regions.
Protein synthesis does not begin at the 5’ end of the mRNA, but at the first 5’ AUG codon, which codes for the amino acid methionine. Protein translation stops at the first truncation codon (usually UGA) thereafter (see Fig. 8.3 ). In the ribosome, amino acid-specific transfer molecules, called transfer RNAs (tRNAs), bind a free molecule of their specific amino acid. The binding is carried out by an anticodon in the tRNA, which is complementary to the mRNA that codes for that specific amino acid. Using its anticodon, the tRNA binds the specific mRNA codon for its amino acid. By a complex machinery, the amino acid is then added to a growing peptide chain which will eventually form the mature protein ( Fig. 8.4 ). The 5’ end of the mRNA corresponds to the NH 2 (amino terminus) of the protein, and the 3’ end of the mRNA corresponds to the COOH (carboxyl terminus) of the protein. Many proteins in higher species are modified after translation by the addition of phosphate or lipid groups.
Chromosomes and cell division
The first coiling of DNA is in the form of the double helix. However, there are subsequent higher orders of coiling and packaging of DNA. The first order gives a loop of about 146 bp in size, wound around a histone protein. The complex is known as a nucleosome. The highest order of coiling of a large DNA molecule, with its associated histones and other proteins, is known as a chromosome.
A chromosome consists of one very long double helix of DNA, containing very many genes in millions of basepairs. Humans are diploid; that is, they have two copies of every chromosome. The normal human chromosome complement is 46, made up of 22 pairs of autosomes (non-sex chromosomes) and two sex chromosomes, either X and Y in a male, or X and X in a female. Each member of a pair of autosomes contains the same genetic information. The pair of X chromosomes in a female will contain the same genetic information, but the X and Y chromosomes in a male only have a small number of genes in common. A normal human metaphase karyotype is shown in Figure 8.5 .