Key Points
- •
Asthma and atopy are examples of complex genetic diseases that, despite a strong genetic component, do not exhibit simple Mendelian inheritance.
- •
The many genes involved have ‘mild’ mutations with small phenotypic effects that combine to influence disease phenotype.
- •
Numerous genes have been identified that are associated with asthma, atopy, atopic dermatitis and allergic rhinitis. Recent advances have largely been due to improvements in whole genome approaches.
- •
Research has now moved on to the modifying effects of environment on these genetic susceptibilities including the role of epigenetic changes.
- •
The hope is that we are now moving into an era of clinical application of these genetic findings such as the use of pharmacogenetics to tailor asthma treatment.
Since the first report of linkage between chromosome 11q13 and atopy in 1989, there have been thousands of published studies of the genetics of asthma and other allergic diseases. Their aim is to identify the genetic factors that modify susceptibility to allergic diseases, determine severity of disease in affected individuals and affect the response to treatment. This recent expansion in our knowledge has provided intriguing insights into the pathophysiology of these complex disorders. In this chapter, we outline the approaches used to undertake genetic studies of common diseases such as atopic dermatitis and asthma and provide examples of how these approaches are beginning to reveal new insights into the pathophysiology of allergic diseases.
Why Undertake Genetic Studies of Allergic Disease?
Susceptibility to allergic disease is likely to result from the inheritance of many gene variants but the underlying cellular defects are unknown. By undertaking research into the genetic basis of these conditions, these gene variants and their gene products can be identified solely by the anomalous phenotypes they produce. Identifying the genes that produce these disease phenotypes provides a greater understanding of the fundamental mechanisms of these disorders, stimulating the development of specific new drugs or biologics to both relieve and prevent symptoms. In addition, genetic variants may also influence the response to therapy and the identification of individuals with altered response to current drug therapies will allow optimization of current therapeutic measures (i.e. disease stratification and pharmacogenetics). The study of genetic factors in large longitudinal cohorts with extensive phenotype and environmental information allows the identification of external factors that initiate and sustain allergic diseases in susceptible individuals and the periods of life in which this occurs, with a view to identifying those environmental factors that could be modified for disease prevention or for changing the natural history of the disorder. For example, early identification of vulnerable children would allow targeting of preventative therapy or environmental intervention, such as avoidance of allergen exposure. Genetic screening in early life may eventually become a practical and cost-effective option for allergic disease prevention.
Approaches to Genetic Studies of Complex Genetic Diseases
What is a Complex Genetic Disease?
The use of genetic analysis to identify genes responsible for simple Mendelian traits such as cystic fibrosis has become almost routine in the 30 years since it was recognized that genetic inheritance can be traced with naturally occurring DNA sequence variation. However, many of the most common medical conditions known to have a genetic component to their etiology, including diabetes, hypertension, heart disease, schizophrenia and asthma, have much more complex inheritance patterns.
Complex disorders show a clear hereditary component, however the mode of inheritance does not follow any simple Mendelian pattern. Furthermore, unlike single-gene disorders, they tend to have an extremely high prevalence. Asthma occurs in at least 10% of children in the UK, and atopy is as high as 40% in some population groups as compared to cystic fibrosis at 1 in 2,000 live white births. Characteristic features of Mendelian diseases are that they are rare and involve mutations in a single gene that are ‘severe’, resulting in large phenotypic effects that may be independent of environmental influences. In contrast, complex disease traits are common and involve many genes, with ‘mild’ mutations leading to small phenotypic effects with strong environmental interactions.
How to Identify Genes Underlying Complex Disease
Before any genetic study of a complex disease can be initiated, there are a number of different factors that need to be considered. These include: (1) assessing the heritability of a disease of interest to establish whether there is indeed a genetic component to the disease in question; (2) defining the phenotype (or physical characteristics) to be measured in a population; (3) the size and nature of the population to be studied; (4) determining which genetic markers are going to be typed in the DNA samples obtained from the population; (5) how the relationships between the genetic data and the phenotype measures in individuals are to be analyzed and (6) how the resulting data can be used to identify the genes underlying the disease.
One of the most important considerations in genetic studies of complex disease susceptibility is the choice of the methods of genetic analysis to be used. This choice will both reflect and be reflected in the design of the study. Will the study be a population study or a family-based study? What numbers of subjects will be needed?
Inheritance
The first step in any genetic analysis of a complex disease is to determine whether genetic factors contribute at all to an individual’s susceptibility to disease. The fact that a disease has been observed to ‘run in families’ may reflect common environmental exposures and biased ascertainment, as well as a potential true genetic component. There are a number of approaches that can be taken to determine if genetics contributes to a disease or disease phenotype of interest including family studies, segregation analysis, twin and adoption studies, heritability studies and population-based relative risk to relatives of probands.
There are three main steps involved in the identification of genetic mechanisms for a disease.
- 1.
Determine whether there is familial aggregation of the disease – does the disease occur more frequently in relatives of cases than of controls?
- 2.
If there is evidence for familial aggregation, is this because of genetic effects or other factors such as environmental or cultural effects?
- 3.
If there are genetic factors, which specific genetic mechanisms are operating?
Family studies involve the estimation of the frequency of the disease in relatives of affected, compared with unaffected, individuals. The strength of the genetic effect can be measured as λ R , where λ R is the ratio of risk to relatives of type R (sibs, parents, offspring, etc.) compared with the population risk (λ R = κ R /κ, where κ R is the risk to relatives of type R and κ is the population risk). The stronger the genetic effect, the higher the value of λ. For example, for a recessive single-gene Mendelian disorder such as cystic fibrosis, the value of λ is about 500; for a dominant disorder such as Huntington’s disease, it is about 5,000. For complex disorders the values of λ are much lower, e.g. 20–30 for multiple sclerosis, 15 for insulin-dependent diabetes mellitus (IDDM), and 4 to 5 for Alzheimer’s disease. It is important to note, though, that λ is a function of both the strength of the genetic effect and the frequency of the disease in the population. Therefore, if a disease has a λ value of 3 to 4 it does not mean that genes are less important in that trait than in a trait with a λ of 30 to 40. A strong effect in a very common disease will have a smaller λ than the same strength of effect in a rare disease.
Determining the relative contribution of common genes versus common environment to clustering of disease within families can be undertaken using twin studies where the concordance of a trait in monozygotic and dizygotic twins is assessed. Monozygotic twins have identical genotypes, whereas dizygotic twins share, on average, only one half of their genes. In both cases, they share the same childhood environment. Therefore, a disease that has a genetic component is expected to show a higher rate of concordance in monozygotic than in dizygotic twins. Another approach used to disentangle the effects of nature versus nurture in a disease is in adoption studies, where, if the disease has a genetic basis, the frequency of the disease should be higher in biologic relatives of probands than in their adopted family.
Once familial aggregation with a probable genetic etiology for a disease has been established, the mode of inheritance can be determined by observing the pattern of inheritance of a disease or trait and how it is distributed within families. For example, is there evidence of a single major gene and is it dominantly or recessively inherited? Segregation analysis is the most established method for this purpose. The observed frequency of a trait in offspring and siblings is compared with the distribution expected with various modes of inheritance. If the distribution is significantly different than predicted, that model is rejected. The model that cannot be rejected is therefore considered the most likely. However, for complex disease, it is often difficult to undertake segregation analysis, because of the multiple genetic and environmental effects making any one model hard to determine. This has implications for the methods of analysis of genetic data in studies, because some methods, such as the parametric logarithm (base 10) of odds (LOD) score approach, require a model to be defined to obtain estimates of parameters such as gene frequency and penetrance (see Approaches to analysis).
Phenotype
Studies of a genetic disorder require that a phenotype be defined, to which genetic data are compared. Phenotypes can be classified in two ways. They may be complex, such as asthma or atopy, and are likely to involve the interaction of a number of genes. Alternatively, intermediate phenotypes may be used, such as bronchial hyperresponsiveness (BHR) and eosinophilia for asthma and serum immunoglobulin E (IgE) levels and specific IgE responsiveness or positive skin prick tests to particular allergens for atopy. Together, these phenotypes contribute to an individual’s expression of the overall complex disease phenotype but are likely to involve the interaction of fewer genetic influences, thus increasing the chances of identifying specific genetic factors predisposing toward the disease. Phenotypes may also be discrete or qualitative, such as the presence or absence of wheeze, atopy and asthma, or quantitative. Quantitative phenotypes, such as blood pressure (mm Hg), lung function measures (e.g. FEV 1 ) and serum IgE levels, are phenotypes that can be measured as a continuous variable. With quantitative traits, no arbitrary cut-off point has to be assigned (making quantitative trait analysis important), because clinical criteria used to define an affected or an unaffected phenotype may not reflect whether an individual is a gene carrier or not. In addition, the use of quantitative phenotypes allows the use of alternative methods of genetic analysis that, in some situations, can be more powerful. Cluster analysis has been used to identify individual phenotypic expressions of asthma in a population sample.
Population
Having established that the disease or phenotype of interest does have a genetic component to its etiology, the next step is to recruit a study population in which to undertake genetic analyses to identify the gene(s) responsible. The type and size of study population recruited depend heavily on a number of interrelated factors, including the epidemiology of the disease, the method of genetic epidemiologic analysis being used, and the class of genetic markers genotyped. For example, the recruitment of families is necessary to undertake linkage analysis, whereas association studies are better suited to either a randomly selected or case-control cohort. In family-based linkage studies, the age of onset of a disease will determine whether it is practical to collect multigenerational families or affected sib pairs for analysis. Equally, if a disease is rare, then actively recruiting cases and matched controls will be a more practical approach compared to recruiting a random population that would need to be very large to have sufficient power.
Genetic Markers
Genetic markers used can be any identifiable site within the genome (locus), where the DNA sequence is variable (polymorphic between individuals). The most common genetic markers used for linkage analysis are microsatellite markers comprising short lengths of DNA consisting of repeats of a specific sequence (e.g. CA n ). The number of repeats varies between individuals, thus providing polymorphic markers that can be used in genetic analysis to follow the transmission of a chromosomal region from one generation to the next. Single-nucleotide polymorphisms (SNPs) are the simplest class of polymorphism in the genome resulting from a single base substitution: for example cytosine substituted for thymidine. SNPs are much more frequent than microsatellites in the human genome, occurring in introns, exons, promoters and intergenic regions, with several million SNPs now having been identified and mapped. Another source of variation in the human genome that has recently been recognized to be present to a much greater extent than was previously thought is copy number variations (CNVs). CNVs are either a deletion or insertion of a large piece of DNA sequence; CNVs can contain whole genes and therefore are correlated with gene expression in a dose-dependent manner. Sequencing of an individual human genome revealed that non-SNP variation (which includes CNVs) made up 22% of all variation in that individual but involved 74% of all variant DNA bases in that genome.
Approaches to Analysis
Linkage analysis involves proposing a model to explain the inheritance pattern of phenotypes and genotypes observed in a pedigree. Linkage is evident when a gene that produces a phenotypic trait and its surrounding markers are co-inherited. In contrast, those markers not associated with the anomalous phenotype of interest will be randomly distributed among affected family members as a result of the independent assortment of chromosomes and crossing over during meiosis. In complex disease, non-parametric linkage approaches, such as allele sharing, are usually used. Allele-sharing methods test whether the inheritance pattern of a particular chromosomal region is not consistent with random Mendelian segregation by showing that pairs of affected relatives inherit identical copies of the region more often than would be expected by chance. While family-based analysis utilizing linkage analysis or allele-sharing methods was the mainstay of gene identification for monogenic diseases in the past, it has been largely superseded for analysis of common disease by the use of genome-wide association studies (for common variants) and next-generation sequencing of whole or partial (e.g. protein-coding fraction or exome) individual genomes.
Association studies do not examine inheritance patterns of alleles; rather, they are case-control studies based on a comparison of allele frequencies between groups of affected and unaffected individuals from a population. The odds ratio of the trait in individuals is then assessed as the ratio of the frequency of the allele in the affected population compared with the unaffected population. The greatest problem in association studies is the selection of a suitable control group to compare with the affected population group. Although association studies can be performed with any random DNA polymorphism, they have the most significance when applied to polymorphisms that have functional consequences in genes relevant to the trait (candidate genes).
It is important to remember with association studies that there are a number of reasons leading to an association between a phenotype and a particular allele:
- •
A positive association between the phenotype and the allele will occur if the allele is the cause of, or contributes to, the phenotype. This association would be expected to be replicated in other populations with the same phenotype, unless there are several different alleles at the same locus contributing to the same phenotype, in which case association would be difficult to detect, or if the trait was predominantly the result of different genes in the other population (genetic heterogeneity).
- •
Positive associations may also occur between an allele and a phenotype if that particular allele is in linkage disequilibrium (LD) with the phenotype-causing allele. That is, the allele tends to occur on the same parental chromosome that also carries the trait-causing mutation more often than would be expected by chance. Linkage disequilibrium will occur when most causes of the trait are the result of relatively few ancestral mutations at a trait-causing locus and the allele is present on one of those ancestral chromosomes and lies close enough to the trait-causing locus that the association between them has not been eroded away through recombination between chromosomes during meiosis. LD is the non-random association of adjacent polymorphisms on a single strand of DNA in a population; the allele of one polymorphism in an LD block (haplotype) can predict the allele of adjacent polymorphisms (one of which could be the causal variant).
- •
Positive association between an allele and a trait can also be artefactual as a result of recent population admixture. In a mixed population, any trait present in a higher frequency in a subgroup of the population (e.g. an ethnic group) will show positive association with an allele that also happens to be more common in that population subgroup. Thus, to avoid spurious association arising through admixture, studies should be performed in large, relatively homogeneous populations. An alternative method to test for association in the presence of linkage is the ‘transmission test for linkage disequilibrium’ (transmission/disequilibrium test [TDT]). The TDT uses families with at least one affected child, and the transmission of the associated marker allele from a heterozygous parent to an affected offspring is evaluated. If a parent is heterozygous for an associated allele A1 and a non-associated allele A2, then A1 should be passed on to the affected child more often than A2 .
However, advances in array-based SNP genotyping technologies and haplotype mapping of the human genome mean genome-wide association studies (GWAS) have revolutionized the study of genetic factors in complex common disease over the last decade. For more than 150 phenotypes – from common diseases to physiological measurements such as height and BMI and biological measurements such as circulating lipid levels and blood eosinophil levels – GWAS have provided compelling statistical associations for thousands of different loci in the human genome and are now the method of choice for identification of genetic variants influencing physiological or disease phenotypes.
Identify Gene
If, as in most complex disorders, the exact biochemical or physiologic basis of the disease is unknown, there are three main approaches to finding the disease gene(s). One method is to test markers randomly spaced throughout the entire genome for linkage with the disease phenotype. If linkage is found between a particular marker and the phenotype, then further typing of genetic markers including SNPs and association analysis will enable the critical region to be further narrowed. The genes positioned in this region can be examined for possible involvement in the disease process and the presence of disease-causing mutations in affected individuals. This approach is often termed positional cloning, or genome scanning if the whole genome is examined in this manner. Although this approach requires no assumptions to be made as to the particular gene involved in genetic susceptibility to the disease in question, it does require considerable molecular genetic analysis to be undertaken in large family cohorts, involving considerable time, resource and expense.
As noted above, this approach has now been superseded by genome-wide association studies using SNPs evenly spaced throughout the genome as an assumption-free approach to locate disease-associated genes involved in disease pathogenesis. As GWAS utilize large data sets, up to one million SNPs to test for association, stringent genotype calling, quality control, population stratification (genomic controls) and statistical techniques have been developed to handle the analysis of such data. Studies start by reporting single marker analyses of primary outcome; SNPs are considered to be strongly associated if the P-values are below the 1% false discovery rate (FDR) or showing weak association above 1% but below the 5% FDR. A cluster of P-values below the 1% FDR from SNPs in one chromosomal location is defined as the region of ‘maximal association’ and is the first candidate gene region to examine further, with analysis of secondary outcome measures, gene database searches, fine mapping to find the causal locus and replication in other cohorts/populations. It is unlikely that the SNP showing the strongest association will be the causal locus, as SNPs are chosen to provide maximal coverage of variation in that region of the genome and not on biological function. Therefore, GWAS will often include fine mapping/haplotype analysis of the region with the aim of identifying the causal locus. If linkage disequilibrium prevents the identification of a specific gene in a haplotype block, then it may be necessary to utilize different racial and ethnic populations to hone in on the causative candidate gene that accounts for the genetic signal in GWAS.
Finally, candidate genes can be selected for analysis because of a known role for the encoded product of the gene in the disease process. The gene is then screened for polymorphisms, which are tested for association with the disease or phenotype in question. A hybrid approach is the selection of candidate genes based not only on their function but also on their position within a genetic region previously linked to the disease (positional candidate). This approach may help to reduce the considerable work required to narrow a large genetic region of several megabases of DNA identified through linkage containing tens to hundreds of genes to one single gene to test for association with the disease.
Once a gene has been identified, further work is required to understand its role in the disease pathogenesis. Further molecular genetic studies may help to identify the precise genetic polymorphism that is having functional consequences for the gene’s expression or function as opposed to those that are merely in linkage disequilibrium with the causal SNP. Often the gene identified may be completely novel and cell and molecular biology studies will be needed to understand the gene product’s role in the disease and to define genotype/phenotype correlations. Furthermore, by using cohorts with information available on environmental exposures, it may be possible to define how the gene product may interact with the environment to cause disease. Ultimately, knowledge of the gene’s role in disease pathogenesis may lead to the development of novel therapeutics.
Allergy and Asthma as Complex Genetic Diseases
From studies of the epidemiology and heritability of allergic diseases, it is clear that these are complex diseases in which the interaction between genetic and environmental factors plays a fundamental role in the development of IgE-mediated sensitivity and the subsequent development of clinical symptoms. The development of IgE responses by an individual, and therefore allergies, is the function of several genetic factors. These include the regulation of basal serum immunoglobulin production, the regulation of the switching of Ig-producing B cells to IgE, and the control of the specificity of responses to antigens. Furthermore, the genetic influences on allergic diseases such as asthma are more complex than those on atopy alone, involving not only genes controlling the induction and level of an IgE-mediated response to allergen but also ‘lung-’ or ‘asthma’-specific genetic factors that result in the development of asthma. This also applies equally to other clinical manifestations of atopy such as rhinitis and atopic dermatitis.
Phenotypes for Allergy and Allergic Disease: What Should We Measure?
The term atopy (from the Greek word for ‘strangeness’) was originally used by Coca and Cooke in 1923 to describe a particular predisposition to develop hypersensitivity to common allergens associated with an increase of circulating reaginic antibody, now defined as IgE, and with clinical manifestations such as whealing-type reactions, asthma and hay fever. Today, even if the definition of atopy is not yet precise, the term is commonly used to define a disorder involving IgE antibody responses to ubiquitous allergens that is associated with a number of clinical disorders such as asthma, allergic dermatitis, allergic conjunctivitis and allergic rhinitis.
Atopy can be defined in several ways, including raised total serum IgE levels, the presence of antigen-specific IgE antibodies, and/or a positive skin test to common allergens. Furthermore, because of their complex clinical phenotype, atopic diseases can be studied using intermediate or surrogate disease-specific measurements such as BHR or lung function for asthma. As discussed earlier, phenotypes can be defined in several ways: subjective measures (e.g. symptoms), objective measures (e.g. BHR, blood eosinophils or serum IgE levels), or both. In addition, some studies have used quantitative scores that are derived from both physical measures such as serum IgE and BHR and questionnaire data. It is a lack of a clear definition of atopic phenotypes that presents the greatest problem when reviewing studies of the genetic basis of atopy, with multiple definitions of the same intermediate phenotype often being used in different studies. Likewise, the definition of asthma can be problematic as this can be clinical (symptoms, parental reports), pharmacological (bronchodilator reversibility, steroid responsiveness) or derived from intermediate measures (BHR, lung function).
The Heritability of Atopic Disease: Are Atopy and Atopic Disease Heritable Conditions?
In 1916, the first comprehensive study of the heritability of atopy was undertaken by Robert Cooke and Albert Vander Veer at the Department of Medicine of the Postgraduate Hospital and Medical School of New York. Although the atopic conditions they included, as well as those excluded (e.g. eczema), may be open for debate today, the conclusions nonetheless remain the same: that there is a high heritable component to the development of atopy and atopic disease, and as is now more clearly understood biologically, this is owing to the inheritance of a tendency to generate specific IgE responses to common proteins.
Subsequent to the work of Cooke and Vander Veer, the results of many studies have established that atopy and atopic disease such as asthma, rhinitis and eczema have strong genetic components. Family studies have shown an increased prevalence of atopy, and phenotypes associated with atopy, among the relatives of atopic compared with non-atopic subjects. In a study of 176 normal families, Gerrard and colleagues found a striking association between asthma in the parent and asthma in the child, between hay fever in the parent and hay fever in the child, and between eczema in the parent and eczema in the child. These studies suggest that ‘end-organ sensitivity’, or which allergic disease an allergic individual will develop, is controlled by specific genetic factors, differing from those that determine susceptibility to atopy per se. This hypothesis is borne out by a questionnaire study involving 6,665 families in southern Bavaria. Children with atopic diseases had a positive family history in 55% of cases compared with 35% in children without atopic disease (P < .001). Subsequent researchers used the same population to investigate familial influences unique to the expression of asthma and found that the prevalence of asthma alone (i.e. without hay fever or eczema) increased significantly if the nearest of kin had asthma alone (11.7% vs 4.7%, P < .0001). A family history of eczema or hay fever (without asthma) was unrelated to asthma in the offspring.
Numerous twin studies have shown a significant increase in concordance for atopy among monozygotic twins compared with dizygotic twins, and both twin and family studies have shown a strong heritable component to atopic asthma. Using a twin-family model, Laitinen and colleagues reported that in families with asthma in successive generations, genetic factors alone accounted for as much as 87% of the development of asthma in offspring, and the incidence of the disease in twins with affected parents is 4-fold compared with the incidence in twins without affected parents. This indicates that asthma is recurring in families as a result of shared genes rather than shared environmental risk factors. This has been further substantiated in a study of 11,688 Danish twin pairs suggesting that 73% of susceptibility to asthma was the result of the genetic component. However, a substantial part of the variation in liability of asthma was the result of environmental factors; there also was no evidence for genetic dominance or shared environmental effects.
Molecular Regulation of Atopy and Atopic Disease, I: Susceptibility Genes
Positional Cloning by Genome-Wide Screens
Many genome-wide screens for atopy and atopic disorder susceptibility genes have been undertaken. Multiple regions of the genome have been observed to be linked to varying phenotypes with differences between cohorts recruited from both similar and different populations. This illustrates the difficulty of identifying susceptibility genes for complex genetic diseases. Different genetic loci will show linkage in populations of different ethnicities and different environmental exposures. As mentioned earlier, in studies of complex disease, the real challenge has not been identification of regions of linkage, but rather identification of the precise gene and genetic variant underlying the observed linkage. To date, several genes have been identified as the result of positional cloning using a genome-wide scan for allergic disease phenotypes, including for example ADAM33, GPRA, DPP10, PHF11 and UPAR for asthma, COL29A1 for atopic dermatitis and PCDH1 for bronchial hyperresponsiveness.
Genes Identified by Genome-Wide Association Studies
Subsequent to positional cloning studies, improvements in technology have now enabled genome-wide association studies to be performed with great success in allergic diseases such as asthma, eczema and allergic sensitization. Figure 3-1 illustrates allergy-associated genes reported in GWAS for asthma, rhinitis, serum IgE, atopy and atopic dermatitis, and the overlap between genes associated with different allergic diseases.
The first novel asthma susceptibility locus to be identified by a GWAS approach contains the ORMDL3 and GSDML genes on chromosome 17q12-21.1. 317,000 SNPs (in genes or surrounding sequences) were characterized in 994 subjects with childhood-onset asthma and 1,243 non-asthmatics followed by replication in a further 2,320 subjects that revealed five significantly associated SNPs. Following gene expression studies, ORMDL3 was found to be strongly associated with disease-associated markers ( P < 10 −22 for rs7216389) identified by the GWAS.
Importantly, a number of subsequent studies have replicated the association between variation in the chromosome 17q21 region (mainly rs7216389) and childhood asthma in ethnically diverse populations. A GWAS by the GABRIEL consortium of 26,475 people confirmed the association between GSDML-ORMDL3 and childhood-onset asthma as well as implicating a number of genes involved in Th2 activation including IL33 , IL1RL1 and SMAD . The loci associated with asthma were not associated with serum IgE levels.
However, a study of association between SNPs and gene expression levels found that a distant SNP rs1051740 (greater than 4 megabases away and on a different chromosome) in the EPHX1 gene associates with ORMDL3 gene expression at a more significant level than rs7216389. Long-distance genomic interactions can mean that the gene within which the SNP is located is not necessarily the causal gene. Therefore, it is important to remember that considerable work is still required to fully characterize this region of the genome before accepting ORMDL3 as the causal gene through ‘guilt by association’ because many genes in a region of linkage disequilibrium will be associated with disease in a GWAS without, necessarily, being the causative gene. GWAS have also identified novel genes underlying blood eosinophil levels (and also associated with asthma), occupational asthma, total serum IgE levels and eczema.
Studies of other atopic diseases have focussed on serum IgE levels and/or allergic sensitization. Weidinger et al identified a locus associated with the high-affinity IgE receptor ( FCER1A ) as strongly associated with both serum IgE and sensitization as well as confirming candidate gene findings of STAT6 and the 5q31 region related to Th2 cytokines. An Icelandic study showed an association between IL1RL1 (the IL-33 receptor coding gene) and blood IgE levels. This region was also identified in the asthma GWAS by Moffatt et al ; however that study did not find an association between asthma and loci associated with serum IgE levels. A meta-analysis of GWAS studies into allergic sensitization that included a total of 16,170 sensitized individuals, identified a total of 10 loci that are estimated to account for 25% of allergic sensitization and allergic rhinitis. Nine of the 10 SNPs identified also showed a directionally consistent association with asthma. Associations were also identified with atopic dermatitis, albeit weaker than with asthma. The authors also investigated known susceptibility loci and found only weak associations with total IgE levels ( FCER1A and HLA-A ) and asthma (17q12-21 and IL33 ). This suggests that these loci do not increase asthma risk through allergic sensitization.
Until recently, very little was known of the genetic causes of atopic dermatitis (AD), aside from filaggrin , which is described in more detail below. However, recent studies have expanded this knowledge: a recent meta-analysis of atopic dermatitis studies by Paternoster et al on 11,025 cases and 40,398 controls revealed loci at OVOL1 and ACTL9 associated with epidermal proliferation and KIF3A in the 5q31 Th2 cytokine cluster. The study also confirmed the filaggrin ( FLG ) locus association. Meanwhile, Weidinger et al studied childhood-onset AD and again identified the FLG association as well as the KIF3A locus mentioned above and the previously identified 11q13.5 and 5q31 regions. They also noted some overlap with asthma and psoriasis, strengthening the view that AD arises from both epithelial and immune dysfunction. This theory is backed up by the discovery of an AD-associated SNP adjacent to C11orf30 , which was previously identified as a Crohn’s disease susceptibility locus, another disease of immune and epithelial dysfunction. Sun et al identified TMEM232 and SLC25A46 at 5q22 and TNFRSF6B and ZGPAT at 20q13 in association with AD in Chinese populations.
Atopic rhinitis is poorly understood but GWAS have identified loci in C11orf30 , mentioned above, as well as the HLA region, MRPL4 and BCAP . Candidate gene studies found an association with IL13 loci, and GWAS have identified several rhinitis-associated loci and loci associated with the phenotype ‘asthma and hay fever’. Likewise, there is much overlap between food allergy and atopy with candidate gene studies showing associations with CD14 , STAT6 , SPINK5 and IL10 but, to date, there have been no GWAS in food allergy.
These studies show the power of the GWAS approach for identifying complex disease susceptibility variants and current research is both expanding these known variants and confirming their associations with clinical phenotypes. GWAS has now moved on from simple loci of association with a broad disease definition, such as asthma, and studies are now identifying particular regions associated with phenotypes of disease or subgroups. For example, Du et al identified CRTAM as associated only with asthma exacerbations in those with low vitamin D, and another recent GWAS has identified CDHR3 as being associated with severe asthma. We are also gaining a better understanding of how atopic and non-atopic asthma overlap with other atopic diseases such as atopic dermatitis and rhinitis. We may also be able to integrate epigenetic information into the expression patterns of known and novel SNPs, for example, asthma risk resulting from the IL4R polymorphism rs3024685 is dramatically increased by higher levels of IL4R DNA methylation. Although GWAS has not fully explained the heritability of asthma and atopic disease, geneticists remain optimistic, as it is believed that this ‘missing heritability’ can be accounted for. It is thought that the inability to find genes could be explained by limitations of GWAS, such as other variants not screened for, analyses not adjusted for gene-environment and gene-gene interactions or epigenetic changes in gene expression. One explanation for missing heritability, after assessing common genetic variation in the genome, is that rare variants (below the frequency of SNPs included in GWAS studies) of high genetic effect, or common copy number variants may be responsible for some of the genetic heritability of common complex diseases.
Candidate Gene/Gene Region Studies
A large number of candidate regions have been studied for both linkage to and association with a range of atopy-related phenotypes. In addition, SNPs in the promoter and coding regions of a wide range of candidate genes have been examined. Candidate genes are selected for analysis based on a wide range of evidence, for example biological function, differential expression in disease, involvement in other diseases with phenotypic overlap, affected tissues, cell type(s) involved and findings from animal models. There are now more than 500 studies that have examined polymorphism in more than 200 genes for association with asthma and allergy phenotypes. When assessing the significance of association studies, it is important to consider several things. For example, was the size of the study adequately powered if negative results are reported? Were the cases and controls appropriately matched? Could population stratification account for the associations observed? In the definitions of the phenotypes, which phenotypes have been measured (and which have not)? How were they measured? Regarding correction for multiple testing, have the authors taken multiple testing into account when assessing the significance of association? Publications by Weiss, Hall, and Tabor and colleagues review these issues in depth.
Genetic variants showing association with a disease are not necessarily causal, because of the phenomenon of linkage disequilibrium (LD), whereby polymorphism A is not affecting gene function but rather it is merely in LD with polymorphism B that is exerting an effect on gene function or expression. Positive association may also represent a Type I error; candidate gene studies have suffered from non-replication of findings between studies, which may be due to poor study design, population stratification, different LD patterns between individuals of different ethnicity and differing environmental exposures between study cohorts. The genetic association approach can also be limited by under-powered studies and loose phenotype definitions.
An Example of a Candidate Gene: Interleukin-13
Given the importance of Th2-mediated inflammation in allergic disease, and the biological roles of IL13 , including switching B cells to produce IgE, wide-ranging effects on epithelial cells, fibroblasts, and smooth muscle promoting airway remodeling and mucus production, IL13 is a strong biological candidate gene. Furthermore, IL13 is also a strong positional candidate. The gene encoding IL13 , like IL4 , is located in the Th2 cytokine gene cluster on chromosome 5q31 within 12 kb of IL4 , with which it shares 40% homology. This genomic location has been extensively linked with a number of phenotypes relevant to allergic disease including asthma, atopy, specific and total IgE responses, blood eosinophils and BHR.
Asthma-associated polymorphisms have been identified in the IL13 gene, including a single-base pair substitution in the promoter of IL13 adjacent to a consensus nuclear factor of activated T cell binding sites. Asthmatics are significantly more likely to be homozygous for this polymorphism ( P = .002, odds ratio = 8.3) and the polymorphism is associated, in vitro, with reduced inhibition of IL13 production by cyclosporine and increased transcription factor binding. Hypotheses proposed to explain the association of this IL13 polymorphism and development of atopic disease include decreased affinity for the decoy receptor IL13Rα2, increased functional activity through IL13Rα1 and enhanced stability of the molecule in plasma (reviewed in Kasaian and Miller ).
An amino acid polymorphism of IL13 has also been described: R110Q (rs20541). The 110Q variant enhances allergic inflammation compared to the 110R wild-type IL-13 by inducing STAT6 phosphorylation, CD23 expression in monocytes and hydrocortisone-dependent IgE switching in B cells. It also has a lower affinity for the IL-13Rα2 decoy receptor and produced a more sustained eotaxin response in primary human fibroblasts expressing low levels of IL-13Rα2.
IL-13 polymorphism associations have been inconsistent with some studies showing association with atopy in children while others show associations with asthma and not atopy. Howard and colleagues also showed that the −1112 C/T variant of IL13 contributes significantly to BHR susceptibility (P = .003) but not to total serum IgE levels. Thus, it is possible that polymorphisms in IL13 may confer susceptibility to airway remodeling in persistent asthma, as well as to allergic inflammation in early life.
As discussed previously, positive association observed between an SNP and a phenotype does not imply that the SNP is casual. IL13 lies adjacent to IL4 , an equally strong biological candidate in which SNPs have shown association with relevant phenotypes, and within the chromosome 5q31 gene cluster that is known to contain an asthma susceptibility gene. Therefore association observed with IL13 SNPs may simply represent a proxy measure of the effect of polymorphisms in IL4 or another gene in the region. For example, a recent genome-wide association study of total IgE levels reported significant associations between polymorphisms in an adjacent gene, RAD50, and total serum IgE levels, in a region containing a number of evolutionary conserved non-coding sequences that may play a role in regulating IL4 and IL13 transcription. However, given the extensive biologic evidence for functionality and recent studies examining polymorphisms across the gene region showing independent effects of the IL13 R110Q SNP, it is likely that the reported IL13 associations are real.
Many studies have observed positive associations of specific genetic polymorphisms with differential response to environmental factors in asthma and other respiratory phenotypes. IL13 levels have been shown to be increased in children whose parents smoke and interaction between IL13 −1112 C/T and smoking with childhood asthma as an outcome has been reported, as well as evidence for this same SNP modulating the adverse effect of smoking on lung function in adults. Thus, differences in smoking exposure between studies may account for some of the differences in findings between studies. DNA methylation is affected by both genetic variants and environment, which may later determine disease risk. For example, Patil et al demonstrated that while rs20541 polymorphisms interacted with maternal smoking to determine methylation at the cg13566430 IL13 promoter region methylation site, a relationship between the rs1800925 SNP in the IL13 locus and the same cg13566430 methylation site affected lung function. This demonstrates the ‘two-step’ model of environment and genetic variance affecting disease state, as shown in Figure 3-2 .
An Example of a Candidate Gene: Interleukin-33
Since its identification in 2005, IL-33 has emerged as one of the most important cytokines in Th2 differentiation, and its receptor, ST2, is an excellent marker of Th2 cells. IL-33 is a member of the IL-1 family and is located on chromosome 9, therefore separate from the chromosome 5q31 cluster of IL13 and IL4 , and not in LD with these genes. Its receptor is encoded by IL1RL1 on chromosome 2, associated with the IL1 cluster. IL33 polymorphisms within two LD blocks have been identified in GWAS as associated with asthma, but these findings have not always been replicated. A number of polymorphisms have also been identified by candidate gene approaches and by both candidate gene and GWAS in the IL1RL1 gene (IL-33 receptor). The IL-33/IL1RL1 pathway has been implicated in the stimulation of type 2 innate lymphoid cells (ILC2s) that produce IL-4, IL-5 and IL-13 and thus may have a pivotal role in initiating the Th2 phenotype in atopy/asthma. Indeed IL1RL1 polymorphisms have been shown to be associated with lower levels of IL1RL1 transcription.
Any observed association of IL13 or IL33/IL1RL1 polymorphisms should have its effect reported in context by considering other variation in other relevant genes, whose products may modulate its effects. For example, there are a number of other functional polymorphisms in genes encoding other components of the IL4/IL13 signaling pathway ( IL4, IL13, IL4RA, IL13Rα1, IL13Rα2 and STAT6 ) with synergistic effects. Likewise, the IL1RL1 locus is closely related to the IL-18 receptor gene ( IL18R1 ), which has a complex LD structure. IL-18 is associated with Th1 responses and cell adhesion. This difficulty has been reviewed by Grotenboer et al who describe the multiple genetic signals in the IL33 and IL1RL1 loci that contribute to asthma pathogenesis. Their suggestion is that the complex LD may be overcome by performing further association studies in other populations with less LD or using meta-analysis with a number of conditional sub-analyses. Further functional and mechanistic studies are also needed.
The IL13/IL33 polymorphism studies illustrate many of the difficulties of genetic analysis in complex disease. Replication is often not found between studies and this may be accounted for by the lack of power to detect the small increases in disease risk that are typical for susceptibility variants in complex disease. Differences in genetic make-up, in environmental exposure between study populations, and failure to ‘strictly replicate’ in either phenotype (IgE and atopy vs asthma and BHR) or genotype (different polymorphisms in the same gene) can all contribute to the lack of replication between studies. Furthermore, studies of a single polymorphism, or even a single gene in isolation can over-simplify the complex genetic variants in asthma pathogenesis and the cross-talk between implicated cytokines, as shown by the roles of IL-13, IL-33, IL1RL1 and Th2/ILC2 cells in asthma pathogenesis.
Analysis of Clinically Defined Subgroups
One approach is to identify genes in a rare, severely affected subgroup of patients, in whom disease appears to follow a pattern of inheritance that indicates the effect of a single major gene. The assumption is that mutations (polymorphisms) of milder functional effect in the same gene in the general population may play a role in susceptibility to the complex genetic disorder. One example of this has been the identification of the gene encoding the protein filaggrin as a susceptibility gene for atopic dermatitis.
Filaggrin
Filaggrin (filament-aggregating protein) has a key role in epidermal barrier function. The protein is a major component of the protein-lipid cornified envelope of the epidermis important for water permeability and blocking the entry of microbes and allergens. In 2002, the condition ichthyosis vulgaris, a severe skin disorder characterized by dry flaky skin and a predisposition to atopic dermatitis and associated asthma, was mapped to the epidermal differentiation complex on chromosome 1q21; this gene complex includes the filaggrin gene ( FLG ). In 2006, Smith and colleagues reported that loss of function mutations in the filaggrin gene caused ichthyosis vulgaris.
Noting the common occurrence of atopic dermatitis in individuals with ichthyosis vulgaris, these researchers subsequently showed that common loss of function variants (combined carrier frequencies of 9% in the European population ) were associated with atopic dermatitis in the general population. Subsequent studies have confirmed an association with atopic dermatitis, and also with asthma and allergy but only in the presence of atopic dermatitis. Atopic dermatitis in children is often the first sign of atopic disease and these studies of filaggrin mutation have provided a molecular mechanism for the co-existence of asthma and dermatitis. It is thought that deficits in epidermal barrier function could initiate systemic allergy by allergen exposure through the skin and start the ‘atopic march’ in susceptible individuals.