Evidence-based medicine (EBM) seeks to improve the care of patients and the delivery of care to patients.
Descriptive-observational studies, including cross-sectional studies and case series, help generate hypotheses and characterize the context of disease.
Case-control studies allow us to study rare diseases and evaluate for a wide range of exposures.
Cohort studies allow us to study many outcomes over time.
Randomized controlled trials (RCTs) are considered to be the gold standard of experimental clinical study design.
Pragmatic clinical trials (PCTs) are designed to study the effectiveness of an intervention in the real world.
Comparative effectiveness research (CER) encompasses patient-centered research, PCTs, meta-analyses, systematic reviews, evidence-based guidelines, and health services research (HSR) to study the benefits and harms of an intervention to improve patient care on the individual and population levels.
Estimating the value of health care involves assessment of the quality and integration of care and the overall cost to provide all services included in that care.
Hippocrates, often hailed as the “father” of Western medicine, introduced the notion that an individual’s disease originates from natural causes that can be observed and described within that person’s environment. Although medicine has evolved in innumerable ways since the era of Hippocrates, the concept of disease as an observable entity related to an individual’s environment is the foundation of evidence-based medicine. Basic and clinical sciences have built significantly on this foundation in the last 50 to 60 years, undergoing an exponential accumulation of research and advancement of knowledge. The main goal of this research is to gain knowledge of disease processes, identify causes and effects of disease states, and develop and assess the efficacy of treatments and interventions. The construction of a translational bridge from the laboratory to clinical practice presents a unique challenge to researchers and clinicians. Published results do not necessarily imply meaningful clinical utility and must often be evaluated in the context of the inherent constraints imposed by research study designs. This chapter discusses traditional clinical study designs and explores modern research constructs, comparative effectiveness research (CER), and health services research (HSR).
Introduction to evidence-based medicine
The goal of evidence-based medicine is to guide clinical decision making using the full body of knowledge built from well-designed and well-conducted research. However, research evidence rarely applies directly to a particular individual or clinical problem. Clinical decisions must be formulated within the specific context of patient care by integrating it with clinical expertise that coincides with the values and goals of the individual patient. Clinical decision making must incorporate the most recent and valid information regarding disease prevention, diagnosis, prognosis, and treatment.
Epidemiologic studies form the foundation on which clinical evidence-based studies and the practice of evidence-based medicine are built. The World Health Organization (WHO) defines epidemiology as “the study of distribution and determinants of health-related states or events (including disease), and the application of this study to the control of disease and other health problems” ( www.who.int ). In simpler terms, epidemiology studies the cause and effect of a particular disease within a defined population in an attempt to assess association and/or causality between exposure and outcome.
One of the most influential studies in gynecology, the Women’s Health Initiative, was designed after epidemiologic data indicated an association between the use of hormone replacement therapy (HRT) and the prevention of coronary heart disease and osteoporosis. The collection of this observational data led to the development of one of the largest randomized controlled trials and U.S. prevention studies ever published, with more than 160,000 postmenopausal women enrolled. Interestingly, this trial, rather than validating the observational finding of a cardioprotective effect of HRT, instead showed an increased risk of coronary heart disease. It also demonstrated an increased risk of venous thromboembolism, stroke, and breast cancer, which were unexpected results ( ). This study is a good example both of the limitations of observational epidemiologic study design and of the importance of developing well-designed experimental clinical trials to test the validity, when plausible, of prior observations and associations.
Epidemiologic studies can be classified as either observational or experimental. The three most common types of observational epidemiologic studies are cohort, case-control, and cross-sectional studies. Case series can also be included among these, although their data is of lower quality. The gold standard of experimental study is the randomized controlled trial (RCT), in part because of its ability to control for confounding variables through the process of eligibility criteria and randomization. However, although the RCT is the traditionally heralded as the gold standard, it often does not directly represent the real-world therapeutic population.
In acknowledgment of these limitations, new fields of study design have been introduced and implemented, including CER and HSR. The primary objective of CER is to compare both clinical and public health interventions to determine which are most efficacious. CER is geared toward assisting “consumers, clinicians, purchasers, and policy makers to make informed decisions that will improve health care at both the individual and population levels” ( ). The importance of this decision-making assistance has been increasingly recognized; 1.1 billion dollars of the American Recovery and Reinvestment stimulus package in 2009 was allocated specifically to CER. HSR, another emerging field, examines how patients get access to care, the cost and quality of that care, and ultimately the result of delivery of care. Health economic analysis is an emerging subgroup of HSR. There are several forms of economic analysis; the most common is cost-effectiveness analysis (CEA), which is used to compare the relative cost and effectiveness of alternative strategies, usually using a standard willingness-to-pay threshold.
In the following sections we review both traditional clinical study design and emerging clinical research methods.
Traditional clinical study design
Traditional clinical study designs are not created equal regarding the quality of evidence they produce . Table 5.1 demonstrates a grading system that assesses clinical study design and evidence quality. Blinded RCTs offer the highest quality of evidence. Some authorities advocate that systematic reviews and meta-analyses of these types of trials produce an equal quality of evidence, although the validity of such studies relies on the quality and validity of the chosen articles ( ). The next level of evidence comes from cohort studies and case-control studies. The lowest-quality ranking is assigned to case series, case reports, and expert opinion. Table 5.2 compares advantages, limitations, and statistical considerations of each study design. Whether experimental or observational, these clinical studies are invaluable to modern medicine and affect patient care. In this section we discuss each clinical study design individually.
|1a||Systematic review (with homogeneity) of randomized controlled trials|
|1b||Individual randomized controlled trials|
|2a||Systematic review (with homogeneity) of cohort studies|
|2b||Individual cohort study|
|2c||“Outcomes research” and ecological studies|
|3a||Systematic review of case-control studies|
|3b||Individual case-control study|
|5||Expert opinion without explicit critical appraisal, or based on physiology, bench research, or “first principles”|
|Study Design||Advantages||Limitations||Statistical Analysis|
|Randomized controlled trials (RCTs)||Gold standard; prospective; multiple study groups; randomization; can determine causality or a treatment advantage; time consuming; expensive; internal validity||Selection bias; confounding factors; performance bias; detection bias (RCT can control for confounding factors and biases with double-blinding and randomization); limited external validity||Relative risk (RR); absolute or attributable risk (AR); confidence interval (CI); number needed to treat (NNT)|
|Cohort studies||Prospective; can assess many outcomes over time||Take many years to complete; expensive; selection bias; confounding factors; patients can be lost to follow-up; changing exposure profile||Incidence; RR; AR|
|Case-control studies||Efficient; inexpensive; can study multiple exposures; can study rare disease||Retrospective; recall bias; sampling bias; confounding factors; good external validity||Odds ratio (OR)|
|Cross-sectional studies||Can determine the frequency of disease or outcomes; highlights possible associations; efficient||Capture one moment in time; cannot determine incidence or causality; sampling bias; participation bias; recall bias||Prevalence|
|Case series||Descriptive of rare or new entities; hypothesis generating||Lack of comparison group; no clinical conclusions||No statistical analysis|
The term observational study describes a wide range of study designs. Observational studies can be classified as either analytical or descriptive. Analytical studies contain a control group for comparison, which includes nonrandomized prospective and retrospective cohort studies, case-control studies, and cross-sectional studies. Descriptive studies lack a control or comparison group and consist of case reports and case series. Observational studies play an important role in evidence-based medicine and are an important source of information when RCTs cannot be performed. The descriptive aspect of all observational studies is an invaluable attribute to clinical research, offering statistics about incidence, prevalence, and mortality rates of diseases in particular populations that provide clinicians with the context of a disease within a population. However, observational studies cannot determine causality, even if such associations appear highly plausible, and they are unfit to test hypotheses or answer etiologic questions. Despite these clear limitations, they still play an important role in generating new hypotheses to be tested by more formal, experimental study design.
Case reports and case series
The basic element or unit of observational studies, as described by Grimes and Schulz, is the case report ( ). Case reports and case series are the least methodologically sound of all observational study designs, but this does not mean that they are not valuable contributors to the literature. Case reports often describe rare or new entities in medicine and offer an opportunity to describe characteristics about a disease and allow for postulation of hypotheses of pathophysiology. It was through case reports of unusual infections and disturbed immunity that acquired immunodeficiency syndrome (AIDS) was first described ( ). Case reports can describe infrequent adverse events associated with medications and other treatments. The association of phocomelia with thalidomide, a drug used to treat pregnancy-associated nausea in the 1960s, was first published in the form of two case reports in 1962, resulting in the swift removal of the drug from the market ( ; ). Case reports can also describe the plausibility and early use of novel treatment methods or surgery. However, the scientific audience may use weaker objective landmarks, such as historical controls, when interpreting the meaning of noted observations. Case reports and case series should be considered no more than “the first step toward more sophisticated research” ( ).
Cross-sectional studies are prevalence studies that examine the relationship between exposure and the outcomes of interest in a defined population at a single point in time. Prevalence is defined as the number of cases in a population at a given time. It is a ratio, or proportion, of affected individuals in relation to a pooled population. Cross-sectional studies cannot determine incidence, or the number of new cases in a population over time. Rather, they are snapshots of a disease in a specific population. With reference to only a designated moment in time, these studies are not able to provide causal evidence. Case reports and cross-sectional studies can highlight possible associations that deserve additional evaluation, but they cannot determine causality.
One advantage of cross-sectional studies is efficiency . Because the study population is examined at one moment in time, conclusions can be generated at the same time as data collection. However, cross-sectional studies are plagued by uncertain causality. Population selection, participation bias, and recall bias are also possible limitations . If a tertiary care center or major referral center is conducting a research study and the study population is taken from patients that present to these facilities, they are unlikely to accurately represent the general population or even a more specific population of patients with a particular disease undergoing therapy within the community. In 1990, Gayle et al. published data regarding the prevalence of human immunodeficiency virus (HIV) among university students, examining more than 17,000 specimens from 19 universities ( ). Thirty students, or 0.2%, had detectable HIV antibodies, which was higher than prior studies within the public. The media sensationalized these data, reporting that more than 25,000 college students across the nation might be infected with HIV. However, patient selection in this study was poor because specimens collected for examination were not random, but rather represented those students who presented to student health whose condition warranted a blood sample. Researchers must be careful that patients selected for cross-sectional studies are representative of the study population desired.
Participation bias arises when selected subjects do not participate, such as in survey studies. If 100,000 surveys are sent out but only 10,000 are completed, the study is likely affected by participation bias. The minority of patients who respond may not be representative of the desired study population. Recall bias becomes an issue when self-reporting, as in survey studies, is a part of study design. Patients often report inaccurate information regarding certain exposures or events. Despite these limitations, well-conducted cross-sectional studies have their place in evidence-based medicine. They are simply prevalence studies, which allow us to determine frequencies of disease or outcomes within particular populations or groups.
The purpose of a case-control study is to determine whether an exposure is associated with an outcome (e.g., a disease of interest). Study participants are selected on the basis of having or not having the outcome of interest (the case group versus the control group, respectively). Case-control studies are always retrospective because they start with an outcome and then evaluate previous exposures or habits. Fig. 5.1 illustrates the differences in methodologies between case-control studies compared with cohort studies. Participants in the case group need to be carefully defined and should include all cases of new-onset disease drawn from an identifiable population. Controls should be sampled from that same population. The purpose of the control group is to allow for comparison in frequencies of exposures of a case group with the outcome of interest versus the control group without that outcome. In 1971, Herbst et al. published a case-control study of 8 cases and 32 matched controls identifying a strong association between vaginal adenocarcinoma and in utero exposure to diethylstilbestrol (DES) ( ). Although further cohort studies were required to confirm causality, this case-control study allowed for identification of a suspected culprit (exposure) for the development of vaginal adenocarcinoma in young women (outcome of interest).
Case-control studies offer the advantages of being relatively inexpensive, simple to conduct, and efficient . They are retrospective and thus do not require a prolonged period of data collection. They can be used to study multiple exposures as they relate to a particular outcome of interest, and they offer the ability to study rare diseases. The quality of the results from these studies is dependent on meticulous selection of cases, control groups, and data collection among the groups. There are also disadvantages to case-control studies, including the risk of recall and sampling biases. Recall bias is particularly problematic because cases and controls are likely to recount historical exposures differently . Patients or families coping with an illness may recall in great detail all events they believe might be associated with the illness, whereas healthy controls may not remember similar exposures. The potential problematic power of recall bias was highlighted in the case-control studies that suggested a correlation between talcum powder use and epithelial ovarian cancer. Although significant legal claims were made concerning this correlation, no significant relationship was ever documented in a prospective study. In the case-control studies, talcum powder use was a solely subjective measure that could not be tracked by any method other than patient report; thus the potential for recall bias was great ( ). Recall bias may also occur when information on the case group is obtained by chart review but information on the control group is obtained either by interview or mail survey. It may ultimately be impossible to eliminate recall bias.
Sample selection, or sampling bias, arises if the cases selected do not appropriately represent a particular disease or outcome . This is similar to sample selection issues in cross-sectional studies. Sampling bias can also occur within the control group if a representation of the desired general population either underestimates or overestimates exposures. Matching, or selecting control group participants similar in characteristics to the case group, helps to decrease bias in the selection of controls. Matching also helps to decrease possible confounding, which occurs when factors relate both to the measured outcome and measured exposures. As Stephen Gehlbach, a renowned epidemiologist, once wrote, “Confounding is the epidemiologist’s eternal triangle…Are we seeing cause and effect, or is a confounding factor exerting its unappreciated influence?” ( ). Controlling sample selection and confounding factors allows for external validity, or the generalizability of the study to the desired population. Researchers can use statistical techniques such as multivariate analysis and logistic regression to help eliminate confounders.
Because case-control studies are retrospective by design, they are also limited in their statistical analysis. They cannot provide data on incidence, relative risks, or attributable risks between an exposure and a measured outcome. Case-control study results are reported as odds ratios, which represent the odds that an individual affected by the specific disease being studied has been exposed to a particular risk factor (case group) divided by the odds that the control group has been exposed . It is loosely considered a reasonable estimate of relative risk, but it is not a true calculation of relative risk.
A cohort study selects a group of individuals at risk for an outcome of interest and divides them into subgroups based on the presence or absence of one or more exposures to be studied. Subgroups are then followed prospectively to evaluate the potential development of the outcome of interest. Cohort studies are unique in that the study participants select their exposure, as opposed to experimental research, in which the investigator selects, either knowingly or unknowingly, the exposure. Several types of comparisons can be made in a cohort study: An intervention or exposure can be compared with an alternate intervention or exposure, or it can be compared with no intervention. These comparisons can be made in general or restricted populations ( ).
An excellent example of a cohort study is the Nurses’ Health Study (NHS). In 1976, the NHS began surveying more than 120,000 female nurses regarding medical history, hormone use, and many other points of interest. The investigators have updated the study every 2 years by mailing out questionnaires to the original enrolled patients. In 2001, Grodstein et al. published data regarding the use of postmenopausal hormone therapy and the secondary prevention of coronary events ( ). The results demonstrated a short-term increased risk of coronary events in patients with a history of coronary disease but a decreased risk with long-term hormonal use.
A strength of cohort studies is the possibility of assessing many different outcomes over time . Also, although RCTs establish the outcomes of interest before study initiation, cohort study outcomes are more flexible and can be defined after the intervention. From a statistical standpoint, cohort studies also allow for the calculation of incidence rates, relative risks, and attributable risks. Incidence is defined as the number of new outcomes of interest in a given population over a set period. In cohort studies, incidence can be calculated for the population as a whole, but it is most often calculated for populations with and without an identifiable risk factor. Relative risk (RR), or risk ratio, can be calculated from these incidence rates. RR should be thought of as a simple ratio of the probability of an outcome (disease) occurring within an exposed population compared with a nonexposed population . It is calculated as the ratio of the incidence in a population exposed to the risk factor over the incidence in the unexposed population. Attributable risk (AR), or absolute risk, represents the absolute additional risk in the exposed population over what may be considered the baseline occurrence in the population. It is determined by calculating the difference between the incidence in the exposed population and the incidence of the unexposed population.
Cohort studies also have disadvantages: They are often time consuming, take many years to complete, and can become costly . The NHS is a clear example of the time and money it takes to complete a prospective cohort study. As with all analytic-observational studies, subject selection is important to control for selection bias and confounding factors. Matching of patients in control and study groups helps address these issues. To the extent that information is collected about known or suspected confounding factors, it is also possible to control for their effect in the statistical analysis using techniques such as regression and stratification ( ). Adjustment techniques can work only for confounding variables that an investigator knows about and measures. Participants may also be lost to follow-up over the often years-long courses of these studies. Investigators must be diligent in keeping accurate follow-up records of each subject. In the same manner, researchers should be aware that patient habits may change over time, thus changing their exposure risks.
Experimental studies: Randomized controlled trials
RCTs are considered the gold standard of clinical study design . Within the constructs of epidemiologic studies, RCT is another type of analytical study. In cohort studies the patient controls the exposure to a factor of interest, whereas in RCTs the clinical investigator controls exposure to the factor of interest. RCTs are designed to establish evidence of causal associations. They are characterized by the prospective assignment of study participants to a study group (who receive the factor of interest, typically a new treatment) or a control group (placebo, no treatment, or standard care). Although there are often only two study arms or groups within an RCT, investigators can develop designs involving multiple study groups. These groups are followed over time to evaluate for differences in outcomes. Outcomes of interest may include prevention or cure of a disease, reduction in severity of a condition, or differences in costs, quality of life, or side effects between treatments. An example is the Women’s Health Initiative already mentioned, which examined the use of HRT and the prevention of coronary heart disease, osteoporosis, and other conditions. The trial did not demonstrate a cardioprotective effect of HRT but instead showed an increased risk of coronary heart disease. Although it did demonstrate protection against osteoporosis and colon cancer, it showed an increased risk of venous thromboembolism, stroke, and breast cancer ( ). Thus the results of this prospective randomized trial directly contradicted findings of previous observational studies.
RCTs are considered the gold standard because of several key design features. Perhaps the most important of these features is randomization , a process that eliminates selection bias and allows for better control of known and unknown confounding factors. Confounding factors can also be controlled by strict eligibility criteria, eliminating possible interference from any peripheral contributing factors. This technique is also called internal validity, or the ability to control for confounding factors to demonstrate a true causal association. An additional feature of most RCTs is blinding , which helps decrease both performance and detection bias. Performance bias is encountered when systemic differences exist in the care delivered to subjects. In other words, performance bias occurs when a patient receives less or more therapy based on knowing what particular group or treatment a patient is randomized to on trial. Detection bias occurs when patients are evaluated more intensely as a result of being in the study group of interest. The double-blinded RCT is a superior study design because the blinding of both subjects and investigators is preferred to control for any performance or detection bias while also controlling for selection bias and confounding factors.
Despite the theoretical design superiority of the RCT approach, these studies are not without limitations and provide the best evidence only if the study has been thoughtfully designed, implemented, analyzed, and reported. Both ethical and practical considerations may limit the use of RCT to answer clinical questions. It is clearly unethical to expose patients to potential disease-causing factors just to learn about their negative effects on a particular outcome of interest. Additional ethical concerns arise when designing treatment groups to be studied. A placebo control group is often most efficient to study the effect of a given treatment. However, if an effective treatment already exists, it is not ethical to use a placebo control group. The concept of primum non nocere, or “first, do no harm,” also applies in clinical research, and harm can be caused by denying patients a known, effective treatment. If a condition is mild, the treatment period is brief, or effective treatment is not generally available, most investigators believe that a placebo control group is ethical. Most often in RCTs study groups will have similar, or at least balanced, benefits and harms. This leaves the researchers to instead investigate what treatment is preferred and if there is an advantage of one treatment over another. This uncertainty can be defined as therapeutic equipoise, or the general uncertainty of the benefits and harms of competing treatments. The assumption of therapeutic equipoise underlies many RCTs, providing patients with an equal chance to undergo at least standard-of-care treatment versus a possible improved treatment plan. Thus RCTs allow an unbiased assessment to assist in determining a preferred treatment. It should be noted that RCTs are ideal to study outcomes over short periods; RCTs intended to study long-term or rare outcomes are time consuming, cost prohibitive, and much more difficult to complete. As already discussed, cohort studies may be more appropriate to evaluate rare outcomes. Unfortunately, for many clinical questions, the time, effort, and expense involved in carrying out an RCT become prohibitive.
One further weakness of RCTs is the possibility that subjects and controls may be special populations whose results are not generalizable to the public, or even a specific subset of the public. In 1971, Cochrane noted this issue, stating “Between measurements based on RCTs and benefit…in the community there is a gulf which has been much under-estimated” ( ). Although the RCT’s strength of design is a strong internal validity, it often lacks strong external validity. External validity is the ability of a result to be generalizable in the real world. Often the more complex the study protocol, the greater the difference between RCT results and general clinical outcomes. RCTs also often use surrogate markers to substitute for clinical outcomes. A surrogate marker is defined as “an outcome measure that substitutes for a clinical event of true importance…an intermediate measure…commonly laboratory measurements or imaging studies thought to be involved in the causal pathway to a clinical event of interest” ( ). An ideal surrogate marker is a measurable event that is necessary along the pathway to the clinical endpoint. For example, Skaznik-Wikiel et al. demonstrated in a retrospective analysis of 124 women that normalization of CA-125 levels in ovarian cancer after three cycles of chemotherapy was associated with an improved overall survival ( ). However, this was a retrospective analysis and there are no studies demonstrating an association between the response of CA-125 and overall survival. Researchers must be careful when using surrogate markers because they may not always equate with the disease process being assessed. For example, a medication that has the effect of lowering cholesterol does not necessarily prevent heart attacks; lipid levels therefore may be a flimsy surrogate marker if myocardial infarction is the actual clinical endpoint of interest. When interpreting randomized trials, surrogate markers must be used with caution. Grimes and Schulz have emphasized that surrogate markers should, among other characteristics, have similar confounders and influences, and they should show a near identical response to a treatment as the desired clinical endpoint. As an example they cite fluoride treatments, which improve bone mineral density (a surrogate marker) but increase fracture risk, which is the true clinical endpoint of interest. Occasionally authors use a combined outcome, which includes a surrogate marker and a valid clinical outcome. This combined outcome should be interpreted cautiously because the relative effect of treatments on the various components remains unknown.
Researchers may also report secondary outcomes, or subgroup analyses, within an RCT. These outcomes may or may not have similar validity as the set primary outcomes. RCTs are designed to test hypotheses on primary outcomes, controlling for confounding variables that affect the primary outcome. Often, secondary outcomes are not considered in the study design, and thus the study does not control for confounding factors affecting secondary outcomes. A subgroup exploratory analysis was conducted for ICON7, a large RCT that evaluated the addition of bevacizumab, a vascular endothelial growth factor inhibitor, to the traditional adjuvant treatment of ovarian cancer. Although the trial demonstrated an improvement in progression-free survival, there was no significant increase in overall survival for patients in the bevacizumab arm. However, a subgroup analysis performed during evaluation of the final survival results indicated a significant improvement in overall survival among subjects with the worst prognoses ( ). Although such results may appear promising, it must be remembered that they are not the result of the initial study framework and therefore must be interpreted with caution in clinical settings unless the original study design included internal validity regarding these subgroups. This concern is particularly pertinent when secondary outcomes and subgroup analyses are incorporated into meta-analyses, which will be addressed later.
Epidemiologic studies use a quantitative approach to describe both exposures and outcomes. Case-control studies, cohort studies, and RCTs all attempt to present their results as a single number, usually referred to as the point estimate, that quantifies the relationship between the exposure and the outcome. This number is an estimate of the truth rather than the truth itself because each study, however large, includes only a sample of all the people who are affected by the exposure-outcome relationship. The point estimate expresses the strength of the association between exposure and outcome. In an RCT or a cohort study the point estimate is the RR. Risk in the study subjects is the number of cases or outcomes that occur over time. The RR is simply the risk of disease (or other outcome) among the exposed or treated subjects divided by the risk in the unexposed subjects. As already discussed, case-control studies do not measure risk directly but instead report an odds ratio (OR), which is generally considered a rough estimate of RR.
Interpretations of RR and OR are similar. As the ratio approaches 1.0, there is little to no association. For both RRs and ORs, the further away the value is from 1.0, the stronger the relationship between the exposure and the outcome. Values less than 1.0 represent a negative association, or a decreased risk of an exposure-outcome relationship. Values greater than 1.0 represent a positive association, or a greater risk of an exposure-outcome relationship. A strong positive RR may be greater than 2.0 and a strong negative RR may be less than 0.5. Weaker associations can often be explained by confounding variables. However, enthusiastic investigators, worried patients, or sensationalistic media often overinterpret weak associations. One must also remember that an OR or RR is based on results of a specific study group and may not represent the general population. This possible difference is understood as a sampling error, or the possible difference in statistics of a study compared with the actual unknown statistics within a population. As a result, investigators will use a confidence interval (CI) to express the precision of a point estimate. Researchers do not rely solely on the traditional P value to determine whether a study’s findings are due to a chance occurrence, especially when discussing ratios and risks . Confidence intervals represent with high probability (95%) the values within which the actual population point estimate would fall. A narrow CI indicates strong precision and is easier to achieve with larger studies or studies with very little variance. A wide CI indicates poor precision and may be representative of an underpowered study or considerable variance among results. It must be noted that CIs do not address uncertainty in the results caused by confounding factors or poor study quality. Furthermore, a wide CI is not definitive proof of a lack of association between the exposure and the outcome. CI is simply a measure of precision. Even an imprecise point estimate remains the best explanation of the relationship until a larger or better study is performed. CIs are most often graphically represented as a straight line around a point estimate to show the width of their range. However, a more accurate visualization of the concept is a bell-shaped curve centered around the point estimate value. Fig. 5.2 demonstrates examples of point estimates and their respective confidence intervals.