Clinical Research Methodology and Statistics

Clinical Research Methodology and Statistics

Donna Mazloomdoost


Clinicians often express discomfort with statistical methods, even those actively engaged in research.1 While they believe research is important, few practitioners agree that more emphasis should be placed on research education during residency training.2 Not surprisingly, skills learned by clinicians diminish as the years lapse,3 feasibly as a result of time constraints and workload pressures.4 Nonetheless, an understanding of research methodologies can improve critical appraisal and interpretation of research and ultimately aid in determining if the results should be implemented into clinical practice.


Human subjects research is critical to advancing the understanding of human pathologies and enhancing medical care, yet subjects involved are asked to incur potential risk from which society may ultimately benefit.5 Ethical principles are, therefore, imperative to protect the rights of research participants.

Codes and Regulations

Prior to formalized research, medical treatments were mostly experimental as large data rarely existed.5 Much of medical ethics at this point relied on the Hippocratic oath,6 and although most investigators likely had good intentions, concerns over deception, self-interests, or exploitation eventually led to a need for regulations. Some regulations were in response to specific situations, whereas others were developed as new information was obtained.5 Among the first of these regulations was the Food and Drugs Act, also known as the Wiley Act, signed by President Roosevelt in 1906. This act paved the way for the regulatory and first consumer protection agency, the U.S. Food and Drug Administration, and prohibited transport or sale of altered or mislabeled food or drug products between states and required food and drugs to have labels of active ingredients.7 Despite the attempted protections laid forth in the act, in 1937, a Tennessee drug company marketed an untested new sulfa drug geared toward children. The drug was found to contain a toxic analogue of antifreeze and resulted in over 100 deaths, many of whom were children. This prompted Congress to pass the Food, Drug, and Cosmetic Act in 1938, which mandated premarket approval of all new drugs.8

World War II ushered in a new era of ethical concerns as research saw tremendous growth. The Nuremberg trials set the foundation for modern-day ethical protection of human subjects.9,10 The trials held in Nuremberg, Germany, accused Nazi physicians of conducting torturous and fatal experiments in concentration camp prisoners and led to the Nuremberg Code of 1947. The first principle of the Nuremberg Code asserts that the informed consent of the competent human subject is essential.9 While the Nuremberg Code was widely accepted and resulted in guidelines from government agencies and the National Institutes of Health (NIH) Clinical Center regarding informed consent, some scientists continued to ignore these principles.11

In 1966, Henry Beecher, a respected Harvard anesthesiologist and researcher, published a groundbreaking article in The New England Journal of Medicine (NEJM), highlighting ethical problems in a preliminary sample of 17 studies, which ultimately expanded to 50 manuscripts.12 Beecher called out reputable institutions such as Harvard and the NIH Clinical Center as well as journals such as NEJM and the Journal of the American Medical Association for their roles or complacency in these ethical violations. He described 22 examples of breaches including withholding of accepted disease treatments, active administration of viral infections to cause disease, and dispensing of medications with known serious adverse events. Of particular concern was the inclusion of vulnerable populations such as institutionalized mentally “defective” individuals, infants, and inmates. Beecher argued subjects in some cases had received harm without any clear benefit,
whereas others may not have been appropriately counseled on the risks.

In 1972, the nation was shocked by the revelation of the U.S. Public Health Service Tuskegee syphilis study.13,14 The study initiated in 1932 by the Public Health Service with the intent to study the natural history of syphilis, enrolling 399 black men from Macon County, Alabama, at the Tuskegee Institute with syphilis along with 201 men who were disease free. The men were promised medical examinations, meals, and burial insurance free of charge as an incentive to participate. The study was scheduled to last 6 months but continued for 40 years, while hundreds of black men had intervention deliberately withheld even once penicillin became the accepted treatment. An advisory panel was convened from the public outcry and found that the men had been misled and were not properly provided with informed consent. Publicization of Tuskegee along with Beecher’s advocacy ultimately led to enough public scrutiny for Congress to pass the 1974 National Research Act.5,14

Belmont Report

The 1974 Act convened the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research, charged to identify basic ethical principles and develop guidelines for the conduct of research involving human subjects.15 The Belmont Report described the following basic ethical principles necessary in research: (1) respect for persons, (2) beneficence, and (3) justice. These principles described the need to (1) respect the autonomous decision making of the participant, (2) protect them from intentional or unnecessary harm, and (3) select research subjects in a representative and just manner. The Commission then recommended that the application of these general principles to the conduct of research would require (1) informed consent, (2) risk/benefit assessment, and (3) the selection of subjects of research.15 Influenced by the Belmont Report, in 1991, the “Common Rule” was published and required research funded by 17 federal agencies to comply with this Federal Policy for the Protection of Human Subjects which outlines provisions for human subjects protections, institutional review boards (IRBs), informed consent, and assurances of compliance.16 Further protection developed to include traditionally neglected populations. The NIH now requires those receiving funding to include underrepresented populations such as women, minorities, and children when eligible or provide justification for their exclusion.17,18 Although the Common Rule is enforced with regard to research involving federal funding through the U.S. Department of Health and Human Services (HHS), most research is impacted because the Common Rule applies to any research involving human subjects conducted at institutions receiving direct or indirect federal funding.19 Therefore, investigators are often bound to the Common Rule because their institution might receive applicable funding, and everyone at that institution must comply.

Informed Consent

Research is ever-evolving, and as new technologies and applications are introduced, a universal framework is necessary to guide investigators, IRBs, sponsors, and other involved personnel to determine whether a research protocol is ethically compliant.20 Seven requirements have been suggested to provide this systematic framework: social or scientific value, scientific validity, fair selection of subjects, favorable risk-to-benefit ratio, independent review, informed consent, and respect for eligible/enrolled participants.5,20 Of these seven principles, informed consent has potentially received the most attention, as it ensures that competent subjects or their decision makers voluntarily agree without coercion to participate only after being fully informed of the purpose, methods, risks, benefits, and alternatives of the research.20 Unfortunately, the importance of the informed consent process was observed and verified through the historical scandals that led to the ethical discussions.

Informed consent should involve appropriate disclosure of the risks and benefits, allow for adequate comprehension of what the trial entails, and permit uncoerced enrollment.21 The informed consent should also provide explanation of the randomization and potential assignment to placebo groups, though subjects show lower comprehension of this.21,22 A particular challenge is verifying the subject’s understanding, particularly when the participant is a minor or incapacitated. The most effective way to educate a participant appears to be a direct conversation explaining the study by a member of the research team.23 Despite the burden, especially in an ever-increasing era of time demand and complex research methods, when possible, it is imperative that this be executed.22 To decrease the burden, addition of a video explaining the study protocol appears to enhance patient understanding,24 although alternative methods to the traditional consent process remain controversial.22,23 Incentives or payments have always been debated to undermine the voluntary nature of consent due to concerns that payments unnecessarily coerce subjects into participation. However, most ethicists agree that because incentives do not involve threats, a method considered to coerce, payments unlikely coerce subjects into participation and can be used when appropriate for recruitment or retention of participants.25 Consent is traditionally obtained by a written signature documenting the subject’s willingness to participate.5

Institutional Review Boards

IRBs were established to assure that the well-being and rights of human subjects are considered in clinical trials.26 While nonhuman research is exempt from IRB review, the use of animals in federally funded research is subject to the Animal Welfare Act (AWA), and NIH-funded research is subject to the U.S. Public Health Service’s Policy on Humane Care and Use of Laboratory Animals.27,28 The AWA seeks to ensure the humane treatment of animals in the research setting, recommending the establishment of an oversight committee, the Institutional Animal Care and Use Committee.28 The AWA describes “the 3Rs,” reduction, refinement, and replacement, which encourages reducing the number of animals used, minimizing suffering, and replacing them with technologies or other models when appropriate.29

As previously discussed, the Common Rule dictates that, with some exception, federally funded clinical trials involving human subjects must receive IRB approval.30 Specifically, six categories were designated exempt from IRB approval as they were not deemed to expose subjects to physical, social, or psychological harm beyond what exists in daily life (Table 7.1).26 Researchers, however, should consult with their institutions regarding institution-specific guidelines, particularly when not funded by applicable federal agencies.31

Research involving minimal risk to subjects may be eligible for expedited review. Institutions who offer this option allow the IRB chair or a designee to approve the study on behalf of the entire group, allowing for a more prompt response than a fully convened IRB.32

Federal regulations require that an IRB have at least five members, with at least one individual whose expertise is in the scientific area of interest, someone with expertise in nonscientific areas, and one individual not affiliated with the institution. Obtaining IRB approval has become standard practice, and investigators can employ for-profit organizations, known as independent or commercial IRBs, if an institutionally provided IRB is not available.19

Multicenter trials have long struggled with the inefficiency of multiple IRB submissions and approvals, delaying research and potentially increasing costs, without any evidence that the duplicative approvals enhance human subject protection.33,34 Therefore, beginning in January 2020, the Common Rule required relative federally funded multisite trials use a single IRB for approval, which hopes to improve the efficiency of trials and human subject protection.34

Data and Safety Monitoring

In addition to informed consent, randomized controlled trials (RCTs) raise further ethical considerations compared with alternative trials. A hallmark to the design of clinical trials is equipoise, or the uncertainty behind the intervention being examined.35 Equipoise requires that, in the design of a clinical trial, the investigators should be certain that there is no evidence that either treatment provided to the subjects is more valuable such that a known beneficial therapeutic would be withheld. Subjects must not knowingly receive an inferior treatment or the study becomes invalid.20 Equipoise influences another ethical dilemma in RCTs: the selection of controls. Choosing an appropriate control is paramount to the validity and generalizability of the study (to be discussed later); however, electing to compare a treatment to a placebo requires justification to withhold a potentially beneficial treatment, or equipoise.5

Because researchers are invested in their studies, an unobjective and independent safety review is necessary.36 In the United States, data and safety monitoring boards (DSMBs) or safety monitors are often used and first surfaced in the 1950s under different titles.36,37 Early experiences with safety monitoring occurred during the first chronic disease trials in the 1960s-1970s.36,38 Though all clinical trials need safety oversight, not all need an independent DSMB, which can be confusing and frustrating to investigators. The World Health Organization has suggested a DSMB be used in eight areas of study involving: (1) mortality, (2) severe morbidity, (3) high-risk interventions, (4) novel interventions with limited safety, (5) complex design or data accrual, (6) potential early termination after an interim analysis, (7) emergency situations, and (8) vulnerable populations.39 NIH now requires all its institutes and centers to designate a system to oversee and monitor funded clinical trials to ensure the safety of the participants, and multicenter trials require an independent DSMB composed of varying expertise to evaluate the safety of ongoing clinical trials.40 The DSMB should evaluate the protocol prior to implementation and also review data while the trial is ongoing.38 Importantly, the DSMB has the authority to stop a trial if there are safety concerns, clear efficacy with regard to a treatment, or futility.37 Perhaps one of the bestknown examples of this was from the Women’s Health Initiative (WHI).

The WHI trial was an NIH-funded study designed in 1991-1992 to assess benefits and risks of postmenopausal hormone replacement therapy on multiple outcomes, a complicated approach which more closely resembled real-world scenarios where such outcomes can occur concurrently.41 Women were randomized to one of two treatment groups and compared to controls receiving placebo. In the treatment groups, participants with a uterus received a combination estrogen/progesterone pill, whereas posthysterectomy subjects received oral estrogen alone. Over 164,500 women were projected to be enrolled in over 40 clinical sites, perhaps the most considerable and challenging trial to ever be undertaken.42 Due to the preventive nature of the study design, the investigators placed more emphasis on global health, and therefore, traditional methods to determine needs to cease the study could not apply. They held discussions a priori with the DSMB to determine scenarios under which members would vote to continue versus stop the trial, and from this, they published that a mixed approach to finding statistical guidelines and would likely best represent appropriate monitoring of the trial.43 One of the unanimous determinations from the DSMB members was the occurrence of breast cancer, and on May 31, 2002, the DSMB recommended early cessation of the combined treatment arm trial after only a mean 5.2 years of follow-up because a predesignated boundary regarding breast cancer was exceeded in the treatment group.44 The estrogen-alone arm was continued. In February 2004, the NIH decided to halt the trial on the determination that the estrogen-alone arm after 7 years of follow-up had failed to demonstrate cardioprotection, but that stroke risk was elevated similarly to the combination group of the alternative arm. Of note, the breast cancer risk remained similar between the estrogen-alone and placebo groups.45 A statement was issued by the National Heart, Lung, and Blood Institute director of NIH determining that enough data had been collected for analysis, and interim assessments deemed enough risks to halt subjecting healthy women
to a preventative medication that was not showing prevention benefits.46 The trial challenged traditional epidemiologic study and uncovered approaches that have and will continue to benefit future studies.41


A fundamental and perhaps difficult step in research development can be the selection of a question and the subsequent appropriate study design. A proper understanding of the state of the science is imperative to pursuing a relevant study question. Often, the first step involves a literature search to evaluate what has already been discovered and understand the gaps about the topic of interest. Once the investigator has determined the question to be answered, the study design can be developed.

Study Designs

Clinical research is divided into two categories: observational versus experimental.47 The determination relates to whether exposure was assigned. Observational studies are analytical or descriptive, whereas experimental studies may be randomized or nonrandomized (Fig. 7.1). Understanding study designs is critical to developing an appropriate protocol. The strength of a study is determined by its design, and guidelines impacting clinical practice are often based on the strength of the evidence obtained from studies. The U.S. Preventive Services Task Force (USPSTF) is an example of an agency which considers an evidence-based approach to clinical practice guidelines.48 The USPSTF refers to an accepted hierarchy of study quality (Table 7.2), and practice recommendation are also graded with regard to strength (Table 7.3).47

Observational Trials


While an RCT is considered the gold standard, it is not always feasible or even appropriate. Observational studies are acceptable when an intervention or exposure cannot be assigned, whether this is secondary to safety or practicality. Observational studies are further divided into descriptive or analytical (see Fig. 7.1). Descriptive studies do not include a comparison group and are commonly known as case reports, case series, or case studies. They can be used to describe the frequency, natural history, or possible factors of a condition.47 Despite occupying the lowest tier of the study quality hierarchy, descriptive studies can be critical in announcing or understanding emerging medical issues.49,50 Nonetheless, these studies may be minimized and even discouraged, but with quality review, publication can be invaluable to medical advancement.51 The discovery of the HIV/AIDS is a well-known example52 followed most recently by the 2020 pandemic caused by the novel coronavirus.53 Although case reports or series can alert the medical community to new conditions, one must use caution changing clinical practice because of these discoveries.


Analytical studies, alternatively, compare groups, which can allow for better understanding of associations. They are categorized as cohort studies, case-control studies, and cross-sectional studies (see Fig. 7.1). The classifications are based on how the groups are selected.


A cohort study selects subjects on the basis of exposure and often follows them prospectively until the outcome is observed while comparing them to individuals not exposed.47 A retrospective cohort study examines exposures and outcomes that have already occurred. It involves reviewing exposure information that was collected at a time in the past and examining for the outcome at the time of the study. For example, records documenting head injury from World War II veterans were reviewed to identify those with and without this exposure, and the sample was evaluated in the 1990s for dementia or Alzheimer disease.54 The authors found that moderate and severe head injuries may be associated with the development of dementia or Alzheimer disease later in life. A common retrospective cohort within pelvic floor research is a review of charts from women who have undergone surgery and examining for risk factors potentially associated with a particular outcome.

General strengths of cohort studies include a better appreciation of exposure preceding the outcome, minimized recall bias (to be discussed later), and estimations of populations at risk.55 Cohort studies allow for the calculation of incidence rates, relative risks (RRs), and attributable risks.47 However, they cannot infer causality, may be lengthy and/or expensive, and are not appropriate for rare events.47,55 Additionally, retrospective cohort studies may miss cases that were either of short duration or fatal.56 Nonetheless, they provide invaluable information on potential disease development.

A prominent example of an effective cohort is the Nurses’ Health study. This ambitious study with continued NIH funding launched in 1976, enrolling 121,700 married registered nurses aged 30 to 55 years, and remains one of the largest and longest running studies on women’s health, boasting a continued 90% response rate of the original cohort after accounting for deaths.57 The study’s original focus was on contraceptive methods, smoking, cancer, and heart disease but has since expanded and contributed vital information about lifestyle factors, behaviors, personal characteristics, and more than 30 diseases.58

Case control

When a study begins with an outcome and looks back for an exposure, it is called a case-control study. These studies are often retrospective because the disease or condition of interest is identified first, and investigators evaluate for potential exposures prior to diagnosis that may have been associated with the development of the outcome. Case-control studies are advantageous when evaluating rare outcomes or those that have a prolonged period of development.47 They are also typically shorter in duration and less expensive to conduct than cohort studies. However, one significant limitation is recall bias, a systematic error where participants do not recall memories accurately, with better recollection of exposures among cases compared to controls, potentially due to impaired memory, confusion, or even a desire to cooperate with an investigator.47,59 Recall bias will be discussed further in the bias section.

Only gold members can continue reading. Log In or Register to continue

May 1, 2023 | Posted by in GYNECOLOGY | Comments Off on Clinical Research Methodology and Statistics
Premium Wordpress Themes by UFO Themes