Study designs are often mischaracterized in the obstetrics literature; in particular, the designation of studies as retrospective (historical) cohorts is frequently in error to describe studies that are prospective cohorts. This is especially true for studies based on electronic health records, which often should be properly considered as prospective cohorts. Epidemiologic study designs were developed in earlier eras of research and healthcare when researchers directly contacted study participants or relied on data from paper medical records. Accordingly, standard epidemiologic study design definitions are difficult to apply to digitized data, which have become common in the modern era of healthcare and computing. In this article, we briefly review the characteristics of the 3 main types of cohort studies. Afterward, we build on existing definitions by proposing several subdesignations of prospective cohort studies that we believe will reduce the confusion in terminology. We provide illustrative examples from obstetrics to concretely demonstrate connections and distinctions among study designs. First, a prospective cohort study can be “active” (participants are deliberately and explicitly enrolled in a prospective research study) or “passive” (participants are followed up in real time for some nonresearch activity, such as clinical care or quality improvement). An active prospective cohort study never stops being a prospective cohort study; however, when reused to answer a new, secondary question, we propose that this should be called a “reused (active) prospective cohort.” The de novo cohort study that answered the original question should be considered an “intended (active) prospective cohort.” Lastly, when a randomized controlled trial is reused to study some new questions where the randomization variable is not under study, this is also a subtype of a prospective cohort study, a “repurposed randomized controlled trial.” The use of more detailed descriptors to describe prospective cohort studies will enable more accurate identification of this study design going forward. It is likely that further refinements will be needed in the future, given the ongoing evolution of how we engage with patients or participants and how data are collected, stored, and linked.
Introduction
In our previous publication, we noted that although the distinctions among the main types of cohort studies are familiar and well established ( Table 1 ), study designs are often mischaracterized in the obstetrics literature. , In particular, the authors’ designation of their study as a retrospective cohort is frequently in error. Furthermore, we noted that the term “retrospective” lacks a clear or consensus definition, , , , and we discouraged its use. In addition, we stated that standard epidemiologic definitions are especially difficult to apply to the proliferation of data sources, such as administrative databases and electronic medical records, that have become common during the modern era of healthcare and computing. Epidemiologic study designs were developed in earlier eras of research and healthcare when researchers directly contacted study participants in person or via telephone or mail or relied on data from paper medical records. , Confusion about study design is particularly acute because of the explosion of secondary data analysis and data reuse.
Variable | Prospective cohort | Historical (retrospective) cohort | Ambidirectional cohort |
---|---|---|---|
Participant eligibility | Based on exposure, without regard to outcome | ||
Participants at risk of outcome at study entry | Yes | Yes | Yes |
Participant follow-up occurs in real time (ie, concurrently with the study conduct) | Yes | No | Yes |
Follow-up occurred in the past | No | Yes | Yes |
Exposure recorded before study begins | Not necessarily | Yes | Yes |
Exposure likely to precede outcome occurrence | Yes | Yes | Yes |
Can calculate incidence rate of outcome | Yes | Yes | Yes |
In this article, we will briefly review the characteristics of the 3 main types of cohort studies. Afterward, we propose several subdesignations of prospective cohort studies that we believe will reduce the confusion in terminology. In theory, there already exists epidemiologic terminology to refer to all of these novel data sources and research applications. In practice, study design classifications have become very confusing, , and we proposed that adding subcategories of existing study design categories will help aid in the important process of accurately identifying study designs used in research. To prevent further confusion resulting from new terminology, we have grounded each term in existing study design concepts and nomenclature, and we used several specific criteria to define each and distinguish among them. Throughout this article, we provide illustrative examples from obstetrics to concretely demonstrate connections among and distinctions among the study designs.
Major types of cohort studies
As we noted, all cohort studies share the following characteristics :
- 1.
Participants enter the cohort independent of their outcome, but they are at risk of experiencing the outcome and are classified according to their exposure of interest.
- 2.
They are followed up over time.
- 3.
The purpose of the follow-up is to determine who develops incident (new-onset) outcomes.
As described in Table 1 , there are 3 major types of cohort studies: prospective, historical (a term we prefer to “retrospective” ), and ambidirectional.
The primary difference between the 3 major types, lies in the timing of follow-up. In a prospective cohort, follow-up occurs in “real time” ( Table 1 ). In particular, characteristics are identified and recorded concurrently when they occur. Historical and ambidirectional cohorts are best described by examples. Suppose an investigator wants to study whether individuals who developed preeclampsia during pregnancy are at increased risk of cardiovascular disease when they reach middle age. A prospective cohort would require that people with and without preeclampsia be followed actively for ≥25 years to determine who develops cardiovascular disease.
In contrast, a historical cohort would require the investigators to begin by reviewing existing obstetrical records from ≥25 years ago to determine who did and did not develop preeclampsia. Furthermore, the investigators need to locate the women in the present day and, through interviews or through obtaining medical record releases, reconstruct 25 years of follow-up time to determine who developed cardiovascular disease (and also, as part of tracing, determine who has already died of cardiovascular disease, eg, by consulting death records). In this design, there was “follow-up,” but it occurred in the past and it was not active (ie, not concurrent with and under the observation of the present-day researchers) ( Table 1 ) and therefore had to be reconstructed.
To continue this example, an ambidirectional cohort would describe the design where an investigator used both historic and prospective follow-up to establish a cohort. Specifically, after the investigator identified participants in the present day and reconstructed historic follow-up time (eg, through interviews or medical record release), she would continue to observe them forward in real time (eg, through annual e-mail, telephone surveys, or further record releases) to identify additional cases of cardiovascular disease. Thus, the ambidirectional design uses follow-up both in the past (passively, making use of historical records) and in real time (actively, under the observation of the researchers). Interested readers can find additional details, and comparisons of the relative advantages of each design, in our previous work.
Prospective cohort subtypes: active and passive
The aforementioned terminology and concepts are still applicable and do not need fundamental alteration, but their application to many modern data sources often leads to confusion. This is particularly so for studies that rely on preexisting digitized data and especially data that can be linked to individuals. By expanding the taxonomy of cohort designs, we hope to provide a language that clarifies to the reader exactly what was done in the study and, therefore, what the strengths and weaknesses inherent in the design might be.
Building on our previous work and other foundational discussions of epidemiology study design, we have proposed subcategories of prospective cohort studies. This is motivated by the observation that many studies using digitized and/or reused records that are incorrectly identified as a retrospective cohort are in fact “prospective” cohort studies. We have built on existing definitions and have proposed 2 levels at which prospective cohort studies can be subdivided ( Supplemental Figure ). The subdivisions are characterized in Table 2 . First, a prospective cohort study can be “active” or “passive .” In an “active prospective cohort,” participants are deliberately and explicitly enrolled in a prospective research study that includes a formal study protocol, an institutional review board approval, a Health Insurance Portability and Accountability Act authorization, and a collection of research information beyond what is typically clinically indicated ( Table 2 , top panel). This is a canonical and familiar prospective cohort study: these participants interact with research personnel repeatedly over time and know that they are involved in research.