Epidemiologic and Research Methods in Fetal Medicine





Key Points





  • The randomised controlled trial (RCT) is the least biased method of assessing the effectiveness of clinical interventions. It has been little used in fetal medicine, but reports are increasing.



  • The details of good clinical trial methodology are now well established. It is critically important to avoid selection bias by ensuring allocation concealment at randomisation.



  • Research synthesis allows the reader to review the totality of relevant evidence on a particular topic. Systematic reviews can be performed of RCTs (‘reviews of effectiveness’), screening and diagnostic tests, or other types of scientific literature. Meta-analysis may or may not be a component of a systematic review.



  • The Cochrane Database of Systematic Reviews is the largest source of high-quality systematic reviews of health care interventions.



  • The likelihood ratio describes the usefulness of a screening or diagnostic test.



  • Routinely collected perinatal datasets can generate useful information and important hypotheses (e.g., the ‘Barker hypothesis’) as long as the quality of data is sound.





Introduction


Epidemiology–the science of the study of the distribution and causes of diseases in populations–has produced a rich set of tools that are being increasingly applied in clinical research. This growing specialty has led to birth of a subspecialty that is termed ‘clinical epidemiology’. This chapter explores these tools and the concepts that underpin them and illustrates their application with reference to diagnostic and screening tests and therapeutic interventions in fetal and perinatal medicine. Fetal medicine is itself a young speciality, and its short history and rapid progress have inevitably resulted in some errors and blind alleys. The methodological concepts presented here hopefully will help obstetricians caring for fetuses–‘maternal-fetal medicine’ specialists in North America and ‘fetal medicine’ specialists in Europe–learn from past mistakes and provide introduction to scientific research foundations crucial to ensure that the application of fetal medicine contributes more good than harm.


Care during pregnancy and childbirth has been among the vanguard areas of clinical activity in moving towards ‘evidence-based’ clinical practice. The basis of this process–the production of systematic reviews of scientifically rigorous studies–has been likened in scale and importance to the Human Genome Project. This chapter is organised under two themes. We begin with a description of epidemiologic methods necessary to understand and make meaningful interpretations of research findings in fetal medicine, and part two is devoted to principles and concepts in general research methods. Empirical applications of methods relating to fetal medicine are infused throughout to facilitate an easier grasp of the concepts.




Epidemiologic Study Designs


Epidemiologic study designs fall under two broad categories: experimental design and observational designs. Randomised controlled trials (RCTs) are experimental designs (discussed later). Observational designs can be classified as analytical or descriptive. The most common analytical study designs include prospective (including the longitudinal design), retrospective (including the case-control design) and cross-sectional study designs. Descriptive studies include meta-analysis (both aggregate and individual patient-level meta-analysis) and case series. Interested readers are referred to the large body of literature on the topic of epidemiological study designs.


Randomised Controlled Trials


Not all interventions can be evaluated by randomised trials; one cannot foresee, for example, randomised trials of fetal transfusion for severe fetal anaemia or of immediate versus delayed delivery for prolonged fetal bradycardia in labour. However, different methods of fetal transfusion or of techniques of caesarean delivery would be obvious candidates for further evaluation.


Randomisation


The RCT is a simple but powerful method of avoiding systematic errors, or bias, by ensuring that experimental (study) and control groups are comparable in all important respects other than in their exposure to the intervention being tested. By random allocation, the investigator accounts not only for known confounding variables but also for factors that are unknown but are also potentially important determinants of final outcome. Random allocation depends on allocation solely on the basis of chance–metaphorically, on the basis of the flip of a coin.


The essence of secure randomisation requires that those involved in the study cannot know in advance to which group a particular woman will be allocated (i.e., concealment of allocation). Thus the use of hospital case numbers, alternate days or date of birth will not adequately conceal the direction of allocation. This prevents clinicians having preconceptions about the effectiveness of the two treatment options from selectively enrolling patients based on the next treatment assignment. These methods of participant allocation to the study groups are sometimes called ‘quasi-random’ and with current concepts of good trial methodology should not be used.


Even apparently robust methods of random allocation, such as the commonly used sealed opaque envelope to be opened only after the woman has consented to entering the trial, have been known to be abused on occasion. The gold-standard methods, used now in large trials, include computerised online or web and telephone randomisation in which someone based at a remote site gives randomisation instructions only after basic descriptive data about the woman and confirmation of eligibility have been recorded. Electronic communication may be particularly difficult in parts of the developing world, and randomised trials may be particularly important in such settings because rates of both maternal and fetal mortality are high. The Collaborative Eclampsia Trial, which for the first time demonstrated the indisputable preeminence of magnesium sulphate as the anticonvulsant of choice for eclampsia, took place mainly in developing countries. This trial used identical boxes containing magnesium sulphate, diazepam or phenytoin, which were opened only when a woman had an eclamptic seizure. Increasingly, web-based randomisation procedures are used.


Explanatory Versus Pragmatic Trials


There are two types of randomised trials; both are valid, and the appropriate trial design depends on the underlying research question to be answered. The explanatory trial assesses efficacy, or the performance of the intervention under ideal circumstances; the pragmatic trial assesses effectiveness, or performance under what may be less than optimal, but real life, circumstances.


The Term Breech Trial was a pragmatic RCT that compared the outcome of term fetuses presenting in labour in the breech position after either a planned caesarean section or a planned vaginal delivery. Clinicians were required to consider themselves ‘skilled’ at vaginal breech delivery, with confirmation by their head of department. Although study sites were located worldwide, randomisation was controlled in Toronto, Canada, with a computerised system accessed by touch tone telephone. Because this was a pragmatic trial evaluating the safety of breech deliveries in real-world practice, 90% of women in the planned caesarean group delivered by caesarean section, and 57% in the vaginal delivery group delivered vaginally. The trial showed a considerable advantage to babies in the planned caesarean group (perinatal mortality or neonatal mortality or serious neonatal morbidity: 1.6% vs 5%; relative risk 0.33; 95% confidence interval, 0.19–0.56). There were no differences in maternal mortality or serious maternal morbidity, nor were there differences in neurodevelopmental delay among newborns followed until 2 years of age.


The Term Breech Trial has had a major impact on clinical practice in many countries but has also generated considerable controversy, mainly around the issue of generalisability of the findings (external vs internal validity). The true results obtained in one population (internal validity) may not necessarily be the same in another population (external validity). Can the results of a large trial performed in diverse settings with differing levels of facility and expertise be applied to other institutions and practices? This is a necessary consideration with any trial. Investigators of the Term Breech Trial were well aware of this.


Sample Size Calculations


All statistical tests are subject to forms of two errors. A type I error, denoted as α, occurs when the results of a trial suggest a difference when, in fact, none exists. In contrast, a type II error, denoted as β, occurs when the results do not suggest a difference, although one does, in fact, exist. The principal protection against both types of errors lies in planning, in advance, an adequate sample size predicated on knowledge of the baseline incidence of the primary outcome and a realistic judgment on what would prove to be a clinically useful change secondary to the new treatment. Another way to describe the importance of prespecified sample size calculations is to compare the clinical value of a study that has confirmed a predicted reduction in perinatal mortality after an intervention with a study that has observed a difference between two groups when looking at the data in retrospect. Clearly, the former should carry more weight.


An excellent online (free) resource to estimate sample size for common study designs including randomised trials, cohort studies, and case-control studies can be found at http://www.openepi.com/Menu/OE_Menu.htm . Although not ideal, underpowered trials may still be useful, and the results can be included in meta-analysis, as long as they are of sound methodology.


Data Monitoring


It is also a principle of good clinical trial design and execution to ensure that an independent panel of experts will have access to interim results to advise whether or not a trial should continue. A charter now exists to guide the workings of data monitoring committees. Advice may be given to abort a trial early if there is overwhelming evidence that either the treatment group or the control group is at significant advantage or disadvantage on the basis of treatment. Thus in both the Term Breech Trial and the Magpie Trial (magnesium sulphate vs placebo for preeclampsia prevention), recruitment was stopped earlier than planned on the recommendation of the Data Monitoring Committees because of large, clinically and statistically significant differences in the primary outcomes between the randomised groups.


Data monitoring committees may also have to decide if further recruitment to a trial can be ethically justified in a pursuit of prespecified sample size, in which the tested intervention is clearly ineffective. Such a trial may be abandoned on grounds of futility. It is bad practice for the researchers themselves to continually monitor the results because of the possibility of stopping a trial after a ‘statistically significant’ result is obtained because this is likely to produce a type I error.




Evaluation of Screening and Diagnostic Tests


Before delving into an understanding of the evaluation of tests, it is imperative to highlight a clear distinction between screening and diagnostic tests (terms that are confusing and often used interchangeably in practice; see Chapter 16 ). Screening tests are those that are applicable to large populations to help screen and identify clinically unsuspected disease. In contrast, a diagnostic test is one that is usually more complex, expensive and precise and is applied to make a specific diagnosis of a disease, particularly in high-risk populations.


The advent of a variety of screening and diagnostic tests in obstetrics and fetal medicine has led to substantial and impressive reductions in fetal and maternal conditions. Importantly, antepartum and intrapartum fetal surveillance, including ultrasound and Doppler velocimetry, nonstress test, contraction stress test and fetal biophysical profile, have not only enabled the early identification of ‘at-risk’ fetuses but have paved ways to reduce the burden of fetal deaths. The primary purpose of fetal surveillance is to identify fetuses with impending asphyxia early in the course of disease to time their delivery to prevent fetal or neonatal demise or long-term damage. A secondary purpose is to avoid neonatal complications arising from asphyxia-related causes.


An ideal test would predict the occurrence or absence of a condition with perfect accuracy. Unfortunately, no such diagnostic test exists for fetal surveillance. The optimal screening test maximises prediction of ‘sick’ fetuses with the lowest occurrence of false-positive findings. We highlight a set of epidemiologic methods to evaluate the effectiveness of a screening test. Consider a 2 × 2 table ( Table 14.1a ). The two columns denote the true disease states, and the two rows denote the results of the test. Cross-classification of the true disease state with the result of the test yields four subpopulations: (i) diseased fetuses with positive test results, represented as ‘cell a’; (ii) nondiseased fetuses with positive test results, represented as ‘cell b’; (iii) diseased fetuses with negative test results, represented as ‘cell c’; and (iv) nondiseased fetuses with negative test results, represented as ‘cell d’.



TABLE 14.1a

Layout of a 2 × 2 Table Illustrating the Cross-Classification of the Disease State Based on the Results of a Screening Test


























Test Result Disease state Total
Present Absent
Test positive a
True positive (TP)
b
False positive (FP)
a + b
Total test positive
Test negative c
False negative (FN)
d
True negative (TN)
c + d
Total test negative
Total a + c
Total diseased
b + d
Total disease free
a + b + c + d = N
Total subjects


The most useful characteristics for evaluating a screening test include sensitivity, specificity, positive and negative predictive values, and likelihood ratios (LRs) for a positive and negative test result. These characteristics are summarised in Table 14.1b . Sensitivity specifies the probability that a test will classify subjects as being diseased when, in fact, they are diseased, and specificity is the probability that a test will classify subjects as being nondiseased when, in fact, they are nondiseased. An optimal test is one that will maximise both sensitivity and specificity (cells ‘a’ and ‘d’ denote the disease states that are classified correctly by the test). The power of a test (1-β) to detect a difference in the probability of detection is the sensitivity of the test.



TABLE 14.1b

Characteristics for Evaluating a Screening Test




































Test Characteristic Mathematical Formulation Alternate Formulation
Sensitivity (Se) <SPAN role=presentation tabIndex=0 id=MathJax-Element-1-Frame class=MathJax style="POSITION: relative" data-mathml='Se=aa+c’>Se=aa+cSe=aa+c
Se = a a + c
<SPAN role=presentation tabIndex=0 id=MathJax-Element-2-Frame class=MathJax style="POSITION: relative" data-mathml='TPR=TPTP+FN’>TPR=TPTP+FNTPR=TPTP+FN
TPR = TP TP + FN
Specificity (Sp) <SPAN role=presentation tabIndex=0 id=MathJax-Element-3-Frame class=MathJax style="POSITION: relative" data-mathml='Sp=db+d’>Sp=db+dSp=db+d
Sp = d b + d
<SPAN role=presentation tabIndex=0 id=MathJax-Element-4-Frame class=MathJax style="POSITION: relative" data-mathml='TNR=TNTN+FP’>TNR=TNTN+FPTNR=TNTN+FP
TNR = TN TN + FP
Positive predictive value (PPV) <SPAN role=presentation tabIndex=0 id=MathJax-Element-5-Frame class=MathJax style="POSITION: relative" data-mathml='PPV=aa+b’>PPV=aa+bPPV=aa+b
PPV = a a + b
<SPAN role=presentation tabIndex=0 id=MathJax-Element-6-Frame class=MathJax style="POSITION: relative" data-mathml='PPV=TPTP+FP’>PPV=TPTP+FPPPV=TPTP+FP
PPV = TP TP + FP
Negative predictive value (NPV) <SPAN role=presentation tabIndex=0 id=MathJax-Element-7-Frame class=MathJax style="POSITION: relative" data-mathml='NPV=dc+d’>NPV=dc+dNPV=dc+d
NPV = d c + d
<SPAN role=presentation tabIndex=0 id=MathJax-Element-8-Frame class=MathJax style="POSITION: relative" data-mathml='NPV=TNTN+FN’>NPV=TNTN+FNNPV=TNTN+FN
NPV = TN TN + FN
Likelihood ratio, positive test (LR + ) <SPAN role=presentation tabIndex=0 id=MathJax-Element-9-Frame class=MathJax style="POSITION: relative" data-mathml='LR+=aa+c1-db+d’>LR+=aa+c1db+dLR+=aa+c1-db+d
LR + = a a + c 1 – d b + d
<SPAN role=presentation tabIndex=0 id=MathJax-Element-10-Frame class=MathJax style="POSITION: relative" data-mathml='LR+=Se1-Sp’>LR+=Se1SpLR+=Se1-Sp
LR + = Se 1 – Sp
Likelihood ratio, negative test (LR ) <SPAN role=presentation tabIndex=0 id=MathJax-Element-11-Frame class=MathJax style="POSITION: relative" data-mathml='LR-=1-aa+cdb+d’>LR=1aa+cdb+dLR-=1-aa+cdb+d
LR – = 1 – a a + c d b + d
<SPAN role=presentation tabIndex=0 id=MathJax-Element-12-Frame class=MathJax style="POSITION: relative" data-mathml='LR-=1-SeSp’>LR=1SeSpLR-=1-SeSp
LR – = 1 – Se Sp
Odds ratio (OR; diagnostic test) <SPAN role=presentation tabIndex=0 id=MathJax-Element-13-Frame class=MathJax style="POSITION: relative" data-mathml='OR=Se1-Sp1-SeSp’>OR=Se1Sp1SeSpOR=Se1-Sp1-SeSp
OR = Se 1 – Sp 1 – Se Sp
<SPAN role=presentation tabIndex=0 id=MathJax-Element-14-Frame class=MathJax style="POSITION: relative" data-mathml='OR=LR+LR-‘>OR=LR+LROR=LR+LR-
OR = LR + LR –

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Mar 19, 2020 | Posted by in GYNECOLOGY | Comments Off on Epidemiologic and Research Methods in Fetal Medicine

Full access? Get Clinical Tree

Get Clinical Tree app for offline access