The article below summarizes a roundtable discussion of a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed:
Bernard J-P, Cuckle HS, Stirnemann JJ, et al. Screening for fetal spina bifida by ultrasound examination in the first trimester of pregnancy using fetal biparietal diameter. Am J Obstet Gynecol 2012;207:306.e1-5.
Discussion Questions
- ■
What was the study design?
- ■
How were the data analyzed?
- ■
What were the results?
- ■
What are the study’s strengths and weaknesses?
- ■
What do the study results mean clinically?
- ■
Where should we go from here?
Twenty years ago, the US Public Health Service recommended that women of childbearing age ingest 400 mcg of folic acid daily to help prevent neural tube defects in their babies. To further ensure sufficient intake, the government mandated supplementation of enriched cereal grain products by January 1998. This simple measure is associated with a 31% drop in the prevalence of spina bifida and generally, an estimated 1000 fewer cases of neural tube defect per year. Still, about 1500 babies are born with spina bifida yearly in the United States. A new study suggests that basic measurements can predict diagnosis earlier in about half of affected pregnancies.
See related article, page 306
An initial step
Bernard and colleagues wanted to see whether straightforward, reproducible fetal biometric measurements routinely acquired at the 11-14 weeks’ ultrasound nuchal translucency scan could be used to predict spina bifida. Eighteen cases of spina bifida were diagnosed among 34,951 scans. These 18 were designated as Group 1; the remaining 34,933 served as controls. Another 28 referred to the researchers’ fetal medicine unit from other institutions were accompanied by well-documented biometrics; they served as Group 2.
The authors converted each fetus’ biparietal diameter (BPD), head circumference, and abdominal circumference to multiples of the median (MoM) for crown-rump length. As is standard practice, the biometric or observed measure was divided by the expected measure for unaffected pregnancies with the same crown-rump length (CRL), using a regression equation that the researchers had previously published. The distribution of MoMs between cases and controls were compared based on median value and the proportions above the fifth centile in controls, the detection rate, and the false-positive rate. The area under the receiver operating characteristic (ROC) curve and the likelihood ratio (LR) for positive and negative results were also calculated.
The technique constructed by Bernard et al is designed for screening rather than diagnostic purposes. Screening tests are offered to asymptomatic people who may or may not have early disease or disease precursors; some might be at high risk for developing a particular condition. The results then guide the decision to move on to diagnostic tests. In contrast, diagnostic tests are offered to people who have specific indications of possible illness, whether it is an element in the history, signs, symptoms, or a positive screening test.
Journal Club members thought the researchers’ objective was worthwhile. Devising a screen that worked in women at 11-14 weeks’ gestation might make it possible for those who test positive to undergo an earlier diagnostic ultrasound; subsequently, they might be able to make an earlier decision about continuing or terminating the pregnancy. Or, earlier identification might allow swifter consideration for fetal surgical interventions.
Testing the test
The researchers used LRs in their study, estimates Journal Club members rarely have an opportunity to discuss. They gauge the likelihood that a given test result would be anticipated in a patient with the target disorder compared with the likelihood that the same result would be expected in a patient without the target disorder. In this study, 50% of the fetuses with spina bifida aperta had a BPD less than the 5th centile compared to roughly 5% of the controls. The positive LR is then 50%/5% or 10. This means that the result seen in a case would be 10 times as likely to be seen in someone with, as opposed to someone without, spina bifida aperta. In fact, the authors reported a positive LR of 10.9, since the percentage of controls with a BPD less than the 5th centile was actually a bit below 5%.
The LR is used to judge a diagnostic test’s value. As such, practitioners can press it into service when choosing the best diagnostic test—or series of tests—for a particular situation. Other useful points: the LR can provide an aggregate finding from the results of several different diagnostic tests, it can be used to determine the post-test probability that a patient has a specific condition, and it is less influenced by prevalence of an illness than are sensitivity and specificity.
An uncertain world
One of the tenets of clinical research is that the role of uncertainty or chance in the findings of a study must be assessed. This is fairly clear-cut for randomized clinical trials, cohort studies, and case-control studies, where well-known bivariate statistics (eg, χ 2 test and t tests) and multivariate techniques are employed. Evaluating the contribution of uncertainty in studies of screening and diagnostic tests requires a different approach. For these, the degree of precision around estimates of sensitivity, specificity, area under the ROC curve, or LRs can be performed, helping readers to decide how certain they are of the clinical utility of a test.
Journal Club members suggested that 95% confidence intervals around the area under the ROC curve and the LRs might have been more useful that P values. Overall, though, they believed the study question was novel, and they liked that the proposed test was quite simple. They would like to see other studies validate the work presented in this paper.