Routine second-trimester transvaginal ultrasonographic (TVU) screening for short cervical length (CL) predicts spontaneous preterm delivery (SPTD), albeit with limited sensitivity (35-40%) and a moderate positive likelihood ratio of 4-6. However, CL describes one of the multidimensional changes that are associated with precocious cervical ripening (PCCR) and that also include cervical softening, cervical funneling (CF), and dilation. PCCR, a precursor and a strong predictor for SPTD, was proposed as a potential screening target. We hypothesized that screening for composite measures of PCCR (eg, CL, CF, cervical consistency, and dilation) with the use of either digital examination or TVU would improve the prediction of SPTD compared with screening for short CL alone. We searched PubMed and EMBASE electronic databases for observational cohort studies to evaluate cervical screening in asymptomatic obstetric populations. Multidimensional composite cervical measures were assessed in 10 datasets (n = 22,050 pregnancies) and 12 publications. Appreciable heterogeneity in cervical measurements, data quality, and outcomes across studies prevented quantitative metaanalysis. Only one study reported intra- and interobserver reliability of cervical measurements. The prevalence of CF ranged from 0.7–9.1%. Five studies compared composite measures of PCCR (ie, CL and CF) with short CL alone and consistently reported improved screening performance. Among 3 TVU studies, gains in sensitivity ranged from 5–27%, and increases in positive likelihood ratio ranged from 3–16. Our findings suggest that composite measures of PCCR might serve as valuable screening targets. High-quality interdisciplinary studies that integrate epidemiologic approaches are needed to test this hypothesis and to accelerate the translation of advances in cervical pathophysiology into effective preventive interventions.
Spontaneous preterm delivery (SPTD) is an unsolved public health problem of global proportion that requires more effective prevention strategies. Timely prevention of SPTD commences with early identification of a modifiable target by means of effective screening programs. In general, short cervical length (CL) is the screening target in routine second-trimester transvaginal ultrasonography (TVU; Figure 1 ), which represents a simple, safe, and reproducible technical advance, compared with digital examination (DE). Although TVU has distinct advantages, DE does not require sonographic training or equipment and therefore may be more suitable for resource-limited settings. Whereas the false-positive rates of TVU and DE screening are similar, the limited sensitivity (35-40%) of TVU screening for shortened CL is marginally better than that of DE (25-30%). Considering short CL, defined at <25 mm at <20 weeks of gestation, a review from the United Kingdom reported a moderate positive likelihood ratio (LR+) of 6.29 to predict SPTD before 34 weeks, and a Canadian review reported an LR+ of 4.31 to predict SPTD at <35 weeks of gestation. Moderate LR+s of short cervices that were assessed at 20-24 weeks of gestation were reported consistently and included a value of 2.86 for asymptomatic high-risk women with a SPTD history.
If the obstetric community had an effective and efficient means of screening for SPTD, this approach could be expanded to routine use in all pregnant women. Previously, there was a lack of evidence for the value of early intervention. However, a recent metaanalysis demonstrated that vaginal progesterone administration to asymptomatic women with a sonographic short cervix not only reduced the risk of SPTD but also led to a 43% reduction in neonatal morbidity and death. Although universal screening for short cervices followed by progesterone treatment is cost-effective, large numbers of mothers (400 or 588) must be screened to prevent 1 SPTD. Clearly, more efficient screening strategies are needed. According to The American Congress of Obstetricians and Gynecologists Practice Bulletins, the maternal cervix “should be examined as clinically appropriate when technically feasible,” and universal cervical screening of pregnant women without a previous preterm birth may be considered despite “limited or inconsistent scientific evidence (level B recommendation).” Furthermore, evidence-based research is required for greater quality assurance. Although the United States Preventive Services Task Force acknowledges the importance of predicting preterm delivery through screening, it has not recommended any screening targets.
Theory and reasoning for prediction
For years, multidimensional cervical features were used to predict the early onset of labor or SPTD. In 1964, the Bishop scoring system (cervical dilation, effacement, consistency, and position as assessed by DE) was correlated with the time to the onset of labor. In 1965, Wood et al first reported that an internal cervical os dilated to 1 finger breadth and an effaced cervix predicted SPTD. Papiernik et al reported a decline in SPTD at <32 weeks of gestation (1.7% in 1971-1974 vs 0.8% in 1979-1982) in the French city of Haguenau after implementation of uncontrolled and multilevel interventions. Prominent among targets of this population-based risk assessment and screening system were both shortened cervices and patency of the internal os. These precocious signs of cervical ripening can be recognized during a vaginal examination several weeks before the onset of SPTD and may be useful in the prediction of it. Despite the presentation of this French screening experience at conferences and in a March of Dimes monograph aimed to the American medical audience, this article that was published in 1986 has not been cited widely in 3 past decades (78 citations on Web of Knowledge and 102 on Google Scholar in March 2014) and deserves a new look. Furthermore, we acknowledge again recent progress in available effective treatments, such as vaginal progesterone, which is an essential criterion that is required to support screening.
The identification of effective screening targets for SPTD relies on an understanding of its natural history and pathophysiology; in the latter circumstance, our understanding is lacking. Because precocious cervical ripening (PCCR) is an important precursor state in the SPTD pathway and a strong predictor for it, PCCR is a potential target for screening. Precursors are pathologic states that have a high probability of progressing to disease after a latent stage. Accordingly, ascertainment of properly defined precursors can increase the effectiveness of screening and prevention. As a recognizable stage in parturition, the term PCCR was coined initially by Papiernik et al in 1986. PCCR describes multidimensional cervical changes that include softening, shortening, funneling, and dilation of the internal os. These changes, which are visible with ultrasound, progress from T- to Y-, V- or U-shape funnels ( Figure 1 ) before the onset of SPTD. Cervical pathophysiologic condition has been investigated further through molecular and cellular approaches. Romero et al described cervical ripening as a general feature of the “premature parturition syndrome.” In 2011, routine recording of cervical ripening was recommended by the Global Alliance to Prevent Prematurity and Stillbirth. In 2012, Caritis and Simhan proposed that the term PCCR was more appropriate and less confusing than either “cervical incompetence” or “cervical insufficiency,” both being ill-defined biologically. In this review, we use the term PCCR and operationalize it as at least 2 measurable cervical dimensions.
It is logical to ask how well the performance of PCCR has been evaluated to date in predicting SPTD. The effectiveness of a screening program depends on the interrelations among (1) the performance, timing, and frequency of screening procedures, (2) the efficacy of timely interventions, and (3) the risk profile of target populations. We chose to investigate both reviews and individual studies; but we confine our comments regarding reviews to the introduction. Reviews by Owen et al and Honest et al grouped only observational studies; other reviews mixed clinical trials and observational studies together. Despite providing useful insights concerning diverse populations, designs and analytical methods, previous reviews failed to consider PCCR, with most investigators focusing entirely on CL as measured by TVU. Reports from 5 investigative teams over the past 15 years did not cite the screening study of Papiernik et al from 1986 but considered some of the hypotheses that form the basis for the present analysis. The first was Leitich et al from Austria who concluded dilation of the internal cervical os to be among the most effective markers for preterm delivery. The second was Honest et al from the United Kingdom who published 3 reviews and reported that (1) the larger the funnel (eg, dilation of internal os >5 mm), the more accurately the prediction of SPTD, and (2) CL and cervical funneling (CF), used alone or in combination, appeared useful for SPTD prediction, but no data were highlighted. The third team was Crane and Hutchens from Canada, who included CF in their tables but did not summarize its predictive performance. The fourth team was Reiter et al from Denmark, who published the only review that chose to target “premature cervical ripening” and reported unclear methods for the estimation and the insufficient evidence for routine screening; however, they neglected to justify this target and to include CF from studies, such as the one from Iams et al. Finally, Barros-Silva et al from Portugal reported inconsistent findings in comparing combined screening targets with short CL alone in 3 studies and recommended combining CL “with other markers (sonographic, biochemical and/or clinical) that reflect the multiplicity of mechanisms involved in the pathogenesis of SPTD.”
Objective
We hypothesized that comprehensive assessment for multidimensional PCCR (eg, CL, CF, cervical consistency, and cervical dilation in combination) is more effective (eg, improved sensitivity and LR+) than screening for short CL alone with the use of either TVU or DE. DE is suitable for resource-limited settings, serves as a historical comparison, and is included. The primary outcome measure was SPTD at specified gestational weeks. In this systematic review, we aimed to identify, appraise, select, and synthesize all high-quality research evidence. We assessed elements of study methods and variations in cervical assessment, risk profiles of participants, and health care contexts. Further, we identified research gaps and suggest future research to improve the performance of cervical screening and its cross-cultural applicability.
Objective
We hypothesized that comprehensive assessment for multidimensional PCCR (eg, CL, CF, cervical consistency, and cervical dilation in combination) is more effective (eg, improved sensitivity and LR+) than screening for short CL alone with the use of either TVU or DE. DE is suitable for resource-limited settings, serves as a historical comparison, and is included. The primary outcome measure was SPTD at specified gestational weeks. In this systematic review, we aimed to identify, appraise, select, and synthesize all high-quality research evidence. We assessed elements of study methods and variations in cervical assessment, risk profiles of participants, and health care contexts. Further, we identified research gaps and suggest future research to improve the performance of cervical screening and its cross-cultural applicability.
Methods for review
Selection criteria and data sources
A comprehensive literature search was conducted to identify articles published in English language journals from 1980 to March 2014. We included high-quality studies, which evaluated multidimensional aspects of PCCR to predict singleton SPTDs in observational cohort studies of unselected obstetric populations. We excluded studies assessing only one cervical dimension (ie, CF or CL ). This systematic review included only published manuscripts and therefore was exempt from institutional review board review.
Using the key words cervi*, preterm, prematurity, predict*, and singleton* , we searched PubMed and EMBASE electronic databases in March 2014. We identified 538 reports and 2 reports from other sources as depicted in a flow diagram ( Figure 2 ) from the Preferred Reporting Items for Systematic reviews and Meta-Analyses (PRISMA) ; 397 citations and abstracts yielded 160 potentially relevant articles for full-text review. We excluded 119 articles that considered only 1 cervical dimension and 29 articles that included multiple cervical measures. Reference lists were searched manually but failed to reveal other studies. Twelve reports that described 10 datasets comprising 22,050 pregnancies met the inclusion criteria.
Screening is the identification of asymptomatic disease or risk factors. Assessment of high-risk (eg, mothers with a history of SPTD) or of symptomatic mothers is not an appropriate design for comparative screening evaluation and will limit the generalizability of findings to other populations. We have been prudent to the inclusion and exclusion criteria so that we can identify all high-quality evidence of screening effectiveness. Twenty-five of the 29 excluded studies (citations are available on request) assessed the cervices of high-risk (n = 4) or symptomatic (n = 21) mothers; 1 excluded women with the history of SPTD. Among 3 studies with asymptomatic participants, 2 studies did not evaluate predictive performances of cervical assessment, and 1 study evaluated cervical measures other than CL.
Measurements and evaluation process
The 2 essential characteristics of a screening test are its reliability and validity. Therefore, when it was available, we abstracted data on reliability, sensitivity, specificity, positive and negative predictive values, positive and negative LRs, and receiver operating characteristic (ROC) curves. However, because of the variation in reporting across studies, we only focused on reporting sensitivity and LRs. We calculated LRs based on the following formulas
:
To assess study quality, we used the recommendations from the Standards for Reporting of Diagnostic Accuracy Steering Group and a review of screening tests to document the following study criteria: sample characteristics (consecutive sample), study design (prospective or retrospective cohort), type of cervical assessment (DE or TVU), evaluator background (eg, sonographers, obstetricians), blinding, reliability, outcomes (definition of SPTD), and quantitative analysis (screening performance and statistical association between cervical measures and SPTD).
The use of multidimensional cervical changes in the inclusion criteria allowed us to evaluate the performance of individual PCCR components. First, within a given population, we calculated the difference in sensitivity and/or LRs of composite measures of PCCR compared with short CL alone. Second, we assessed the consistency of within-study comparisons across studies. However, a quantitative metaanalysis could not be performed because of (1) the lack of sensitivity and LR data or data required to reproduce the 2 × 2 contingency tables and (2) the appreciable heterogeneity in design, screening, definitions of CF, and variable cutoff values.
Results
General time and geographic trends
The 12 studies that we reviewed were published at a rate of approximately 3-4 per decade in the past 30 years ( Table 1 ). They were conducted in Europe (5 publications, 41%), North America (3 publications, 25%), Asia (2 publications, 17%), and South America (2 publications, 17%). Most of the studies were from high-income countries (ie, France, Sweden, Japan, Finland, the United States, and the United Kingdom) or high-income regions (ie, Hong Kong), but 2 originated in middle-income countries (ie, Brazil and Colombia).
Source | Screening period | Location/venue | Population, n | Weeks of gestation | Type of examination |
---|---|---|---|---|---|
Papiernik et al, 1986 | 1971-1976 | Haguenau, France a | 4430 | 18, 19-24, 25-28, 29-31, 32-34, and 35-36 | DE |
Bouyer et al, 1986 | 1971-1976 | Haguenau, France a | 4390 | 18, 19-24, 25-28, 29-31, 32-34, and 35-36 | DE |
Mortensen et al, 1987 | 1982 | Skaraborg, Sweden a | 581 | 24, 28 & 32 | DE |
Hartmann et al, 1999 | 1995-2000 | 4 sites in North Carolina | 871 | 24-29 | DE |
Newman et al, 2008 | 1992-1994 | 10-Site National Institutes of Health Maternal Fetal Medicine Unit Network in the United States | 2916, 2538 | 22-24, 26-29 | DE & TVU |
Iams et al, 1996 | 1992-1994 | 10-Site National Institutes of Health Maternal Fetal Medicine Unit Network in the United States | 2915, 2531 | 22-24, 26-29 | DE & TVU |
Hasegawa et al, 1996 | 1994 | 10 centers in Japan | 729 | 15-34 | TVU |
Taipale & Hiilesmaa, 1998 | 1995-1996 | Helsinki, Finland a | 3694 | 18-22 | TVU |
To et al, 2001 | 1997-2000 | London, United Kingdom a | 6334 | 22-24 | TVU |
De Carvalho et al, 2005 | 1998-2001 | São Paulo, Brazil | 1958 | 21-24 | TVU |
Leung et al, 2005 | 2000-2002 | Hong Kong, China a | 2880 | 18-22 | TVU |
Parra-Saavedra et al, 2011 | 2009-2010 | Barranquilla, Colombia | 1115 | 5-36 | TVU |
Study quality
All studies used prospective cohorts with consecutive cases ( Table 2 ). The patients and providers in 2 studies and providers in 5 studies were blinded to cervical measures. Intra-/interobserver reliability in studies of ultrasound measures was poorly described; despite 3 studies claiming to have used rigorous quality control processes, only 1 reported the intra- and interobserver reliability of the cervical consistency index. Older studies reported associations (eg, adjusted relative risk) between cervical measurements and the risk of SPTD. However, 2 of these did not include measures of sensitivity, specificity, or LRs, and 3 others reported only a single criterion (ie, sensitivity or positive predictive value).
Source | Consecutive sample | Prospective cohort design | Type of examination | Evaluators | Blinded | Reliability quantified | Outcome defined | Predictive performance assessed | Statistical association reported |
---|---|---|---|---|---|---|---|---|---|
Papiernik et al, 1986 | Yes | Yes | DE | Obstetricians | Unclear | No | Yes | No | Yes |
Bouyer et al, 1986 | Yes | Yes | DE | Obstetricians | Unclear | No | Yes | Yes | Yes |
Mortensen et al, 1987 | Yes | Yes | DE | Midwives | Unclear | No | Yes | Yes | Yes |
Hartmann et al, 1999 | Yes | Yes | DE | Obstetricians & nurses | Provider | No | Yes | Yes | No |
Newman et al, 2008 | Yes | Yes | DE | Nurses & examiners | Provider | No | Yes | Yes | No |
Iams et al, 1996 | Yes | Yes | DE & TVU | Nurses & examiners | Provider | No | Yes | Yes | Yes |
Hasegawa et al, 1996 | Yes | Yes | TVU | Unclear | Unclear | No | Yes | Yes | Yes |
Taipale & Hiilesmaa, 1998 | Yes | Yes | TVU | Obstetricians and midwives | Provider | No | Yes | Yes | Yes |
To et al, 2001 | Yes | Yes | TVU | Obstetricians & sonographers | Provider | No | Yes | No | Yes |
de Carvalho et al, 2005 | Yes | Yes | TVU | Sonographers | Patient & provider | No | Yes | Yes | No |
Leung et al, 2005 | Yes | Yes | TVU | Sonographers | Patient & provider | No | Yes | Yes | No |
Parra-Saavedra et al, 2011 | Yes | Yes | TVU | Obstetricians | Unclear | Yes | Yes | Yes | No |
Study populations and definitions of SPTD
The low incidence rates of SPTD (eg, <9% at <37 weeks of gestation, <5% at <34 or <35 weeks of gestation, and <1% at <33 weeks of gestation; Table 3 ) reflected generally low-risk populations. Mothers were recruited from multicenter studies or hospitals. Only hospitals from Finland, France, Hong Kong, Sweden, and the United Kingdom integrated cervical assessment into routine prenatal service and institutionalized preventive intervention to predict SPTD.
Source | Screen, wk | Measurements | Funneling dilation, % | Preterm | Sensitivity, % | Specificity, % | Positive predictive value, % | Negative predictive value, % | ROC curves | Positive likelihood ratio a | Negative likelihood ratio a | Association | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
<wk | Rate, % | ||||||||||||
Papiernik et al, 1986 | 18 | Dilation Station Length Uterine contraction Expanded lower segment | 0.8-12.4 By ≤18, 24, 28, 31, 34, 36 wk | 37 | Unclear | 2.5-3.4 (0.8-4.3) 0.9-2.9 1.2-2.9 0.6-1.9 | |||||||
Bouyer et al, 1986 | 18 | Score f | 1.1-14 | 37 | 5.9 b 5.5 | 44-57 b 56-64 | 71-78 b 73-78 | 1.5-2.4 b 2.1-2.8 | 0.59-0.78 b 0.47-0.58 | 0.8-3.1/1.8-6.7 b 0.9-2.7/1.6-3.5 | |||
Mortensen et al, 1987 | 24 | Modified Bishop score Dilation Effacement | 4.0 | 37 | 1.5 | 33 11 | 88 97 | 4 6 | 97 99 | 2.8 3.6 | 0.76 0.92 | 8.3 3.1 | |
Hartmann et al, 1999 | 24-29 | CL <2.0 cm Dilation ≥1.0 cm Cervical score d <2 c | 6.0 | 37 or pPROM | 8.3 | 13 c 8 20 c | 93 99 93 | 15 38 21 | 92 92 92 | 1.9 c 8 2.9 c | 0.94 0.93 0.86 | ||
Newman et al, 2008 | 22-24 | T1 e : Bishop score ≥4 Cervical score d <1.5 T2 e : Bishop score ≥5 CL <2.0 cm CF present Cervical score d <1.5 c | Unclear Unclear | 35 | 4.4 | 28 13 32 32 c 32 36 c | 90.9 97.9 93.0 95 91 95 | 12 21 14 17 11 20 | 98 98 98 | 0.66 0.61 0.68 0.68 | 3.0 6.4 4.6 6.4 c 3.6 7.4 c | 0.80 0.88 0.73 0.72 0.75 0.67 | |
Iams et al, 1996 | 15-34 | T1 e : Bishop ≥4 CL ≤2.5 cm CF present T2 e : Bishop ≥4 CL ≤2.5 cm CF present | 6.4 9.1 | 35 | 4.3 | 28 37 25 43 49 33 | 91 92 95 83 87 92 | 12 18 17 10 11 17 | 97 97 97 97 98 98 | 3.0 4.8 4.6 2.4 3.7 3.9 | 0.80 0.68 0.79 0.70 0.58 0.34 | 6.19 9.57 | |
Hasegawa et al, 1996 | 15-34 | CL ≤2.7 cm Open internal os | 7.8 | 36 | 3.3 | 10 b /2 7 b /11 | 4.86 (1.85-12.72) b 6.00 (1.65-21.71) | ||||||
Taipale and Hiilesmaa, 1998 | 18-22 | CL ≤2.9 cm Dilation ≥0.5 cm Either c | 0.7 | 35 37 | 0.8 2.4 | 19 c 16 29 c | 97 99 97 | 6 20 7 | 6.3 c 16 9.7 c | 0.84 0.85 0.73 | 8 28 11 | ||
To et al, 2001 | 22-24 | CL Internal os ≥0.5 cm | 4 | 33 | 0.9 | 24.9 1.8 | |||||||
De Carvalho et al, 2005 | 21-24 | CL ≤2 cm Add CF present c | 1.5 | 34 | 3.4 | 7 c 34 c | |||||||
Leung et al, 2005 | 18-22 | CL ≤2.7 cm CF Both c Either c | 6.3 | 34 | 0.7 | 37 c 32 26 42 c | 96 94 99 91 | 6 3 15 3 | 100 100 100 100 | 9.8 c 5.2 26 c 4.7 | 0.66 0.73 0.74 0.64 | ||
Parra-Saavedra et al, 2011 | 5-36 | Consistency index c CL | Excluded | 34 | 2.1 | 64 c 9 c | 98 98 | 47 9 | 99 98 | 0.94 | 39.7 c 4.3 c |