Recent guidelines issued jointly by the American College of Obstetricians and Gynecologists and the Society for Maternal-Fetal Medicine for assessing labor progress differ substantially from those described initially by Friedman, which have guided clinical practice for decades. The guidelines are based on results obtained from new and untested methods of analyzing patterns of cervical dilatation and fetal descent. Before these new guidelines are adopted into clinical practice, the results obtained by these unconventional analytic approaches should be validated and shown to be superior, or at least equivalent, to currently accepted standards. The new guidelines indicate the patterns of labor originally described by Friedman are incorrect and, further, are inapplicable to modern obstetric practice. We contend that the original descriptions of normal and abnormal labor progress, which were based on direct clinical observations, accurately describe progress in dilatation and descent, and that the differences reported more recently are likely attributable to patient selection and the potential inaccuracy of very high-order polynomial curve-fitting methods. The clinical evaluation of labor is a process of serially estimating the likelihood of a safe vaginal delivery. Because many factors contribute to that likelihood, such as cranial molding, head position and attitude, and the bony architecture and capacity of the pelvis, graphic labor patterns should never be used in isolation. The new guidelines are based heavily on unvalidated notions of labor progress and ignore clinical parameters that should remain cornerstones of intrapartum decision-making.
The seemingly inexorable increase in the use of cesarean delivery, and the substantial contribution that dystocia and related diagnoses have made to that increase, have prompted a reevaluation of what constitutes normal labor. As a result, new guidelines promulgated jointly by the American College of Obstetricians and Gynecologists (ACOG) and the Society for Maternal-Fetal Medicine (SMFM) were released. The new recommendations define abnormal labor and provide guidelines for its management that differ sharply from those originally described by Friedman, which have formed the basis of the clinical management of labor for many decades in the United States and elsewhere. For that reason, a thorough analysis of the proposed standards is warranted to ensure that changes recommended for obstetric care during labor are justified by the available evidence.
The guidelines are based heavily on analytic methods used by Zhang and colleagues to describe the patterns of cervical dilatation and fetal descent as functions of time elapsed in labor. Their findings, which have been rapidly adopted in some parts the obstetric community, have not yet been validated. For the reasons we briefly summarize in this commentary, we believe the new ACOG/SMFM recommendations provide definitions of dysfunctional labor and guidelines for its management that, however well intentioned, are likely to impose undue risk on mother and fetus.
Historical background
Prior to the mid-1950s, the evaluation of progress in labor was based primarily on its duration. Vague admonitions such as, “Never let the sun set twice on a laboring woman,” which were based on prevailing observations about average labor duration and outcomes, were commonly intoned. This approach was, however, ineffective in identifying when intervention would be appropriate or optimal.
In 1954, the first of hundreds of studies of labor by, or based on the work of, Emanuel Friedman was published. Friedman’s work built upon previous investigators’ attempts to describe the events of labor as a function of time. Their recognition of the practical implications of this approach was hampered by what we now know to have been erroneous assumptions about labor, particularly with regard to the role of membrane rupture. The first publications describing the graphic patterns of dilatation and descent stimulated the interest of many investigators, and led to the formulation of criteria that made the assessment of progress in labor objective rather than arbitrary. Unfortunately, the criteria have not always been applied appropriately, in part because of some misunderstandings about the curves and their proper place in clinical care.
Misconceptions
It has often been alleged that Friedman’s seminal observations regarding the labor curves rest on a fragile foundation because they were never corroborated by others. In fact, numerous studies done in different parts of the world over the course of several decades confirmed the basic nature of the original curves, and validated their usefulness in clinical practice. There have been disagreements over the importance of the latent phase or even the existence of the deceleration phase of dilatation, but the core finding that active-phase cervical dilatation progresses linearly, with a lower limit of normal approximately 1.0 cm/h in nulliparas, has been remarkably consistent among studies. It is also noteworthy that in many institutions the introduction of labor curves to clinical care was associated with a decline in the cesarean rate.
Some of the early data were collected using a mechanical cervimeter to obviate the potential subjectivity in clinical examination, and cervimetry by investigators using various tools confirmed the sigmoid nature of the dilatation curve. Limited data from more recently developed techniques to automate cervical assessment also appear consistent with the earlier observations. Sigmoid-shaped curves of cervical dilatation have even been described in cows, suggesting a common pattern of labor among mammalian species.
Given the large body of evidence confirming the basic pattern of progress in normal labor, it is difficult to believe that labor progresses very differently today from how it was originally described. Why, then, do the labor curves of Zhang and his colleagues differ from those of previous observers? One explanation was provided by Zhang himself when he and his colleagues applied their analytical methods to the very same data Friedman had analyzed from the Collaborative Perinatal Project. Friedman’s analysis of those data revealed a sigmoid-shaped dilatation curve; that of Zhang et al revealed an exponential curve, essentially the same as they had found from contemporary labors. Clearly, what had changed was not the nature of progress in labor, but how the data were analyzed. This raises the question of which analytic technique provides a more accurate model of labor progress: that of Friedman or that of Zhang et al?
In trying to address that question it is important to understand that the original dilatation and descent curves were based on and confirmed by direct experimental observations made on women in labor. The primacy of direct observation over theoretical conceptualization or indirect analysis of data in hypothesis testing has been a central tenet of the scientific method since the Enlightenment. When the results of an analytic approach differ from those derived from observation, it is important to understand why this has occurred, and try to adjudicate accordingly, before declaring the direct objective findings invalid.
Analytical issues
The labor curves in Friedman’s original reports were not created by using complex mathematical formulae, as some have suggested. The initial data were collected by a single observer. Subsequently, data from multiple practitioners in a single institution were reported. In both instances, the curves were drawn by hand, the descriptions were empiric, and the statistical analysis basic. Only later was a more sophisticated method of assessing the labor graphs by computer used to analyze >10,000 nulliparas from multiple institutions. This more sophisticated analysis confirmed the initial findings regarding the nature of the cervical dilatation and head descent time functions.
The computer algorithm used was developed with the Office of Biometry of the National Institutes of Health. Raw labor data were plotted on a probit (ie, the normal probability) scale, to convert the sigmoid curves to straight lines. The maximum slope data were converted to logarithms to normalize their right-skewed distribution. The linearity thus achieved made the data amenable to descriptive statistical study for determining distributions and limits of normal, which have until recently stood the tests of time and clinical applicability.
By contrast, Zhang and colleagues used a high-order polynomial curve-fitting program to analyze dilatation and descent data, and interval-censored regression to fit curves based on centimeter-by-centimeter median traverse times. We have concerns about the application of this technique to labor.
We do not profess personal expertise in this area, but we are impressed by the negative comments and strong skepticism encountered in the engineering literature pertaining to the limitations of high-order curve-fitting methods. Such models do not guarantee reliable results. Indeed, high-order curve fitting may not be appropriate or even necessary for most situations. Low-order quadratic curve fitting is preferable, whenever possible, and yields results that are at least as accurate. In fact, the higher the order, the less satisfactory curve-fitting accuracy tends to be. This is so because ‘noise’ (ie, unstable data points, especially if those points are spread apart from each other or are located at the ends of the range of data) is magnified. As a consequence, portions of the derived curve are distorted. In this regard a leading authority opined that, “It is important to keep the order of the model as low as possible…As a general rule the use of high-order polynomials (k >2) should be avoided unless they can be justified for reasons outside the data…Arbitrary fitting of high-order polynomials is a serious abuse of regression analysis.” Zhang et al used polynomial curve-fitting models of the order of 8-10, far in excess of the cited recommendation of no higher than 1 or 2.
Other investigators have used interval data to create labor curves, with varying results. Gurewitsch et al found a sigmoid curve of dilatation, but Chen and Chu found results similar to those of Zhang et al in terms of curve shape and much lower rates of dilatation.
Thus, the differences alleged to exist between the Friedman and the Zhang curves are likely due to the different mathematical models used to fit these curves. This is confirmed by Zhang’s finding, noted above, that the same data Friedman and Neff analyzed decades ago yielded exponential curves with the curve-fitting methods used by Zhang and his colleagues.
The approach by Zhang et al is likely to have introduced an important set of selection biases, which also cast doubt on the validity of their findings. Women with rapidly progressing labors tend to present themselves for obstetric care and be first examined at more advanced cervical dilatation than those with longer labor. Thus, the intervals at the distal end of the dilatation curve are likely to have been loaded with progressively more rapid labors. This may explain the exponential nature of the dilatation curve derived in this manner. It may also explain why the descent curve, which was unencumbered by that problem because all patients were present and under observation for their entire second stage, looks very much like that originally reported.
In addition, the labor curves of Zhang et al were generated after excluding women delivered by cesarean. Many of these were undoubtedly having slow, dysfunctional labor patterns that led to a diagnosis of dystocia and the need for cesarean delivery. Their exclusion is likely, therefore, to have falsely increased the average rate of dilatation in residual study cases, contributing to the exponential appearance of the curves. Zhang et al also excluded women whose cervix was >6 cm dilated at admission, probably thus excluding many of the most rapid labors and contributing to the overall appearance of slow average dilatation.
In fact, these and other biases were acknowledged by Zhang and his colleagues. They stated that the fact that their study excluded first-stage cesareans “limit[ed] the generalizability of the results.” They also acknowledged the probable disparity in dilatation rates among parturients admitted at different time points in labor, thus raising doubts about the comparability of data derived from these sequential points. They further reported that their labor curves “are unadjusted for potential confounders, such as oxytocin use. While it is technically possible to control for confounders…it complicates the interpretation of the results….” They also acknowledged that “…time intervals for more advanced cervical dilation were affected to an extent by dropout of women because of caesarean delivery for labor arrest. Such dropouts usually are not random. Slow progressing labors often dropped out early, making the average time intervals for the remaining women appear shorter than otherwise. The degree of bias depends on the incidence of first-stage caesarean delivery. Unfortunately, we have not yet recognized an easy solution to overcome this informed censoring.”
To summarize, Zhang and colleagues have themselves acknowledged that both the selection biases and the unadjusted confounders likely influenced the shape of their dilatation curve either by slowing the early aspects of the active phase (or the transition from latent to active phase) or speeding the late aspects of the active phase, or both. The combined effect of these biases probably explains in part their finding that the rate of active-phase dilatation increases exponentially, rather than linearly as Friedman and many others have previously found.
Transition to active phase
One critically important way in which the new guidelines depart from the old is in identifying the transition from latent to active phase during the first stage. It is widely, but erroneously, concluded from the Friedman dilatation curve that the active phase of labor begins at 4 cm. Some studies have even used 3 cm as the definition of entry into active phase. According to the guidelines, the active phase begins at 6 cm. The difference is of critical importance, because it has a dramatic effect on whether dysfunctional labor can be diagnosed early in the active phase. Important labor abnormalities (protracted active phase and arrest of dilatation) that would be identified by the Friedman curve prior to 6 cm of dilatation would be classified as normal by the new guidelines.
Why the active phase of first-stage labor has been inferred to begin at 4 cm is puzzling. We, in fact, have never suggested that the active phase begins at either 4 or 3 cm of cervical dilatation; on the contrary, we have expressly discouraged the use of any specific degree of dilatation for the identification of the active phase. Observations of dilatation data make it clear the active phase can begin anywhere from 3-6 cm, and, occasionally, earlier or later, depending on the individual labor. Using an arbitrary cutoff sacrifices accuracy for ease, and this unnecessary oversimplification risks incorrect diagnosis. The transition from the latent phase to the active phase can be correctly identified only by interpretation of serial clinical examinations for each patient as her labor progresses.
Consider, for example, a labor that begins with the cervix 2 cm dilated for several hours. It then dilates rapidly to 5 cm in 1 hour, but fails to dilate further over the next 2 hours. According to the new guidelines, that would be normal latent-phase labor. To us it is an arrest of dilatation in active-phase labor that requires thorough evaluation to search for a cause. The likelihood that it will resolve itself (as many arrest disorders do) or would benefit from oxytocin stimulation would depend on the clinical circumstances, determinable by evaluation of mother and fetus. If there were significant molding and a narrow pelvis, little would be gained by further labor, and the fetus might be exposed to unnecessary risk.
Misconceptions
It has often been alleged that Friedman’s seminal observations regarding the labor curves rest on a fragile foundation because they were never corroborated by others. In fact, numerous studies done in different parts of the world over the course of several decades confirmed the basic nature of the original curves, and validated their usefulness in clinical practice. There have been disagreements over the importance of the latent phase or even the existence of the deceleration phase of dilatation, but the core finding that active-phase cervical dilatation progresses linearly, with a lower limit of normal approximately 1.0 cm/h in nulliparas, has been remarkably consistent among studies. It is also noteworthy that in many institutions the introduction of labor curves to clinical care was associated with a decline in the cesarean rate.
Some of the early data were collected using a mechanical cervimeter to obviate the potential subjectivity in clinical examination, and cervimetry by investigators using various tools confirmed the sigmoid nature of the dilatation curve. Limited data from more recently developed techniques to automate cervical assessment also appear consistent with the earlier observations. Sigmoid-shaped curves of cervical dilatation have even been described in cows, suggesting a common pattern of labor among mammalian species.
Given the large body of evidence confirming the basic pattern of progress in normal labor, it is difficult to believe that labor progresses very differently today from how it was originally described. Why, then, do the labor curves of Zhang and his colleagues differ from those of previous observers? One explanation was provided by Zhang himself when he and his colleagues applied their analytical methods to the very same data Friedman had analyzed from the Collaborative Perinatal Project. Friedman’s analysis of those data revealed a sigmoid-shaped dilatation curve; that of Zhang et al revealed an exponential curve, essentially the same as they had found from contemporary labors. Clearly, what had changed was not the nature of progress in labor, but how the data were analyzed. This raises the question of which analytic technique provides a more accurate model of labor progress: that of Friedman or that of Zhang et al?
In trying to address that question it is important to understand that the original dilatation and descent curves were based on and confirmed by direct experimental observations made on women in labor. The primacy of direct observation over theoretical conceptualization or indirect analysis of data in hypothesis testing has been a central tenet of the scientific method since the Enlightenment. When the results of an analytic approach differ from those derived from observation, it is important to understand why this has occurred, and try to adjudicate accordingly, before declaring the direct objective findings invalid.
Analytical issues
The labor curves in Friedman’s original reports were not created by using complex mathematical formulae, as some have suggested. The initial data were collected by a single observer. Subsequently, data from multiple practitioners in a single institution were reported. In both instances, the curves were drawn by hand, the descriptions were empiric, and the statistical analysis basic. Only later was a more sophisticated method of assessing the labor graphs by computer used to analyze >10,000 nulliparas from multiple institutions. This more sophisticated analysis confirmed the initial findings regarding the nature of the cervical dilatation and head descent time functions.
The computer algorithm used was developed with the Office of Biometry of the National Institutes of Health. Raw labor data were plotted on a probit (ie, the normal probability) scale, to convert the sigmoid curves to straight lines. The maximum slope data were converted to logarithms to normalize their right-skewed distribution. The linearity thus achieved made the data amenable to descriptive statistical study for determining distributions and limits of normal, which have until recently stood the tests of time and clinical applicability.
By contrast, Zhang and colleagues used a high-order polynomial curve-fitting program to analyze dilatation and descent data, and interval-censored regression to fit curves based on centimeter-by-centimeter median traverse times. We have concerns about the application of this technique to labor.
We do not profess personal expertise in this area, but we are impressed by the negative comments and strong skepticism encountered in the engineering literature pertaining to the limitations of high-order curve-fitting methods. Such models do not guarantee reliable results. Indeed, high-order curve fitting may not be appropriate or even necessary for most situations. Low-order quadratic curve fitting is preferable, whenever possible, and yields results that are at least as accurate. In fact, the higher the order, the less satisfactory curve-fitting accuracy tends to be. This is so because ‘noise’ (ie, unstable data points, especially if those points are spread apart from each other or are located at the ends of the range of data) is magnified. As a consequence, portions of the derived curve are distorted. In this regard a leading authority opined that, “It is important to keep the order of the model as low as possible…As a general rule the use of high-order polynomials (k >2) should be avoided unless they can be justified for reasons outside the data…Arbitrary fitting of high-order polynomials is a serious abuse of regression analysis.” Zhang et al used polynomial curve-fitting models of the order of 8-10, far in excess of the cited recommendation of no higher than 1 or 2.
Other investigators have used interval data to create labor curves, with varying results. Gurewitsch et al found a sigmoid curve of dilatation, but Chen and Chu found results similar to those of Zhang et al in terms of curve shape and much lower rates of dilatation.
Thus, the differences alleged to exist between the Friedman and the Zhang curves are likely due to the different mathematical models used to fit these curves. This is confirmed by Zhang’s finding, noted above, that the same data Friedman and Neff analyzed decades ago yielded exponential curves with the curve-fitting methods used by Zhang and his colleagues.
The approach by Zhang et al is likely to have introduced an important set of selection biases, which also cast doubt on the validity of their findings. Women with rapidly progressing labors tend to present themselves for obstetric care and be first examined at more advanced cervical dilatation than those with longer labor. Thus, the intervals at the distal end of the dilatation curve are likely to have been loaded with progressively more rapid labors. This may explain the exponential nature of the dilatation curve derived in this manner. It may also explain why the descent curve, which was unencumbered by that problem because all patients were present and under observation for their entire second stage, looks very much like that originally reported.
In addition, the labor curves of Zhang et al were generated after excluding women delivered by cesarean. Many of these were undoubtedly having slow, dysfunctional labor patterns that led to a diagnosis of dystocia and the need for cesarean delivery. Their exclusion is likely, therefore, to have falsely increased the average rate of dilatation in residual study cases, contributing to the exponential appearance of the curves. Zhang et al also excluded women whose cervix was >6 cm dilated at admission, probably thus excluding many of the most rapid labors and contributing to the overall appearance of slow average dilatation.
In fact, these and other biases were acknowledged by Zhang and his colleagues. They stated that the fact that their study excluded first-stage cesareans “limit[ed] the generalizability of the results.” They also acknowledged the probable disparity in dilatation rates among parturients admitted at different time points in labor, thus raising doubts about the comparability of data derived from these sequential points. They further reported that their labor curves “are unadjusted for potential confounders, such as oxytocin use. While it is technically possible to control for confounders…it complicates the interpretation of the results….” They also acknowledged that “…time intervals for more advanced cervical dilation were affected to an extent by dropout of women because of caesarean delivery for labor arrest. Such dropouts usually are not random. Slow progressing labors often dropped out early, making the average time intervals for the remaining women appear shorter than otherwise. The degree of bias depends on the incidence of first-stage caesarean delivery. Unfortunately, we have not yet recognized an easy solution to overcome this informed censoring.”
To summarize, Zhang and colleagues have themselves acknowledged that both the selection biases and the unadjusted confounders likely influenced the shape of their dilatation curve either by slowing the early aspects of the active phase (or the transition from latent to active phase) or speeding the late aspects of the active phase, or both. The combined effect of these biases probably explains in part their finding that the rate of active-phase dilatation increases exponentially, rather than linearly as Friedman and many others have previously found.
Transition to active phase
One critically important way in which the new guidelines depart from the old is in identifying the transition from latent to active phase during the first stage. It is widely, but erroneously, concluded from the Friedman dilatation curve that the active phase of labor begins at 4 cm. Some studies have even used 3 cm as the definition of entry into active phase. According to the guidelines, the active phase begins at 6 cm. The difference is of critical importance, because it has a dramatic effect on whether dysfunctional labor can be diagnosed early in the active phase. Important labor abnormalities (protracted active phase and arrest of dilatation) that would be identified by the Friedman curve prior to 6 cm of dilatation would be classified as normal by the new guidelines.
Why the active phase of first-stage labor has been inferred to begin at 4 cm is puzzling. We, in fact, have never suggested that the active phase begins at either 4 or 3 cm of cervical dilatation; on the contrary, we have expressly discouraged the use of any specific degree of dilatation for the identification of the active phase. Observations of dilatation data make it clear the active phase can begin anywhere from 3-6 cm, and, occasionally, earlier or later, depending on the individual labor. Using an arbitrary cutoff sacrifices accuracy for ease, and this unnecessary oversimplification risks incorrect diagnosis. The transition from the latent phase to the active phase can be correctly identified only by interpretation of serial clinical examinations for each patient as her labor progresses.
Consider, for example, a labor that begins with the cervix 2 cm dilated for several hours. It then dilates rapidly to 5 cm in 1 hour, but fails to dilate further over the next 2 hours. According to the new guidelines, that would be normal latent-phase labor. To us it is an arrest of dilatation in active-phase labor that requires thorough evaluation to search for a cause. The likelihood that it will resolve itself (as many arrest disorders do) or would benefit from oxytocin stimulation would depend on the clinical circumstances, determinable by evaluation of mother and fetus. If there were significant molding and a narrow pelvis, little would be gained by further labor, and the fetus might be exposed to unnecessary risk.
Diagnosis of arrest of dilatation
Under the new guidelines, neither protracted active phase nor arrest of dilatation should be diagnosed in a nullipara before 6 cm cervical dilatation, and the lower limit of normal active-phase dilatation is about 0.5 cm/h, rather than the 1.0 or 1.2 cm/h reported by Friedman and others. The guidelines do recognize that there can be slow but progressive first-stage dilatation (protracted active phase), and that it should not be an indication for cesarean delivery, but they conflate protracted active phase and arrest of dilatation, despite evidence that they may be distinct disorders that respond differently to therapy and have a different prognosis. A protracted active phase, unless it has been caused by factors that inhibit contractility, such as anesthesia, infection, and (possibly) obesity, does not respond to oxytocin stimulation with an increased rate of dilatation. Contractility does, however, increase, thus conferring risk with no offsetting benefit.
Role of contractile force
To diagnose arrest of dilatation, the guidelines require that the cervix be ≥6 cm dilated, the membranes be ruptured, and there be no progress for ≥4 hours with adequate contractions, or ≥6 hours with inadequate contractions produced by oxytocin. They define adequate uterine contractility as “e.g., >200 Montevideo Units” (MVU), but recommend no alternative means of assessment. Moreover, no upper boundary of MVU is provided, thus condoning the potential exposure of the fetus to excessive uterine contractility. The definition also implies that an internal uterine pressure transducer (IUPT) is useful to diagnose an arrest of dilatation, but this is questionable.
The use of MVUs is problematic for several reasons. Intrauterine catheters carry risk, and there is not evidence for benefit. Studies have demonstrated that the use of IUPTs had no advantage when compared to noninvasive means of assessing uterine contractility during labor. In addition, IUPT readings may depend on patient position, or on their location within the uterus and, most importantly, they do not correlate well with progress in cervical dilatation or with the need for cesarean delivery. Normal progress in dilatation is achieved over a broad range of uterine activity, and the pattern of contractions may be as important as their strength.
The definition of arrest of dilatation proposed by the guidelines would, for example, allow a labor arrested at 8 cm with strong contractions to continue for at least 4 hours (and an additional 4 hours if the membranes were not ruptured until after the first 4 hours) at that dilatation before an arrest could be diagnosed and the recommended 4 hours of treatment begun. This recommendation would be inadvisable in many circumstances, because it fails to consider any preceding labor abnormalities, the results of clinical cephalopelvimetry, the presence of infection, and other factors that might be contributing to the dysfunction, some of which might not be surmountable. Of even more concern, the recommendations in the guidelines implicitly deny the possibility that the fetus could be put at risk by prolonged exposure to strong uterine contractions during an arrest of labor.