Background
Contemporary interpretation of fetal heart rate patterns is based largely on the tenets of Drs Quilligan and Hon. This method differs from an older method that was championed by Dr Caldeyro-Barcia in recording speed and classification of decelerations. The latter uses a paper speed of 1 cm/min and classifies decelerations referent to uterine contractions as type I or II dips, compared with conventional classification as early, late, or variable with paper speed of 3 cm/min. We hypothesized that 3 cm/min speed may lead to over-analysis of fetal heart rate and that 1 cm/min may provide adequate information without compromising accuracy or efficiency.
Objective
The purpose of this study was to compare the Hon-Quilligan method of fetal heart rate interpretation with the Caldeyro-Barcia method among groups of obstetrics care providers with the use of an online interactive testing tool.
Study Design
We deidentified 40 fetal heart rate tracings from the terminal 30 minutes before delivery. A website was created to view these tracings with the use of the standard Hon-Quilligan method and adjusted the same tracings to the 1 cm/min monitoring speed for the Caldeyro-Barcia method. We invited 2–4 caregivers to participate: maternal-fetal medicine experts, practicing maternal-fetal medicine specialists, maternal-fetal medicine fellows, obstetrics nurses, and certified nurse midwives. After completing an introductory tutorial and quiz, they were asked to interpret the fetal heart rate tracings (the order was scrambled) to manage and predict maternal and neonatal outcomes using both methods. Their results were compared with those of our expert, Edward Quilligan, and were compared among groups. Analysis was performed with the use of 3 measures: percent classification, Kappa, and adjusted Gwet-Kappa ( P < .05 was considered significant).
Results
Overall, our results show from moderate to almost perfect agreement with the expert and both between and within examiners (Gwet-Kappa 0.4–0.8). The agreement at each stratum of practitioner was generally highest for ascertainment of baseline and for management; the least agreement was for assessment of variability.
Conclusion
We examined the agreement of fetal heart rate interpretation with a defined set of rules among a number of different obstetrics practitioners using 3 different statistical methods and found moderate-to-substantial agreement among the clinicians for matching the interpretation of the expert. This implies that the simpler Caldeyro-Barcia method may perform as well as the newer classification system
Electronic fetal monitoring has been used widely in clinical service since the 1970s, with nearly universal application in developed countries. The current tracing speed that is used in the United States is 3 cm/min, although historically fetal heart rate (FHR) tracings ran at 1 cm/min. This rate adjustment is intended to plot the FHR on a larger field and to enable more accurate interpretation of patterns. Many experts historically have countered that 1 cm/min gives sufficiently good records for clinical purposes and has the advantage of limiting resources and costs. Regardless of the different variations, FHR monitoring has remained merely a screening test and has not been beneficial in decreasing the incidence of cerebral palsy. In fact, well-designed prospective studies that have concerned electronic FHR monitoring have demonstrated a higher incidence of operative deliveries, including cesarean births, without improvement of cord pH values or 5-minute Apgar scores.
The interpretation of fetal monitoring is also not without controversy. Since its inception, there has been a lack of consistent use of terminology in the literature. It was not until 1997 that the National Institutes of Child Health and Human Development (NICHD) workshop arrived at a consensus concerning the standardization of fetal monitoring interpretation. The product of this committee resulted in FHR pattern definitions and research guidelines for interpretation with a full description of what an intrapartum FHR tracing should include. The causal relationship of ≥1 FHR patterns that are indicative of cerebral palsy remained unclear. The NICHD workshop regrouped in 2008 and set out to standardize FHR pattern definitions further. Recommendations were made for further research regarding the correlation of reliability, validity of interpretative technique, and causal relationship of FHR patterns and outcome. Last, in July 2009, the American College of Obstetricians and Gynecologists (ACOG) reconvened to develop a set of guidelines that would be recognized universally and used consistently. After extensive examination, the clinical guidelines were revised to include a 3-tier classification system. Despite this, the number of cerebral palsy cases has not improved, and cesarean rates remain high. The mere fact that the interpretation systems have required frequent revisions demonstrates the need for further clarification and simplification.
A different classification system of FHR decelerations existed in the 1960s, championed by Dr Caldeyro Barcia. A fundamental feature of the Caldeyro-Barcia (CB) method was a slower 1-cm/min recording rate and a simpler method to classify decelerations based on the timing of their nadir in relation to the peak of the contraction. Because the new NICHD guidelines have helped to standardize and simplify FHR interpretation, we hypothesize that perhaps this older method that visually condenses the FHR information and simplifies the interpretation of decelerations may also perform as well as the newer classification system, which further highlights the need for continued research into other methods of fetal evaluation. The objective of this study was to compare the interobserver reliability among groups of obstetrics care providers with the use of the contemporary Hon-Quilligan (HQ) method of FHR interpretation with the CB method with the use of an online interactive testing tool.
Materials and Methods
Our web-based study made use of 40 representative FHR tracings that were identified from Las Vegas area hospitals in accordance with study protocol under Western Institutional Review Board approval (#IRB00000533). Tracings were chosen to represent the full spectrum of fetal outcomes from normal to poor according to 5-minute Apgar scores. FHR tracings were deidentified in accord with current Health Insurance Portability and Accountability Act requirements and Good Clinical Practice. The tracings were selected from women at ≥36 weeks of gestation with a known fetal cord arterial pH. Exclusion criteria included suspected chorioamnionitis and multifetal gestation.
Each tracing represented the terminal 30 minutes before delivery and concluded no more than 10 minutes, compared with delivery. Each strip was edited in Photoshop (Adobe Systems Incorporated, San Jose, CA) to produce a consistent and clean presentation with accurate rescaling between both the HQ tracing system (3 cm/min speed) and the CB tracing system (1 cm/min speed). Clinical information from each of the cases that included the outcome of labor and delivery and neonatal outcomes was synopsized and presented along with each of the 40 tracings with the use of both methods. The tracings were then interpreted by our standards expert (E.J.Q.) who was blinded to the outcomes.
The tracings were then presented to study participants via the study website that was created by the High Risk Pregnancy Center, Research Department. The website provided participants with training instruction, which could be reviewed as desired, included pop-up context aware help screens and an intuitive layout for study chart presentation and question sections. The training section contained a detailed description of both methods and a list of definitions for all study variables. Definitions for the HQ method were taken from the 2009 ACOG publication. Specifically, for the CB method, type I dips were defined as having a nadir that corresponds to the peak of the contraction and is considered not to be a cause of decreased fetal p02. Type II dips, which nadir typically 30-60 seconds after the peak of the contraction, are considered a cause of decreased fetal p02. Type I dips are analogous to early decelerations in the HQ method, and type II dips are considered similar to variable and late decelerations. A short quiz was then given to test their proficiency in the CB method; the study participant was allowed to start the study once a score of 80% was reached. Study participants were allowed to log in multiple times to complete the study. When returning, the study took the participants back to where they left off in the previous login, which made navigation of the study straightforward and flexible to fit participants’ schedules. Study participants were assigned a randomly generated login identification that was disassociated with their identity. No identifying information was collected from study participants or was maintained on the server.
We then invited 2–4 participants from each areas of expertise: maternal-fetal medicine (MFM) experts (individuals in the field with research experience in FHR monitoring), actively practicing MFMs, MFM fellows, obstetrics nurses, and certified nurse midwives. After completing the introductory tutorial and quiz, participants were asked to interpret the FHR tracings and predict maternal and neonatal outcomes using both methods on our secure, password-accessible website. The order of tracings was scrambled. At 10-minute increments, the participants were asked to answer 7 questions (with provided answer choices):
- (1)
What is the baseline? (bradycardia/normal/tachycardia)
- (2)
What is the baseline variability? (absent/minimal/moderate/marked/sinusoidal)
- (3)
Presence and number of decelerations and or accelerations. (0/1/2/3/4/5)
- (4)
What is the umbilical artery pH? (<7.0/7.0–7.09/7.1–7.19/≥7.2)
- (5)
What is the 5-minute Apgar score? (0–3/4–6/7–10)
- (6)
Would you continue expectant management at this time? (yes/no)
- (7)
Would you deliver the patient at this time? (yes/no)
The same 7 questions were answered for each of the full 30-minute tracings. Each participant thus reviewed 240 10-minute strip segments, which represented 40 30-minute fetal heart tracings in the QH tracing system (3 cm/min speed) and the same 40 tracings in the CB tracing system (1 cm/min speed). Answer choices were predefined and presented in either a drop down menu or a list.
Their results were compared with those of Dr Quilligan and compared between groups. Analysis was performed with 3 measures: percent classification, Kappa and adjusted Gwet-Kappa (AC1; P < .05 was considered significant.) Standard Kappa interpretation was taken from Landis and Koch. Slight agreement was considered to be 0–0.2; fair agreement was considered to be 0.21–0.4; moderate agreement was considered to be 0.41–0.6, substantial agreement was considered to be 0.61–0.80, and almost perfect agreement was considered to be 0.81–1.0. For each group of examiners, averages of Kappa and AC1 scores were calculated and weighted by inverse variance; percent agreement was calculated by simple average.
A power calculation was not conducted before the pilot study because of the paucity of data that were available in the literature about the proportion of positive ratings of the different characteristics that are observed in fetal heart tracings. Previous studies have ranged from 3–5 reviewers who reviewed 50-100 10-minute segments.
Results
We invited up to 4 participants in each category; Table 1 lists the number of participants who completed the study in the allotted time frame. Participants were allowed 1 month to complete the study.
Category | Participants completing study, n |
---|---|
Maternal-fetal medicine expert | 3 |
Obstetrics nurse | 3 |
Maternal-fetal medicine practitioner | 3 |
Maternal-fetal medicine fellow | 4 |
Certified nurse midwife | 2 |
Overall, our results show from moderate-to-almost perfect agreement for all parameters, both between and within examiners, with a range of Gwet’s Kappa AC1 scores between 0.4 and 0.8 ( Table 2 ). The agreement at each stratum of practitioner was generally highest for ascertainment of baseline and for management, with the least agreement for assessment of variability. The only group that had a lower level of agreement was the obstetrics nurses. In the categories of management and delivery mode, their Gwet’s Kappa AC1 scores ranged from 0.069–0.387, which is slight-to-fair agreement. Because of a lack of definition of acceleration in the CB method and differences in definition of decelerations between the 2 methods, these 2 variables were excluded from this analysis.
Variable | Comparison | ||
---|---|---|---|
Expert-to-examiner with Hon-Quilligan | Expert-to-examiner with Caldeyro-Barcia | Examiner-to-self Hon-Quilligan to Caldeyro-Barcia | |
Maternal-fetal medicine experts | Percent (adjusted Gwet-Kappa) | Percent (adjusted Gwet-Kappa) | Percent (adjusted Gwet-Kappa) |
Baseline | 0.83 (0.783) | 0.85 (0.812) | 0.90 (0.90) |
Variability a | 0.49 (0.401) | 0.62 (0.565) | 0.67 (0.632) |
Deceleration b | 0.53 (0.497) | N/A | N/A |
Acceleration b | 0.78 (0.580) | N/A | N/A |
Apgar score 5’ | 0.80 (0.755) | 0.76 (0.691) | 0.87 (0.834) |
Delivery mode | 0.71 (0.440) | 0.68 (0.387) | 0.81 (0.654) |
Management | 0.69 (0.405) | 0.78 (0.565) | 0.81 (0.660) |
Maternal-fetal medicine practitioners | |||
Baseline | 0.76 (0.681) | 0.81 (0.751) | 0.83 (0.821) |
Variability | 0.50 (0.416) | 0.57 (0.510) | 0.58 (0.512) |
Deceleration b | 0.57 (0.540) | N/A | N/A |
Acceleration b | 0.74 (0.478) | N/A | N/A |
Apgar score 5’ | 0.78 (0.747) | 0.78 (0.729) | 0.85 (0.829) |
Delivery mode | 0.73 (0.474) | 0.69 (0.407) | 0.78 (0.637) |
Management | 0.75 (0.525) | 0.69 (0.403) | 0.70 (0.437) |
Maternal-fetal medicine fellows | |||
Baseline | 0.81 (0.719) | 0.79 (0.756) | 0.87 (0.838) |
Variability | 0.51 (0.430) | 0.59 (0.532) | 0.69 (0.668) |
Deceleration | 0.51 (0.470) | N/A | N/A |
Acceleration | 0.85 (0.693) | N/A | N/A |
Apgar score 5’ | 0.69 (0.623) | 0.65 (0.552) | 0.69 (0.622) |
Delivery mode | 0.71 (0.439) | 0.72 (0.486) | 0.77 (0.606) |
Management | 0.70 (0.448) | 0.73 (0.501) | 0.83 (0.748) |
Obstetrics nurses | |||
Baseline | 0.88 (0.837) | 0.61 (0.506) | 0.66 (0.599) |
Variability | 0.44 (0.335) | 0.51 (0.495) | 0.62 (0.618) |
Deceleration b | 0.48 (0.442) | N/A | N/A |
Acceleration b | 0.84 (0.685) | N/A | N/A |
Apgar score 5’ | 0.79 (0.751) | 0.67 (0.604) | 0.78 (0.806) |
Delivery mode | 0.53 (0.069) | 0.54 (0.101) | 0.59 (0.2698) |
Management | 0.68 (0.387) | 0.61 (0.251) | 0.63 (0.332) |
Certified nurse midwives | |||
Baseline | 0.89 (0.8498) | 0.80 (0.740) | 0.86 (0.836) |
Variability | 0.46 (0.363) | 0.55 (0.487) | 0.54 (0.466) |
Deceleration b | 0.51 (0.476) | N/A | N/A |
Acceleration b | 0.79 (0.525) | N/A | N/A |
Apgar score 5’ | 0.59 (0.513) | 0.65 (0.572) | 0.73 (0.669) |
Delivery mode | 0.75 (0.524) | 0.68 (0.393) | 0.81 (0.660) |
Management | 0.76 (0.536) | 0.80 (0.617) | 0.81 (0.657) |