Objective
To evaluate the effect of simulation training vs traditional hands-on surgical instruction on learner operative skills and patient outcomes in gynecologic surgeries.
Data Sources
PubMed, Embase, ClinicalTrials.gov , and the Cochrane Central Register of Controlled Trials from inception to January 12, 2021.
Study Eligibility Criteria
Randomized controlled trials, prospective comparative studies, and prospective single-group studies with pre- and posttraining assessments that reported surgical simulation-based training before gynecologic surgery were included.
Methods
Reviewers independently identified the studies, obtained data, and assessed the study quality. The results were analyzed according to the type of gynecologic surgery, simulation, comparator, and outcome data, including clinical and patient-related outcomes. The maximum likelihood random effects model meta-analyses of the odds ratios and standardized mean differences were calculated with estimated 95% confidence intervals.
Results
Twenty studies, including 13 randomized controlled trials, 1 randomized crossover trial, 5 nonrandomized comparative studies, and 1 prepost study were identified. Most of the included studies (14/21, 67%) were on laparoscopic simulators and had a moderate quality of evidence. Meta-analysis showed that compared with traditional surgical teaching, high- and low-fidelity simulators improved surgical technical skills in the operating room as measured by global rating scales, and high-fidelity simulators decreased the operative time. Moderate quality evidence was found favoring warm-up exercises before laparoscopic surgery. There was insufficient evidence to conduct a meta-analysis for other gynecologic procedures.
Conclusion
Current evidence supports incorporating simulation-based training for a variety of gynecologic surgeries to increase technical skills in the operating room, but data on patient-related outcomes are lacking.
Introduction
There is an increasing need for additional teaching opportunities such as simulation training to help surgeons in training obtain surgical proficiency. These modalities allow trainees to acquire surgical skills in a simulated environment instead of with real patients, and thus offer the possibility of developing and improving surgical technique while minimizing the risk of patient harm. ,
Surgical simulation training may be accomplished in several ways, including using human cadavers, animal models, or constructed replicas. Depending on how closely the simulators mimic real procedures, they can be classified as either low- or high-fidelity. Low- fidelity models include bench-top and laparoscopic box simulators, whereas high-fidelity mockups include virtual reality trainers and robotic simulators. Other strategies include mental imagery (which encourages learners to rehearse in their minds the actual movements of each step of the procedure) and warm-up exercises with simulators just before the procedure. These latter ancillary modalities of preparation for surgery are less complex than a surgical simulation program that allows for repetition in advance of the surgical procedure. However, they are also aimed at enhancing surgical performance.
Because of the potential benefits of simulation training and the limitations imposed by duty hour restrictions and the diversity of gynecologic surgery, residency and fellowship programs are increasingly incorporating simulation modalities. However, training programs recognize that simulation education requires a thoughtful, organized, and structured curriculum. Accordingly, the American Board of Obstetrics and Gynecology requires successful completion of the Fundamentals for Laparoscopic Surgery certification for physicians seeking board certification who graduated from residency after May 2020. Most published studies of surgical simulation training reflect the needs of general surgery learners and address general surgery procedures. There is a paucity of published studies describing the impact of simulation training in gynecologic surgery. Our aim was to evaluate the effect of simulation training vs traditional hands-on surgical instruction on learner operative skills and patient outcomes in gynecologic surgeries by performing a systematic review and meta-analysis.
Why was this study conducted?
This systematic review aimed to assess the impact of surgical-simulation-based training on surgical skills with outcomes measured in the operating room. Patient-level outcomes were also determined.
Key findings
Low- and high-fidelity laparoscopic simulation-based training improves surgical skills, which are measured with global rating scales in the operating room. High-fidelity simulation-based training is associated with a reduced operating time. However, patient outcome level data are lacking, and there was insufficient evidence to conduct a meta-analysis for other gynecologic procedures.
What does this add to what is known?
This is a uniquely comprehensive systematic review of simulation training in gynecologic procedures.
Materials and Methods
The review was conducted in accordance with the standards consistent with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. The protocol was registered with PROSPERO (registration number CRD42020143327). The study was considered exempt from institutional review board approval because it was deemed as nonhuman subject research. An electronic search using the databases of PubMed, Embase, ClinicalTrials.gov , and the Cochrane Central Register of Controlled Trials from inception to January 12, 2021 was performed by the Systematic Review Group of the Society of Gynecologic Surgeons (SGS) to identify studies addressing simulation teaching (including training, education, coaching, video, virtual reality, animal and anatomy models, and mental imagery) related to gynecologic surgery (including laparoscopic, open, or robotic gynecologic surgery; cystoscopy; obstetrical anal sphincter injuries; vaginal surgery; and basic surgical skills). Articles were identified using a combination of the following medical subject heading terms: training and gynecologic surgery. References from retrieved articles were hand-searched for additional articles. The complete search strategy is presented in Appendix 1 and is available online at http://∗∗∗.
Study selection
Studies were eligible for inclusion if they included gynecologic surgeons and learners at various stages of training (ie, obstetrics and gynecology [OBGYN] residents, fellows, and physicians in practice). Surgeons from other specialties testing simulation or general surgical skills applicable to gynecologic surgery (eg, knot tying and general laparoscopic skills) were also included. However, studies specific to nongynecologic procedures (eg, hernia repair, cholecystectomy, and prostatectomy) and studies from the veterinary literature were excluded.
The interventions of interest included any type of gynecological surgical simulation including laparoscopic simulators (low- or high-fidelity), animal models, vaginal surgery simulators, inanimate models, and other types of preoperative training such as preoperative warm-up exercises or mental imagery. The comparators of interest included traditional (nonsimulation) training or different types of simulation. The outcomes of interest were categorized using the 3 translational science categories routinely applied in medical education as follows: T1 (skills in the laboratory or testing setting), T2 (skills in the clinical setting), and T3 (patient-level outcomes). , Examples of laboratory setting outcomes include time to complete the simulation tasks and performance in skills evaluated while using the simulator. Examples of outcomes in the clinical setting included operating time and any formal evaluation of skills with a global rating scale for each procedure such as the Objective Structured Assessment of Technical Skills (OSATS) score and the Global Operative Assessment of Laparoscopic Skills (GOALS) score assessed in the operating room. Examples of patient-level outcomes included estimated blood loss, patient quality of life after surgery, and adverse events. Laboratory setting outcomes were not assessed, because the focus of this systematic review centered on clinical outcomes. Comparative study designs including randomized controlled trials (RCT), nonrandomized comparative studies (NRCS), and prospective single group studies that included pre- and posttraining assessments (comparative controls) were included. Conference abstracts of comparative designs were also included.
Screening and data extraction
Abstract screening was performed using Abstrackr (abstrackr.cebm.brown.edu/). After a pilot round of training, the abstracts were screened in duplicate for inclusion by a group of 11 reviewers. Discrepancies were resolved by a third independent reviewer. The qualified articles were retrieved in full and assessed for inclusion. The same 11 reviewers rescreened potentially relevant full-text articles in duplicate and extracted data from eligible studies. The studies were extracted into a customized Excel spreadsheet by 1 reviewer and checked by at least 1 other reviewer.
Assessment of risk of bias
Each article’s methodological quality was graded as good (A), fair (B), or poor (C) on the basis of risk of bias assessment per the Cochrane Risk of Bias tool (for RCTs) and selected items from the ROBINS-I Tool (for NRCS and prepost studies). An overall quality grade was assigned to each study (and agreed on by at least 2 team members); the quality pertaining to specific outcomes was downgraded as needed on the basis of methodological deficiencies specific to the outcomes.
Data synthesis
Using the Grading of Recommendations, Assessment, Development, and Evaluations (GRADE) approach, evidence profiles were created for each surgery type, simulation comparison, and outcome combination for which there were at least 2 studies. The overall quality of the evidence for each outcome was graded as high, moderate, low, or very low on the basis of the methodological quality of the studies, consistency across studies, precision of effect estimates, and directness of the studies to the research question. The importance of patient-level outcomes was categorized as critical, and the 1 for clinical setting outcomes was categorized as high for this review.
When feasible, restricted maximum likelihood random effects model meta-analyses of the odds ratios and standardized mean differences (SMD) were calculated. SMD measures how many standard deviations away from a zero effect the observed effect falls. We utilized the following thresholds: standardized mean difference (SMD) <0.2 no effect, 0.2–0.49 small effect, 0.5–0.79 moderate effect, and ≥0.8 large effect. For studies that reported median and interquartile or full-range data, the mean and standard error was estimated on the basis of statistical analysis reported by either Wan et al or Hozo et al. Consistency was determined on the basis of the I 2 statistic from meta-analysis. We categorized I 2 >75% as high heterogeneity (low consistency), that from 25% to 75% as moderate heterogeneity and consistency, and that <25% as low heterogeneity (high consistency).
The review’s findings were presented for public comment at the SGS annual scientific meeting in Tucson, AZ in March 2019, reviewed by the SGS board of directors, and distributed to SGS membership for comment or critique before publication.
Results
The literature search yielded 12,651 citations, of which 913 abstracts were deemed potentially relevant. Of these, 893 were rejected in full-text ( Figure 1 ). The eligible 20 studies included 13 RCTs, 1 randomized crossover trial, 5 NRCSs, and 1 prepost study. Gynecologic procedures of interest that were simulated in these studies included the following: laparoscopy (14 studies), cystoscopy (2 studies), vaginal hysterectomy (1 study), robotic-assisted laparoscopy (1 study), midurethral sling (1 study), and several different procedures (1 study).
Among the 14 RCTs, only 9 adequately described the specific randomization strategy, and none of them was considered to be at high risk of selection bias. None of the RCTs were at high risk of ascertainment bias for lack of blinding outcome assessment. Two RCTs were at risk of attrition bias. , All 6 observational studies were at an increased risk of selection bias given their study designs ; 3 were at risk of assessment bias for lack of blinded outcome assessment, , and 2 were at high risk of attrition bias. , None of the observational studies adjusted for differences between the compared groups.
Four of the 20 studies described a training to a prespecified level of proficiency before assessing outcome in the operating room. , , , In general, the prospective studies lacked a description of cointerventions (eg, how many operating room cases each subject performed concomitantly during the study) or the compliance with the training programs. Three RCTs reported some lack of compliance with the assigned intervention of mental imagery , and warm-up exercises. In total, only 4 studies included patient-level outcomes such as estimated blood loss (EBL) and complications. , , ,
Laparoscopy
Across 14 eligible studies evaluating laparoscopy training, , , , , most evaluated the use of laparoscopic simulators ( Table ); one utilized a porcine training model. The extent of simulation training ranged from a single 2-hour session to a full training curriculum performed over 6 months (presented in Appendix 2 , available online at http://∗∗∗). All but 2 studies used adnexal surgery as the index procedure for the evaluation of clinical setting outcomes in the operating room; one study used a laparoscopic hysterectomy, and the other study used extracorporeal peritoneal suturing.
Study country | Study design (Quality) A /Risk of bias | Population | Simulator B (N Analyzed) | Comparator (N Analyzed) | Assessment method/Surgical Procedure | Outcome (score range or unit) | Results |
---|---|---|---|---|---|---|---|
Laparoscopy: high-fidelity vs usual teaching | |||||||
Larsen et al, 2009 Denmark | RCT (A) | Lower level residents | LapSimGYN (N=11) | Usual teaching (N=10) | Blinded, video of salpingectomy | OSA-LS score in OR (10–50) | Median (range) Sim: 33 (25–39) Cx: 23 (21–28) P =.001 |
Total operation time in OR C (min) | Sim: 12 (6–24) Cx: 24 (14–38) P <.001 | ||||||
Ahlborg et al, 2013 Sweden | NRCS (C) D Large attrition rate | Residents, level nonspecified | LapSim GYN and 3 BTLs in the OR (N=7) | Usual teaching (N=5) | Blinded, video of BTL | Operative time in OR E (sec) | Mean (IQR) Sim: 340 (285–537) Cx: 760 (573–1218) P <.001 |
Janssens et al, 2015 Australia | Prepost (C) Nonblinded outcome assessment Unclear total length of simulation | Residents, level nonspecified | LapMentor II (N=16) | Usual teaching F (N=15) | Nonblinded, medical records review of adnexal surgeries | Total operation time in OR (min) | Mean (SD) Sim: 75 (62) Cx: 86 (70) P =.28 |
Akdemir et al, 2014 Turkey G | NRCS (B) | Lower and upper level residents | LapSim (N=20 lower level residents) | Usual teaching (N=20 upper level residents) | Blinded, video of BTL | OSA-LS score in OR (5–25) | Mean (range) Sim: 17 (15–19) Cx: 11.5 (10–14) P <.01 |
Operation time H (sec) | Median (range) Sim: 340 (260–400) Cx: 425 (320–530) P =.003 | ||||||
Jokinen et al, 2019 | RCT (A) | Lower level residents | LapMentor | Usual teaching | Blinded, video of right side salpingectomy | OSATS score in OR (6–30) | Mean (SD) Sim: 16.4 (5.3) Cx: 16.2 (5.7) P=NS |
(N=10) | (N=9) | Operation time (sec) | Mean (SD) Sim: 14.6 (12.6) Cx: 12.6 (4) P =.349 | ||||
Bleeding stage (Visually graded from video 0–3) | Mean (SD) Sim: 0.8 (1) Cx: 0.3 (0.7) P =.515 | ||||||
Jokinen et al, 2020 | RCT (A) | All residents with experience in diagnostic laparoscopy and adnexal surgery | LapMentor | Usual teaching | Blinded, video of resident’s first laparoscopic hysterectomy as a surgeon | OSATS score in OR (6–30) I | Mean (SD) Sim: 17 (31) Cx: 11.2 (2.4) P =.002 |
(N=10) | (N=9) | Operation time (sec) | Mean (SD) Sim: 144 (20.8) Cx: 165 (44.9) P =.205 | ||||
EBL (mL) | Mean (SD) Sim: 133 (129) Cx: 121 (113) P =.907 | ||||||
Direct complications | N Sim: none Cx: 1 colon serosa lesion | ||||||
Laparoscopy: Low-fidelity vs usual teaching | |||||||
Gala et al, 2013 United States | RCT (A) | Residents, all levels | Box trainer (N=48) | Usual teaching (N=60) | Blinded, direct observation of bilateral salpingectomy | OSATS score in OR (7–35) | Mean (SD) Sim: 30 (3) Cx: 27.5 (5) P =.03 No interaction with level of training |
Bankset al, 2007 | RCT (A) | Lower level residents | Limbs and Things | Usual teaching (N=10) | Blinded, direct observation of BTL | OSATS score in OR (scaled to 100%) | Mean (SD) Sim: 64 (5) Cx: 45 (11) P =.003 |
United States | (N=10) | Task specific checklist in OR (% of items) | Mean (SD) Sim: 92 (7) Cx: 57 (20) P =.002 | ||||
Pass rate in OR (%) | Sim: 100 Cx: 30 P =.003 | ||||||
Coleman and Muller, 2002 United States | RCT (A) | Upper level residents | Box trainer (N=11) | Usual teaching (N=7) | Blinded, video of partial salpingectomy | GSAT score in OR (7–35) | Mean Sim: 21.7 Cx: 20.3 P =NS |
Antosh et al, 2012 | NRCS (B) | Upper level residents | TASKit | Usual teaching (N=5) | Blinded, video of peritoneum closure | OSATS score in OR (7–35) | Mean (SD) Sim: 29 (9) Cx: 23 (6) P =.28 |
United States | High attrition rate | (N=4) | GOALS score in OR (5–25) | Sim: 16 (5) Cx: 13 (3) P =.81 | |||
Suturing time in OR (sec) | Sim: 480 (64) Cx: 509 (252) P =.81 | ||||||
Akdemir et al, 2014 | NRCS (B) | Lower and upper level residents | Box trainer | Usual teaching | Blinded, video of BTL | OSA-LS score in OR (5–25) | Mean (range) Sim: 17 (16–18) Cx: 11.5 (10–14) P <.01 |
Turkey J | (N=20 lower level residents) | (N=20 upper level residents) | Operation time K (sec) | Median (range) Sim: 340 (270–430) Cx: 425 (320–530) P =.01 | |||
Laparoscopy: high fidelity vs low fidelity | |||||||
Akdemir et al, 2014 | RCT (A) | Lower level residents | LapSim | Box trainer | Blinded, video of BTL | OSA-LS score in OR (5–25) | Mean (range) High: 17 (15–19) Low: 17 (16–18) P =.71 |
Turkey L | (N=20) | (N=20) | Operation time M (sec) | Median (range) High: 340 (260–400) Low: 340 (270–430) P =.56 | |||
Laparoscopy: high-+low-fidelity vs usual teaching | |||||||
Shore et al, 2016 Canada | RCT (A) | Lower level residents | VR simulator, box trainer, cognitive training, nontechnical skills (N=14) | Usual teaching (N=13) | Blinded, video of left salpingectomy and intracorporeal knot tying in OR | OSA-LS in OR (max 50 points) | Median (IQR) Sim: 34 (29–37) Cx: 30 (27–35) P =.043 |
Laparoscopy: warm-up on simulator vs no warm-up | |||||||
Chen et al, 2013 United States | RCT (B) High attrition rate Crossover bias in 16% of cases | Residents, all levels | TASKIT laparoscopic trainer warm-up (N=46) | No warm-up (N=45) | Blinded, direct observation of minor and major surgical cases | Reznick OSATS in OR (range 7–35) N | Mean (SE) Sim: 22.6 (2.4) Cx: 19.5 (2.6) P =.001 |
Polterauer et al, 2016 | Randomized crossover (C) | Residents (n=4) and specialists (n=6) | VR simulator warm-up | No warm-up | Blinded, video of unilateral salpingo-oophorectomy | OSATS in OR (range not reported) | Sim: 19.8 (0.7) Cx: 18.6 (1.7) P =.51 |
Austria | High attrition rate | (N=10) | (N=10) | Operating time in OR (min) | Sim: 24.7 (5.6) Cx: 25 (5.6) P =.86 | ||
Outcome assessed in 1 side. Unclear possibility of cointervention bias if assessment was during the first or the second adnexa | Perioperative complications | ‘None documented’ |
A On the basis of overall assessment across risk of bias domains, good (A), fair (B), or poor (C)
B Appendix 2 provides descriptions of simulator training
C Plus 5 minutes for each recording
D Originally randomized to 3 groups: (1) Simulation, (2) Simulation+mentorship, (3) Control. Groups 1 and 2 merged because of loss to follow-up. Analyzed as a NRCS
E From entering the bipolar grasper until the last working instrument’s exit from the abdominal cavity
F “Pre” period, before implementation of simulation training
G This study is a RCT for low-fidelity vs high-fidelity comparisons and NRCS for low- fidelity vs control and high-fidelity vs control
H From resident holding instruments to removal of trocars
I OSATS in this table represent the global rating scale score used in the meta-analysis. Authors present also a procedure specific OSAT and a visual analog scale showing similar results
J This study is an RCT for low-fidelity vs high-fidelity comparisons and NRCS for low-fidelity vs control and high-fidelity vs control
K From resident holding instruments to removal of trocars
L This study is an RCT for low-fidelity vs high-fidelity comparisons and NRCS for low- fidelity vs control and high-fidelity vs control
M From resident holding instruments to removal of trocars
N Similar results on Vassiliou and Kundhal OSATS scales, though differences in Kundhal’s did not reach statistical significance.
Six studies evaluated T2 results for comparisons of high-fidelity laparoscopic simulators (LapSim GYN, LapMentor II) vs traditional training among OBGYN residents, including 3 RCTs , , and 3 NRCSs ( Table ) (Laparoscopy: high-fidelity vs usual teaching). , , A pooled analysis of 4 studies found a large effect of an increase in OSATS and the surgery-specific Objective Structured Assessment of a Laparoscopic Salpingectomy (OSA-LS) scores (SMD, 0.96; 95% CI, 0.23–1.68), corresponding to better surgical performance ( Figure 2 , A). , The consistency was moderate (I 2 64%). A pooled analysis of 6 studies found a consistent small effect of a decrease in the total operation time (SMD, −0.41; 95% CI, −0.74 to −0.07) ( Figure 2 , B). The overall quality of the evidence for both the outcomes was moderate (presented in Appendix 3 , available online at http://∗∗∗). Three of the studies included patient-level outcomes (EBL, direct complications). However, the numbers are small, and there were no differences. , ,