Assessment of intraoperative judgment during gynecologic surgery using the Script Concordance Test




Objective


We sought to develop a valid, reliable assessment of intraoperative judgment by residents during gynecologic surgery based on Script Concordance Theory.


Study Design


This was a multicenter prospective study involving 5 obstetrics and gynecology residency programs. Surgeons from each site generated case scenarios based on common gynecologic procedures. Construct validity was evaluated by correlating scores to training level, in-service examinations, and surgical skill and experience using a Global Rating Scale of Operative Performance and case volumes.


Results


A final test that included 42 case scenarios was administered to 75 residents. Internal consistency (Cronbach alpha = 0.73) and test-retest reliability (Lin correlation coefficient = 0.76) were good. There were significant differences between test scores and training levels ( P = .002) and test scores correlated with in-service examination scores (r = 0.38; P = .001). There was no association between test scores and total number of cases or technical skills.


Conclusion


The Script Concordance Test appears to be a reliable, valid assessment tool for intraoperative decision-making during gynecologic surgery.


An obstetrics and gynecology resident’s knowledge is assessed via multiple-choice examinations, while their critical thinking skills and judgment are assessed via oral examinations. However, there are no valid and reliable methods to assess a surgeon’s judgment, ie, the ability to make decisions appropriate to an ambiguous clinical situation. For example, a resident may understand the indications for a surgical procedure (medical knowledge), and technically be able to perform the procedure (technical skill), but he or she may not be able to determine the appropriate next surgical step in an operative situation where several steps could be considered correct but one may be more appropriate to take before performing the others (surgical judgment).


Script Concordance Theory proposes a framework for assessing judgment by theorizing that clinicians initially process medical concepts via basic mechanisms of disease learned in medical school. As one progresses through training, a bank of cases based on engaging in patient care (ie, experience) is formed. One then creates knowledge and understanding to develop what have been referred to as “illness scripts.” According to this theory, people activate these previously acquired networks of knowledge when confronted with a new yet somewhat similar patient scenario. Expert clinicians develop this extensive pattern recognition based on experience working through the regular tasks and scenarios they encounter in practice.


The Script Concordance Test (SCT) was developed using these principles to assess clinical reasoning skills for ill-defined clinical problems; the SCT has been used to assess judgment in surgery, urology, radiology, and internal medicine. Compared to standard multiple-choice questions where there is 1 correct answer, scenarios using SCT are created where there is no single correct answer, and items about each scenario force trainees to confirm or eliminate clinical hypotheses based on their qualitative judgment in the face of an uncertain situation. Assuming that a surgeon’s increasing experience improves judgment, a valid test should demonstrate that with increasing experience answers provided will come closer to those provided from a panel of experts. Such experience in a training program is usually demonstrated by increasing scores by resident training level.


The primary aim of this study was to develop a valid and reliable method to assess intraoperative judgment by obstetrics and gynecology resident trainees during gynecologic surgery using SCT. The resulting scores could then be used to assess the extent to which knowledge and understanding of gynecologic surgery meet the constraints and complexities of real-life complex operative scenarios that require the surgeon to apply his or her judgment. Secondary aims were to determine whether SCT scores correlate with other measures of knowledge and understanding such as standardized test scores from in-service examinations as well as technical skill as measured by the Global Rating Scale (GRS) of Operative Performance, surgical case numbers, and self-assessment of surgical judgment.


Materials and Methods


This was a multicenter, prospective study involving 5 obstetrics and gynecology residency programs in the United States and Canada: Cleveland Clinic (Cleveland, OH); Los Angeles County-University of Southern California Medical Center (Los Angeles, CA); IWK Health Center, Dalhousie University (Halifax, Nova Scotia, Canada); Brooke Army Medical Center (San Antonio, TX); and Magee Women’s Hospital (Pittsburgh, PA). Institutional review board approval was obtained at all participating sites. All residents at all participating programs were approached for enrollment during their resident didactics by a study coordinator, informed consent was obtained, and trainees’ identities were concealed using random number assignments.


The study was composed of 2 phases: phase 1 included item selection and test development and phase 2 included validity and reliability testing. Phase 1 involved developing the initial pool of scenarios for the SCT instrument, which were generated from the current literature on surgical education in gynecology and gynecologic oncology, as well as from the guidelines for resident education from the American College of Obstetricians and Gynecologists and the Council on Resident Education in Obstetrics and Gynecology (CREOG). Three main principles to constructing a SCT were used: (1) an authentic challenging clinical situation is presented in which there are several relevant options; (2) responses follow a Likert scale that reflects script clinical reasoning theory; and (3) scoring is based on the aggregate scoring method to take into account the variability of the clinical reasoning process among a series of experts.


According to previous work by Charlin et al, 50-60 items are sufficient to achieve internal reliability with an alpha coefficient of 0.80. Assuming scenarios or items would be eliminated, gynecologic surgeons from each participating site generated 96 case scenarios that occur during common gynecologic surgical procedures including 2-4 items per scenario. All scenarios were written to involve intraoperative decision-making and were designed to be ambiguous with no 1 correct answer. Common procedures were defined as those surgical procedures a resident should be able to perform determined by the CREOG Education Objectives. The initial goal was to generate around 100 scenarios in anticipation of applying the following criteria to eliminate scenarios and items: >5% missing answers, minimum concordance >50%, maximum concordance <80%, items demonstrating floor or ceiling effects, or >80% agreement on experts’ scoring on a particular answer.


Scenario design and scoring


The test design and scoring process is unique to SCT. Figure 1 is an example of a typical scenario with 3 items. Scenarios are constructed to reflect authentic challenging clinical situations, and multiple possible correct answers, which distinguish this questionnaire from the standard multiple-choice or oral examinations. The scoring process is based on the principle that any answer given by an expert has an intrinsic value, even if the other experts do not agree. A target number of 10-20 experts was identified, based on a prior study demonstrating that at least 10 experts are necessary for acceptable reliability, and that >20 experts shows negligible additional benefit in terms of psychometric properties. Credits for each answer are transformed proportionally to obtain a maximum score of 1 for answers most frequently endorsed by the expert panel for each item; other experts’ choices are given partial credit. Answers not chosen by experts receive a score of 0. For example, if 6 of 10 experts choose response –1 to item no. 1 in Figure 1 , this choice receives 1 point (6/6). If 3 experts choose response 0, this choice receives 0.5 point (3/6). If 1 expert chooses –2, this choice is assigned 0.167 (1/6). The responses +1 and +2 are assigned 0 points ( Figure 2 ). The scoring system should reflect a range of potential scores and the expected distribution should be broad. However, ideally the distribution should also be clustered around a mean, because too broad a distribution invalidates the question. Additionally, if only 1 answer is chosen by all the experts, the SCT becomes a multiple-choice question and should not be used. As such, this represented an additional item elimination criterion in the test development.




FIGURE 1


Example of single scenario with 4 items designed using Script Concordance Theory to assess intraoperative judgment in gynecologic surgery

Park. Intraoperative judgment assessment using the SCT. Am J Obstet Gynecol 2010.



FIGURE 2


Scoring system for Script Concordance Test

Park. Intraoperative judgment assessment using the SCT. Am J Obstet Gynecol 2010.


For this study, the scoring system was derived by answers generated by designated experts in gynecologic surgery from multiple sites (total n = 17; 2-6/site) using a weighted aggregate scoring method and Likert-type scale. Experts were selected by the principal investigator at each site for their operating experience and reputation. The total score for the test was the sum of credits on all items. For the convenience of interpretation, the scores were transformed so that the maximum score was 100.


Scores on the CREOG examination, a standardized, multiple-choice knowledge-based examination implemented yearly at most obstetrics and gynecology training programs in the United States and Canada, were collected to assess evidence of concurrent validity, or the relationship between knowledge and understanding and judgment. Surgical volume and performance of technical skill using a GRS of Operative Performance were also used to investigate the relationship between performance of surgical skill and operative judgment. The GRS is a validated instrument to assess surgical technical skills in the operating room, and includes 7 domains covering respect for tissue, time and motion, instrument handling, knowledge of instruments, flow of operation, use of assistants, and knowledge of specific procedure. The scale for each domain ranges from 1–5, with the maximum score of 35. Finally, trainees were asked to self-assess their own intraoperative judgment by marking an “X” on a 10-cm line on how one would rate his or her decision-making skills in the operating room as compared to an expert surgeon.


Internal consistency was evaluated by computing Cronbach alpha, and test-retest reliability was evaluated using Lin concordance correlation coefficient. Construct validity was evaluated by determining the questionnaire’s ability to discriminate levels of training using Spearman correlation as presumably a trainee’s judgment improves with increasing experience, particularly early in the learning experience. Test scores were correlated to CREOG scores, surgical volume, technical skills using GRS, and self-assessment of intraoperative judgment using Pearson correlation coefficient. A receiver operating characteristic (ROC) curve was generated to determine what threshold separated experts from nonexperts on performance of the test. Since we had a binary state of either resident or expert take the test and we had a continuous predictor variable on which to classify, the ROC curve was used. The threshold given denotes the score that most accurately predicts whether a resident reaches the level of judgment based on the test performance by the pool of experts. This score is not intended to be a minimum cutoff score to determine competence, as this would require different standard-setting methods.


Based on prior data looking at the validity of the SCT in assessing intraoperative decision-making skills in general surgery, a sample size of 76 residents of a possible 124 residents from all participating sites was required to achieve 82% power to detect a 5% difference ± 6% (SD) between training levels using a 1-way analysis of variance with an alpha level of 0.05. All statistical analysis was performed on JMP 8.0 (SAS Institute, Cary, NC), SAS 9.1 (SAS Institute), and R 2.7.2 (R Foundation for Statistical Computing, Vienna, Austria) software.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jul 6, 2017 | Posted by in GYNECOLOGY | Comments Off on Assessment of intraoperative judgment during gynecologic surgery using the Script Concordance Test

Full access? Get Clinical Tree

Get Clinical Tree app for offline access