Establishing cutoff scores on assessments of surgical skills to determine surgical competence




Objective


The aim of this study was to establish minimum cutoff scores on intraoperative assessments of surgical skills to determine surgical competence for vaginal hysterectomy.


Study Design


Two surgical rating scales, the Global Rating Scale of Operative Performance and the Vaginal Surgical Skills Index, were used to evaluate trainees while performing vaginal hysterectomy. Cutoff scores were determined using the Modified Angoff method.


Results


Two hundred twelve evaluations were analyzed on 76 surgeries performed by 27 trainees. Trainees were considered minimally competent to perform vaginal hysterectomy if total absolute scores (95% confidence interval) on Global Rating Scale = 18 (16.5–20.3) and Vaginal Surgical Skills Index = 32 (27.7–35.5). On average, trainees met new cutoffs after performing 21 and 27 vaginal hysterectomies, respectively. With the new cutoffs applied to the same cohort of fourth-year obstetrics and gynecology trainees, all residents achieved competency in performing vaginal hysterectomy by the end of their gynecology rotations.


Conclusion


Standard-setting methods using cutoff scores may be used to establish competence in vaginal surgery.


To carry out their charge of protecting the public, licensing and certifying organizations must develop and administer assessment instruments that distinguish between trainees with adequate and inadequate levels of knowledge and skill. Unfortunately, surgical skills in obstetrics and gynecology are not directly assessed by these organizations. Although the American Board of Obstetrics and Gynecology requires a trainee to pass a written and oral examination that allows them to demonstrate they “know how” to perform surgical skills, it assumes that trainees can appropriately “perform” these skills. There is no requirement for these organizations to directly observe performance; rather, direct observation during surgery occurs by supervising surgeons during residency or fellowship. Trainees are often deemed competent to perform procedures based on nonvalidated, subjective or objective global assessments and case logs. However, these forms of assessment do not provide direct evidence that an individual competently performs a skill, because case logs lack content validity and most forms of global assessment have poor reliability and unknown validity.


Therefore, the challenge continues for surgeon educators to establish more robust methods of determining surgical competency that truly reflect safe and effective surgical care. Competency-based learning begins with setting standards. There are several methods of standard setting and they can be broadly categorized as relative (norm referencing) or absolute (criterion referencing). Norm referencing describes an individual’s performance relative to his or her position within a group. For example, a resident is judged by comparison to the scores achieved by his or her resident colleagues on the same test. Although this is the most common method of referencing, it aims to rank trainees and allows trainees to be compared with one another. However, it cannot provide a clear assessment of the trainee’s abilities, because there will always be a fixed number who fail. Moreover, norm referencing encourages competition, not cooperation, and it is somewhat unstable, as it will shift according to the performance of the norm group. Criterion referencing provides a clear definition of what the trainee should be able to do and it provides a given standard to indicate a competent level of performance. Criterion referencing is also more responsive to the subject matter being taught and allows the teacher and learner to clearly pinpoint capabilities.


A common procedure for obstetrics and gynecology trainees to learn and become competent performing is hysterectomy. One in 5 hysterectomies in the United States is performed using the vaginal approach. Vaginal surgery is considered the approach of choice for most patients requiring hysterectomy, because morbidity appears to be lower with the vaginal approach than with any other method. Therefore, it is imperative for obstetrics and gynecology trainees to be competent performing vaginal hysterectomy. Our group recently determined the validity and reliability of 2 assessment scales that resident and fellow training programs may use to objectively assess intraoperative vaginal surgical skills: the Global Rating Scale (GRS) of operative performance, developed by Reznick et al, and the Vaginal Surgical Skills Index (VSSI), specifically designed for the evaluation of vaginal surgical skills. The primary aim of this study was to use formal standard-setting techniques to establish credible and defensible minimum cutoff scores on these intraoperative assessments of surgical skills to determine competence performing vaginal hysterectomy.


Materials and Methods


Institutional review board exemption was obtained, as this project involved the use of educational tests and did not affect the clinical course of patients. This is a supplemental study to our original study in which we determined the reliability and validity of 2 scales that can be used to assess trainees performing vaginal surgery: the new VSSI and the GRS of operative performance. To summarize our original study, the GRS was developed by Reznick et al and consists of a 7-item global rating scale that allows supervising surgeons to directly rate important but generic skills during surgical procedures. The GRS is currently the only recommended assessment instrument listed in the Accreditation Council for Graduate Medical Education’s Assessment Toolbox for assessing surgical skills. This instrument has good internal consistency (Cronbach’s alpha = 0.95) and acceptable intrarater reliability (intraclass correlation coefficient [ICC], 0.64) but low interrater reliability (ICC, 0.31) for assessing a trainee’s surgical skills while performing vaginal surgery. The VSSI consists of 13 surgical principles. The VSSI also has good internal consistency (Cronbach’s alpha = 0.96), acceptable intrarater reliability (ICC, 0.82), and better interrater reliability (ICC, 0.53) than the GRS during the assessment of trainees performing live vaginal surgery. Both instruments appear to be valid assessment scales for vaginal surgery, because they demonstrate convergent validity (how closely a new scale is related to other measures of the same construct to which it should be related) and discriminate validity (ability to distinguish between training levels).


The GRS (range, 0–35) and VSSI (range, 0–52) were used to evaluate postgraduate trainees while performing vaginal hysterectomy in obstetrics and gynecology from 2 academic medical centers. A 10-cm visual analog scale (VAS) indicating a trainee’s overall level of surgical performance was also completed. A higher score on all scales indicates better performance. Vaginal hysterectomies were performed by obstetrics and gynecology residents and Female Pelvic Medicine and Reconstructive Surgery fellows (postgraduate years [PGY], 1–7) between May 2007–June 2008 and were observed live and video recorded to validate a new intraoperative surgical scale for vaginal surgery, the VSSI. Trainees performed a vaginal hysterectomy while the procedure was videotaped in a blinded, standardized fashion. An expert surgeon scored the trainee using all 3 assessment scales immediately after the procedure and again 4 weeks after the procedure using the videotape. A second blinded surgeon at a third participating institution evaluated all the videotapes using the same scales.


Because the aim of this study was to establish credible and defensible minimum cutoff scores on these 2 valid and reliable intraoperative assessments of surgical skills, methods of credible standard setting and procedures for establishing defensible absolute passing scores on performance examinations in health profession education were applied to the assessment scales. Content expert surgeons included 7 gynecologic surgeons representing the East, South, Midwest, and West Coast of the United States from 3 different academic medical centers, including: 4 from the Cleveland Clinic, 1 from Mayo Clinic, 1 who recently located to the Cleveland area from the University of Tennessee at Knoxville, and 1 surgeon from the University of California, San Francisco. Two of 7 experts were women, all were board-certified obstetricians/gynecologists, and 4 of the 7 finished an American Board of Obstetrics and Gynecology/American Urological Association-approved fellowship in Female Pelvic Medicine and Reconstructive Surgery. All experts were familiar with assessment tools, curricula, and trainees. Historically, most standard-setting studies have demonstrated that judges, absent all performance data, tend to set unrealistically high passing scores, which will fail an unreasonably high proportion of trainees. To avoid this, it is recommended that experts are “calibrated” for the standard-setting methods to have a realistic expectation of actual trainee performance by first-hand observation of the actual scale scores for trainees and in-depth discussion of differences between competence and expertise. This was performed for this study.


Most absolute standard-setting methods are based around the concept of a borderline trainee’s performance. The concept of a borderline trainee originated from Angoff and assumes that a borderline trainee is one who has an exactly 50:50 probability of passing or failing the assessment. Therefore, a minimum cutoff score is 1 that separates those who are competent and those who are not. For this study, cutoff scores were determined using a single method, the Modified Angoff method, and confirmed using scores derived from 2 additional standard-setting methods, the Contrasting Groups method and Hofstee method. Each of these standard-setting methods is described later.


In this study, the Modified Angoff method began as the experts discussed the characteristics and gave examples of a borderline trainee performing vaginal hysterectomy. Experts came to an agreement on these borderline characteristics, and each expert determined the score a minimally competent trainee should get on each item on the 3 assessment scales. A discussion among the experts then ensued based on each judge’s rating. Experts were given the opportunity to change their scores if desired. Scores were averaged for each item and summed to determine the minimum passing score for this method. To determine the level of agreement of the new cutoff scores between experts, the interobserver reliability was assessed using the ICC.


The minimum cutoff scores obtained by the Modified Angoff approach were then compared with scores obtained using 2 additional methods of standard setting. Furthermore, because volume of surgical cases is often clinically used as a surrogate of determining competence, we elected to use volume alone as a possible separating characteristic in 1 of the standard-setting methods. In our experience, the volume for vaginal hysterectomy is approximately 20 cases. This clinical impression is supported by a recent investigation looking at the development of proficient operating times using a newly implemented robotic approach for hysterectomy with lymph node dissection by gynecologic oncologists. Using the Contrasting Groups method, we divided trainees into competent (≥20 vaginal hysterectomies) and noncompetent (<20 vaginal hysterectomies). The distributions of scores were plotted from the 3 intraoperative scales, and the Contrasting Groups method was applied to each scale. The passing score was set at the intersection of the distribution of the 2 groups assuming false-negative and false-positive errors were of equal weight. The cutoff scores are the ones that best discriminate between the 2 groups. The Contrasting Groups method has been previously applied to assessing laparoscopy skills and is the basis for the widely used cutoff values in the Fundamentals of Laparoscopy Skills curriculum required of all graduating General Surgery residents.


The Hofstee method of standard setting is sometimes referred to as the “relative-absolute compromise method,” because it combines features of both relative and absolute standard setting. For this method, the actual distributions of scores, including the mean, standard deviations, and quartiles, were presented in itemized and graphical form and discussed with the experts. After experts came to an agreement on the characteristics of a borderline trainee, 4 Hofstee questions were presented and discussed, with each judge understanding the implications of each question: What is the lowest acceptable percentage of trainees to fail the procedure (minimum fail rate); What is the highest acceptable percentage of trainees to fail the procedure (maximum fail rate); What is the lowest acceptable percent-correct score for each assessment scale that allows a borderline trainee to pass the procedure (minimum passing score); and What is the highest acceptable percent-correct score for each assessment scale that allows a borderline trainee to pass the procedure (maximum passing score)? The mean percentage across all experts was calculated and each point was plotted as a line on a cumulative frequency distribution of scores for each scale. The midpoint of the minimum and maximum failure rates and pass scores represented the overall competency cutoff score for this method for all experts. All analyses were performed using JMP 7.0 (SAS Institute, Cary, NC).




Results


The 212 evaluations were analyzed on 76 surgeries performed by 27 trainees, and all evaluations were assumed to be independent of one another for purposes of this analysis. The Table summarizes cutoff scores on the 3 assessment instruments using all 3 standard-setting methods. Based on the Modified Angoff method, trainees should be considered minimally competent to perform vaginal hysterectomy if total absolute scores on VSSI = 32 (95% CI, 27.7–35.5), GRS = 18 (95% CI, 16.5–20.3), or VAS = 51 (95% CI, 39.6–62.4). The level of agreement of the new cutoff scores between experts (the interobserver reliability) was high, with the ICC = 0.81.


Jul 7, 2017 | Posted by in GYNECOLOGY | Comments Off on Establishing cutoff scores on assessments of surgical skills to determine surgical competence

Full access? Get Clinical Tree

Get Clinical Tree app for offline access