Abstract
The risk of finding an occult leiomyosarcoma (LMS) at surgery for presumed leiomyomas and subsequent outcomes for patients who have these tumours morcellated is a subject of conflict in gynaecology today. This dispute arose in 2013 from a single case in which a physician underwent surgery for fibroids and power morcellation was utilized. This physician ultimately was diagnosed with an LMS and she and her family began a media campaign and created a public forum to effect a ban on power morcellation. They argued that prevalence of these occult tumours was much higher than originally believed, and the use of power morcellation worsened outcomes. Originally, a friend of the physician performed a cursory meta-analysis using incomplete data suggesting that the rate of uterine sarcoma in women having surgery for presumed fibroids was 1 in 323. The US Food and Drug Administration (FDA) used similar but slightly different methodology and found a prevalence of 1 in 352 for uterine sarcoma, and 1 in 498 for leiomyosarcoma [1]. Neither analysis has been published in a peer-reviewed journal, and both represent incomplete and biased attempts at establishing a prevalence rate.
16.1 Introduction
The risk of finding an occult leiomyosarcoma (LMS) at surgery for presumed leiomyomas and subsequent outcomes for patients who have these tumours morcellated is a subject of conflict in gynaecology today. This dispute arose in 2013 from a single case in which a physician underwent surgery for fibroids and power morcellation was utilized. This physician ultimately was diagnosed with an LMS and she and her family began a media campaign and created a public forum to effect a ban on power morcellation. They argued that prevalence of these occult tumours was much higher than originally believed, and the use of power morcellation worsened outcomes. Originally, a friend of the physician performed a cursory meta-analysis using incomplete data suggesting that the rate of uterine sarcoma in women having surgery for presumed fibroids was 1 in 323. The US Food and Drug Administration (FDA) used similar but slightly different methodology and found a prevalence of 1 in 352 for uterine sarcoma, and 1 in 498 for leiomyosarcoma [1]. Neither analysis has been published in a peer-reviewed journal, and both represent incomplete and biased attempts at establishing a prevalence rate.
The FDA also performed a systematic review of the literature addressing outcomes after morcellation of these occult tumours. They found worse outcomes for women after ‘power’ morcellation compared to intact removal. In the majority of the cases they reported upon, the tumours were morcellated by ‘scalpel’ or by ‘hand’. Nevertheless, their numbers are repeatedly cited and are used as the argument against the use of power morcellation.
Two more groups then undertook meta-analyses and systematic reviews to address prevalence and outcomes. Both of these groups found very different prevalence rates and outcome data after morcellation compared with the FDA [2, 3].
What follows is a historical perspective and review of some fundamentals of evidence in medicine. The morcellation controversy will then be addressed and the best available evidence will be provided regarding true prevalence and outcomes after morcellation. This will be followed by a closer look at the reasons for disparities in the findings of the studies.
16.2 The History of Evidence-Based Medicine and Gynaecology
Obstetrics and gynaecology has long been viewed as the specialty that utilizes the lowest-quality research in formulating treatments for patients. In 1979, Archie Cochrane (of the Cochrane Library), awarded obstetrics and gynaecology the ‘wooden spoon’ for being the least evidence-based medical specialty. The origins of this prize stem from a tradition upheld at Cambridge University until the early twentieth century: an actual ‘wooden spoon’ was presented each year to the student with the lowest score on the mathematics examination. The implication was that the student was better equipped to become a cook than a scholar. Since this time, our specialty has not only produced better-quality research, but has begun to embrace evidence-based medicine. However, there are still examples of ‘junk science’ or ‘emotional science’ being utilized to influence the medical care of women.
Prior to 1980, Bendectin, an antiemetic made of vitamin B6 and an antihistamine, was used in early pregnancy for hyperemesis gravidarum. In 1975, a child was born to a woman that had used many medications, including Bendectin, in her pregnancy. The child had multiple anomalies, and she placed the blame on Bendectin.
This initially resulted in what can be called a legal blitzkrieg with thousands of plaintiffs claiming fetal anomalies due to this drug. This was rapidly followed by a large number of epidemiologic publications of poor quality. The initial reports supported the initial complaint, and due to the publicity of the cases and the cost of litigation, Bendectin was removed from the market. Gradually, however, better data, not influenced by special interests, began to surface. When the dust eventually settled, in fact, there was no convincing evidence that Bendectin was responsible for any type of birth defect. Bendectin has now resurfaced as a category B drug under the name Diclegis. It has been studied more thoroughly than any other drug for pregnant women, and has substantially decreased the suffering of women with hyperemesis gravidarum.
A second glaring example occurred less than 10 years later: the silicone breast implant controversy. Several women that had placement of silicone implants blamed them for autoimmune diseases they acquired. The FDA reviewed the available data and found no evidence to support the claim. However, as with Bendectin, a flurry of poor-quality studies and case reports surfaced, as well as a plethora of ‘expert’ clinical opinion, supporting the finding of harm with the use of these implants. In 2000, a report was published by the Institute of Medicine with conclusions drawn from high-quality evidence, showing no association between the implantations and autoimmune disease (or any other disease for that matter). However, the court of public opinion had already convened. As researchers that championed evidence-based medicine began speaking out against the claims of harm, they were harassed and threatened with lawsuits. Ultimately, Dow Corning, the makers of the implant, filed bankruptcy, and silicone-based medical technology and research came to a standstill.
16.3 The Current Controversy
Prior to April 2014, there were case reports of occult LMS and single-institution retrospective chart reviews as well as expert opinion regarding the true prevalence of these disorders, but no published meta-analysis or prospective database existed. In April 2014, the FDA issued a warning regarding the use of power morcellation during surgeries for presumed uterine fibroids. They presented an internal meta-analysis and systematic review of the literature. They claimed much higher prevalence rates of occult LMSs than previously presumed and worse outcomes for women when morcellation was utilized during surgery [1].
Since that time, two groups have performed and published meta-analyses addressing the prevalence of these occult tumours and systematic reviews exploring outcomes after unintentional morcellation [2, 3]. Both analyses used more accurate analytic methodology than that of the FDA, and results from these studies were similar to one another, but very different than those claimed by the FDA.
16.4 The Basics of Evidence-Based Medicine
Historically, we have relied less upon science and more upon opinion to formulate treatment plans. Until the early 1990s, clinicians solved clinical problems by considering their own experiences in treating the disease, by considering the underlying physiology and by referring to a textbook or local expert. This meant relying on our veteran physicians; their expert opinion ruled all. An older physician always had the last word when novice physicians were attempting to treat patients. However, attempts to validate this model failed repeatedly. Beginning in the 1970s, there was awareness of the value of answers generated via clinical research rather than mere expert opinion. This was further bolstered by a more thorough understanding of the different types of clinical trials and the relative value of each in determining optimal clinical pathways. It became widely recognized that randomized trials are the gold standard for evaluating clinical issues. Cohort studies are of secondary importance, with prospective data collection deemed more valuable than retrospective review. The hierarchical system of study value gradually became a central tenet of evidence-based medicine. It was clearly successful in that relying on our physiologic rationale continually failed to predict the results of randomized trials. This led to criticism of the practice of evidence-based medicine from those in the position of the authority (more accepted by younger, inexperienced physicians; helping them to have some credibility in their newfound field). Although it is hard for practitioners to admit, we are the largest source of bias in forming treatment plans for our patients [4].
As stated above, retrospective cohort studies are relatively low in the hierarchy of trusted outcomes. They are generally used as a step in the creation of scientifically valid patient treatment regimens. Retrospective cohort studies have numerous inherent biases and unmeasured confounders which serve to reduce their reliability.
An example of a bias at work is in the collection of prevalence rates, affecting the results in either direction. For example, someone with an occult LMS may have her chart sent directly to the hospital risk management department, rendering it inaccessible to the researchers. This could falsely lower prevalence rates. Conversely, retrospective studies can be initiated due to an index case. When the index case is included in the prevalence calculations, the resulting bias potentially can overestimate the rate of prevalence. By restricting or extending the range of years included in the chart mining, the numerator (or overall population count) could easily be manipulated. This was actually the case in several studies that have been included in several recent meta-analyses. The most significant came from a referral centre that identified an occult LMS during a surgery for presumed fibroids. The index case was included in the study, but the retrospective chart review went back only 2 years [5]. The prevalence rate for this particular study was disparate to other studies that included longer study periods and larger populations.
The origin of retrospective datasets may also confound results. Referral centres are different from the average medical centre in that patients are often much sicker or suffer from unusual maladies. Thus, clinical studies performed in such centres often draw from patient populations radically different from the general populations seen in community hospitals. When dealing with prevalence rates, relatively rare diagnoses are overrepresented, as is disease severity.
The best available evidence comes from studies in which the data are prospectively collected. It includes randomized trials and/or prospective cohort data. In comparing retrospective versus prospectively collected data, retrospective data collection is measurably less accurate than prospective collection. In some instances, more than 50% of the data is unavailable if collected in a retrospective fashion [6].
In prospective investigations, the data collection is begun at a predefined time point, consecutive cases are included and the data are uniformly collected on all patients until the study is completed. This lessens the confounding factors such as selection bias, patient exclusion and referral bias.
16.5 The Utility of the Meta-Analysis
Rigorously conducted systematic reviews and meta-analyses are widely recognized as among the highest standards of evidence for informed medical decision making [7]. This technique involves combining results of multiple studies in an attempt to discern a single best pathway or rate. With rare events, this may be the only reliable and accessible approach to formulate sound medical treatments.
Some have argued that prevalence rates for occult LMSs can be calculated from the current literature using the crude method of summing number of reported cases of disease, and dividing this by the total number of surgeries performed. However, the combination of data from multiple populations is not the same as data from a single large population undergoing sampling. The heterogeneity among studies for inclusion and exclusion, confounders, and even definitions of risk factors and outcomes leads to tremendous bias in calculating a crude prevalence [8, 9] Crude calculations are only appropriate if (1) each study was an independent and identically distributed measure of the overall population and (2) the variance of each study’s estimate is known. These conditions are rarely if ever met.
Heterogeneity among studies in a meta-analysis also dictates the type of analysis performed. When studies investigate the same population with the same research questions and structure, a fixed effects model can be used. The clear majority of most studies in meta-analyses are not designed to estimate similar populations and with the same questions, so some degree of statistical heterogeneity is likely. A random effects meta-analysis corrects for design differences.
There are also several random effects models from which to choose. Many choose the classical model, but this model does not correct for variation in study size. The FDA in their meta-analysis chose to use this model. Bayesian random effects meta-analysis has been used extensively for clinical decision making and policy analysis. This type of analysis is much more robust and automatically corrects for variation in study size. It has proven to be the method of choice for combining multiple studies to determine rates of rarely occurring diseases.
16.5.1 Prevalence: The Data
The meta-analysis presented by the FDA was executed by the Senior Advisor in the Office of Planning and Policy and has yet to be formally published. The FDA initially identified 41 studies, but only 9 met inclusion/exclusion criteria. Eight of the studies were retrospective; only one included prospectively collected data. Nineteen occult leiomyosarcomas were identified from their dataset comprised of 9,160 uterine fibroid surgeries, and the FDA utilized the classical meta-analytic model. The estimated prevalence rate of occult leiomyosarcoma was 2.01/1,000 or 1 occult leiomyosarcoma for every 498 surgeries for presumed fibroids [1].
The next meta-analysis was completed by our group, utilizing comparable study dates, from 1980 to early 2014. We included data from studies where pathology was confirmed for every study participant. Our database, however, was stratified based upon the quality of evidence. Sensitivity analyses confirming the validity and robustness of our calculations were also performed. We initially identified 4,864 candidate studies; excluding 3,844 after abstract review. The remaining 1,020 manuscripts were reviewed in their entirety. One hundred thirty-three publications fit the inclusion/exclusion criteria and comprised our evidence base. We identified 32 total occult leiomyosarcomas in 30,193 surgeries.
We initially analysed the information using only the prospectively collected data. These data were extracted from 64 published prospective analyses: 38 as prospective cohorts and 26 as part of randomized clinical trials. The women were undergoing a combination of hysterectomy and myomectomy. These analyses encompassed 5,223 women, with 3 leiomyosarcomas identified post-surgically. The prevalence of occult leiomyosarcoma using only data derived from prospective studies was 0.12 per 1,000 surgeries. Stated alternatively, surgeons can expect to find one occult leiomyosarcoma per each 8,300 surgeries for presumed benign uterine fibroids, with a 97.5% probability of being less than 0.75 per 1,000 surgeries.
Seventy published analyses with retrospective cohorts also qualified for our analysis, encompassing a total of 24,970 patients. There was a mix of patients undergoing myomectomy and hysterectomy. Of these, 29 were noted to have leiomyosarcomas post-surgically. The prevalence of occult leiomyosarcoma using all of the data derived from both prospective and retrospective databases was 0.51/1,000 surgeries. When including all the data, we confirmed that surgeons could expect to find one occult leiomyosarcoma per each 2,000 surgeries performed for fibroids, with a 97.5% probability of being less than 0.98 per 1,000 surgeries [2]. These data were reported to the FDA during the medical device special meeting in July of 2014, and subsequently published in peer-reviewed journals. If only hysterectomies and no myomectomies are included in the analysis, the rate remains similar at 0.55 per 1,000 surgeries (CI 0.06–1.3) or 1 in 1,818 procedures (D. Olive and D. Vanness, personal communication).
The third meta-analysis to estimate prevalence of occult leiomyosarcomas comes from an analysis by the Agency for Healthcare Research and Quality (AHRQ) of the Department of Health and Human Services that was published on 14 December 2017. They confirmed and validated our search analysis then expanded the search to include data published after our cut-off date in 2014. They were able to identify 539 more candidate publications and found another 24 retrospective studies, 2 prospective studies and a single randomized controlled trial that fit exclusion and inclusion criteria. They included studies with women undergoing either myomectomy and/or hysterectomy for presumed benign fibroid tumours where pathology results were available for all women in the publication. An additional 71,153 more retrospective cases were analysed along with the 24,970 cases in our original database. An additional 34,842 prospective cases were added to our original 5,230. Using the Bayesian analytical model, the predicted prevalence rate using only the prospective data was 2.1 per 10,000 surgeries or 1 occult LMS per every 4,761 surgeries and 8.5 per 10,000 surgeries when looking at retrospective data, or 1 occult tumour per 1,176 surgeries. They initially combined both prospective and retrospective datasets, but chose to separate them due to statistical heterogeneity making the approximations markedly different. Sub-analyses were performed using data that excluded hysteroscopy as well as restricting the analyses to prospective data in which they had high confidence of histopathologic evaluation for all subjects. Using all of the above exclusions, a prevalence risk of 0.5 occult LMSs per 10,000 surgeries or 1 occult tumour in each 20,000 surgeries was found [3].
Taken together, the more comprehensive meta-analyses reveal an estimated prevalence of LMSs in surgeries for presumed leiomyomas that is substantially less than that previously estimated by the FDA.
16.6 Exploring the Disparities in the Data
The variation in outcomes between the initial FDA analysis and the ensuing meta-analyses can be ascribed to both the statistical methodology employed and the base of evidence identified.
Classical meta-analytic techniques were utilized by the FDA, while Bayesian techniques were utilized by our group and the AHRQ. This accounted for 8% of the variation between studies.
The remaining 92% of the difference is attributed to the datasets themselves. The search terms used in the two comprehensive meta-analyses included any studies in which surgery was performed for presumed benign fibroids with histopathology explicitly provided for every subject. This strategy yielded 134 studies for our dataset and 160 for the AHRQ dataset. In contrast, to obtain their evidence, the FDA performed a targeted search using the search terms ‘uterine cancer’ AND ‘hysterectomy or myomectomy’ AND ‘incidental cancer or uterine prolapse, pelvic pain, uterine bleeding, and uterine fibroids’. By using the conjunction ‘and’ during their search, the term ‘uterine cancer’ was necessary in the title, abstract, or key words. Those studies not including the search term ‘uterine cancer’ in title, abstract or listed keywords would be overlooked. Indeed, this was the case: eight of the nine studies found in the FDA database contained at least one LMS; the ninth study used the term malignancy in the abstract.
We note that in the FDA’s review of the nine studies referenced, eight were retrospective studies and one was a report from prospectively collected data, while all nine in the unpublished dataset were retrospective. Such a preponderance of retrospective reports raises concerns of significant ascertainment bias in the resulting prevalence rate. The subsequent two analyses contained a sufficient number of both retrospective and prospective studies to allow analyses restricted to each, producing what we believe to be the most appropriate evidence base from which to calculate prevalence.
Another difference lies in the fact that only studies with more than 100 subjects were included in the evidence base compiled by the FDA, while the unpublished work required 50 subjects for inclusion; their reasoning was that this would reduce bias from smaller studies. Recognizing the arbitrary nature of any predefined size threshold, our preferred approach included eligible studies of all sizes, while using a statistical model that allowed weighting of each study according to its size and degree of statistical heterogeneity. This was possible due to the collaboration of both published groups with professors of biostatistics that specialize in analysis of complex clinical research and meta-analyses. The FDA did not have a statistician collaborate in their endeavours.
Third, the FDA and the unpublished analysis included only studies that exclusively examined procedures performed for presumed leiomyomas; if multiple indications were listed by the author of the study, it was excluded from their evidence base and was unavailable for analysis. However, many publications containing multiple indications for surgery contained unequivocal information about those women with a primary surgical indication of fibroids and the data were easily extractable. They were included in the published evidence bases if the patients undergoing hysterectomy or myomectomy for fibroids were clearly identified, if histopathology was performed on all cases, and if results were explicitly provided.
Fourth, the FDA and non-published group excluded all non-English articles from consideration, a decision that makes reviews much easier to perform but is highly elitist and incomplete. We felt the inclusion of non-English publications made for a more comprehensive review of the subject, and thus included studies regardless of the language of publication. The AHRQ and our group were able to expand our database further by including all languages of publication as long as they were in peer-reviewed journals. This also leads to a more real-world application of the data.
The FDA included one non-peer-reviewed abstract and one letter to the editor in their dataset. We excluded these and other similar data, restricting our analysis to peer-reviewed publications containing five or more applicable subjects. Parenthetically, the letter to the editor included in the FDA evidence base was written in English. The original data were reported in their entirety in a French language publication. We excluded the letter to the editor, but found the original, peer-reviewed publication and included it in our evidence base. There were three LMSs presented in this study.
In examining our own data, we found that seven of the leiomyosarcomas defined were inconsistent with the World Health Organization (WHO) criteria utilized to diagnose a tumour as such. The criteria used for classification are the so-called Stanford criteria [10] published in 1994 and later adopted by the WHO [11]. These criteria indicate that a uterine smooth muscle tumour with coagulative tumour cell necrosis (not hyaline necrosis) is an LMS. If no such necrosis exists, then the diagnosis is made only if the mitotic index is ≥10 mitoses per 10 high-power fields and there is diffuse, moderate to severe cytological atypia. As seen in Table 16.1, seven of these did not fit criteria [12–14].
Author | Leiomyoma subtype | Age (years) | Pathology | Recurrence |
---|---|---|---|---|
Leibsohn [12] | Atypical | 36 | 6 mitoses/10 HPF, ‘poorly demarcated’, cellular atypia | NED 6 months |
Atypical | 48 | 7 mitoses/10 HPF, cellular atypia | NED 16 months | |
Parker [13] | Atypical | 30 | Irregular infiltrative borders, mild nuclear atypia, 5–8 mitoses/10 HPF | NED ‘Years’ |
Seki [14] | Mitotically active | 33 | 6 mitoses/10 HPF, NO cellular atypia | NED 11 months |
Mitotically active | 34 | 5 mitoses/10 HPF, NO cellular atypia | NED 57 months | |
Mitotically active | 43 | 8 mitoses/10 HPF, NO cellular atypia | NED 61 months | |
Mitotically active | 43 | 9 mitoses/10 HPF, NO cellular atypia | NED 92 months |