Discussion: ‘Treatment of symptomatic uterine fibroids’ by van der Kooij et al




In the roundtable that follows, clinicians discuss a study published in this issue of the Journal in light of its methodology, relevance to practice, and implications for future research. Article discussed:


van der Kooij SM, Hehenkamp WJK, Volkers NA, et al. Uterine artery embolization vs hysterectomy in the treatment of symptomatic uterine fibroids: 5-years outcome from the randomized EMMY trial. Am J Obstet Gynecol 2010;203:105.e1-13.


Discussion Questions





  • How does the study contribute to knowledge in the field?



  • What was the research question?



  • What type of study was this?



  • What statistical methods were used?



  • What data are in the tables?



  • What did the authors conclude?



  • How does this study apply to your clinical practice?





Introduction


Classically, only surgical treatments were available for patients with symptomatic uterine fibroids. While hysterectomy is the definitive treatment, it is a major surgical procedure. Uterine artery embolization (UAE), first described 15 years ago, has garnered increased attention as a therapy. Previous publications from the EMbolization versus hysterectoMY (EMMY) trial indicate that compared with hysterectomy, UAE was associated with an equal rate of major complications but a shorter hospital stay and a faster return to usual daily activities. The latest study of EMMY participants, by van der Kooji and colleagues, examined how patients who underwent UAE were faring at the 5-year postprocedure mark.




See related article, page 105




For a summary and analysis of this discussion, see page 186



Kristen A. Matteson, MD, MPH and George A. Macones, MD, MSCE, Associate Editor




Background


Matteson: Can you talk a little about the background?


Holman: UAE has garnered increased attention over the past 10 years as a conservative treatment for symptomatic uterine fibroids. Several studies have reported similar findings when comparing UAE with surgical options, such as hysterectomy or myomectomy. The EMMY trial is a randomized controlled trial (RCT) that compared UAE with hysterectomy. Periprocedural and 2-year follow-up results showed a low rate of major complications in the UAE arm and similar improvement in health-related quality of life (QOL) and menorrhagia between the UAE and hysterectomy arms. This led the authors to conclude that UAE is a reasonable alternative to hysterectomy. The study under discussion today is a 5-year follow-up of EMMY participants.


Matteson: How does the study contribute to knowledge in the field?


Holman: Although there have been several RCTs comparing hysterectomy and UAE, they have only followed patients for up to 2 years postprocedure. This is one of the first trials to evaluate patients 5 years after their procedure. This information is important since we counsel patients regarding treatment of fibroids. For example, if most patients were found to require hysterectomy within 5 years after UAE, it might not make sense to recommend UAE for management of symptomatic fibroids.




Study Design


Matteson: What was the research question?


Cronin: The researchers were exploring whether the 2-year results of the EMMY trial change when the cohort is followed out to 5 years. Is UAE still a good alternative to hysterectomy with 5 years of follow-up? Are a large percentage of patients still able to avoid hysterectomy after 5 years?


Matteson: Did they clearly state this?


Cronin: This is stated but not clearly. The last sentence of the introduction section seems to suggest that the authors are trying to look at whether or not clinical and QOL outcomes are different between hysterectomy and UAE 5 years postprocedure. They provide a vague sentence about whether or not UAE is a viable alternative to hysterectomy. The question might have been stated more definitively.


Matteson: What was their hypothesis? How did you decide what the hypothesis was?


Cronin: Again, the hypothesis was not stated obviously, but it could be deduced by looking at what the authors discussed in the introduction and in the sample-size calculation in the methods section. I assumed their hypothesis was that less than 25% of patients randomized to UAE will receive a hysterectomy in the 5 years postprocedure. The authors do not suggest a clear hypothesis for how the treatments compare to one another in terms of all of the other clinical and QOL outcomes.


Matteson: Where was this study conducted?


Raker: The study was conducted in the Netherlands. Patients in this multicenter study were recruited from gynecological outpatient clinics at 28 different hospitals.


Matteson: Who was the study population?


Raker: Enrollees were women who suffered from symptomatic uterine fibroids and who were eligible for hysterectomy.


Matteson: How were participants recruited? What was the refusal rate for participation in this study?


Phipps: Women who visited the clinics with symptomatic fibroids and who were eligible for hysterectomy between March 2002 and February 2004 were evaluated for possible participation in the trial. Figure 1 shows that there were 349 eligible patients with a 51% participation rate. Women were ineligible or declined to participate because they had a hysterectomy or UAE prior to randomization or were waiting to have a procedure and presumably, were unwilling to be randomized. Other reasons included seeking alternative medicine, a difficult home situation, or not wanting to participate.


Matteson: What type of study was this?


Raker: This study is actually a secondary analysis of an RCT comparing UAE to hysterectomy. Specifically, it was a noninferiority trial because the objective was to show that UAE was not worse than hysterectomy with respect to “alleviating menorrhagia.” With an equivalence study, the goal is to show that a new intervention is not worse or better than the standard by more than a prespecified margin. In this noninferiority trial, the emphasis is only upon UAE not being worse than hysterectomy. In the previous analysis of this trial, the authors stated that this design was chosen because UAE could not prove superior to hysterectomy. It is a bit unconventional in that the main outcome—avoidance of hysterectomy 5 years postprocedure—could only be measured in 1 treatment arm.


Matteson: Are surgical RCTs typically easy or difficult to recruit for?


Cronin: I think surgical RCTs are probably quite difficult to set up and recruit patients for. Generally, patients have certain expectations and preconceived notions for what type of intervention they desire. It is human nature for women to want to retain personal control over large decisions; it is difficult to recruit patients for studies where a so-called flip of a coin will decide whether they will have a major surgical procedure or a less common radiologic procedure.


Matteson: What benefits are there to doing an RCT rather than a cohort study?


Holman: Randomization of study subjects to 2 or more groups is an effective way to control for possible confounders. In this study, for instance, factors such as prior treatment, number and volume of fibroids, and duration of symptoms could all affect the results. RCT design allows potential confounders that are both known and unknown to be randomly and evenly distributed into each group as long as the sample size is large enough.


Matteson: How did they randomize their subjects?


Raker: Patients were randomized equally to UAE and hysterectomy. Randomization was stratified by study center to make sure that center-specific differences would be evenly distributed between groups. Instead of using the usual stratified block randomization to allocate patients, the investigators used a computer-based minimization scheme. In this case, after randomly assigning the first patient to a group, each subsequent assignment is selected so that the imbalance in baseline characteristics between groups is minimized. Technically, this assignment is not random, but it has been shown to produce very similar groups in terms of balancing baseline characteristics throughout recruitment.


Matteson: What do you think about their method of randomization?


Raker: Their method of group assignment was appropriate for their objectives. One point to mention is that with using the minimization scheme, randomization lists are not generated in advance, and this can make the process more logistically challenging.


Matteson: What were their inclusion and exclusion criteria? To what population will the findings apply?


Phipps: Women were eligible if they were premenopausal, were diagnosed with uterine fibroids, had menorrhagia, had no treatment options other than hysterectomy, and had no desire for future pregnancy. The findings will apply to this same limited population of women.


Matteson: What was their intervention?


Holman: Their intervention was hysterectomy or UAE.




Statistical Analyses


Matteson: What were the study outcomes, and how were they measured?


Cronin: The main study outcome, as Dr Raker mentioned earlier, was alleviation of menorrhagia, which they defined as avoidance of hysterectomy 5 years postprocedure. This is a little unusual because this outcome could only be measured in the UAE treatment arm and could not be compared between women undergoing UAE or hysterectomy. Other study outcomes were menorrhagia, menopause and menopausal symptoms, QOL, urinary and defecation function, and satisfaction with the received treatment. Patients received questionnaires designed to measure outcomes at baseline and at regular intervals until 2 years after the initial intervention—all of these were identical. The questionnaire distributed at the 5-year mark was condensed to optimize response rates. It also examined additional interventions between 2 and 5 years, menstrual characteristics, and health-related QOL. Many validated questionnaires were used to measure outcomes. Health-related QOL was assessed using the Medical Outcome Study Short Form 36 (SF-36), which provides summary scores for physical and mental components; this was validated for the Dutch population. Menopausal symptoms were assessed by Kupperman score and by answers to the question of whether patients felt as if they were in or beyond menopause. The Urogenital Distress Inventory (UDI) assessed urinary symptoms, while the Defecation Distress Inventory (DDI) was used to measure defecation complaints. Participants were also asked to rate overall quality of urinary and stool function and were asked about usage of urinary incontinence pads or laxatives. Lastly, participants were asked to rate their satisfaction with their assigned treatment, whether they would recommend it to a friend, and whether they would choose the assigned treatment again.


Matteson: Did they measure outcomes both at enrollment (before treatment) and after treatment?


Cronin: Questionnaires were distributed at baseline and at fixed intervals until 2 years after treatment. All patients received a final questionnaire in the fall of 2008 (median follow-up, 5 years). This is actually important because many studies look at abnormal uterine bleeding and only measure outcomes such as health-related QOL after the treatment rather than at baseline and after the intervention. So you don’t know if the groups were different in terms of health-related QOL at baseline and what the relative changes in their health-related QOL scores were.


Matteson: Was anyone blinded in this study? How could lack of blinding affect some of our outcomes?


Holman: Neither the investigators nor the study subjects were blinded in this study. I think it would be pretty difficult in this case. Obviously, you are going to have an incision somewhere for hysterectomy and a puncture site in the groin for UAE, which would make blinding of the participants almost impossible. It would be possible, however, to blind the research assistants who were assessing the outcomes and obtaining the baseline and follow-up data from participants. The study doesn’t mention whether or nor the research assistants were blinded.


Lack of blinding has the potential to affect outcomes in a major way, especially the QOL outcomes. Health-related QOL is a self-reported measure. If patients have a preconceived notion of how they are supposed to feel after one procedure vs another, this could factor into how they answer their questions—their expectations could affect their answers and their perceived QOL.


Matteson: This study used many different validated tools to evaluate their health-related QOL and urinary and defecatory dysfunction outcomes. Why is the use of validated tools important?


Phipps: Using validated tools allows investigators and clinicians to compare results across studies. These tools add validity to what they are measuring; investigators can feel confident that the questions they are asking actually measure and test for what they intend to evaluate. Validated instruments oftentimes are tested to make sure they are “responsive,” meaning that the answers and scores change in an expected fashion in response to an intervention. This is especially important in this study, as change in baseline scores was a main measure.


Matteson: How was sample size determined for this study?


Raker: The sample size was based on a noninferiority design and the expected number of clinical failures after 2 years of follow-up. The investigators expected failures in 12.5% of the UAE group, and they decided that a 25% failure rate was the maximum that could be tolerated. Sixty patients per group were needed to reject the null hypothesis of inferiority greater than 25% with 90% power and a 1-sided 5% significance level.


While this calculation is fine for a noninferiority trial, it is not necessarily applicable to the outcome at 5 years or the QOL measures. One could perform a post-hoc power calculation with the fixed sample size, but it would be more informative to present 95% confidence intervals for all estimated measures of association. This would allow the reader to assess whether clinically relevant differences were compatible or incompatible with the data.


Matteson: If the investigators truly wanted to set this up as an equivalence trial, what would that mean for the necessary sample size?


Raker: It is actually set up in a very similar way. One difference is that an equivalence trial requires a 2-sided significance test. Also, to show 2 treatments are exactly the same, a very large sample size is necessary because the minimal detectable difference (delta) established for the study is nearly zero, and the smaller the minimal detectable difference, the larger the sample size.


Matteson: What statistical methods were used to compare the groups?


Raker: The primary outcome, number of hysterectomies in the UAE group over 5 years, was calculated as the cumulative incidence proportion or risk at 5 years. In addition, a Kaplan-Meier analysis was performed to show the distribution of failures over the 5-year follow-up period, and this accounted for subjects lost to follow-up or censored for other reasons. Finally, baseline predictors of failure in the UAE group were examined by multiple logistic regression.


Matteson: What statistical measures were used for secondary outcomes?


Raker: The QOL measures were examined as a change in mean score from baseline, which was gauged with repeated measures analysis; most likely ANOVA or linear regression. These are flexible methods for analyzing the average change in groups over time and for testing whether the rate of change in scores differed between groups. In addition, multiple linear regression was used to see if baseline characteristics predicted change scores at 5 years of follow-up.


Matteson: Do you think the methods were adequate?


Raker: Overall, the analysis methods were appropriate for each objective and endpoint.


Matteson: Were participants analyzed in the groups to which they were allocated?


Cronin: Yes, they were analyzed by intention to treat. In the hysterectomy arm, 14 of 89 women did not undergo hysterectomy. In the UAE arm, 7 of 81 did not undergo UAE; 6 of these 7 had a hysterectomy.


Matteson: Did the study account for participants at each stage of the study?


Holman: Figure 1 in the article demonstrates subject flow throughout the study and indicates when and why patients were not included at the various follow-up points. All participants were accounted for throughout the study. Throughout the study, similar numbers of women were lost to follow-up in the 2 groups.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

Jul 6, 2017 | Posted by in GYNECOLOGY | Comments Off on Discussion: ‘Treatment of symptomatic uterine fibroids’ by van der Kooij et al

Full access? Get Clinical Tree

Get Clinical Tree app for offline access