Systematic reviews of observational studies: evaluating evidence quality







Related article, page 21 .



Systematic reviews, metaanalyses, and other forms of evidence synthesis help summarize existing data relevant to a clinical or policy question and, through judgments about the quality of that data, can identify those areas in which further research may resolve any ongoing uncertainty.


Metaanalyses frequently have been restricted to providing summary estimates of randomized clinical trials because of the recognition that other study designs are subject to systematic biases that may lead to under- or overestimation of the association between a treatment or exposure and the outcomes of interest. However, there is a growing recognition that, for some questions, randomized clinical trials are unlikely to provide clear evidence, either because some outcomes of interest are rare relative to sample sizes typical of randomized clinical trials or require extended durations of follow-up (eg, the development of cancers potentially attributable to the treatment), or randomizing the exposure/treatment of interest is either unethical or not feasible.


Both the Effective Healthcare Program of the U.S. Agency for Healthcare Research and Quality and the Grading of Recommendations Assessment, Development and Evaluation Working Group have provided guidance for using nonrandomized study designs in systematic review and guideline development.


In this issue of the Journal, Mowat et al report the results of a systematic review and metaanalysis of such an exposure: surgeon volume and complications of gynecological surgery.


As in studies of other nongynecological procedures, the authors found a consistent and statistically significant association between surgeon volume and complication rates, and, for gynecological oncologists, perioperative mortality. The authors used the Grading of Recommendations Assessment, Development, and Evaluation approach to judging evidence quality and rated the quality as very low to moderate.


Although a detailed description of the Grading of Recommendations Assessment, Development, and Evaluation approach is beyond the scope of this editorial (both the Grading of Recommendations Assessment, Development, and Evaluation Working Group web site [ www.gradeworkinggroup.org ] and the series of detailed papers in the Journal of Clinical Epidemiology are highly recommended), a brief overview of the approach is helpful for putting the review of Mowat et al in context and for providing a basis for evaluating other reviews that include nonrandomized studies.


The Grading of Recommendations Assessment, Development, and Evaluation was initially designed to help in guideline development, and the approach to judgments about evidence quality arise from this purpose. Under Grading of Recommendations Assessment, Development, and Evaluation, recommendations are either strong, indicating a high degree of certainty about the evidence about the critical outcomes necessary to make a recommendation for or against a particular intervention, or weak, indicating either residual uncertainty about the evidence or the likelihood that the optimal choice may vary based on individual preferences.


Ratings of evidence quality are based on reviewers’ judgments about the likelihood that estimates of the association between the interventions or exposures of interest and the specific outcome of interest could be incorrect: in earlier iterations of Grading of Recommendations Assessment, Development and Evaluation, this was expressed in terms of the probability that further research would change the observed results.


Factors considered in rating the quality of evidence go beyond risk of bias (which is largely a function of study design and conduct), imprecision (related to the statistical power of the studies included in the review and variability in the quantitative estimate of the association), indirectness (related to the use of surrogate outcomes or studies done in populations or settings that may be substantially different from the one for which the guidelines are intended), inconsistency (related to the consistency of the direction of the association between exposure/intervention and outcome), and likelihood of publication bias.


It is important to note that these factors interact in judgments about the overall strength of evidence. It is possible that consistent evidence from randomized trials might still be judged less than high quality if other considerations, such as directness, are important. For example, a recent review of the evidence on breast cancer screening done to support guidelines for the United States graded the evidence on the overall quantitative estimate of mortality reduction with mammography as moderate primarily because of the differences between the evidence from the available randomized trials (the majority of which were performed using older technology and in non-US settings in which other factors related to access to diagnosis and treatment might affect mortality) and current and future US practice.


Whether there is a true association between low surgical volume and an increased risk of complications, and the effect of this association on the absolute risk of a complication at both the individual and population level, is clearly an important question for patients, clinicians, and policy makers. If the association is real and the absolute risk clinically significant, then efforts to reduce this risk are justified. However, this is clearly a question for which a randomized trial would be almost impossible to design and implement, so we are dependent on observational studies.


The observational evidence reviewed by Mowat et al was consistent in showing a positive association between low volume and complications both within and across gynecological procedures and, as previously noted, is also consistent with studies of other interventions. However, there is still considerable uncertainty about the precision of the relative association (relative risks and odds ratios). This is important because estimating the absolute risk of a complication attributable to surgeon volume is critical for both patient decision making and for estimating the public health impact of potential strategies to ensure adequate surgeon volume.


One factor contributing to this is the lack of consistency in the literature for defining low vs high volume. It seems unlikely that there is a discrete threshold for volume and risk of complications or that the mathematical relationship between volume and complications is a simple linear one. Because any policy for reducing complications attributable to low surgical volume would necessarily need an explicit definition of low, much more work is needed to clarify this issue.


Another major limitation of the literature is that the type of data most suitable to having enough observations to have reasonable statistical power, large administrative databases, rarely provides sufficient detail to adjust for differences between patients in factors that might affect risk of complications, particularly intraoperative complications. For example, uterine size, the stage of endometriosis, a history of prior surgery, or body mass index are not captured in the codes of the International Classification of Diseases , ninth revision, or the International Classification of Diseases , 10th revision.


There may be a relationship between surgical volume and degree of difficulty: some surgeons may have lower volume because they more frequently operate on complex patients at higher risk of complications with longer mean operative times. In this case, the failure to adjust for surgical complexity results in a biased estimate of the association between volume and complications.


Setting may be important as well. There is likely considerable difference in referral patterns between the US and non-US settings, and factors affecting those patterns may have an impact both on surgical volume and factors other than volume affecting complication risk.


A final limitation is that, other than 5 year survival after the initial surgery for ovarian cancer, Mowat et al did not identify any evidence for an association between surgical volume and the therapeutic goals of surgery: in other words, are patients who have procedures performed by higher-volume surgeons more likely to achieve the desired outcome of their surgery (whether specific goals such as pain relief or bleeding cessation or more general improvement in quality of life or survival) or to have a more durable result?


Given that any policy designed to ensure that patients receive their care from high-volume surgeons has, at the very least, potential to affect the access and timeliness of care, consideration needs to be given to all relevant outcomes, not just complications. If the absolute risk of complications is high enough, such a policy might be justified, but the justification would be stronger if there were evidence supporting positive benefit as well as reduction of harm.


Minimizing the risk of complications of gynecological surgery is a fundamental obligation for clinicians. At the population level, minimizing risk helps improve the quality and efficiency of the health care system. At both these levels, identifying factors that can potentially be addressed is important. The evidence presented by Mowat et al is strongly suggestive that lower surgical volume is a risk factor for surgical complications. However, to design and evaluate potential interventions to minimize complications attributable to surgeon volume, better evidence is needed to estimate the absolute magnitude of the effect as well as the potential impact of volume on other outcomes, including surgical success and access to care.

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 2, 2017 | Posted by in GYNECOLOGY | Comments Off on Systematic reviews of observational studies: evaluating evidence quality

Full access? Get Clinical Tree

Get Clinical Tree app for offline access