See related article, page 147
As an active clinician, administrator, and researcher in health care quality, I am often amusingly reminded of Robert Pirsig’s Zen and the Art of Motorcycle Maintenance (1974) as I search for what defines quality in medicine and obstetrics. In this book, the protagonist takes a 17-day adventure across the country on a motorbike, consumed with a spiritual inquiry into the metaphysics of quality. In harmony with the atmosphere of the early 1970s, one realizes there is probably no objective definition of quality, but rather quality is a concept that folds spiritual experience into rational thought. Appreciating the book requires one to suspend linear approaches to solving problems, metaphorically gripping the handlebars on a wild intellectual ride, exploring uncertainties with inconclusive conclusions. If you are confused as to the point, it is intended. Quality is hard to define and the quest itself probably means more than finding the answer.
At its simplest level, quality from a medical perspective is an assessment of how close care is to the optimal state. Yet, one discovers this is probably an insufficient definition when it comes to the practical efforts of measuring and improving it. What is the optimal state, how do we quantify it, and how do we weight quality measurements against each other and in diverse settings? Thus, in medicine we find ourselves on a journey much like Pirsig’s protagonist, and nowhere is this path more tortuous than in obstetrics. Measuring quality in medicine is a challenge, but measuring it in obstetrics is made difficult by the fact that most childbirth is physiologic and normal, and so often our patients are unharmed as they pass through the medical system. Even when things go wrong, for instance during an error or mishap, the resilient physiology of a healthy woman or fetus can often deflect the potential injury into a near-miss. For the most part we do well to achieve the optimal in obstetrics, despite ourselves. This scenario makes it all the more surprising and devastating to us and our patients when error does lead to harm.
In contrast to Pirsig’s metaphysical adventure, our path to define quality must actually realize conclusive ends. Quality plays a critical role in justifying costs as we search for value in health care spending. Regulatory bodies, credentialing boards, government and third party payers, hospitals, and, of course, our patients demand improved outcomes, better value, and fewer instances of harm. For over 10 years, obstetric researchers and clinicians have proposed, and in many cases rejected, several individual and sets of measures to standardize quality. Early efforts focused on outcomes measures, the first most widely disseminated being the Obstetric Adverse Outcomes Index, as proposed by Mann and colleagues. This set of measures has demonstrated internal validity in many studies, but is problematic, in particular because of intrahospital variation and the fact that one measure—severe perineal lacerations, an imperfect measure on its own—dominates the index. Further work has included culture measures, such as the Safety Attitudes Questionnaire and the Agency for Healthcare Research and Quality (AHRQ) Hospital Survey on Patient Safety Culture, and process measures, like nonindicated deliveries before 39 weeks of gestation and the rate of administration of antenatal steroids to preterm infants. The latter represent 2 of the 5 most ubiquitously accepted measures, the publicly reported Perinatal Core Measures, defined by the National Quality Foundation and tracked by hospitals for The Joint Commission.
This issue of the American Journal of Obstetrics and Gynecology presents the second publication from the ambitious Assessment of Perinatal Excellence (APEX) study, a Maternal-Fetal Medicine Unit (MFMU) Network program aiming not just to evaluate the use of a handful of fundamental intrapartum obstetric quality measures but also to inquire how comparisons can be made within these measures. It is not enough that we just find metrics for evaluating improvement; we must also figure out how to compare, judiciously, those metrics across diverse populations and health care settings. This multicenter prospective cohort study involving over 115,000 patients in 25 hospitals collected background data on patients, providers, and hospital settings in an attempt to provide risk-adjusted comparisons across these domains. The first publication demonstrated they were there unable to risk-adjust across hospitals for patient characteristics and conditions, intimating that judging quality across diverse settings, using the chosen measures, might not be fair.
Taking this one step further, the report from Grobman et al asks if good adherence to standard and best practices (care processes) is associated with good outcomes. The association could be direct and causative, where implementing an evidence-based practice impacts the outcome (such as use of prophylaxis measures such as intermittent compression devices or low-dose anticoagulation in high-risk patients to reduce the incidence of venous thromboembolism). Or, the association could be correlative, where the culture of an institution around safe medical practices creates a high-reliability setting.
Unfortunately, but importantly, this report by Grobman et al concludes that hospital characteristics and practices do not seem to account for variations in quality across settings. Differences in patient populations do seem to matter, accounting for 20-40% of the differences in quality outcomes. Furthermore, common care processes are associated with differences in adverse outcome risks, but differences in common practices between hospitals do not seem to explain the differences in outcomes between them.
This is very valuable information, but the information should be wielded carefully because though this is a huge and heterogeneous study, it is still limited in scope in the outcomes and processes it measures. The adverse outcomes chosen for study included venous thromboembolism, postpartum hemorrhage, peripartum infection, severe perineal laceration, and a composite neonatal adverse outcome. This is a rather limited set of outcomes, especially because thromboembolic events were too few to include in the modeling. Further, there are numerous care practices that the researchers do not account for. For instance, they did not assess for which hospitals had comprehensive protocols for certain high-risk practices and procedures, such as oxytocin administration, induction techniques, or preeclampsia management.
In addition, they do not measure some nonclinical practices and processes that have been shown to correlate with improved practice environments, such as simulation programs or team-training programs. The presence of a detailed chain-of-command, mandatory electronic fetal monitoring certification of employees, or hospital Magnet certification could contribute to differentials in care given. Differences in any of these factors could potentially account for some of the variation across hospitals.
The authors’ summarization of the point of all of this, in the discussion, is very important: if we are going to measure and report care processes, we better know that improvement in these processes has a significant impact on outcomes, so that we are not just measuring for measurement sake. Although it may make logical sense to adhere to evidence-based standards, it may not be completely fair to hold providers and hospitals to certain standards if those standards are not responsible for the variations in quality our patients might see and experience. A recent study by Clark et al demonstrates this from another angle. This review of over 100,000 deliveries validated the Joint Commission Perinatal Core Measure for counting elective deliveries before 39 weeks (“PC-01”), but also pointed out that up to 60% of labor units would report rates out of compliance (>5% nonmedically indicated deliveries from 37 0/7 to 38 6/7) because of low numbers of cases in the denominator, even when reporting quarterly data, as a result of strict inclusion/exclusion criteria. The results of studies like these demonstrate that we have much work ahead of us to validate and improve our quality measuring and reporting. For all of us interested in quality, this means hopping on our motorcycles and getting back on the road.