We read with interest the article on the use of a comparability scoring system in reporting observational studies that was published in February in the American Journal of Obstetrics and Gynecology. The article proposes a checklist-based scoring system to compare intervention and control groups in terms of geographic setting, healthcare setting, healthcare providers, confounding interventions, time interval, and consensus statement impacts. Although this approach has merit for some observational studies, we consider comment on the utility of this approach for population-based studies is warranted.
The comparability score is proposed to reduce the impact of selection bias in cohort and case-control studies. However, selection bias is unlikely to occur in studies that use whole population data because the intervention and control groups are drawn from the same unselected population. Further, the article proposes weighting the confidence interval by comparability score if circumstances of healthcare for the intervention and control groups are not quantified. Although easy to apply, this assigns equal weighting to each of the circumstances of care, which precludes the assessment of the clinical and statistical relevance of the different care components of the scoring system.
An alternative to a comparability scoring system in population studies is to use multilevel modeling to adjust for unmeasured characteristics that cluster, for example, around the types of patients and medical care that is provided at each hospital. In an unselected population, multilevel modeling has the advantage of being able to quantify the degree of clustering and determine whether this is statistically significant. Multilevel models also have the advantage of being able to down-weight small hospitals where erratic results are more common, thereby improving the generalizability of the results. Additionally, multilevel modeling can be used to explore how case mix and hospital factors contribute to the observed variation between institutions.
Finally, it is worth noting that there is usually a tradeoff between participant comparability and generalizability. A study from a single healthcare provider at a single healthcare facility conducted over a short time period would receive the maximum comparability score. However, conclusions based on single institution studies may not be generalizable to other institutions, counties, states, or countries. Further, although single institution studies may control for confounding because of circumstances of care, if the sample size is small, then studies conducted in single institutions may be more likely to have false-positive findings reported because of publication bias.