Objective
The purpose of this study was to test the diagnostic performance of 5 existing classification systems (developed by Lawson, Tafesse, Goh, Waaldijk, and the World Health Organization) and a prognostic scoring system that was derived empirically from our data to predict fistula closure 3 months after surgery.
Study Design
Women with genitourinary fistula (n = 1274) who received surgical repair services at 11 health facilities in sub-Saharan Africa and Asia were enrolled in a prospective cohort study. Using one-half of the sample, we created multivariate generalized estimating equation models to obtain weighted prognostic scores for components of each existing classification system and the empirically derived scoring system. With the second one-half, we developed receiver operating characteristic curves using the prognostic scores and calculated areas under the curves (AUCs) and 95% confidence intervals (CIs) for each system.
Results
Among existing systems, the scoring systems that represented the World Health Organization, Goh, and Tafesse classifications had the highest predictive accuracy: AUC, 0.63 (95% CI, 0.57–0.68); AUC, 0.62 (95% CI, 0.57–0.68), and AUC, 0.60 (95% CI, 0.55–0.65), respectively. The empirically derived prognostic score achieved similar predictive accuracy (AUC, 0.62; 95% CI, 0.56–0.67); it included significant predictors of closure that are found in the other classification systems, but contained fewer, nonoverlapping components. The differences in AUCs were not statistically significant.
Conclusion
The prognostic values of existing urinary fistula classification systems and the empirically derived score were poor to fair. Further evaluation of the validity and reliability of existing classification systems to predict fistula closure is warranted; consideration should be given to a prognostic score that is evidence-based, simple, and easy to use.
Although garnering worldwide attention only in the past decade, female genitourinary fistula (an abnormal opening between the genital and urinary systems) is an ancient condition that is caused predominantly by obstructed labor. From the mid 19th century, when the first consistently successful surgical techniques for repairing genitourinary fistulas were developed, efforts have been made to develop a schema for the classification of fistulas. At least 25 systems exist, although the reliability and validity of most of them have not been empirically tested. Although there is widespread acknowledgment that a standardized classification system is needed, disagreement remains about which fistula characteristics should be included and what purposes (eg, prognostic or descriptive) the system should serve.
The purposes of existing systems and the components that they include vary. They are used for didactic purposes, to facilitate communication and learning, and for planning and conducting repairs, which includes the assessment of prognosis and determination of the need for referral. Some systems, particularly older ones (eg, Sims, Lawson ), describe the location of the fistula only. Others (eg, Goh, Tafesse, and Waaldijk ) are more detailed, describing the extent to which varying anatomic structures are affected and factors such as bladder and fistula size. The more detailed systems allow for a precise description of the fistula, with the implicit assumption that, as type increases by number or letter combination (eg, type 2Bb vs type 2A), the prognosis worsens. Indeed, the systems developed by Goh and Waaldijk have been tested empirically to determine the extent to which their parameters predict repair outcomes. An additional system presented by the World Health Organization (WHO) classifies fistula on the degree of repair difficulty (simple or complex). However, to our knowledge, this system has not been validated nor is it currently used. None of the systems we are aware of are scoring systems, and none evaluate patient characteristics that include comorbidities.
These systems were developed with clinical judgment, rather than empiric evidence. Few studies have examined the ability of individual patient or fistula characteristics to predict fistula repair outcomes, and the evidence-base on most predictor-repair outcome relationships remains thin. One recent study directly compared Goh and Waaldjik’s classification systems; although providing an important contribution to the evidence-base, limitations included a small sample and short follow-up length.
A standardized evidence-based prognostic classification system would facilitate communication and learning across fistula services and assist with patient triage and selection. An evidence-based prognostic scoring system, in particular, would have unique advantages. A scoring system could facilitate (1) surgeons’ decisions regarding patient referral by providing thresholds for what constitutes a “good” or “poor” prognosis, (2) a comparison of studies that examine treatment outcomes, (3) the evaluation of surgical success rates across facilities, and (4) the effectiveness of interventions that are independent of confounding by indication.
To be clinically and analytically useful, a classification system must be both simple and sufficient. A simple and sufficient system would be used more readily and would increase prognostic accuracy. For analytic purposes, it would need to decrease opportunities for residual confounding, yet not over-adjust and unnecessarily increase variance.
Using data that were collected as part of a multicountry prospective cohort study, our primary aim was to test the diagnostic performance of 5 classification systems (Lawson, Waaldijk, Tafesse, Goh, and the WHO ) to predict fistula closure. These systems either are used commonly in clinical settings (Waaldijk and Goh) or represent a range of detail, from more sparse (Lawson) to more exhaustive (Tafesse and WHO). Our secondary aim was to evaluate whether the inclusion of patient or fistula characteristics that are not included in other classification systems is warranted or whether existing systems could be simplified for prognostic purposes.
Methods
Study participants and procedures
At 11 sites in Bangladesh, Guinea, Niger, Nigeria, and Uganda, 1389 women who had surgery for repair of a genitourinary fistula were enrolled between September 2007 and September 2010. All sites were hospitals or clinics that received support from EngenderHealth’s Fistula Care project to conduct repairs. Twenty-five women were excluded because they underwent repair for rectovaginal fistula only. An additional 35 women were excluded because they were referred to other facilities, did not have surgery for medical/safety reasons, or were treated by catheterization. Excluded women were distributed evenly across all facilities. Most of the women who were enrolled (95.9%) returned for the 3-month follow-up examination; these 1274 women constitute the sample for these analyses.
Data were collected by site staff on standardized case report forms that were used at all sites. Site staff who carried out the study were trained in study procedures and interview techniques. Before surgery, information was collected on sociodemographics, obstetric history, clinical examination results, and the medical care that was provided. At the time of surgery, detailed information was collected on fistula characteristics, intraoperative procedures that had been performed, and surgical outcomes. Before discharge, data on postoperative care that had been provided and surgical outcomes were recorded. Three months after surgery, surgical outcomes were assessed during a clinical examination.
National level ethical approval was obtained in Nigeria, Uganda, Guinea, and Niger. Facility-based ethical review was required and was obtained at 2 of 3 facilities in Bangladesh; the third facility gave permission for the study to be conducted. All patients provided signed informed consent (if the patient was not literate, consent was indicated by thumbprint and a witness signed the form).
Measures
The primary outcome was genitourinary fistula closure vs “not closed” at 3 months after surgery. Closure was assessed by pelvic examination, with a dye test when there was leakage of urine. At 2 sites, pelvic examinations were not conducted routinely. In these cases (186 women; 14.6%), closure was determined with the question: “Does the client have continuous and uncontrolled leakage of urine?” A dye test was used to assess fistula closure in any patient complaining of urine leakage.
We directly measured many components of existing classifications in our data set; however, in some cases, it was necessary to create variables that closely corresponded to these components with the use of measures in our dataset. The operationalization of components of the classification systems that used variables from our dataset is detailed in Table 1 . There are no agreed on standard definitions for many fistula characteristics; thus, the degree of scarring and tissue loss and bladder size were assessed subjectively by operating surgeons.
Classification system | Classification system component | Variable used to operationalize component | |
---|---|---|---|
Goh | |||
Type 1 | Distal edge of the fistula >3.5 cm from external urinary meatus | Urethral length >3.5 cm | |
Type 2 | Distal edge of the fistula 2.5–3.5 cm from external urinary meatus | Urethral length 2.5–3.5 cm | |
Type 3 | Distal edge of the fistula 1.5–<2.5 cm from external urinary meatus | Urethral length 1.5–<2.5 cm | |
Type 4 | Distal edge of the fistula <1.5 cm from external urinary meatus | Urethral length <1.5 cm | |
a. | Size, <1.5 cm in the largest diameter | Available from dataset | |
b. | Size, 1.5–3 cm in the largest diameter | Available from dataset | |
c. | Size, >3 cm in the largest diameter | Available from dataset | |
i. | None or only mild fibrosis (around fistula and/or vagina) and/or vaginal length >6 cm, normal bladder capacity | None or only mild fibrosis and normal bladder capacity a | |
ii. | Moderate or severe fibrosis (around fistula and/or vagina) and/or reduced vaginal length and/or bladder capacity | Moderate or severe fibrosis and small bladder capacity | |
iii. | Special considerations eg, after radiation, ureteric involvement, circumferential fistula, b previous repair | Ureteric involvement, circumferential fistula, previous repair | |
Lawson | |||
i. | Juxta-urethral | Available from dataset | |
ii. | Mid vaginal | Available from dataset | |
iii. | Juxta-cervical | Available from dataset | |
iv. | Vault | Available from dataset | |
v. | Massive combination fistula a | Available from dataset | |
Tafesse | |||
Class 1 | Noncircumferential, not previously operated | Available from dataset | |
Class 2 | Noncircumferential, previously operated | Available from dataset | |
Class 3 | Circumferential, not previously operated | Available from dataset | |
Class 4 | Circumferential, previously operated | Available from dataset | |
Urethral involvement | |||
I. | No involvement (length, >4 cm) | Available from dataset | |
II. | Urethra involved, but not middle 1/3 (length, 2.73.9 cm) | Available from dataset | |
III. | Middle 1/3 partly involved (length 1.4–2.6 cm) | Available from dataset | |
IV. | Middle 1/3 completely involved, but some urethral tissue remains (<1.4 cm) | Collapsed categories IV and V | |
V. | No urethra | ||
Bladder size | |||
a. | Longitudinal diameter, >7 cm | Normal bladder | |
b. | Longitudinal diameter, 4–7 cm | Small or no bladder | |
c. | Longitudinal diameter, <4 cm | Small or no bladder | |
Anterior vaginal tissue loss | |||
1. | <50% of anterior vagina involved (>3.5 cm of healthy vagina remains) | Minimal tissue loss | |
2. | >50% of the anterior vagina wall involved (<3.5 cm of health vagina remains) | Moderate tissue loss | |
3. | Obliterated vagina (cannot admit >1 finger) | Extensive tissue loss or obliterated vagina | |
Waaldijk | |||
Type 1 | Not involving the closing mechanism | Not involving the closing mechanism and not involving complete destruction of bladder neck | |
Type 2 | Involves closing mechanism | Involves closing mechanism or destruction of bladder neck | |
A. | Without (sub)total urethra involvement | Intact or partially damaged urethra | |
B. | With (sub)total urethra involvement | Completely destroyed urethra | |
a. | Without circumferential defect | Available from dataset | |
b. | With circumferential defect | Available from dataset | |
Type 3 | Ureteric and other exceptional fistulas | Mixed vesicovaginal and rectovaginal fistulas, cervical and ureteric fistulas | |
Size of fistula | Small, <2 | Available from dataset | |
Medium, 2–3 | Available from dataset | ||
Large, 4–5 | Available from dataset | ||
Extensive, ≥6 | Available from dataset | ||
World Health Organization | Simple | Complex | |
No. of fistulas | Single | Multiple | Available from dataset |
Site | VVF | All non-VVF urinary fistula | Non-VVF excludes ureteric and urethral fistulas a |
RVF | |||
Mixed VVF/RVF fistula | |||
Involvement of cervix | |||
Size (diameter) | <4 cm | >4 cm | Available from dataset |
Involvement of the urethra/continence mechanism | Absent | Present | Available from dataset |
Scarring of vaginal tissue | Absent | Present | Available from dataset |
Presence of circumferential defect | Absent | Present | Not included in multivariate analysis c |
Degree of tissue loss | Minimal | Extensive | Moderate and minimal tissue loss considered “minimal” |
Ureter/bladder involvement | Ureters are inside the bladder, not draining into the vagina | One or both ureters draining into the vagina; 1 or both ureters at edge of fistula | Created composite measure that represented ureteric involvement (either ureteric location or ureters draining into vagina or at edge of vagina) |
No. of previous repair attempts | No previous attempt | Failed previous repair attempts | Available from dataset |
a Ureteric and urethral fistulas were excluded because ureteric fistulas were captured under “ureter/bladder involvement” and urethral fistulas were captured by “involvement of the urethra/continence mechanism”;
b Complete separation of the urethra from the bladder;
c Excluded from multivariate analysis because of overlap with “involvement of the urethra/continence mechanism.”
We also evaluated whether variables that were not included in existing classification systems merited inclusion in a scoring system or whether variables that were included already should be revised or recategorized. In particular, we evaluated individual characteristics that included patient age, fistula duration, and comorbidities before surgery. Age and duration of the fistula were measured as continuous variables. Comorbidities that were assessed included the presence of malnutrition (yes/no, as determined through either a skin-fold measurement, body mass index, or visual assessment), anemia (yes/no, as determined through either hemoglobin level, hematocrit, or visual assessment), urinary tract infection (based on clinician reports), and parasitic infections, which included malaria (based on clinician reports). Finally, we examined the distributions of ordinal variables that were included in existing classification systems to determine whether cut-points should be revised.
Statistical analyses
We used a split-sample design. One-half of the sample (the derivation cohort) was used to create scoring systems that represented the 5 existing classifications and the 1 classification that was scored empirically derived from our data. The second one-half of the sample (the validation cohort) was used to test the scores.
Sociodemographics and repair outcomes in the 2 cohorts were compared with the use of t -tests for continuous variables and χ 2 tests or Fisher exact tests (where cell sizes were <5) for ordinal or dichotomous variables.
Characteristics of patients whose fistulas were closed at the 3-month follow-up visit were compared with those whose fistulas were not closed with the use of risk ratios (RRs) and corresponding 95% confidence intervals (CIs). RRs were reported instead of odds ratios for all analyses because the likelihood of failed fistula closure was >10%; therefore, odds ratios would overestimate rather than approximate RRs. RRs and 95% CIs were derived by generalized estimating equations, with an exchangeable correlation structure with a robust standard error estimator to account for clustering of patient outcomes by facility; results that accounted for clustering by attending surgeon (not shown) were similar. RRs were generated with the logarithm link function and binomial distribution specification in SAS PROC GENMOD.
Using the derived cohort, we constructed separate multivariate generalized estimating equation models for the components of each classification system. RRs were generated by log-binomial models; in 2 models in which the log-binomial model failed to converge, SAS PROC GENMOD’s Poisson regression capability with a log-link function and robust variance was used. Weighted scores for individual classification system components were derived from adjusted RRs (ARRs); scores were assigned only to those components that were significant ( P < .05). Weights were rounded to the nearest whole number.
The multivariate model that was used to develop the empirically derived score included variables that were associated with repair failure at P ≤ .20 in bivariate analysis or were associated conceptually with repair outcome. In the event that 2 candidate variables were intercorrelated, the variable with the most clinical significance was included.
Using the validation cohort, we calculated sensitivity and specificity for each scoring system. Receiver operating characteristic (ROC) curves that depict the relationship between the proportion of true-positives and false-positives for each system were drawn and compared visually; areas under the ROC curves (AUCs) and 95% CIs were calculated for each curve. With the use of methods for paired data, AUCs were compared by calculation of the contrast χ 2 and corresponding probability value for the difference between the AUCs. ROC curves and AUCs were also generated using the derivation cohort to assess model robustness ( Figure 1 and Table 5 ). All analyses were performed with SAS software (version 9.2; SAS Institute Inc, Cary, NC); AUCs were calculated with the %ROC macro, and ROC curves were constructed with the %ROCPLOT macro.
Results
Baseline characteristics and repair outcomes were similar between the derivation and validation cohorts ( Table 2 ). The proportions of successful fistula closure at 3 months were 81.5% and 82.0% in the derived and validation cohorts, respectively.
Variable | Total | Derived cohort | Validation cohort |
---|---|---|---|
Total, n (%) | 1274 (100) | 637 (100) | 637 (100) |
Rural residence, n (%) | 1088 (86.1) | 546 (86.4) | 542 (85.8) |
Mean age, y a | 28.2 ± 11.0 | 28.2 ± 11.1 | 28.1 ± 11.0 |
≥Primary education, n (%) | 267 (21.0) | 120 (18.9) | 147 (23.1) |
Years with fistula a | 3.3 ± 5.5 | 3.4 ± 5.6 | 3.2 ± 5.4 b |
Previous repair | 294 (23.1) | 149 (23.4) | 145 (22.9) |
Type of fistula, n (%) | |||
Vesicovaginal fistula only | 1229 (97.1) | 622 (98.3) | 607 (95.9) c |
Rectovaginal fistula and vesicovaginal fistula | 37 (2.9) | 11 (1.7) | 26 (4.1) |
Current marital status, n (%) | |||
Single | 23 (1.8) | 10 (1.6) | 13 (2.1) |
Married/as if married | 830 (66.1) | 403 (64.4) | 427 (67.8) |
Widowed | 61 (4.9) | 34 (5.4) | 27 (4.3) |
Divorced or separated | 341 (27.1) | 178 (28.4) | 163 (25.9) |
Other | 1 (0.1) | 1 (0.2) | 0 |
Parity a | 3.4 ± 2.9 | 3.3 ± 2.9 | 3.4 ± 2.9 |
Commodities in residence, n (%) | |||
Piped water | 288 (22.7) | 129 (20.3) | 159 (25.0) c |
Flush toilet | 46 (3.6) | 24 (3.8) | 22 (3.5) |
Electricity | 256 (20.1) | 119 (18.7) | 137 (21.5) |
Radio | 881 (69.2) | 438 (68.8) | 443 (69.5) |
TV | 199 (15.7) | 94 (14.8) | 105 (16.5) |
Mobile phone | 457 (36.0) | 221 (34.7) | 236 (37.2) |
Land-line phone | 24 (1.9) | 12 (1.9) | 12 (1.9) |
Refrigerator | 49 (3.9) | 22 ( 3.5) | 27 (4.2) |
Current ability to meet basic needs, n (%) | |||
Easily meet needs | 327 (25.8) | 153 (24.2) | 174 (27.4) |
Somewhat meet needs | 660 (52.1) | 336 (53.1) | 324 (51.0) |
Barely satisfy needs | 281 (22.2) | 144 (22.7) | 137 (21.6) |
Closed at discharge, n (%) | 1058 (84.7) | 534 (85.6) | 524 (84.3) |
Closed at 3 mo, n (%) | 1041 (81.6) | 519 (81.5) | 522 (82.0) |