Objective
The objective of the study was to externally validate and assess the robustness of 2 nomograms designed to predict the probability of lymphatic dissemination (LD) for patients with early-stage endometrioid endometrial cancer.
Study Design
Using a prospective multicenter database, we assessed the discrimination, calibration, and clinical utility of 2 nomograms in patients with surgically treated early-stage endometrioid endometrial cancer.
Results
Among the 322 eligible patients identified, the overall LD rate was 9.9% (32 of 322). Predictive accuracy according to discrimination was 0.65 (95% confidence interval, 0.61–0.69) for the full nomogram and 0.71 (95% confidence interval, 0.68–0.74) for the alternative nomogram. The correspondence between observed recurrence rate and the nomogram predictions suggests a moderate calibration of the nomograms in the validation cohort.
Conclusion
The nomograms were externally validated and shown to be partly generalizable to a new and independent patient population. Although these tools provide a more individualized estimation of LD, additional parameters are needed to allow higher accuracy for counseling patients in clinical practice.
In the field of endometrial cancer (EC), which is the most common gynecological malignancy in developed countries, most patients are diagnosed with early-stage endometrioid EC with an overall survival between 85% and 91% for stage I.
The primary surgical treatment involves removal of the uterus, tubes, and ovaries. Indications for systematic lymphadenectomy remain a matter of debate. However, in the last decade, the lymphatic dissemination (LD) status has progressively gained importance as a determinant prognostic marker for recurrence and survival for patients with early-stage EC. The precise quantification of the risk of LD is therefore a key in defining an accurate and evidence-based algorithm for adjuvant therapies. Although sentinel lymph node (SLN) biopsy and sophisticated imaging techniques (eg, positron emission tomography/computed tomography) are less invasive ways of assessing LD status compared with lymphadenectomy, their predictive accuracy is still under evaluation or have already been demonstrated as being more limited.
For several years, researchers have proposed cancer-related prognostic markers for LD such as depth of myometrial invasion, histological grade and type, and lymphovascular space involvement (LVSI). In practice, none of these characteristics accurately identify either the risk of LD or even a subset of patients for whom systematic lymphadenectomy could be unnecessary.
A complementary approach based on individualized prediction models such as the nomogram has recently been introduced to help patients make informed decisions about the benefits and risks of treatment. In this area of research, AlHilli et al developed 2 nomograms in patients with surgically treated stages I–IV endometrioid EC to predict the probability of LD. These nomograms are built on clinicopathological parameters and were validated internally using bootstrapping methods. Hence, an external validation on an independent set of patients was required to ensure applicability to patients from different institutions.
The aim of this prospective multicenter database study was therefore to externally validate these recently introduced nomograms predicting the probability of LD in patients with endometrioid early-stage EC.
Materials and Methods
Study population
Data of all patients with apparent early-stage EC who received primary surgical treatment between January 2007 and December 2012 were abstracted from 4 institutions in France with maintained EC databases (Tenon University Hospital, Reims University Hospital, Dijon Cancer Center, and Creteil hospital) and from the Senti-Endo trial applying the same inclusion criteria as the study by AlHilli et al.
To be included for validation analysis, the patients had to have an endometrioid EC and all nomogram variables documented. Patients with histologically proven EC were staged on the basis of final pathological findings according to the 2009 International Federation of Gynecology and Obstetrics (FIGO) classification.
Clinical and pathological variables included patient age, body mass index (calculated as weight in kilograms divided by the square of height in meters), surgical procedure, 2009 FIGO stage, and final pathological analysis (histology type and grade, depth of myometrial invasion, tumor diameter, and LVSI status). A tumor was considered LVSI positive when tumor emboli were found within a space clearly lined by endothelial cells.
Adjuvant therapy was administered according both to multidisciplinary committees and international guidelines. The research protocol was approved by the Consultative Committee for Protection of Persons in Biomedical Research of Paris 6 (France).
The nomograms of AlHilli et al
Patients who underwent primary surgery for endometrioid EC between Jan. 1, 1999, and Dec. 31, 2008, were considered for inclusion in the study by AlHilli et al. The primary outcome measure was the presence of pelvic and/or paraaortic (P/PA) lymphatic dissemination. This was defined as (1) positive P/PA lymph nodes when P/PA lymphadenectomy had been performed or (2) P/PA lymph node recurrence after negative lymphadenectomy or when P/PA lymphadenectomy had not been performed.
The nomograms included the following covariates: FIGO stage, histological grade, LVSI status, cervical stroma invasion, depth of myometrial invasion with or without primary tumor diameter (TD). Two final models were considered: a full model including all of the variables that were significant on univariate analysis and an alternative full model not taking TD into account that would be useful for patients with unknown TD.
Validation
The discrimination, calibration accuracy, average (E average [E aver]) and maximal errors (E maximal [E max]) and clinical utility of both nomograms were assessed. Discrimination is the ability to differentiate between patients with LD and those without. It is measured using the receiver-operating characteristic curve and summarized by the area under the curve (AUC). An AUC of 1.0 indicates perfect concordance, whereas an AUC of 0.5 indicates no relationship.
Calibration is the agreement between the frequency of observed outcome and the predicted probabilities and was studied using graphical representations of the relationship between the 2 calibration curves. Average (E aver) and maximal errors (E max) evaluated the errors between predictions and observations obtained from a calibration curve. Discrimination is a popular evaluation criterion. It does not reflect the accuracy of a model, and its clinical significance is poor. In contrast, the clinical significance of calibration is high: it reflects the accuracy of individual predictions. In addition, patients were clustered into deciles according to their nomogram score. For each decile group, we calculated the difference between the predicted and the observed LD probability. A subgroup analysis was performed according to the European Society for Medical Oncology (ESMO) risk stratifications.
Other statistical analyses
Statistical analysis was based on the χ 2 test or Fisher exact test, as appropriate, for categorical variables. Values of P < .05 were considered to denote significant differences. Data were managed with an Excel database (Microsoft, Redmond, WA) and analyzed using R 2.15 software, available online ( http://cran. r-project. org/ ).
Results
During the study period, 650 patients with EC were documented as having received primary surgical treatment. Among them, 322 patients with endometrioid EC who fulfilled the inclusion criteria of AlHilli et al were selected for validation analysis according to the following distribution: Dijon cancer center (n = 111; 34.5%), Tenon University Hospital (n = 67; 20.8%), Reims University Hospital (n = 71; 22.0%), and Senti-Endo trial (n = 73; 22.7%).
The demographics and clinicopathological characteristics of both the AlHilli cohort and the current validation cohort are reported in Table 1 . The median follow-up time was 27 (range, 1–151) months. The overall LD and recurrence rates were 9.9% and 9.6%, respectively. Both cohorts were mainly composed of early stage EC. There was a significantly higher rate of patients with grades 2 and 3 and cervical stromal invasion in the validation cohort. Additional differences included a higher rate of lymph node dissection and adjuvant treatment assignment in the validation cohort (65.5% vs 59.0% and 24.1% vs 73%, respectively) ( Table 1 ).
Parameters | AlHilli et al, n, % (n = 883) | Validation cohort, n, % (n = 322) | P value |
---|---|---|---|
Age at surgery, y, mean (SD) | 63.6 (11.3) a | 65.7 (10.6) a | — |
64.2 (12.7) b | 66.8 (14.9) b | ||
BMI, kg/m 2 , mean (SD) | 34.2 (9.7) a | 29.5 (15.8) a | — |
33.0 (9.9) b | 26.9 (7.9) b | ||
Histological grade | |||
1 | 538 (60.9) | 172 (53.4) | |
2 | 271 (30.7) | 109 (33.8) | |
3 | 74 (8.4) | 41 (12.8) | .02 |
Primary tumor diameter | |||
≤2 cm | 313 (35.5) | 92 (28.5) | |
>2 cm | 570 (64.5) | 230 (71.5) | .02 |
Myometrial invasion, %, mean (SD) | 20.0 (23.0) | 40.24 (14.88) | — |
Cervical stromal invasion | |||
No | 858 (97.1) | 297 (92.2) | |
Yes | 25 (2.9) | 25 (7.8) | < .001 |
Lymphovascular space invasion | |||
No | 794 (89.9) | 242 (75.1) | |
Yes | 89 (10.1) | 80 (24.8) | < .001 |
FIGO stage | |||
I | 801 (90.7) | 239 (74.2) | |
II | 20 (2.3) | 25 (7.8) | |
III | 58 (6.6) | 53 (16.5) | |
IV | 4 (0.4) | 5 (1.5) | < .001 |
Nodal staging | |||
Nodal staging (P/PA) | 521 (59.0%) | 211 (65.5) | |
SLN biopsy | — | 61 (28.9) | — |
Lymphatic dissemination | |||
Overall lymphatic dissemination rate | 7.6% (67/833) | 9.9% (32/322) | — |
Positive P/PA nodes among those with a LND or SLN biopsy | 10.9% (57/521) | 14.7% (31/211) | — |
P/PA recurrence in those without LND or who had negative LND | 1.2% (10/826) | 0.9% (1/111) | — |
Adjuvant therapy | |||
No adjuvant therapy | 671 (75.9) | 87 (27.0) | |
EBRT ± brachytherapy | 137 (15.6) | 43 (13.3) | |
Brachytherapy | — | 118 (36.7) | |
Chemotherapy | 18 (2.1) | 5 (1.6) | |
Hormonal therapy | — | 1 (0.3) | |
Multimodal therapy | 26 (2.9) | 68 (21.1) | |
NA | 31 (3.5) | 0 (0) | — |
Validation
AUCs were 0.65 (95% confidence interval [CI], 0.61–0.69) and 0.71 (95% CI, 0.68–0.74) for the full and alternative nomogram without TD, respectively ( Figure 1 ). The predicted and the actual probabilities of LD are shown in the calibration plot ( Figure 2 , A and B). The performance of both nomograms appears to be partly inaccurate, with a mean error of 3.9% and 9.7%, respectively, in the whole population according to the decile of risk stratification ( Table 2 ). The performance appears to be heterogeneous according to ESMO risk stratifications. This subgroup stratification underlines acceptable discrimination ability for high-risk patients with poor calibration accuracy ( Table 3 ).
Decile | Full nomogram | Alternative nomogram without TD | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Patients, n | Nomogram score | Predicted probability | Observed probability | Error between predicted and observed probability | Patients, n | Nomogram score | Predicted probability | Observed probability | Error between predicted and observed probability | |
Whole population | 322 | — | 13.8% | 9.9% | 3.9% | 322 | — | 19.6% | 9.9% | 9.7% |
I | 61 | <45 | 0.9% | 4.9% | 4.0% | 93 | <62 | 4.9% | 3.5% | 1.4% |
II | 25 | 45–64 | 1.9% | 4.0% | 2.1% | 29 | 62–65 | 10% | 0% | 10% |
III | 63 | 65–86 | 4.7% | 7.9% | 3.2% | 20 | 66–79 | 13% | 10% | 3% |
IV | 31 | 87–97 | 5.0% | 3.2% | 1.8% | 26 | 80–81 | 15% | 15.3% | 0.3% |
V | 11 | 98–109 | 8.6% | 18.1% | 9.5% | 25 | 82–95 | 23.8% | 10.2% | 13.6% |
VI | 28 | 110–124 | 15.0% | 10.7% | 4.3% | 27 | 96–97 | 30% | 0% | 30% |
VII | 27 | 125–132 | 24.4% | 14.8% | 9.6% | 24 | 98–101 | 30% | 16.6% | 13.4% |
VIII | 42 | 133–149 | 27.5% | 16.6% | 10.9% | 30 | 102–111 | 38.5% | 26.6% | 11.9% |
IX | 14 | 150–156 | 37.1% | 0% | 37.1% | 25 | 112–117 | 40% | 0% | 40% |
X | 20 | ≥157 | 52.2% | 30% | 22.2% | 23 | ≥118 | 57.7% | 26.0% | 31.7% |