Validation of a second-generation multivariate index assay for malignancy risk of adnexal masses




Background


Women with adnexal mass suspected of ovarian malignancy are likely to benefit from consultation with a gynecologic oncologist, but imaging and biomarker tools to ensure this referral show low sensitivity and may miss cancer at critical stages.


Objective


The multivariate index assay (MIA) was designed to improve the detection of ovarian cancer among women undergoing surgery for a pelvic mass. To improve the prediction of benign masses, we undertook the redesign and validation of a second-generation MIA (MIA2G).


Study Design


MIA2G was developed using banked serum samples from a previously published prospective, multisite registry of patients who underwent surgery to remove an adnexal mass. Clinical validity was then established using banked serum samples from the OVA500 trial, a second prospective cohort of adnexal surgery patients. Based on the final pathology results of the OVA500 trial, this intended-use population for MIA2G testing was high risk, with an observed cancer prevalence of 18.7% (92/493). Coded samples were assayed for MIA2G biomarkers by an external clinical laboratory. Then MIA2G results were calculated and submitted to a clinical statistics contract organization for decoding and comparison to MIA results for each subject. Sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated, among other measures, and stratified by menopausal status, stage, and histologic subtype.


Results


Three MIA markers (cancer antigen 125, transferrin, and apolipoprotein A-1) and 2 new biomarkers (follicle-stimulating hormone and human epididymis protein 4) were included in MIA2G. A single cut-off separated high and low risk of malignancy regardless of patient menopausal status, eliminating potential for confusion or error. MIA2G specificity (69%, 277/401 [n/N]; 95% confidence interval [CI], 64.4–73.4%) and PPV (40%, 84/208; 95% CI, 33.9–47.2%) were significantly improved over MIA (specificity, 54%, 215/401; 95% CI, 48.7–58.4%, and PPV, 31%, 85/271; 95% CI, 26.1–37.1%, respectively) in this cohort. Sensitivity and NPV were not significantly different between the 2 tests. When combined with physician assessment, MIA2G correctly identified 75% of the malignancies missed by physician assessment alone.


Conclusion


MIA2G specificity and PPV were significantly improved compared with MIA, while sensitivity and NPV were unchanged. The second-generation test significantly improved the predicted efficiency of triage vs MIA without sacrificing high sensitivity and NPV, which are essential for effectiveness.


Introduction


The number of women diagnosed annually with adnexal mass far exceeds the number of ovarian cancer cases, posing a serious clinical challenge to accurately identify the subgroup of patients most likely to benefit from consultation with a gynecologic oncologist. Although numerous prediction models and referral guidelines have been tested in the preoperative evaluation of the adnexal mass, no single method has received widespread acceptance. In addition, numerous studies indicate that the majority of new ovarian cancer cases fail to be appropriately referred or treated at the time of first surgery, with significant detrimental effects on 5-year survival.


To improve the presurgical detection of ovarian cancer among women undergoing removal of adnexal masses, the multivariate index assay (MIA) OVA1 was developed. The test was cleared by the Food and Drug Administration (FDA) in 2009 for presurgical risk assessment limited to cases where adnexal excision is warranted, and validated in 2 previously published clinical trials. The MIA combined the results of the biomarker concentrations from the Siemens BNII assays (Siemens, Malvern, PA) for apolipoprotein A-1, transthyretin, beta-2-microglobulin, and transferrin (TRF) and the Elecsys assay for cancer antigen 125 (CA125-II) (Roche, Indianapolis, IN). In an intended-use clinical cohort (adnexal surgery patients enrolled from nongynecologic oncology practices), MIA showed significantly higher sensitivity for predicting malignancy compared with clinical impression, CA125-II, or modified American Congress of Obstetricians and Gynecologists (ACOG) criteria of Dearking et al, and negative predictive value (NPV) ranging from 95-98%. Limitations of the MIA assay, however, included a less than ideal specificity of 43-50%, as a consequence of a high false-positive rate resulting in a positive predictive value (PPV) of 30-42% in the cohorts examined. These results predict that many patients with benign masses may be classified as high risk, reducing overall triage effectiveness. The current study was undertaken to evaluate the clinical validity of a second-generation MIA (MIA2G), in which the MIA panel was redesigned to improve specificity and PPV of the assay while maintaining high sensitivity and NPV.


Herein we report that the MIA2G assay improves both specificity and PPV relative to MIA without compromising sensitivity, NPV, or detection of early-stage ovarian cancer.




Materials and Methods


The MIA2G algorithm was derived from samples from an intended use cohort described by Ueland et al. This proprietary algorithm was derived from the serum proteins apolipoprotein A-1, CA125-II, human epididymis protein 4 (HE4), follicle-stimulating hormone (FSH), and TRF using methods described by the coauthors.


For the validation study, archived serum samples from an independent prospectively collected set of specimens–the OVA500 study–were used. This study cohort had the same enrollment criteria as the OVA1 pivotal study and had been previously evaluated for the first-generation serum biomarker MIA as part of an independent verification of MIA. Consecutive patients who met inclusion criteria were prospectively enrolled from 27 sites throughout the United States, all of which had institutional review board approval. All enrolling clinicians were from nongynecologic oncology specialty practices, although patients may have had consultation with or undergone surgery by a gynecologic oncologist. Inclusion criteria were: women age ≥18 years, signed informed consent, agreeable to phlebotomy, and documented pelvic mass planned for surgical intervention within 3 months of imaging. A pelvic mass was confirmed by imaging (computed tomography, ultrasonography, or magnetic resonance imaging) prior to enrollment. Exclusion criteria included a diagnosis of malignancy in the previous 5 years (except of nonmelanoma skin cancers) or enrollment by a gynecologic oncologist. Menopause was defined as the absence of menses for ≥12 months or age ≥50 years. Demographic and clinicopathologic information was collected on case report forms.


A preoperative blood sample of ≤80 mL was processed within 1-6 hours of collection, and serum was frozen at the collection site. Serum samples were shipped on dry ice to an archive site (PrecisionMed Inc, Solana Beach, CA) where they were thawed and aliquoted, then frozen and stored at –65 to –85°C. All aliquots were thawed only once and consumed entirely during testing, so that no sample had undergone >2 or <2 freeze-thaw cycles. Serum biomarker concentrations were determined on the Roche cobas 6000 clinical analyzer, utilizing the c501 and e601 modules. The c501 module is a medium throughput (up to 600 samples/h), photometric detection module used for clinical chemistry applications, homogenous immunoassays, and whole blood measurement. The e601 module is a medium throughput electrochemiluminescent detection module used for heterogeneous immunoassays. Biomarker assays were run according to the manufacturer’s instructions. All measurements were performed on coded samples (blinded as to patient demographics or pathology outcome) at the Clinical Laboratory Improvement Amendments–/College of American Pathologists–certified laboratory of the Division of Clinical Chemistry, Department of Pathology, Johns Hopkins Medical Institution. The MIA2G combines the results of the biomarker concentrations from the cobas assays for apolipoprotein A-1, CA125-II, HE4, FSH, and TRF. Assays for apolipoprotein A-1 and TRF are immunoturbidimetric assays; CA125-II, HE4, and FSH assays use electrochemiluminescent detection. Package inserts for these assays indicate a maximum coefficient of variation of 1.1-2.8% for repeatability and 2.5%-4.5% for intermediate precision for serum samples on these individual biomarker assays.


The MIA2G risk score was calculated using software (OvaCalc, Version 4.0.0, Vermillion, Inc., Austin, TX) that uses the 5 biomarker values and a proprietary algorithm to return a dimensionless numerical score from 0.0-10.0. While MIA was optimized using different cut-offs of ≥5.0 (premenopausal) and ≥4.4 (postmenopausal) to separate higher- from lower-risk subjects, MIA2G uses a single risk cut-off of ≥5.0 regardless of menopausal status).


In the original OVA500 trial, clinicians were required to document the results of physical examination, family history, imaging, laboratory tests, and formal presurgical assessment of malignancy. In cases where a formal assessment was done by a clinician, other than the enrolling physician, the referral history and the specialty of the clinician who made the prediction were recorded, as was the specialty of the surgeon who ultimately operated on each patient. To reflect routine clinical judgment and referral behavior, physicians were not asked to either follow any specific prediction algorithm or justify their prediction. The same prospectively obtained physician assessment (PA) prediction of malignancy was utilized in the present trial, to compare routine clinical assessment to MIA2G independently and in combination with PA. Postoperative pathology diagnosis was recorded at each enrolling site and independently reviewed.


The coded sample MIA2G scores and biomarker values were submitted to an independent clinical statistics contract research organization, Applied Clinical Intelligence (Bala Cynwyd, PA), where they were matched by a subject identifier to the information from the case report form and used for statistical analyses. Clinical diagnostic performance criteria (sensitivity, specificity, NPV, and PPV) were calculated for MIA2G alone, PA alone, and MIA2G + PA.


Triage decision rules followed those previously published for MIA. The combined test result was declared positive when the patient had either a high-risk MIA2G score or the presurgical PA predicted a malignancy. Accordingly, MIA2G + PA was scored negative only when both MIA2G and PA predicted a benign outcome. A subset of the OVA500 cohort meeting study inclusion/exclusion criteria had MIA scores generated as part of a previously published study, and these data were used for comparison (N = 493).


Statistical analyses were performed with software (SAS, Version 9.2 or later; SAS Institute Inc, Cary, NC). Sensitivity was defined as: (test positives/all subjects with identified malignancy) × 100. Specificity was defined as: (test negatives/all subjects without identified malignancy) × 100. PPV was defined as: (true test positives/all test positive subjects) × 100. NPV was defined as: (true test negatives/all test negative subjects) × 100. Overall accuracy was defined as: (true test positives + true test negatives)/all subjects) × 100. These clinical performance measures are presented as a percentage score, followed by the numbers of subjects that define the measure, followed by the 95% confidence interval (CI) of the measure. Wilson score-corrected 95% CI was used throughout to provide better estimates for smaller subgroups. McNemar test was used to test the marginal homogeneity of the proportions of true positives (negatives) as identified by pairs of diagnostic tests under consideration. Additionally, statistics included CI for the differences between the risk indices taking into account the correlated nature of the indices, receiver operator characteristic (ROC) curve, ROC area under the curve (AUC), and their corresponding CIs. The P value from a test of the equality of the areas under the empirical ROC curves is presented (using methods indicated by DeLong et al as implemented in SAS, Version 9.4 software). Differences in ratios were considered significant if the lower bound of the CI for a difference comparison was >0 or the lower bound for a ratio comparison was >1. A total of approximately 500 evaluable subjects in the validation subset with an assumed prevalence of 20% would provide 95% 2-tailed CIs for estimates of sensitivity (%) and specificity (%) within ±7% and ±5% (absolute), respectively, using the defined cut-off value and assuming comparable levels of sensitivity and specificity seen in previous studies. Under the same assumption, the 95% 2-tailed CI would be at ±4.0% for PPV(%) and ±2.0% for NPV(%).




Materials and Methods


The MIA2G algorithm was derived from samples from an intended use cohort described by Ueland et al. This proprietary algorithm was derived from the serum proteins apolipoprotein A-1, CA125-II, human epididymis protein 4 (HE4), follicle-stimulating hormone (FSH), and TRF using methods described by the coauthors.


For the validation study, archived serum samples from an independent prospectively collected set of specimens–the OVA500 study–were used. This study cohort had the same enrollment criteria as the OVA1 pivotal study and had been previously evaluated for the first-generation serum biomarker MIA as part of an independent verification of MIA. Consecutive patients who met inclusion criteria were prospectively enrolled from 27 sites throughout the United States, all of which had institutional review board approval. All enrolling clinicians were from nongynecologic oncology specialty practices, although patients may have had consultation with or undergone surgery by a gynecologic oncologist. Inclusion criteria were: women age ≥18 years, signed informed consent, agreeable to phlebotomy, and documented pelvic mass planned for surgical intervention within 3 months of imaging. A pelvic mass was confirmed by imaging (computed tomography, ultrasonography, or magnetic resonance imaging) prior to enrollment. Exclusion criteria included a diagnosis of malignancy in the previous 5 years (except of nonmelanoma skin cancers) or enrollment by a gynecologic oncologist. Menopause was defined as the absence of menses for ≥12 months or age ≥50 years. Demographic and clinicopathologic information was collected on case report forms.


A preoperative blood sample of ≤80 mL was processed within 1-6 hours of collection, and serum was frozen at the collection site. Serum samples were shipped on dry ice to an archive site (PrecisionMed Inc, Solana Beach, CA) where they were thawed and aliquoted, then frozen and stored at –65 to –85°C. All aliquots were thawed only once and consumed entirely during testing, so that no sample had undergone >2 or <2 freeze-thaw cycles. Serum biomarker concentrations were determined on the Roche cobas 6000 clinical analyzer, utilizing the c501 and e601 modules. The c501 module is a medium throughput (up to 600 samples/h), photometric detection module used for clinical chemistry applications, homogenous immunoassays, and whole blood measurement. The e601 module is a medium throughput electrochemiluminescent detection module used for heterogeneous immunoassays. Biomarker assays were run according to the manufacturer’s instructions. All measurements were performed on coded samples (blinded as to patient demographics or pathology outcome) at the Clinical Laboratory Improvement Amendments–/College of American Pathologists–certified laboratory of the Division of Clinical Chemistry, Department of Pathology, Johns Hopkins Medical Institution. The MIA2G combines the results of the biomarker concentrations from the cobas assays for apolipoprotein A-1, CA125-II, HE4, FSH, and TRF. Assays for apolipoprotein A-1 and TRF are immunoturbidimetric assays; CA125-II, HE4, and FSH assays use electrochemiluminescent detection. Package inserts for these assays indicate a maximum coefficient of variation of 1.1-2.8% for repeatability and 2.5%-4.5% for intermediate precision for serum samples on these individual biomarker assays.


The MIA2G risk score was calculated using software (OvaCalc, Version 4.0.0, Vermillion, Inc., Austin, TX) that uses the 5 biomarker values and a proprietary algorithm to return a dimensionless numerical score from 0.0-10.0. While MIA was optimized using different cut-offs of ≥5.0 (premenopausal) and ≥4.4 (postmenopausal) to separate higher- from lower-risk subjects, MIA2G uses a single risk cut-off of ≥5.0 regardless of menopausal status).


In the original OVA500 trial, clinicians were required to document the results of physical examination, family history, imaging, laboratory tests, and formal presurgical assessment of malignancy. In cases where a formal assessment was done by a clinician, other than the enrolling physician, the referral history and the specialty of the clinician who made the prediction were recorded, as was the specialty of the surgeon who ultimately operated on each patient. To reflect routine clinical judgment and referral behavior, physicians were not asked to either follow any specific prediction algorithm or justify their prediction. The same prospectively obtained physician assessment (PA) prediction of malignancy was utilized in the present trial, to compare routine clinical assessment to MIA2G independently and in combination with PA. Postoperative pathology diagnosis was recorded at each enrolling site and independently reviewed.


The coded sample MIA2G scores and biomarker values were submitted to an independent clinical statistics contract research organization, Applied Clinical Intelligence (Bala Cynwyd, PA), where they were matched by a subject identifier to the information from the case report form and used for statistical analyses. Clinical diagnostic performance criteria (sensitivity, specificity, NPV, and PPV) were calculated for MIA2G alone, PA alone, and MIA2G + PA.


Triage decision rules followed those previously published for MIA. The combined test result was declared positive when the patient had either a high-risk MIA2G score or the presurgical PA predicted a malignancy. Accordingly, MIA2G + PA was scored negative only when both MIA2G and PA predicted a benign outcome. A subset of the OVA500 cohort meeting study inclusion/exclusion criteria had MIA scores generated as part of a previously published study, and these data were used for comparison (N = 493).


Statistical analyses were performed with software (SAS, Version 9.2 or later; SAS Institute Inc, Cary, NC). Sensitivity was defined as: (test positives/all subjects with identified malignancy) × 100. Specificity was defined as: (test negatives/all subjects without identified malignancy) × 100. PPV was defined as: (true test positives/all test positive subjects) × 100. NPV was defined as: (true test negatives/all test negative subjects) × 100. Overall accuracy was defined as: (true test positives + true test negatives)/all subjects) × 100. These clinical performance measures are presented as a percentage score, followed by the numbers of subjects that define the measure, followed by the 95% confidence interval (CI) of the measure. Wilson score-corrected 95% CI was used throughout to provide better estimates for smaller subgroups. McNemar test was used to test the marginal homogeneity of the proportions of true positives (negatives) as identified by pairs of diagnostic tests under consideration. Additionally, statistics included CI for the differences between the risk indices taking into account the correlated nature of the indices, receiver operator characteristic (ROC) curve, ROC area under the curve (AUC), and their corresponding CIs. The P value from a test of the equality of the areas under the empirical ROC curves is presented (using methods indicated by DeLong et al as implemented in SAS, Version 9.4 software). Differences in ratios were considered significant if the lower bound of the CI for a difference comparison was >0 or the lower bound for a ratio comparison was >1. A total of approximately 500 evaluable subjects in the validation subset with an assumed prevalence of 20% would provide 95% 2-tailed CIs for estimates of sensitivity (%) and specificity (%) within ±7% and ±5% (absolute), respectively, using the defined cut-off value and assuming comparable levels of sensitivity and specificity seen in previous studies. Under the same assumption, the 95% 2-tailed CI would be at ±4.0% for PPV(%) and ±2.0% for NPV(%).




Results


This study followed a Prospective Specimen Collection Retrospective Blinded Evaluation design using clinically annotated sera collected for the OVA500 trial. From August 2010 through December 2011 a total of 520 subjects were consecutively enrolled, all of whom provided a specimen. One subject was found to have been enrolled twice, leaving 519 for analysis. Subjects were excluded from the final analysis for: failed exclusion criteria (imaging outside of window prior to inclusion, surgery >12 weeks, previous cancer <5 years, n = 12), primary contact was a gynecologic oncologist (n = 6), and no ovarian pathology (n = 8). The remaining 493 fully evaluable patients were scored after MIA2G testing. The demographic characteristics of the subject cohort are presented in Table 1 . All 493 subjects had a nongynecologic oncologist as their primary contact. The specialty of physicians making the clinical assessment was a nongynecologic oncologist in 249 patients and a gynecologic oncologist in 244 patients.



Table 1

Demographics and pathology of OVA500 clinical cohort


















































































































































































































































































































All enrolled subjects
N = 519
Evaluable subjects
All evaluable subjects
N = 493
Premenopausal women
N = 276
Postmenopausal women
N = 217
Age, y
N 519 493 276 217
Mean (SD) 48.4 (14.32) 48.6 (14.16) 39.5 (8.96) 60.2 (10.74)
Median 47 48 41 60
Range (minimum, maximum) 18, 87 18, 87 18, 60 33, 87
Ethnicity/race, n (%)
Asian 13 (2.5) 13 (2.6) 8 (2.9) 5 (2.3)
Black or African American 86 (16.6) 81 (16.4) 54 (19.6) 27 (12.4)
Native Hawaiian/Pacific islander 1 (0.2) 1 (0.2) 1 (0.4) 0 (0.0)
White 365 (70.3) 347 (70.4) 173 (62.7) 174 (80.2)
Other 5 (1.0) 5 (1.0) 4 (1.4) 1 (0.5)
Hispanic or Latino 49 (9.4) 46 (9.3) 36 (13.0) 10 (4.6)
Surgery performed, n (%)
Yes 511 (98.5) 493 (100.0) 276 (100.0) 217 (100.0)
No 8 (1.5) 0 (0.0) 0 (0.0) 0 (0.0)
Time to surgery, wk
N 511 493 276 217
Mean (SD) 2.1 (2.19) 2.0 (1.72) 1.9 (1.68) 2.1 (1.76)
Median 1 1 1 2
Range (minimum, maximum) 0, 24 0, 11 0, 10 0, 11
Specialty of surgeon, n (%)
Obstetrics/gynecology 212 (40.8) 204 (41.4) 144 (52.2) 60 (27.6)
Gynecological oncology 299 (57.6) 289 (58.6) 132 (47.8) 157 (72.4)
Pathology diagnosis, n (%)
Benign ovarian tumor 415 (80.0) 401 (81.3) 245 (88.8) 156 (71.9)
Nonovarian primary malignancy with no involvement of ovaries 4 (0.8) 4 (0.8) 1 (0.4) 3 (1.4)
Nonovarian primary malignancy with involvement of ovaries 6 (1.2) 6 (1.2) 2 (0.7) 4 (1.8)
Low malignant potential (borderline) 17 (3.3) 17 (3.4) 5 (1.8) 12 (5.5)
Primary malignant ovarian malignancy 69 (13.3) 65 (13.2) 23 (8.3) 42 (19.4)
If malignant ovarian malignancy: predominant histology, n (%)
Epithelial: serous 24 (4.6) 24 (4.9) 8 (2.9) 16 (7.4)
Epithelial: mucinous 12 (2.3) 9 (1.8) 1 (0.4) 8 (3.7)
Epithelial: endometrioid 13 (2.5) 13 (2.6) 5 (1.8) 8 (3.7)
Epithelial: clear cell 5 (1.0) 5 (1.0) 1 (0.4) 4 (1.8)
Epithelial: carcinosarcoma 1 (0.2) 1 (0.2) 1 (0.4) 0 (0.0)
Epithelial: mixed 2 (0.4) 1 (0.2) 0 (0.0) 1 (0.5)
Other 12 (2.3) 12 (2.4) 7 (2.5) 5 (2.3)
Tumor grade, n (%)
N 69 65 23 42
1 9 (13.0) 9 (13.8) 1 (4.3) 8 (19.0)
2 12 (17.4) 10 (15.4) 4 (17.4) 6 (14.3)
3 43 (62.3) 41 (63.1) 15 (65.2) 26 (61.9)
Not graded 5 (7.2) 5 (7.7) 3 (13.0) 2 (4.8)
Tumor stage, n (%)
N 69 65 23 42
I 30 (43.5) 28 (43.1) 9 (39.1) 19 (45.2)
II 7 (10.1) 7 (10.8) 2 (8.7) 5 (11.9)
III 26 (37.7) 25 (38.5) 10 (43.5) 15 (35.7)
IV 6 (8.7) 5 (7.7) 2 (8.7) 3 (7.1)

Only gold members can continue reading. Log In or Register to continue

Stay updated, free articles. Join our Telegram channel

May 2, 2017 | Posted by in GYNECOLOGY | Comments Off on Validation of a second-generation multivariate index assay for malignancy risk of adnexal masses

Full access? Get Clinical Tree

Get Clinical Tree app for offline access