A correct diagnosis of any adnexal mass is essential to triage women to appropriate treatment pathways. Several imaging techniques are available that may be used to provide an assessment of a mass before treatment, such as transvaginal ultrasonography, magnetic resonance imaging, computed tomography, and positron emission tomography combined with computed tomography. In this chapter, we focus in depth on the role of transvaginal ultrasonography, as current evidence suggests it is the most appropriate initial imaging investigation to identify and characterise any mass if present in women suspected of having adnexal pathology. Subjective assessment by an experienced ultrasound examiner is the optimal approach to diagnose masses, followed by risk models and rules developed by the International Ovarian Tumor Analysis study. A group of tumours has proven difficult to classify with transvaginal ultrasound, and remain a diagnostic challenge for which accurate second-stage tests would be of value. Some studies suggest that magnetic resonance imaging (MRI), compared with other imaging modalities, may play a role in the assessment of this cohort of ‘difficult to classify’ adnexal masses. These studies, however, did not report quality of transvaginal ultrasonography (i.e. experience level of the examiner) and lacked uniformity in describing the criteria used to define such ‘difficult’ masses. On the basis of standardised terminology developed by the International Ovarian Tumor Analysis study to describe adnexal masses, as well as prediction models and rules developed in the course of the study, we propose new criteria that we can use to clearly define complex or ‘difficult to classify’ adnexal masses to focus the role for second-line imaging tests, such as conventional magnetic resonance imaging combined with dynamic contrast-enhanced or diffusion-weighted sequences on masses where further tests other than ultrasonography would be of value.
Introduction
An ovarian neoplasm or cyst is a relatively common clinical condition that occurs at all stages of life , and is a leading indication for gynaecologic surgery. The annual hospitalisation rate for women with suspected ovarian neoplasms is reported to range from 160,000 to 289,000 women in the USA. Most hospitalised women will eventually undergo surgery . Fortunately, most women with an adnexal mass do not have cancer . This implies that an accurate pre-surgical assessment of the likely pathology of any mass is pivotal, as unnecessary or overly radical surgery are significant risks to women with a cyst that is inappropriately characterised as malignant; the consequences of failing to recognise cancer will significantly affect prognosis . Most presumed benign cysts in pre- and postmenopausal women can either be safely managed expectantly or removed using laparoscopic surgery, therefore avoiding unnecessary costs and morbidity . On the other hand, when suspicion of cancer is high, referral to a specialist oncology centre is warranted to improve overall survival .
Transvaginal ultrasonography
Subjective assessment of gray scale and colour Doppler ultrasound findings with transvaginal ultrasonography (TVS) is the first-line imaging technique for detecting and characterising adnexal masses . The optimal approach using ultrasound to discriminate between the benign or malignant nature of an adnexal mass before surgery is the subjective assessment of gray-scale and Doppler ultrasound findings by an expert level III examiner with a special interest in gynaecological ultrasonography . In the International Ovarian Tumor Analysis (IOTA) six categories of diagnostic certainty have been proposed for the subjective assessment of adnexal masses (i.e. certainly malignant, probably malignant, uncertain but more likely to be malignant, uncertain but more likely to be benign, probably benign, or certainly benign) . When expert examiners are highly or moderately confident about the histological nature of an adnexal mass, a large study on 3511 adnexal masses by the IOTA collaboration reported a sensitivity and specificity of 91% and 96% of malignancy . Only a small proportion (6–8%) of masses cannot be confidently classified as benign or malignant when using subjective assessment by experienced ultrasound examiners , and accuracy is limited to 68% in this group of tumours, with rather poor sensitivity ranging from 57–70%, and specificity of only 60–77% . The IOTA studies have evaluated several candidate secondary tests to improve test performance of expert examiners, such as biochemical markers (i.e. serum CA125), serum human epididymis protein-4, or algorithms that incorporate these biomarkers (i.e. Risk of Malignancy Index [RMI] or Risk of Ovarian Malignancy Algorithm) . None, however, has proved useful in this group of difficult to classify tumours .
The ability to characterise adnexal tumours correctly with TVS when using subjective assessment of gray scale and colour Doppler ultrasound findings clearly improves with the level of experience of the ultrasound operator . Therefore, investment in education and training in gynaecological ultrasound examination is pivotal to minimise the healthcare burden related to misclassified adnexal tumours . The European Federation of Societies for Ultrasound in Medicine and Biology has published guidelines on how much training and education in gynaecological ultrasound imaging is needed to obtain competence at different levels .
Difficult tumours
Most unclassifiable tumours after expert subjective evaluation are benign, with only 16% being invasive cancers and 14% borderline malignant tumours . Among unclassifiable cases, serous and mucinous cystadenomas and cystadenofibromas, fibromas, rare benign tumours, and borderline malignancies were two to three times more common than in cases where the examiner had a higher degree of confidence . Unclassifiable adnexal tumours have certain typical morphological features. These tumours were larger than classifiable masses, more often had a unilocular-solid or multilocular-solid appearance, and more frequently had irregular walls and papillary projections than classifiable masses. Multilocular cysts, with more than 10 cyst locules, were also more often observed among unclassifiable masses. An absence of colour Doppler signals was less common in these masses, whereas a moderate amount of colour Doppler signals (colour score 3) was more common . Ultrasound examples of ‘difficult to classify’ masses for expert examiners are shown in Fig. 1 .
Ultrasound-based models and rules to characterise adnexal masses
An alternative approach to using subjective assessment is to use risk models or diagnostic rules to triage women as being at low or high risk of cancer. Such models and rules have been developed to assist clinicians with variable training backgrounds and levels of expertise. In the most recent systematic review and meta-analysis to address the performance of mathematical models and scoring systems, a total of 195 diagnostic accuracy studies were included . It considered 116 different prediction models for characterising adnexal masses . The meta-analysis focused on 19 different models that had been externally validated in 96 studies. The RMI was the most frequently validated model, with a pooled sensitivity of 72% (67–76%) and specificity of 92% (89–93%), using a cut-off level of 200. The IOTA logistic regression model LR2, with a risk cut off of 10% and simple rules, were superior to all other models included in the meta-analysis, with a pooled sensitivity and specificity of 92% (88–95%) and 83% (77–88%) for LR2, and 93% (89–95%) and 81% (76–85%) for simple rules . The logistic regression model based on 12 variables (IOTA LR1) was not included in this meta-analysis, because it had not been validated in over 1000 patients. Previous IOTA studies, however, have confirmed that LR1 has a performance that is at least as good as that of LR2 . On the other hand, LR2 is based on six variables only (see below), which facilitates its use in clinical practice. Thus, contrary to the conclusion of a previous systematic review , current evidence-based practice should propose IOTA strategies as the primary test to characterise adnexal masses .
Simple rules
The simple rules developed by the IOTA collaborative group are based on five ultrasound features of malignancy (M-features) and five ultrasound features suggestive of a benign lesion (B-features) ( Table 1 ) . An adnexal mass is classified as malignant if at least one M-feature and no B-features are present and vice versa. When no B- or M-features are present, or if both B- and M-features are present, then simple rules are considered inconclusive (uncertain), and a different diagnostic method should be used . So far, these simple rules have been externally validated in five studies in 17 clinical centres . In these studies, the rules could be applied to 79–89% of all adnexal masses . In the group of tumours in which the simple rules could not be applied, the malignancy rate varied from 23–51% . The rules worked well for endometriomas, dermoid cysts, simple cysts, and advanced invasive cancers, but they work less well for hydrosalpinges, peritoneal cysts, abscesses, fibromas, rare benign tumours, stage I borderline tumours, and early stage primary invasive cancers . This implies that the rules work well in tumours that are usually easily classifiable using subjective assessment but less well in tumours that tend to be more difficult to classify using subjective assessment, with the exception that hydrosalpinges are relatively easy to classify .
Features for predicting a malignant tumour (M-features) | Features for predicting a benign tumour (B-features) | ||
---|---|---|---|
M1 | Irregular solid tumour | B1 | Unilocular tumour |
M2 | Presence of ascites | B2 | Presence of solid components where the largest solid component has a largest diameter <7 mm |
M3 | At least four papillary structures | B3 | Presence of acoustic shadows |
M4 | Irregular multilocular solid tumour with largest diameter ≥100 mm | B4 | Smooth multilocular tumour with largest diameter <100 mm |
M5 | Very strong blood flow (colour score 4) | B5 | No blood flow (colour score 1) |
The IOTA studies have suggested subjective assessment by an experienced examiner as the best secondary test to classify these inconclusive cases. On prospective validation, this two-step strategy reached a sensitivity of 90% to detect ovarian malignancy and a specificity of 93% . Another approach, should an experienced ultrasound examiner be unavailable, is to classify all inconclusive cases as malignant. This would minimise the number of missed cancers, but this would be at the cost of increased numbers of misclassified benign tumours. In 2011, the Royal College of Obstetricians and Gynaecologists included the simple rules in their guideline for evaluating ovarian pathology in premenopausal women .
One advantage of using simple rules is that they offer clear criteria that define a group of more difficult tumours for which second-stage diagnostic tests might be of value. No information, however, is available about the extent of any other test, other than expert subjective assessment can help to classify tumours as benign or malignant correctly where the simple rules do not apply.
Logistic regression model LR2
The IOTA LR2 model uses six variables: (1) patient age (years); (2) presence of ascites (yes = 1, no = 0); (3) presence of blood flow within a papillary projection (yes = 1, no = 0); (4) maximum diameter of the solid component (expressed in millimeters and truncated at 50 mm); (5) irregular internal cyst walls (yes = 1, no = 0); and (6) presence of acoustic shadows (yes = 1, no = 0). LR2 estimates the probability of malignancy for an adnexal tumour as 1/(1 + exp(−z)), where z = −5.3718 + 0.0354(1) + 1.6159(2) + 1.1768(3) + 0.0697(4) + 0.9586(5) − 2.9486(6). A probability cut-off of 10% was proposed to classify tumours as benign or malignant based on LR2 risk scores . The advantage of this mathematical model over simple rules is that it can be applied to all tumours. Although all adnexal tumours are classifiable with LR2, test performance might improve if women with intermediate risk of malignancy (5–25%) are referred for second-stage testing, as with simple rules (e.g. magnetic resonance imaging or subjective assessment by expert examiners) .
Predicting subtypes of malignant adnexal pathology: multiclass risk models
Current ultrasound-based prediction models used in practice for characterising adnexal tumours classically discriminate between two outcomes (i.e. cancer or no cancer). From a clinical point of view, it is relevant to obtain more information on the different subtypes of malignant disease (e.g. metastatic, early or advanced primary invasive or borderline malignancy) because each is managed differently with implications in relation to type of surgery, length of hospitalization, and financial cost . Recently, the IOTA study has already demonstrated that predicting multiclass risk estimates for adnexal tumours is feasible, but needs further refinement .
Transvaginal ultrasonography
Subjective assessment of gray scale and colour Doppler ultrasound findings with transvaginal ultrasonography (TVS) is the first-line imaging technique for detecting and characterising adnexal masses . The optimal approach using ultrasound to discriminate between the benign or malignant nature of an adnexal mass before surgery is the subjective assessment of gray-scale and Doppler ultrasound findings by an expert level III examiner with a special interest in gynaecological ultrasonography . In the International Ovarian Tumor Analysis (IOTA) six categories of diagnostic certainty have been proposed for the subjective assessment of adnexal masses (i.e. certainly malignant, probably malignant, uncertain but more likely to be malignant, uncertain but more likely to be benign, probably benign, or certainly benign) . When expert examiners are highly or moderately confident about the histological nature of an adnexal mass, a large study on 3511 adnexal masses by the IOTA collaboration reported a sensitivity and specificity of 91% and 96% of malignancy . Only a small proportion (6–8%) of masses cannot be confidently classified as benign or malignant when using subjective assessment by experienced ultrasound examiners , and accuracy is limited to 68% in this group of tumours, with rather poor sensitivity ranging from 57–70%, and specificity of only 60–77% . The IOTA studies have evaluated several candidate secondary tests to improve test performance of expert examiners, such as biochemical markers (i.e. serum CA125), serum human epididymis protein-4, or algorithms that incorporate these biomarkers (i.e. Risk of Malignancy Index [RMI] or Risk of Ovarian Malignancy Algorithm) . None, however, has proved useful in this group of difficult to classify tumours .
The ability to characterise adnexal tumours correctly with TVS when using subjective assessment of gray scale and colour Doppler ultrasound findings clearly improves with the level of experience of the ultrasound operator . Therefore, investment in education and training in gynaecological ultrasound examination is pivotal to minimise the healthcare burden related to misclassified adnexal tumours . The European Federation of Societies for Ultrasound in Medicine and Biology has published guidelines on how much training and education in gynaecological ultrasound imaging is needed to obtain competence at different levels .
Difficult tumours
Most unclassifiable tumours after expert subjective evaluation are benign, with only 16% being invasive cancers and 14% borderline malignant tumours . Among unclassifiable cases, serous and mucinous cystadenomas and cystadenofibromas, fibromas, rare benign tumours, and borderline malignancies were two to three times more common than in cases where the examiner had a higher degree of confidence . Unclassifiable adnexal tumours have certain typical morphological features. These tumours were larger than classifiable masses, more often had a unilocular-solid or multilocular-solid appearance, and more frequently had irregular walls and papillary projections than classifiable masses. Multilocular cysts, with more than 10 cyst locules, were also more often observed among unclassifiable masses. An absence of colour Doppler signals was less common in these masses, whereas a moderate amount of colour Doppler signals (colour score 3) was more common . Ultrasound examples of ‘difficult to classify’ masses for expert examiners are shown in Fig. 1 .
Ultrasound-based models and rules to characterise adnexal masses
An alternative approach to using subjective assessment is to use risk models or diagnostic rules to triage women as being at low or high risk of cancer. Such models and rules have been developed to assist clinicians with variable training backgrounds and levels of expertise. In the most recent systematic review and meta-analysis to address the performance of mathematical models and scoring systems, a total of 195 diagnostic accuracy studies were included . It considered 116 different prediction models for characterising adnexal masses . The meta-analysis focused on 19 different models that had been externally validated in 96 studies. The RMI was the most frequently validated model, with a pooled sensitivity of 72% (67–76%) and specificity of 92% (89–93%), using a cut-off level of 200. The IOTA logistic regression model LR2, with a risk cut off of 10% and simple rules, were superior to all other models included in the meta-analysis, with a pooled sensitivity and specificity of 92% (88–95%) and 83% (77–88%) for LR2, and 93% (89–95%) and 81% (76–85%) for simple rules . The logistic regression model based on 12 variables (IOTA LR1) was not included in this meta-analysis, because it had not been validated in over 1000 patients. Previous IOTA studies, however, have confirmed that LR1 has a performance that is at least as good as that of LR2 . On the other hand, LR2 is based on six variables only (see below), which facilitates its use in clinical practice. Thus, contrary to the conclusion of a previous systematic review , current evidence-based practice should propose IOTA strategies as the primary test to characterise adnexal masses .
Simple rules
The simple rules developed by the IOTA collaborative group are based on five ultrasound features of malignancy (M-features) and five ultrasound features suggestive of a benign lesion (B-features) ( Table 1 ) . An adnexal mass is classified as malignant if at least one M-feature and no B-features are present and vice versa. When no B- or M-features are present, or if both B- and M-features are present, then simple rules are considered inconclusive (uncertain), and a different diagnostic method should be used . So far, these simple rules have been externally validated in five studies in 17 clinical centres . In these studies, the rules could be applied to 79–89% of all adnexal masses . In the group of tumours in which the simple rules could not be applied, the malignancy rate varied from 23–51% . The rules worked well for endometriomas, dermoid cysts, simple cysts, and advanced invasive cancers, but they work less well for hydrosalpinges, peritoneal cysts, abscesses, fibromas, rare benign tumours, stage I borderline tumours, and early stage primary invasive cancers . This implies that the rules work well in tumours that are usually easily classifiable using subjective assessment but less well in tumours that tend to be more difficult to classify using subjective assessment, with the exception that hydrosalpinges are relatively easy to classify .
Features for predicting a malignant tumour (M-features) | Features for predicting a benign tumour (B-features) | ||
---|---|---|---|
M1 | Irregular solid tumour | B1 | Unilocular tumour |
M2 | Presence of ascites | B2 | Presence of solid components where the largest solid component has a largest diameter <7 mm |
M3 | At least four papillary structures | B3 | Presence of acoustic shadows |
M4 | Irregular multilocular solid tumour with largest diameter ≥100 mm | B4 | Smooth multilocular tumour with largest diameter <100 mm |
M5 | Very strong blood flow (colour score 4) | B5 | No blood flow (colour score 1) |
The IOTA studies have suggested subjective assessment by an experienced examiner as the best secondary test to classify these inconclusive cases. On prospective validation, this two-step strategy reached a sensitivity of 90% to detect ovarian malignancy and a specificity of 93% . Another approach, should an experienced ultrasound examiner be unavailable, is to classify all inconclusive cases as malignant. This would minimise the number of missed cancers, but this would be at the cost of increased numbers of misclassified benign tumours. In 2011, the Royal College of Obstetricians and Gynaecologists included the simple rules in their guideline for evaluating ovarian pathology in premenopausal women .
One advantage of using simple rules is that they offer clear criteria that define a group of more difficult tumours for which second-stage diagnostic tests might be of value. No information, however, is available about the extent of any other test, other than expert subjective assessment can help to classify tumours as benign or malignant correctly where the simple rules do not apply.
Logistic regression model LR2
The IOTA LR2 model uses six variables: (1) patient age (years); (2) presence of ascites (yes = 1, no = 0); (3) presence of blood flow within a papillary projection (yes = 1, no = 0); (4) maximum diameter of the solid component (expressed in millimeters and truncated at 50 mm); (5) irregular internal cyst walls (yes = 1, no = 0); and (6) presence of acoustic shadows (yes = 1, no = 0). LR2 estimates the probability of malignancy for an adnexal tumour as 1/(1 + exp(−z)), where z = −5.3718 + 0.0354(1) + 1.6159(2) + 1.1768(3) + 0.0697(4) + 0.9586(5) − 2.9486(6). A probability cut-off of 10% was proposed to classify tumours as benign or malignant based on LR2 risk scores . The advantage of this mathematical model over simple rules is that it can be applied to all tumours. Although all adnexal tumours are classifiable with LR2, test performance might improve if women with intermediate risk of malignancy (5–25%) are referred for second-stage testing, as with simple rules (e.g. magnetic resonance imaging or subjective assessment by expert examiners) .
Predicting subtypes of malignant adnexal pathology: multiclass risk models
Current ultrasound-based prediction models used in practice for characterising adnexal tumours classically discriminate between two outcomes (i.e. cancer or no cancer). From a clinical point of view, it is relevant to obtain more information on the different subtypes of malignant disease (e.g. metastatic, early or advanced primary invasive or borderline malignancy) because each is managed differently with implications in relation to type of surgery, length of hospitalization, and financial cost . Recently, the IOTA study has already demonstrated that predicting multiclass risk estimates for adnexal tumours is feasible, but needs further refinement .