Adnexal masses difficult to classify as benign or malignant using subjective assessment of gray-scale and Doppler ultrasound findings: logistic regression models do not help

Valentin, L; Ameye, L; Savelli, L; Fruscio, R; Fpg, Leone; Czekierdowski, A; Lissoni, Aa; Fischerova, D; Guerriero, Stefano; Van Holsbeke, C; Van Huffel, S; Timmerman, D.

doi:10.1002/uog.9030

Abstract OBJECTIVE: To develop a logistic regression model that can discriminate between benign and malignant adnexal masses perceived to be difficult to classify by subjective evaluation of gray-scale and Doppler ultrasound findings (subjective assessment) and to compare its diagnostic performance with that of subjective assessment, serum CA 125 and the risk of malignancy index (RMI). METHODS: We used data from the 3511 patients with an adnexal mass included in the International Ovarian Tumor Analysis (IOTA) studies. All patients had been examined using transvaginal gray-scale and Doppler ultrasound following a standardized research protocol carried out by an experienced ultrasound examiner using a high-end ultrasound system. In addition to prospectively collecting information on > 40 clinical and ultrasound variables, the ultrasound examiner classified each mass as certainly or probably benign, unclassifiable, or certainly or probably malignant. A logistic regression model to discriminate between benignity and malignancy was developed for the unclassifiable masses (n = 244, i.e. 7% of all tumors) using a training set (160 tumors, 45 malignancies) and then tested on a test set (84 tumors, 28 malignancies). The gold standard was the histological diagnosis of the surgically removed adnexal mass. The area under the receiver-operating characteristics curve (AUC), sensitivity, specificity, positive likelihood ratio (LR+) and negative likelihood ratio (LR-) were used to describe diagnostic performance and were compared between subjective assessment, CA 125, the RMI and the logistic regression model created. RESULTS: One variable was retained in the logistic regression model: the largest diameter (in mm) of the largest solid component of the tumor (odds ratio (OR) = 1.04; 95% CI, 1.02-1.06). The model had an AUC of 0.68 (95% CI, 0.59-0.78) on the training set and an AUC of 0.65 (95% CI, 0.53-0.78) on the test set. On the test set, a cut-off of 25% probability of malignancy (corresponding to the largest diameter of the largest solid component of 23 mm) resulted in a sensitivity of 64% (18/28), a specificity of 55% (31/56), an LR+ of 1.44 and an LR- of 0.65. The corresponding values for subjective assessment were 68% (19/28), 59% (33/56), 1.65 and 0.55. On the test set of patients with available CA 125 results, the LR+ and LR- of the logistic regression model (cut-off = 25% probability of malignancy) were 1.29 and 0.73, of subjective assessment were 1.45 and 0.63, of CA 125 (cut-off = 35 U/mL) were 1.24 and 0.84 and of RMI (cut-off = 200) were 1.21 and 0.92. CONCLUSIONS: About 7% of adnexal masses that are considered appropriate for surgical removal cannot be classified as benign or malignant by experienced ultrasound examiners using subjective assessment. Logistic regression models to estimate the risk of malignancy, CA 125 measurements and the RMI are not helpful in these masses.