Using machine learning to predict ovarian cancer

被引:70
作者
Lu, Mingyang [1 ,2 ,3 ]
Fan, Zhenjiang [4 ]
Xu, Bin [1 ,2 ,3 ]
Chen, Lujun [1 ,2 ,3 ]
Zheng, Xiao [1 ,2 ,3 ]
Li, Jundong [5 ]
Znati, Taieb [4 ]
Mi, Qi [6 ]
Jiang, Jingting [1 ,2 ,3 ]
机构
[1] Soochow Univ, Dept Tumor Biol Treatment, Affiliated Hosp 3, Changzhou, Jiangsu, Peoples R China
[2] Jiangsu Engn Res Ctr Tumor Immunotherapy, Changzhou, Jiangsu, Peoples R China
[3] Soochow Univ, Inst Cell Therapy, Changzhou, Jiangsu, Peoples R China
[4] Univ Pittsburgh, Dept Comp Sci, Pittsburgh, PA 15260 USA
[5] Univ Virginia, Dept Elect & Comp Engn, Charlottesville, VA USA
[6] Univ Pittsburgh, Dept Sports Med & Nutr, Pittsburgh, PA 15260 USA
基金
中国国家自然科学基金;
关键词
Ovarian Cancer; Tumor Marker; Machine Learning; DIAGNOSTIC-VALUE; HE-4; CA125; CLASSIFICATION; PROTEIN; INDEX; ROMA; BIOMARKER; RISK;
D O I
10.1016/j.ijmedinf.2020.104195
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Objective: Ovarian cancer (OC) is one of the most common types of cancer in women. Accurately prediction of benign ovarian tumors (BOT) and OC has important practical value. Methods: Our dataset consists of 349 Chinese patients with 49 variables including demographics, blood routine test, general chemistry, and tumor markers. Machine learning Minimum Redundancy - Maximum Relevance (MRMR) feature selection method was applied on the 235 patients' data (89 BOT and 146 OC) to select the most relevant features, with which a simple decision tree model was constructed. The model was tested on the rest of 114 patients (89 BOT and 25 OC). The results were compared with the predictions produced by using the risk of ovarian malignancy algorithm (ROMA) and logistic regression model. Results: Eight notable features were selected by MRMR, among which two were identified as the top features by the decision tree model: human epididymis protein 4 (HE4) and carcinoembryonic antigen (CEA). Particularly, CEA is a valuable marker for OC prediction in patients with low HE4. The model also yields better prediction result than ROMA. Conclusion: Machine learning approaches were able to accurately classify BOT and OC. Our goal is to derive a simple predictive model which also carries a good performance. Using our approach, we obtained a model that consists of just two biomarkers, HE4 and CEA. The model is simple to interpret and outperforms the existing OC prediction methods. It demonstrates that the machine learning approach has good potential in predictive modeling for the complex diseases.
引用
收藏
页数:8
相关论文
共 35 条
[1]  
[Anonymous], 2013, EUR C MACH LEARN PRI
[2]  
[Anonymous], 2019, Introduction to Data Mining
[3]   A comparison of CA125, HE4, risk ovarian malignancy algorithm (ROMA), and risk malignancy index (RMI) for the classification of ovarian masses [J].
Anton, Cristina ;
Carvalho, Filomena Marino ;
Oliveira, Elci Isabel ;
Rosa Maciel, Gustavo Arantes ;
Baracat, Edmund Chada ;
Carvalho, Jesus Paula .
CLINICS, 2012, 67 (05) :437-441
[4]   Comparison of HE 4, CA 125, ROMA score and ultrasound score in the differential diagnosis of ovarian masses [J].
Aslan, Koray ;
Onan, M. Anil ;
Yilmaz, Canan ;
Bukan, Neslihan ;
Erdem, Mehmet .
JOURNAL OF GYNECOLOGY OBSTETRICS AND HUMAN REPRODUCTION, 2020, 49 (05)
[5]  
Bray F, 2018, CA-CANCER J CLIN, V68, P394, DOI [10.3322/caac.21492, 10.3322/caac.21609]
[6]  
Cheng J, 1999, UNCERTAINTY IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, P101
[7]   Clinical value of ROMA index in diagnosis of ovarian cancer: meta-analysis [J].
Cui, Ranliang ;
Wang, Yichao ;
Li, Ying ;
Li, Yueguo .
CANCER MANAGEMENT AND RESEARCH, 2019, 11 :2545-2551
[8]   Minimum redundancy feature selection from microarray gene expression data [J].
Ding, C ;
Peng, HC .
PROCEEDINGS OF THE 2003 IEEE BIOINFORMATICS CONFERENCE, 2003, :523-528
[9]  
Ding C.H. Q., 2002, RECOMB, P127
[10]   Comparison of discrimination methods for the classification of tumors using gene expression data [J].
Dudoit, S ;
Fridlyand, J ;
Speed, TP .
JOURNAL OF THE AMERICAN STATISTICAL ASSOCIATION, 2002, 97 (457) :77-87