Breast Cancer Prediction Based on Differential Privacy and Logistic Regression Optimization Model

被引:1
作者
Chen, Hua [1 ]
Wang, Nan [1 ]
Zhou, Yuan [1 ,2 ]
Mei, Kehui [1 ]
Tang, Mengdi [1 ]
Cai, Guangxing [1 ]
机构
[1] Hubei Univ Technol, Sch Sci, Wuhan 430068, Peoples R China
[2] Wuhan Univ Bioengn, Sch Comp Sci & Technol, Wuhan 430415, Peoples R China
来源
APPLIED SCIENCES-BASEL | 2023年 / 13卷 / 19期
基金
中国国家自然科学基金;
关键词
breast cancer; feature selection; batch gradient descent; differential privacy; logistic regression;
D O I
10.3390/app131910755
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
In order to improve the classification effect of the logistic regression (LR) model for breast cancer prediction, a new hybrid feature selection method is proposed to process the data, using the Pearson correlation test and the iterative random forest algorithm based on out-of-bag estimation (RF-OOB) to screen the optimal 17 features as inputs to the model. Secondly, the LR is optimized using the batch gradient descent (BGD-LR) algorithm to train the loss function of the model to minimize the loss. In order to protect the privacy of breast cancer patients, a differential privacy protection technology is added to the BGD-LR model, and an LR optimization model based on differential privacy with batch gradient descent (BDP-LR) is constructed. Finally, experiments are carried out on the Wisconsin Diagnostic Breast Cancer (WDBC) dataset. Meanwhile, accuracy, precision, recall, and F1-score are selected as the four main evaluation indicators. Moreover, the hyperparameters of each model are determined by the grid search method and the cross-validation method. The experimental results show that after hybrid feature selection, the optimal results of the four main evaluation indicators of the BGD-LR model are 0.9912, 1, 0.9886, and 0.9943, in which the accuracy, recall, and F1-scores are increased by 2.63%, 3.41%, and 1.76%, respectively. For the BDP-LR model, when the privacy budget epsilon is taken as 0.8, the classification performance and privacy protection effect of the model reach an effective balance. At the same time, the four main evaluation indicators of the model are 0.9721, 0.9975, 0.9664, and 0.9816, which are improved by 1.58%, 0.26%, 1.81%, and 1.07%, respectively. Comparative analysis shows that the models of BGD-LR and BDP-LR constructed in this paper perform better than other classification models.
引用
收藏
页数:21
相关论文
共 51 条
[1]   A new nested ensemble technique for automated diagnosis of breast cancer [J].
Abdar, Moloud ;
Zomorodi-Moghadam, Mariam ;
Zhou, Xujuan ;
Gururajan, Raj ;
Tao, Xiaohui ;
Barua, Prabal D. ;
Gururajan, Rashmi .
PATTERN RECOGNITION LETTERS, 2020, 132 :123-131
[2]   A new fusion of grey wolf optimizer algorithm with a two-phase mutation for feature selection [J].
Abdel-Basset, Mohamed ;
El-Shahat, Doaa ;
El-henawy, Ibrahim ;
de Albuquerque, Victor Hugo C. ;
Mirjalili, Seyedali .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 139
[3]  
Agustian F., 2020, P 8 INT C CYB IT SER
[4]   A Comparative Analysis of Breast Cancer Detection and Diagnosis Using Data Visualization and Machine Learning Applications [J].
Ak, Muhammet Fatih .
HEALTHCARE, 2020, 8 (02)
[5]  
Algherairy A., 2022, P 7 INT C DATA SCI M
[6]   A Hybrid Machine Learning Approach to Screen Optimal Predictors for the Classification of Primary Breast Tumors from Gene Expression Microarray Data [J].
Alromema, Nashwan ;
Syed, Asif Hassan ;
Khan, Tabrej .
DIAGNOSTICS, 2023, 13 (04)
[7]   Interpreting Deep Machine Learning Models: An Easy Guide for Oncologists [J].
Amorim, Jose P. ;
Abreu, Pedro H. ;
Fernandez, Alberto ;
Reyes, Mauricio ;
Santos, Joao ;
Abreu, Miguel H. .
IEEE REVIEWS IN BIOMEDICAL ENGINEERING, 2023, 16 :192-207
[8]   Tree-Based and Machine Learning Algorithm Analysis for Breast Cancer Classification [J].
Bhardwaj, Arpit ;
Bhardwaj, Harshit ;
Sakalle, Aditi ;
Uddin, Ziya ;
Sakalle, Maneesha ;
Ibrahim, Wubshet .
COMPUTATIONAL INTELLIGENCE AND NEUROSCIENCE, 2022, 2022
[9]  
Chaurasia V, 2020, SN Computer Science, V1, DOI [10.1007/s42979-020-00296-8, DOI 10.1007/S42979-020-00296-8, 10.1007/s42979-020-00296-8]
[10]   A New Density Peak Clustering Algorithm With Adaptive Clustering Center Based on Differential Privacy [J].
Chen, Hua ;
Zhou, Yuan ;
Mei, Kehui ;
Wang, Nan ;
Cai, Guangxing .
IEEE ACCESS, 2023, 11 :1418-1431