An advanced machine learning method for simultaneous breast cancer risk prediction and risk ranking in Chinese population: A prospective cohort and modeling study

被引:1
作者
Liu, Liyuan [1 ,2 ]
He, Yong [2 ,3 ]
Kao, Chunyu [3 ]
Fan, Yeye [2 ]
Yang, Fu [3 ]
Wang, Fei [1 ,4 ]
Yu, Lixiang [1 ,4 ]
Zhou, Fei [1 ,4 ]
Xiang, Yujuan [1 ,4 ]
Huang, Shuya [1 ,4 ]
Zheng, Chao [1 ,4 ]
Cai, Han [1 ,4 ]
Bao, Heling [5 ]
Fang, Liwen [6 ]
Wang, Linhong [6 ]
Chen, Zengjing [2 ]
Yu, Zhigang [1 ,4 ]
机构
[1] Shandong Univ, Hosp 2, Cheeloo Coll Med, Dept Breast Surg, Jinan 250033, Shandong, Peoples R China
[2] Shandong Univ, Sch Math, Jinan 250100, Shandong, Peoples R China
[3] Shandong Univ, Zhongtai Secur Inst Financial Studies, Jinan 250100, Shandong, Peoples R China
[4] Shandong Univ, Inst Translat Med Breast Dis Prevent & Treatment, Jinan 250033, Shandong, Peoples R China
[5] Peking Univ, Sch Publ Hlth, Dept Maternal & Child Hlth, Beijing 100191, Peoples R China
[6] Chinese Ctr Dis Control & Prevent, Natl Ctr Chron & Noncommunicable Dis Control & Pre, Beijing 100050, Peoples R China
基金
中国博士后科学基金;
关键词
Breast cancer; Cancer prevention; Models; Women; Risk assessment; VARIABLE SELECTION; STATISTICS; VALIDATION; WOMEN; SNPS;
D O I
10.1097/CM9.0000000000002891
中图分类号
R5 [内科学];
学科分类号
1002 ; 100201 ;
摘要
Background:Breast cancer (BC) risk-stratification tools for Asian women that are highly accurate and can provide improved interpretation ability are lacking. We aimed to develop risk-stratification models to predict long- and short-term BC risk among Chinese women and to simultaneously rank potential non-experimental risk factors.Methods:The Breast Cancer Cohort Study in Chinese Women, a large ongoing prospective dynamic cohort study, includes 122,058 women aged 25-70 years old from the eastern part of China. We developed multiple machine-learning risk prediction models using parametric models (penalized logistic regression, bootstrap, and ensemble learning), which were the short-term ensemble penalized logistic regression (EPLR) risk prediction model and the ensemble penalized long-term (EPLT) risk prediction model to estimate BC risk. The models were assessed based on calibration and discrimination, and following this assessment, they were externally validated in new study participants from 2017 to 2020.Results:The AUC values of the short-term EPLR risk prediction model were 0.800 for the internal validation and 0.751 for the external validation set. For the long-term EPLT risk prediction model, the area under the receiver operating characteristic curve was 0.692 and 0.760 in internal and external validations, respectively. The net reclassification improvement index of the EPLT relative to the Gail and the Han Chinese Breast Cancer Prediction Model (HCBCP) models for external validation was 0.193 and 0.233, respectively, indicating that the EPLT model has higher classification accuracy.Conclusions:We developed the EPLR and EPLT models to screen populations with a high risk of developing BC. These can serve as useful tools to aid in risk-stratified screening and BC prevention.
引用
收藏
页码:2084 / 2091
页数:8
相关论文
共 42 条
[1]  
Bao H L, 2020, Zhonghua Liu Xing Bing Xue Za Zhi, V41, P2040, DOI 10.3760/cma.j.cn112338-20200507-00695
[2]   Applications of Machine Learning Predictive Models in the Chronic Disease Diagnosis [J].
Battineni, Gopi ;
Sagaro, Getu Gamo ;
Chinatalapudi, Nalini ;
Amenta, Francesco .
JOURNAL OF PERSONALIZED MEDICINE, 2020, 10 (02)
[3]   Bagging predictors [J].
Breiman, L .
MACHINE LEARNING, 1996, 24 (02) :123-140
[4]   Distribution of breast cancer risk from SNPs and classical risk factors in women of routine screening age in the UK [J].
Brentnall, A. R. ;
Evans, D. G. ;
Cuzick, J. .
BRITISH JOURNAL OF CANCER, 2014, 110 (03) :827-828
[5]  
[曹毛毛 Cao Maomao], 2019, [中国肿瘤临床, Chinese Journal of Clinical Oncology], V46, P145
[6]   Use and misuse of the receiver operating characteristic curve in risk prediction [J].
Cook, Nancy R. .
CIRCULATION, 2007, 115 (07) :928-935
[7]   Validation studies for models projecting the risk of invasive and total breast cancer incidence [J].
Costantino, JP ;
Gail, MH ;
Pee, D ;
Anderson, S ;
Redmond, CK ;
Benichou, J ;
Wieand, HS .
JOURNAL OF THE NATIONAL CANCER INSTITUTE, 1999, 91 (18) :1541-1548
[8]  
Cruz JA, 2006, CANCER INFORM, V2, P59
[9]   Breast cancer risk assessment with five independent genetic variants and two risk factors in Chinese women [J].
Dai, Juncheng ;
Hu, Zhibin ;
Jiang, Yue ;
Shen, Hao ;
Dong, Jing ;
Ma, Hongxia ;
Shen, Hongbing .
BREAST CANCER RESEARCH, 2012, 14 (01)
[10]   Breast Cancer Risk Prediction Using Clinical Models and 77 Independent Risk-Associated SNPs for Women Aged Under 50 Years: Australian Breast Cancer Family Registry [J].
Dite, Gillian S. ;
MacInnis, Robert J. ;
Bickerstaffe, Adrian ;
Dowty, James G. ;
Allman, Richard ;
Apicella, Carmel ;
Milne, Roger L. ;
Tsimiklis, Helen ;
Phillips, Kelly-Anne ;
Giles, Graham G. ;
Terry, Mary Beth ;
Southey, Melissa C. ;
Hopper, John L. .
CANCER EPIDEMIOLOGY BIOMARKERS & PREVENTION, 2016, 25 (02) :359-365