Cardiovascular Risk Prediction Method Based on CFS Subset Evaluation and Random Forest Classification Framework

被引:0
作者
Xu, Shan [1 ]
Zhang, Zhen [2 ]
Wang, Daoxian [2 ]
Hu, Junfeng [2 ]
Duan, Xiaohui [2 ]
Zhu, Tiangang [3 ]
机构
[1] China Acad Informat Commun Technol, Beijing, Peoples R China
[2] Peking Univ, Sch Elect Engn & Comp Sci, Beijing, Peoples R China
[3] Peking Univ, Peoples Hosp, Beijing, Peoples R China
来源
2017 IEEE 2ND INTERNATIONAL CONFERENCE ON BIG DATA ANALYSIS (ICBDA) | 2017年
关键词
Cardiovascular disease (CVD); risk prediction; data mining; feature selection; random forest; HEART-DISEASE; SYSTEM;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Cardiovascular Disease (CVD) is a highly significant contributor to loss of quality and quantity of life all over the world. Early detection and risk prediction is very important for patients' treatment and doctors' diagnose. This paper focus on establishing a more accurate and practical risk prediction system based on data mining techniques to provide auxiliary medical service. In order to be practically used for collecting and analyzing patients' data in healthcare industries, the system consists of four parts: data interface, data preparation, feature selection and classification. Data interface response to obtain hospitals' raw data from hospital; data preprocessing is needed for data integration, data cleaning and rating mapping etc. Key features were then selected by CFS Subset Evaluation combined with Best-First-Search method to reduce dimensionality. Random forest was inducted as basic classifier to identify risk level, which is a prior trial in CVD risk prediction field. Cleveland Heart-Disease Database (CHDD) and Cardiology inpatient dataset of PKU People's Hospital were both tested to confirm accuracy as well as practicality. In CHDD test, our system has a significantly higher accuracy of 91.6% than other methods. In People's Hospital dataset test, it achieves an accuracy of 97%, which is better than most of other classifiers except SVM (98.9%), however random forest only take half of time than SVM. Comprehensively considering the risk prediction system shows great significance in accuracy and practical use for patients' treatment and doctors' diagnose.
引用
收藏
页码:233 / 237
页数:5
相关论文
共 9 条
[1]  
Chen Weiwei, 2015, CHINESE CIRCULATION
[2]   Effective diagnosis of heart disease through neural networks ensembles [J].
Das, Resul ;
Turkoglu, Ibrahim ;
Sengur, Abdulkadir .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7675-7680
[3]   AptaCDSS-E: A classifier ensemble-based clinical decision support system for cardiovascular disease level prediction [J].
Eom, Jae-Hong ;
Kim, Sung-Chun ;
Zhang, Byoung-Tak .
EXPERT SYSTEMS WITH APPLICATIONS, 2008, 34 (04) :2465-2479
[4]   DIABETES AND CARDIOVASCULAR-DISEASE - FRAMINGHAM-STUDY [J].
KANNEL, WB ;
MCGEE, DL .
JAMA-JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION, 1979, 241 (19) :2035-2038
[5]   Prediction and Diagnosis of Cardio Vascular Disease - A Critical Survey [J].
Mohan, K. Raj ;
Paramasivam, Ilango ;
SathyaNarayan, Subhashini .
2014 WORLD CONGRESS ON COMPUTING AND COMMUNICATION TECHNOLOGIES (WCCCT 2014), 2014, :246-+
[6]   Intelligent heart disease prediction system using data mining techniques [J].
Palaniappan, Sellappan ;
Awang, Raflah .
2008 IEEE/ACS INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS AND APPLICATIONS, VOLS 1-3, 2008, :108-115
[7]  
Parthiban L., 2008, INT J BIOL LIFE SCI, V3, P157, DOI DOI 10.1109/IAMA.2009.5228016
[8]  
Shouman M., 2012, Proceedings 2012 Japan-Egypt Conference on Electronics, Communications and Computers (JEC-ECC), P173, DOI 10.1109/JEC-ECC.2012.6186978
[9]  
Xu S, 2016, IEEE INT C BIG DAT A