A Hybrid Outlier Detection Method for Health Care Big Data

被引:12
作者
Yan, Ke [1 ]
You, Xiaoming [2 ,3 ]
Ji, Xiaobo [1 ]
Yin, Guangqiang [4 ]
Yang, Fan [1 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[2] 32 Inst China Elect Technol Grp Corp, Shanghai, Peoples R China
[3] Tongji Univ, Sch Software Engn, Shanghai, Peoples R China
[4] Univ Elect Sci & Technol China, Sch Elect Engn, Chengdu, Peoples R China
[5] Chengdu Community Univ, Chengdu, Peoples R China
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016) | 2016年
关键词
K-Nearest Neighbor; pruning; health care; outlier detection; attribute overlapping rate; case classification quality character; big data; ALGORITHM;
D O I
10.1109/BDCloud-SocialCom-SustainCom.2016.34
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Technology advancements in health care informatics, digitalizing health records, and telemedicine has resulted in rapid growth of health care data. One challenge is how to effectively discover useful and important information out of such massive amount of data through techniques such as data mining. Outlier detection is a typical technique used in many fields to analyze big data. However, for the large scale and high dimensional heath care data, the conventional outlier detection methods are not efficient. This paper proposes a novel hybrid outlier detection method, namely, Pruning-based K-Nearest Neighbor (PB-KNN), which integrates the density-based, cluster based methods and KNN algorithm to conduct effective outlier detection. The proposed PB-KNN adopts the case classification quality character (CCQC) as the medical quality evaluation model and uses the attribute overlapping rate (AOR) algorithm for data classification and dimensionality reduction. To evaluate the performance of the pruning operations in PB-KNN, we conduct extensive experiments. The experiment results show that the PB-KNN method outperforms the k-nearest neighbor (KNN) and local outlier factor (LOF) in terms of the accuracy and efficiency.
引用
收藏
页码:157 / 162
页数:6
相关论文
共 18 条