A Hybrid Outlier Detection Method for Health Care Big Data

被引:12
作者
Yan, Ke [1 ]
You, Xiaoming [2 ,3 ]
Ji, Xiaobo [1 ]
Yin, Guangqiang [4 ]
Yang, Fan [1 ,5 ]
机构
[1] Univ Elect Sci & Technol China, Sch Comp Sci & Engn, Chengdu, Peoples R China
[2] 32 Inst China Elect Technol Grp Corp, Shanghai, Peoples R China
[3] Tongji Univ, Sch Software Engn, Shanghai, Peoples R China
[4] Univ Elect Sci & Technol China, Sch Elect Engn, Chengdu, Peoples R China
[5] Chengdu Community Univ, Chengdu, Peoples R China
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCES ON BIG DATA AND CLOUD COMPUTING (BDCLOUD 2016) SOCIAL COMPUTING AND NETWORKING (SOCIALCOM 2016) SUSTAINABLE COMPUTING AND COMMUNICATIONS (SUSTAINCOM 2016) (BDCLOUD-SOCIALCOM-SUSTAINCOM 2016) | 2016年
关键词
K-Nearest Neighbor; pruning; health care; outlier detection; attribute overlapping rate; case classification quality character; big data; ALGORITHM;
D O I
10.1109/BDCloud-SocialCom-SustainCom.2016.34
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Technology advancements in health care informatics, digitalizing health records, and telemedicine has resulted in rapid growth of health care data. One challenge is how to effectively discover useful and important information out of such massive amount of data through techniques such as data mining. Outlier detection is a typical technique used in many fields to analyze big data. However, for the large scale and high dimensional heath care data, the conventional outlier detection methods are not efficient. This paper proposes a novel hybrid outlier detection method, namely, Pruning-based K-Nearest Neighbor (PB-KNN), which integrates the density-based, cluster based methods and KNN algorithm to conduct effective outlier detection. The proposed PB-KNN adopts the case classification quality character (CCQC) as the medical quality evaluation model and uses the attribute overlapping rate (AOR) algorithm for data classification and dimensionality reduction. To evaluate the performance of the pruning operations in PB-KNN, we conduct extensive experiments. The experiment results show that the PB-KNN method outperforms the k-nearest neighbor (KNN) and local outlier factor (LOF) in terms of the accuracy and efficiency.
引用
收藏
页码:157 / 162
页数:6
相关论文
共 18 条
  • [11] Distance-based outliers: algorithms and applications
    Knorr, EM
    Ng, RT
    Tucakov, V
    [J]. VLDB JOURNAL, 2000, 8 (3-4) : 237 - 253
  • [12] Koh Hian Chye, 2005, J Healthc Inf Manag, V19, P64
  • [13] Larose DT., 2014, DISCOVERING KNOWLEDG, P149, DOI [DOI 10.1002/0471687545.CH5, 10.1002/9781118874059.CH7, DOI 10.1002/9781118874059.CH7]
  • [14] Efficient Processing of k Nearest Neighbor Joins using MapReduce
    Lu, Wei
    Shen, Yanyan
    Chen, Su
    Ooi, Beng Chin
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2012, 5 (10): : 1016 - 1027
  • [15] Raghupathi W, 2014, HEALTH INF SCI SYST, V2, DOI 10.1186/2047-2501-2-3
  • [16] Outlier detection in healthcare fraud: A case study in the Medicaid dental domain
    van Capelleveen, Guido
    Poel, Mannes
    Mueller, Roland M.
    Thornton, Dallas
    van Hillegersberg, Jos
    [J]. INTERNATIONAL JOURNAL OF ACCOUNTING INFORMATION SYSTEMS, 2016, 21 : 18 - 31
  • [17] Clustering Algorithm Based on Outlier Detection for Anomaly Intrusion Detection
    Yin, Shang-Nan
    Kang, Ho-Seok
    Kim, Sung-Ryul
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2016, 17 (02): : 291 - 299
  • [18] Zhang T., 1996, ACM SIGMOD Record, V25, P103, DOI 10.1145/235968.233324