A Method for Duplicate Record Detection Based on Decision Tree

被引:0
|
作者
Lin, Guangyan [1 ]
Qian, Yuxiang [1 ]
Zhang, Yiqiong [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
来源
2016 3RD INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (PES 2016) | 2016年 / 4卷
关键词
Duplicate Detection; Decision Tree; Data Cleaning; Attribute Similarity; LINKAGE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Duplicates is a common problem that widely bothers information systems. When computing similarity of two records, it will be time consuming and complex if comparing attributes one by one. This paper proposed a duplicate detection method based on decision tree. A conclusion of attribute similarity algorithms for common data types was made first. Based on the above, through mapping attribute similarity to decision tree nodes, that whether two records are duplicates or not can be determined in advance without computing entire attributes. At the same time of ensuring precision, the time complexity can be reduced significantly. The precision of experiments achieve above 98% and the F score 97%.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [31] A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation
    Guo, Lejiang
    Wang, Wei
    Chen, Fangxin
    Tang, Xiao
    Wang, Weijiang
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (1B): : 26 - 30
  • [32] A Defect Detection Technology Based on Software Behavior Decision Tree
    Chen, Xiangzhou
    Ding, Huixia
    Fang, Shuai
    Li, Zhe
    He, Xiao
    2017 INTERNATIONAL CONFERENCE ON COMPUTER SYSTEMS, ELECTRONICS AND CONTROL (ICCSEC), 2017, : 717 - 724
  • [33] Decision Tree based AIS strategy for Intrusion Detection in MANET
    Jim, Lincy Elizebeth
    Chacko, Jim
    PROCEEDINGS OF THE 2019 IEEE REGION 10 CONFERENCE (TENCON 2019): TECHNOLOGY, KNOWLEDGE, AND SOCIETY, 2019, : 1191 - 1195
  • [34] Evolutionary Decision Tree-Based Intrusion Detection System
    Azad, Chandrashekhar
    Mehta, Ashok Kumar
    Jha, Vijay Kumar
    PROCEEDINGS OF THE THIRD INTERNATIONAL CONFERENCE ON MICROELECTRONICS, COMPUTING AND COMMUNICATION SYSTEMS, MCCS 2018, 2019, 556 : 271 - 282
  • [35] A life-threatening arrhythmia detection method based on pulse rate variability analysis and decision tree
    Chou, Lijuan
    Liu, Jicheng
    Gong, Shengrong
    Chou, Yongxin
    FRONTIERS IN PHYSIOLOGY, 2022, 13
  • [36] Decision Tree based Support Vector Machine for Intrusion Detection
    Mulay, Snehal A.
    Devale, P. R.
    Garje, G. V.
    2010 INTERNATIONAL CONFERENCE ON NETWORKING AND INFORMATION TECHNOLOGY (ICNIT 2010), 2010, : 59 - 63
  • [37] A decision tree based decomposition method for oil refinery scheduling
    Xiaoyong Gao
    Dexian Huang
    Yongheng Jiang
    Tao Chen
    Chinese Journal of Chemical Engineering, 2018, 26 (08) : 1605 - 1612
  • [38] Optimizing Method for Mobile Agents Migration Based on Decision Tree
    Wang, Ke-gang
    2012 THIRD INTERNATIONAL CONFERENCE ON TELECOMMUNICATION AND INFORMATION (TEIN 2012), 2012, : 88 - 92
  • [39] Radar emitter recognition method based on AdaBoost and decision tree
    Tang Xiaojing
    Chen Weigao
    Zhu Weigang
    PROCEEDINGS OF THE 2017 2ND INTERNATIONAL CONFERENCE ON AUTOMATION, MECHANICAL CONTROL AND COMPUTATIONAL ENGINEERING (AMCCE 2017), 2017, 118 : 326 - 330
  • [40] A money laundering risk evaluation method based on decision tree
    Wang, Su-Nan
    Yang, Jian-Gang
    PROCEEDINGS OF 2007 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2007, : 283 - +