A Method for Duplicate Record Detection Based on Decision Tree

被引:0
|
作者
Lin, Guangyan [1 ]
Qian, Yuxiang [1 ]
Zhang, Yiqiong [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
来源
2016 3RD INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (PES 2016) | 2016年 / 4卷
关键词
Duplicate Detection; Decision Tree; Data Cleaning; Attribute Similarity; LINKAGE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Duplicates is a common problem that widely bothers information systems. When computing similarity of two records, it will be time consuming and complex if comparing attributes one by one. This paper proposed a duplicate detection method based on decision tree. A conclusion of attribute similarity algorithms for common data types was made first. Based on the above, through mapping attribute similarity to decision tree nodes, that whether two records are duplicates or not can be determined in advance without computing entire attributes. At the same time of ensuring precision, the time complexity can be reduced significantly. The precision of experiments achieve above 98% and the F score 97%.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [41] A quantitative method for pulse strength classification based on decision tree
    Wang, Huiyan
    Zhang, Peiyong
    Journal of Software, 2009, 4 (04) : 323 - 330
  • [42] A decision tree based decomposition method for oil refinery scheduling
    Gao, Xiaoyong
    Huang, Dexian
    Jiang, Yongheng
    Chen, Tao
    CHINESE JOURNAL OF CHEMICAL ENGINEERING, 2018, 26 (08) : 1605 - 1612
  • [43] A decision tree based method for fault classification in transmission lines
    Shahrtash, S. M.
    Jamehbozorg, A.
    2008 IEEE/PES TRANSMISSION & DISTRIBUTION CONFERENCE & EXPOSITION, VOLS 1-3, 2008, : 1039 - 1043
  • [44] Method of Bearing Fault Identification Based On SVM Decision Tree
    Cheng Hang
    Li Xi
    Qin Zheng-bo
    Huang Chao-young
    DIGITAL MANUFACTURING & AUTOMATION III, PTS 1 AND 2, 2012, 190-191 : 1010 - 1015
  • [45] A Quantitative Method for Pulse Strength Classification Based on Decision Tree
    Wang, Huiyan
    ISISE 2008: INTERNATIONAL SYMPOSIUM ON INFORMATION SCIENCE AND ENGINEERING, VOL 2, 2008, : 111 - 115
  • [46] An LSH-Based Model-Words-Driven Product Duplicate Detection Method
    Hartveld, Aron
    van Keulen, Max
    Mathol, Diederik
    van Noort, Thomas
    Plaatsman, Thomas
    Frasincar, Flavius
    Schouten, Kim
    ADVANCED INFORMATION SYSTEMS ENGINEERING, CAISE 2018, 2018, 10816 : 409 - 423
  • [47] Intrusion Detection with Neural Networks Based on Knowledge Extraction by Decision Tree
    Guevara, Cesar
    Santos, Matilde
    Lopez, Victoria
    INTERNATIONAL JOINT CONFERENCE SOCO'16- CISIS'16-ICEUTE'16, 2017, 527 : 508 - 517
  • [48] Decision Tree Applied in Web-based Intrusion Detection System
    Wei, Mingjun
    Liu, Yufang
    Chen, Xuebin
    Li, Jianmin
    SECOND INTERNATIONAL CONFERENCE ON FUTURE NETWORKS: ICFN 2010, 2010, : 110 - 113
  • [49] High Impedance Fault Detection Based on HSTransfo and Decision Tree Techniques
    Nakho, A.
    Moloi, K.
    Hamam, Y.
    2021 SOUTHERN AFRICAN UNIVERSITIES POWER ENGINEERING CONFERENCE/ROBOTICS AND MECHATRONICS/PATTERN RECOGNITION ASSOCIATION OF SOUTH AFRICA (SAUPEC/ROBMECH/PRASA), 2021,
  • [50] Fault Detection for Automatic Guided Vehicles Based on Decision Tree and LSTM
    Ding, Xiaohu
    Zhang, Dongdong
    Zhang, Liangang
    Zhang, Lei
    Zhang, Changjiang
    Xu, Bin
    2021 5TH INTERNATIONAL CONFERENCE ON SYSTEM RELIABILITY AND SAFETY (ICSRS 2021), 2021, : 42 - 46