A Method for Duplicate Record Detection Based on Decision Tree

被引:0
|
作者
Lin, Guangyan [1 ]
Qian, Yuxiang [1 ]
Zhang, Yiqiong [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
来源
2016 3RD INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (PES 2016) | 2016年 / 4卷
关键词
Duplicate Detection; Decision Tree; Data Cleaning; Attribute Similarity; LINKAGE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Duplicates is a common problem that widely bothers information systems. When computing similarity of two records, it will be time consuming and complex if comparing attributes one by one. This paper proposed a duplicate detection method based on decision tree. A conclusion of attribute similarity algorithms for common data types was made first. Based on the above, through mapping attribute similarity to decision tree nodes, that whether two records are duplicates or not can be determined in advance without computing entire attributes. At the same time of ensuring precision, the time complexity can be reduced significantly. The precision of experiments achieve above 98% and the F score 97%.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [1] Duplicate record detection: A survey
    Elmagarmid, Ahmed K.
    Ipeirotis, Panagiotis G.
    Verykios, Vassilios S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
  • [2] Efficient Duplicate Record Detection Based on Similarity Estimation
    Li, Mohan
    Wang, Hongzhi
    Li, Jianzhong
    Gao, Hong
    WEB-AGE INFORMATION MANAGEMENT, PROCEEDINGS, 2010, 6184 : 595 - 607
  • [3] A Survey On Duplicate Record Detection In Real World Data
    Dhivyabharathi, G., V
    Kumaresan, S.
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,
  • [4] Performance Analysis of Duplicate Record Detection Techniques
    Adil, Syed Hasan
    Ebrahim, Mansoor
    Ali, Syed Saad Azhar
    Raza, Kamran
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2019, 9 (05) : 4755 - 4758
  • [5] Compound record clustering algorithm for design pattern detection by decision tree learning
    Dong, Jing
    Sun, Yongtao
    Zhao, Yajing
    PROCEEDINGS OF THE 2008 IEEE INTERNATIONAL CONFERENCE ON INFORMATION REUSE AND INTEGRATION, 2008, : 226 - +
  • [6] Detection of Finger Flexions Based on Decision Tree
    Prilepok, Michal
    Jahan, Ibrahim Salem
    Snasel, Vaclav
    PROCEEDINGS OF THE THIRD INTERNATIONAL AFRO-EUROPEAN CONFERENCE FOR INDUSTRIAL ADVANCEMENT-AECIA 2016, 2018, 565 : 57 - 67
  • [7] Web-based Arabic/English Duplicate Record Detection with Nested Blocking Technique
    Higazy, Azza
    El Tobely, Tarek
    Yousef, Ahmed H.
    Sarhan, Amany
    2013 8TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING & SYSTEMS (ICCES), 2013, : 313 - 318
  • [8] The Method of Attribute Reduction Based on Decision Tree
    Li, Fachao
    Yin, Hongze
    Guan, Fei
    FRONTIERS OF MANUFACTURING SCIENCE AND MEASURING TECHNOLOGY, PTS 1-3, 2011, 230-232 : 1303 - 1307
  • [9] Decision Tree-Based Anonymized Electronic Health Record Fusion for Public Health Informatics
    Khalique, Fatima
    Khan, Shoab Ahmed
    Mubarak, Qurat-ul-ain
    Safdar, Hasan
    INTELLIGENT COMPUTING, VOL 1, 2019, 858 : 404 - 414
  • [10] Hybrid islanding detection method based on decision tree and positive feedback for distributed generations
    Zhou, Bin
    Cao, Chi
    Li, Canbing
    Cao, Yijia
    Chen, Chen
    Li, Yong
    Zeng, Long
    IET GENERATION TRANSMISSION & DISTRIBUTION, 2015, 9 (14) : 1819 - 1825