A Method for Duplicate Record Detection Based on Decision Tree

被引:0
|
作者
Lin, Guangyan [1 ]
Qian, Yuxiang [1 ]
Zhang, Yiqiong [1 ]
机构
[1] Beihang Univ, Sch Software, Beijing, Peoples R China
来源
2016 3RD INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (PES 2016) | 2016年 / 4卷
关键词
Duplicate Detection; Decision Tree; Data Cleaning; Attribute Similarity; LINKAGE;
D O I
暂无
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Duplicates is a common problem that widely bothers information systems. When computing similarity of two records, it will be time consuming and complex if comparing attributes one by one. This paper proposed a duplicate detection method based on decision tree. A conclusion of attribute similarity algorithms for common data types was made first. Based on the above, through mapping attribute similarity to decision tree nodes, that whether two records are duplicates or not can be determined in advance without computing entire attributes. At the same time of ensuring precision, the time complexity can be reduced significantly. The precision of experiments achieve above 98% and the F score 97%.
引用
收藏
页码:146 / 150
页数:5
相关论文
共 50 条
  • [21] The Method to Determine Bibliographic Types Based on Decision Tree
    Geng, Si
    Li, Ning
    Zhao, Lin
    Tian, Ying'ai
    PROCEEDINGS OF THE 2016 4TH INTERNATIONAL CONFERENCE ON ELECTRICAL & ELECTRONICS ENGINEERING AND COMPUTER SCIENCE (ICEEECS 2016), 2016, 50 : 755 - 761
  • [22] Method of Web Information Extraction Based on Decision Tree
    Chen Hong-ye
    2009 INTERNATIONAL FORUM ON INFORMATION TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 664 - 666
  • [23] Progressive Duplicate Detection
    Papenbrock, Thorsten
    Heise, Arvid
    Naumann, Felix
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1316 - 1329
  • [25] An Efficient Duplicate Detection System for XML Documents
    Lwin, Thandar
    Nyunt, Thi Thi Soe
    2010 SECOND INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND APPLICATIONS: ICCEA 2010, PROCEEDINGS, VOL 2, 2010, : 178 - 182
  • [26] EEG feature selection method based on decision tree
    Duan, Lijuan
    Ge, Hui
    Ma, Wei
    Miao, Jun
    BIO-MEDICAL MATERIALS AND ENGINEERING, 2015, 26 : S1019 - S1025
  • [27] The Method of Vehicle Ontology Building Based on Decision Tree
    Ma, Bingxian
    Wang, Aixia
    Qu, Shouning
    2009 IITA INTERNATIONAL CONFERENCE ON SERVICES SCIENCE, MANAGEMENT AND ENGINEERING, PROCEEDINGS, 2009, : 503 - 506
  • [28] An Evaluation Method of ATR Algorithm Based on Decision Tree
    Zhang, Yifei
    Zhou, Bin
    Dou, Hao
    Ming, Delie
    EIGHTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING (ICDIP 2016), 2016, 10033
  • [29] Early Detection of Ball Bearing Faults Using the Decision Tree Method
    Istanto, Iwan
    Sulaiman, Robi
    Wijaya, Rio Natanael
    Suhendro, Budi
    Arifianto, Rokhmat
    Slamet
    EMITTER-INTERNATIONAL JOURNAL OF ENGINEERING TECHNOLOGY, 2024, 12 (02) : 150 - 166
  • [30] Fall Detection Algorithm based on Gradient Boosting Decision Tree
    Ning, Yunkun
    Zhang, Sheng
    Nie, Xiaofen
    Li, Guanglin
    Zhao, Guoru
    CONFERENCE PROCEEDINGS OF 2019 IEEE INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING, COMMUNICATIONS AND COMPUTING (IEEE ICSPCC 2019), 2019,