Duplicate Data Detection Using GNN

被引:0
|
作者
Lu, Hanrong [1 ]
Chen, Xin [1 ]
Lan, Xuhui [1 ]
Zheng, Feng [1 ]
机构
[1] Air Force Early Warning Acad, Dept Early Warning Intelligence, Wuhan, Peoples R China
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016) | 2016年
关键词
record detection; data cleaning; neural network; genetic algorithm; GNN; NETWORK; PERFORMANCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In some applications like data warehousing, data mining or information integration data must be cleaned as a preprocessing step to ensure the quality of data and the performance of applications. An essential work in data cleaning is duplicate record detection. Existing detection methods apply to different data models and record types. Certain difficulties with those studies are still to be overcome. This paper proposes a genetic neural network based approach to duplicate record detection. The topology and weight vector of a neural network are firstly optimized by a genetic algorithm for the given data set before it is used to perform the detection. The method can enhance the detection accuracy and alleviate the many problems with previous works.
引用
收藏
页码:167 / 170
页数:4
相关论文
共 50 条
  • [1] Data Duplicate Detection
    Medidar, Nikita
    Chavan, Manik
    2018 9TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2018,
  • [2] Duplicate Detection Exploiting Data Relationships
    Herschel, Melanie
    IT-INFORMATION TECHNOLOGY, 2009, 51 (04): : 231 - 234
  • [3] Efficient and Effective Duplicate Detection in Hierarchical Data
    Leitao, Luis
    Calado, Pavel
    Herschel, Melanie
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (05) : 1028 - 1041
  • [4] A Survey On Duplicate Record Detection In Real World Data
    Dhivyabharathi, G., V
    Kumaresan, S.
    2016 3RD INTERNATIONAL CONFERENCE ON ADVANCED COMPUTING AND COMMUNICATION SYSTEMS (ICACCS), 2016,
  • [5] A GNN-Based False Data Detection Scheme for Smart Grids
    Qiu, Junhong
    Zhang, Xinxin
    Wang, Tao
    Hou, Huiying
    Wang, Siyuan
    Yang, Tiejun
    ALGORITHMS, 2025, 18 (03)
  • [6] Progressive Duplicate Detection
    Papenbrock, Thorsten
    Heise, Arvid
    Naumann, Felix
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2015, 27 (05) : 1316 - 1329
  • [7] A Similar Duplicate Data Detection Method Based on Fuzzy Clustering for Topology Formation
    Guo, Lejiang
    Wang, Wei
    Chen, Fangxin
    Tang, Xiao
    Wang, Weijiang
    PRZEGLAD ELEKTROTECHNICZNY, 2012, 88 (1B): : 26 - 30
  • [8] Duplicate record detection: A survey
    Elmagarmid, Ahmed K.
    Ipeirotis, Panagiotis G.
    Verykios, Vassilios S.
    IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2007, 19 (01) : 1 - 16
  • [9] MPI Errors Detection using GNN Embedding and Vector Embedding over LLVM IR
    El Karchi, Jad
    Chen, Hanze
    TehraniJamsaz, Ali
    Jannesari, Ali
    Popov, Mihail
    Saillard, Emmanuelle
    PROCEEDINGS 2024 IEEE INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM, IPDPS 2024, 2024, : 595 - 607
  • [10] Elaborated Framework for Duplicate Device Detection from Multisourced Mobile Device Location Data
    Kabiri, Aliakbar
    Darzi, Aref
    Pan, Yixuan
    Namadi, Saeed Saleh
    Zhao, Guangchen
    Sun, Qianqian
    Yang, Mofeng
    Ashoori, Mohammad
    TRANSPORTATION RESEARCH RECORD, 2024, 2678 (06) : 881 - 890