Duplicate Data Detection Using GNN

被引:0
|
作者
Lu, Hanrong [1 ]
Chen, Xin [1 ]
Lan, Xuhui [1 ]
Zheng, Feng [1 ]
机构
[1] Air Force Early Warning Acad, Dept Early Warning Intelligence, Wuhan, Peoples R China
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016) | 2016年
关键词
record detection; data cleaning; neural network; genetic algorithm; GNN; NETWORK; PERFORMANCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In some applications like data warehousing, data mining or information integration data must be cleaned as a preprocessing step to ensure the quality of data and the performance of applications. An essential work in data cleaning is duplicate record detection. Existing detection methods apply to different data models and record types. Certain difficulties with those studies are still to be overcome. This paper proposes a genetic neural network based approach to duplicate record detection. The topology and weight vector of a neural network are firstly optimized by a genetic algorithm for the given data set before it is used to perform the detection. The method can enhance the detection accuracy and alleviate the many problems with previous works.
引用
收藏
页码:167 / 170
页数:4
相关论文
共 50 条
  • [31] A Method for Duplicate Record Detection Based on Decision Tree
    Lin, Guangyan
    Qian, Yuxiang
    Zhang, Yiqiong
    2016 3RD INTERNATIONAL CONFERENCE ON POWER AND ENERGY SYSTEMS (PES 2016), 2016, 4 : 146 - 150
  • [32] GNN-based Advanced Feature Integration for ICS Anomaly Detection
    Shuaiyi, L. U.
    Wang, Kai
    Wei, Yuliang
    Liu, Hongri
    Fan, Qilin
    Wang, Bailing
    ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2023, 14 (06)
  • [33] Construction of the Optimized Production Performance Detection Model Using Data Mining
    Pan, Wen-Tsao
    Su, Sheng-Chu
    PROCEEDINGS OF THE 2014 9TH IEEE CONFERENCE ON INDUSTRIAL ELECTRONICS AND APPLICATIONS (ICIEA), 2014, : 1971 - 1974
  • [34] Damage detection of multistory shear buildings using partial modal data
    Shah, Ankur
    Vesmawala, Gaurang
    Meruane, V
    EARTHQUAKES AND STRUCTURES, 2022, 23 (01) : 1 - 11
  • [35] Cluster-based Sorted Neighborhood for Efficient Duplicate Detection
    Samiei, Ahmad
    Naumann, Felix
    2016 IEEE 16TH INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOPS (ICDMW), 2016, : 202 - 209
  • [36] Heterogeneous GNN with Express Edges for Intrusion Detection in Cyber-Physical Systems
    Li, Hongwei
    Chasaki, Danai
    2024 INTERNATIONAL CONFERENCE ON COMPUTING, NETWORKING AND COMMUNICATIONS, ICNC, 2024, : 523 - 529
  • [37] A knowledge-based approach for duplicate elimination in data cleaning
    Low, WL
    Lee, ML
    Ling, TW
    INFORMATION SYSTEMS, 2001, 26 (08) : 585 - 606
  • [38] An analysis of object designation performance using GNN and GNP correlation
    Levedahl, M
    SIGNAL AND DATA PROCESSING OF SMALL TARGETS 2004, 2004, 5428 : 441 - 451
  • [39] MIP-GNN: A Data-Driven Framework for Guiding Combinatorial Solvers
    Khalil, Elias B.
    Morris, Christopher
    Lodi, Andrea
    THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE, 2022, : 10219 - 10227
  • [40] Processing the 3D Heritage Data Samples Based on Combination of GNN and GAN
    Lam Duc Vu Nguyen
    Sinh Van Nguyen
    Son Thanh Le
    Minh Khai Tran
    Maleszka, Marcin
    ADVANCES IN COMPUTATIONAL COLLECTIVE INTELLIGENCE, ICCCI 2024, PART I, 2024, 2165 : 295 - 307