Duplicate Data Detection Using GNN

被引:0
|
作者
Lu, Hanrong [1 ]
Chen, Xin [1 ]
Lan, Xuhui [1 ]
Zheng, Feng [1 ]
机构
[1] Air Force Early Warning Acad, Dept Early Warning Intelligence, Wuhan, Peoples R China
来源
PROCEEDINGS OF 2016 IEEE INTERNATIONAL CONFERENCE ON CLOUD COMPUTING AND BIG DATA ANALYSIS (ICCCBDA 2016) | 2016年
关键词
record detection; data cleaning; neural network; genetic algorithm; GNN; NETWORK; PERFORMANCE;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
In some applications like data warehousing, data mining or information integration data must be cleaned as a preprocessing step to ensure the quality of data and the performance of applications. An essential work in data cleaning is duplicate record detection. Existing detection methods apply to different data models and record types. Certain difficulties with those studies are still to be overcome. This paper proposes a genetic neural network based approach to duplicate record detection. The topology and weight vector of a neural network are firstly optimized by a genetic algorithm for the given data set before it is used to perform the detection. The method can enhance the detection accuracy and alleviate the many problems with previous works.
引用
收藏
页码:167 / 170
页数:4
相关论文
共 50 条
  • [21] Duplicate detection in adverse drug reaction surveillance
    Noren, G. Niklas
    Orre, Roland
    Bate, Andrew
    Edwards, I. Ralph
    DATA MINING AND KNOWLEDGE DISCOVERY, 2007, 14 (03) : 305 - 328
  • [22] A text and GNN based controversy detection method on social media
    Benslimane, Samy
    Aze, Jerome
    Bringay, Sandra
    Servajean, Maximilien
    Mollevi, Caroline
    WORLD WIDE WEB-INTERNET AND WEB INFORMATION SYSTEMS, 2023, 26 (02): : 799 - 825
  • [23] IRGraphSeg: Infrared Small Target Detection Based on Hierarchical GNN
    Jia, Guimin
    Cheng, Yu
    Chen, Tao
    IEEE GEOSCIENCE AND REMOTE SENSING LETTERS, 2024, 21 : 1 - 5
  • [24] A Novel GNN Model for Fraud Detection in Online Trading Activities
    Long, Jing
    Fang, Fei
    Luo, Haibo
    ALGORITHMS AND ARCHITECTURES FOR PARALLEL PROCESSING, ICA3PP 2021, PT II, 2022, 13156 : 603 - 614
  • [25] Accelerating Duplicate Data Chunk Recognition Using NN Trained by Locality-Sensitive Hash
    Berman, Amit
    Birk, Yitzhak
    Mendelson, Avi
    2014 IEEE 28TH CONVENTION OF ELECTRICAL & ELECTRONICS ENGINEERS IN ISRAEL (IEEEI), 2014,
  • [26] Duplicate Literature Detection for Cross-Library Search
    Liu, Wei
    Zeng, Jianxun
    CYBERNETICS AND INFORMATION TECHNOLOGIES, 2016, 16 (02) : 160 - 178
  • [27] AdaSG: A Lightweight Feature Point Matching Method Using Adaptive Descriptor with GNN for VSLAM
    Liu, Ye
    Huang, Kun
    Li, Jingyuan
    Li, Xiangting
    Zeng, Zeng
    Chang, Liang
    Zhou, Jun
    SENSORS, 2022, 22 (16)
  • [28] Duplicate Bug Report Detection: How Far Are We?
    Zhang, Ting
    Han, Donggyun
    Vinayakarao, Venkatesh
    Irsan, Ivana Clairine
    Xu, Bowen
    Thung, Ferdian
    Lo, David
    Jiang, Lingxiao
    ACM TRANSACTIONS ON SOFTWARE ENGINEERING AND METHODOLOGY, 2023, 32 (04)
  • [29] DCGG: drug combination prediction using GNN and GAE
    Ziaee, S. Sina
    Rahmani, Hossein
    Tabatabaei, Mina
    Vlot, Anna H. C.
    Bender, Andreas
    PROGRESS IN ARTIFICIAL INTELLIGENCE, 2024, 13 (01) : 17 - 30
  • [30] Efficient Similarity Joins for Near-Duplicate Detection
    Xiao, Chuan
    Wang, Wei
    Lin, Xuemin
    Yu, Jeffrey Xu
    Wang, Guoren
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2011, 36 (03):