PYRAMID: A HETEROGENEOUS DATA INTEGRATION ALGORITHM BASED ON HIERARCHICAL GRAPH

被引:0
作者
Jiang, Sining [1 ]
Lan, Yujun [1 ]
Wang, Weigang [1 ]
Guo, Zhongwen [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Engn, Qingdao, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
data integration; entity match; duplicate elimination; hierarchical graph;
D O I
10.1109/ICASSP48485.2024.10447879
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
引用
收藏
页码:6220 / 6224
页数:5
相关论文
共 50 条
[1]   A shortest path algorithm based on hierarchical graph model [J].
Wu, YM ;
Xu, JM ;
Hu, YC ;
Yang, QH .
2003 IEEE INTELLIGENT TRANSPORTATION SYSTEMS PROCEEDINGS, VOLS. 1 & 2, 2003, :1511-1514
[2]   Evaluation of Data Integration Plans based on Graph Data [J].
Vasiliev, Diana Anca ;
Ghiran, Ana-Maria ;
Buchmann, Robert Andrei .
KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 :1041-1050
[3]   An algorithm for drawing a hierarchical graph [J].
Eades, P ;
Lin, XM ;
Tamassia, R .
INTERNATIONAL JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS, 1996, 6 (02) :145-155
[4]   Heterogeneous data integration system based on ontology [J].
Jiang, Yong Liang ;
Zhang, Ya Min .
COMPUTING, CONTROL, INFORMATION AND EDUCATION ENGINEERING, 2015, :777-780
[5]   Use of Graph Database for the Integration of Heterogeneous Data about Ecuadorian Historical Personages [J].
Mosquera, Jean ;
Piedra, Nelson .
2018 7TH INTERNATIONAL CONFERENCE ON SOFTWARE PROCESS IMPROVEMENT (CIMPS): APPLICATIONS IN SOFTWARE ENGINEERING, 2018, :95-100
[6]   The Heterogeneous Data Integration Based on XML in Coal Enterprise [J].
Tian Feng ;
Han Xiao-bing ;
Wu Feng-bo .
ISCSCT 2008: INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE AND COMPUTATIONAL TECHNOLOGY, VOL 1, PROCEEDINGS, 2008, :438-441
[7]   Research and Implementation of Heterogeneous Data Integration Based on XML [J].
Tang, Hong-jie .
PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON ADVANCES IN MECHANICAL ENGINEERING AND INDUSTRIAL INFORMATICS, 2015, 15 :488-491
[8]   Data integration by fuzzy similarity-based hierarchical clustering [J].
Ciaramella, Angelo ;
Nardone, Davide ;
Staiano, Antonino .
BMC BIOINFORMATICS, 2020, 21 (Suppl 10)
[9]   Data integration by fuzzy similarity-based hierarchical clustering [J].
Angelo Ciaramella ;
Davide Nardone ;
Antonino Staiano .
BMC Bioinformatics, 21
[10]   A graph based model for multiple biological data sources integration [J].
Hanafi, Hamza ;
Rafii, Fadoua ;
Hassani, Badr Dine Rossi ;
Kbir, M'hamed Ait .
PROCEEDINGS OF THE 3RD INTERNATIONAL CONFERENCE ON SMART CITY APPLICATIONS (SCA'18), 2018,