PYRAMID: A HETEROGENEOUS DATA INTEGRATION ALGORITHM BASED ON HIERARCHICAL GRAPH

被引:0
作者
Jiang, Sining [1 ]
Lan, Yujun [1 ]
Wang, Weigang [1 ]
Guo, Zhongwen [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Engn, Qingdao, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
data integration; entity match; duplicate elimination; hierarchical graph;
D O I
10.1109/ICASSP48485.2024.10447879
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
引用
收藏
页码:6220 / 6224
页数:5
相关论文
共 50 条
[31]   Catalog Integration of Heterogeneous and Volatile Product Data [J].
Schmidts, Oliver ;
Kraft, Bodo ;
Winkens, Marvin ;
Zuendorf, Albert .
DATA MANAGEMENT TECHNOLOGIES AND APPLICATIONS, DATA 2020, 2021, 1446 :134-153
[32]   A Semantic Integration System for Heterogeneous Bioinformatics Data [J].
Dai, Weidi ;
Cheng, Jianlai ;
Wang, Qiuwen .
PROCEEDINGS OF 2012 2ND INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND NETWORK TECHNOLOGY (ICCSNT 2012), 2012, :1072-1076
[33]   Heterogeneous Embeddings for Relational Data Integration Tasks [J].
Li, Xuehui ;
Wang, Guangqi ;
Shen, Derong ;
Nie, Tiezheng ;
Kou, Yue .
WEB INFORMATION SYSTEMS AND APPLICATIONS (WISA 2021), 2021, 12999 :680-692
[34]   An approach for semantic integration of heterogeneous data sources [J].
Fusco, Giuseppe ;
Aversano, Lerina .
PEERJ COMPUTER SCIENCE, 2020, 2020 (03) :1-30
[35]   Integration of Heterogeneous Data for Real World Domain [J].
Hong, Jer Lang .
2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, :868-872
[36]   Semantics-aware data integration for heterogeneous data sources [J].
Leida, Marcello ;
Gusmini, Alex ;
Davies, John .
JOURNAL OF AMBIENT INTELLIGENCE AND HUMANIZED COMPUTING, 2013, 4 (04) :471-491
[37]   Semantics-aware data integration for heterogeneous data sources [J].
Marcello Leida ;
Alex Gusmini ;
John Davies .
Journal of Ambient Intelligence and Humanized Computing, 2013, 4 :471-491
[38]   Knowledge graph-based data integration system for digital twins of built assets [J].
Ramonell, Carlos ;
Chacon, Rolando ;
Posada, Hector .
AUTOMATION IN CONSTRUCTION, 2023, 156
[39]   An integration approach of multi-source heterogeneous fuzzy spatiotemporal data based on RDF [J].
Bai, Luyi ;
Li, Nan ;
Bai, Huilei .
JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2021, 40 (01) :1065-1082
[40]   Deep Learning-based Evolutionary Recommendation Model for Heterogeneous Big Data Integration [J].
Yoo, Hyun ;
Chung, Kyungyong .
KSII TRANSACTIONS ON INTERNET AND INFORMATION SYSTEMS, 2020, 14 (09) :3730-3744