PYRAMID: A HETEROGENEOUS DATA INTEGRATION ALGORITHM BASED ON HIERARCHICAL GRAPH

被引:0
作者
Jiang, Sining [1 ]
Lan, Yujun [1 ]
Wang, Weigang [1 ]
Guo, Zhongwen [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Engn, Qingdao, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
data integration; entity match; duplicate elimination; hierarchical graph;
D O I
10.1109/ICASSP48485.2024.10447879
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
引用
收藏
页码:6220 / 6224
页数:5
相关论文
共 50 条
[41]   Research on Heterogeneous Data Integration of Coal Mine Safety Monitoring Based on GIS Technology [J].
Chen Hu ;
Huang Liyan .
COMPUTATIONAL MATERIALS SCIENCE, PTS 1-3, 2011, 268-270 :653-658
[42]   Optimization of Hierarchical Graph Layout with a Genetic Algorithm and Sprawl/Clutter Metrics [J].
Murakami, Ayana ;
Itoh, Takayuki .
2023 27TH INTERNATIONAL CONFERENCE INFORMATION VISUALISATION, IV, 2023, :166-171
[43]   Biomedical heterogeneous data categorization and schema mapping toward data integration [J].
Deshpande, Priya ;
Rasin, Alexander ;
Tchoua, Roselyne ;
Furst, Jacob ;
Raicu, Daniela ;
Schinkel, Michiel ;
Trivedi, Hari ;
Antani, Sameer .
FRONTIERS IN BIG DATA, 2023, 6
[44]   Distributed geospatial data infrastructure for heterogeneous disaster data integration and application [J].
Xie, Jibo ;
Li, Guoqing .
2017 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTED, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION (SMARTWORLD/SCALCOM/UIC/ATC/CBDCOM/IOP/SCI), 2017,
[45]   Efficient knowledge discovery through the integration of heterogeneous data [J].
Scotney, B ;
McClean, S .
INFORMATION AND SOFTWARE TECHNOLOGY, 1999, 41 (09) :569-578
[46]   Integration Model between Heterogeneous Data Services in a Cloud [J].
Vieira, Marcelo Aires ;
Fracalossi Ribeiro, Elivaldo Lozer ;
Claro, Daniela Barreiro ;
Mane, Babacar .
JOURNAL OF UNIVERSAL COMPUTER SCIENCE, 2021, 27 (04) :387-412
[47]   ROHDIP: Resource Oriented Heterogeneous Data Integration Platform [J].
Shehab, Wael ;
ElGokhy, Sherin M. ;
Sallam, ElSayed .
INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2016, 7 (09) :104-109
[48]   Construction of Mediators for Heterogeneous Data Source Integration Systems [J].
高明 ;
宋瀚涛 .
Journal of Beijing Institute of Technology(English Edition), 2003, (01) :33-36
[49]   INTEGRATION OF HETEROGENEOUS DATA SOURCES IN AN ONTOLOGICAL KNOWLEDGE BASE [J].
Mylka, Antoni ;
Mylka, Alina ;
Kryza, Bartosz ;
Kitowski, Jacek .
COMPUTING AND INFORMATICS, 2012, 31 (01) :189-223
[50]   Realization of web heterogeneous data integration access architecture [J].
Li, Guanyu ;
Qu, Lining ;
Wu, Dandan .
2007 INTERNATIONAL SYMPOSIUM ON COMPUTER SCIENCE & TECHNOLOGY, PROCEEDINGS, 2007, :1044-1046