PYRAMID: A HETEROGENEOUS DATA INTEGRATION ALGORITHM BASED ON HIERARCHICAL GRAPH

被引:0
作者
Jiang, Sining [1 ]
Lan, Yujun [1 ]
Wang, Weigang [1 ]
Guo, Zhongwen [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Engn, Qingdao, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
data integration; entity match; duplicate elimination; hierarchical graph;
D O I
10.1109/ICASSP48485.2024.10447879
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
引用
收藏
页码:6220 / 6224
页数:5
相关论文
共 50 条
[21]   Integration of multiple heterogeneous omics data [J].
Zhang, Chuanchao ;
Liu, Juan ;
Shi, Qianqian ;
Yu, Xiangtian ;
Zeng, Tao ;
Chen, Luonan .
2016 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2016, :564-569
[22]   Integration of Weakly Heterogeneous Semistructured Data [J].
Feuerlicht, George ;
Pokorny, Jaroslav ;
Richta, Karel ;
Ruttananontsatean, Narongdech .
INFORMATION SYSTEMS DEVELOPMENT: TOWARDS A SERVICE PROVISION SOCIETY, 2009, :69-+
[23]   The Study on Integration of heterogeneous data in universities [J].
Han Hongmei ;
Li Xin ;
Gao Bingquan ;
Liu Jinbiao .
2011 INTERNATIONAL CONFERENCE ON FUTURE COMPUTERS IN EDUCATION (ICFCE 2011), VOL I, 2011, :140-143
[24]   Mediation and Graph Data Models for Medical Data Integration [J].
Constanza Pabon, Maria ;
Andres Montoya, Guillermo ;
Millan, Martha .
PROCEEDINGS OF THE 2013 XXXIX LATIN AMERICAN COMPUTING CONFERENCE (CLEI), 2013,
[25]   Graph-Based Short Text Entity Linking: A Data Integration Perspective [J].
Ma, Bo ;
Yang, Yating ;
Zhou, Xi ;
Wang, Lei .
PROCEEDINGS OF THE 2016 INTERNATIONAL CONFERENCE ON ASIAN LANGUAGE PROCESSING (IALP), 2016, :193-197
[26]   Research on heterogeneous data integration model of group enterprise based on cluster computing [J].
Zhou, Qingyuan .
CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS, 2016, 19 (03) :1275-1282
[27]   Integration of Heterogeneous Data Sources in Smart Grid based on Summary Schema Model [J].
Sedighi, Foroogh ;
Moghadam, Mahshid Helali .
PROCEEDINGS OF THE 2016 12TH INTERNATIONAL CONFERENCE ON INNOVATIONS IN INFORMATION TECHNOLOGY (IIT), 2016, :88-93
[28]   Survey on Data Integration Technologies for Relational Data and Knowledge Graph [J].
Gao Y.-J. ;
Ge C.-C. ;
Guo Y.-X. ;
Chen L. .
Ruan Jian Xue Bao/Journal of Software, 2023, 34 (05) :2365-2391
[29]   Two approaches to the integration of heterogeneous data warehouses [J].
Torlone, Riccardo .
DISTRIBUTED AND PARALLEL DATABASES, 2008, 23 (01) :69-97
[30]   Two approaches to the integration of heterogeneous data warehouses [J].
Riccardo Torlone .
Distributed and Parallel Databases, 2008, 23 :69-97