PYRAMID: A HETEROGENEOUS DATA INTEGRATION ALGORITHM BASED ON HIERARCHICAL GRAPH

被引:0
|
作者
Jiang, Sining [1 ]
Lan, Yujun [1 ]
Wang, Weigang [1 ]
Guo, Zhongwen [1 ]
机构
[1] Ocean Univ China, Dept Informat Sci & Engn, Qingdao, Peoples R China
来源
2024 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING, ICASSP 2024 | 2024年
关键词
data integration; entity match; duplicate elimination; hierarchical graph;
D O I
10.1109/ICASSP48485.2024.10447879
中图分类号
学科分类号
摘要
The surging volume of big data underscores the imperative of integrating heterogeneous datasets into a unified, semantically consistent format. We introduce Pyramid, a comprehensive framework for heterogeneous data integration, addressing schema transformation, feature encoding, entity matching, deduplication, and mapping retrieval. At its core, a hierarchical graph captures relationships across databases, bridging diverse data sources. We employ a bottom-up encoding strategy, factoring in data context, and a top-down matching mechanism, curbing attribute misalignment across entity types. Enhanced by the transformer model and contrastive learning, our approach realizes unsupervised feature synthesis, bolstering integration. Extensive experiments and evaluations validate the broad applicability and superior performance of our method across a variety of heterogeneous datasets.
引用
收藏
页码:6220 / 6224
页数:5
相关论文
共 50 条
  • [1] Hierarchical graph embedding in vector space by graph pyramid
    Mousavi, Seyedeh Fatemeh
    Safayani, Mehran
    Mirzaei, Abdolreza
    Bahonar, Hoda
    PATTERN RECOGNITION, 2017, 61 : 245 - 254
  • [2] hSGM: Hierarchical Pyramid Based Stereo Matching Algorithm
    Won, Kwang Hee
    Jung, Soon Ki
    ADVANCED CONCEPTS FOR INTELLIGENT VISION SYSTEMS, 2011, 6915 : 693 - 701
  • [3] Hierarchical Graph-Coupled HMMs for Heterogeneous Personalized Health Data
    Fan, Kai
    Eisenberg, Marisa
    Walsh, Alison
    Aiello, Allison
    Heller, Katherine
    KDD'15: PROCEEDINGS OF THE 21ST ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2015, : 239 - 248
  • [4] A hierarchical clustering algorithm based on fuzzy graph connectedness
    Dong, Yihong
    Zhuang, Yueting
    Chen, Ken
    Tai, Xiaoying
    FUZZY SETS AND SYSTEMS, 2006, 157 (13) : 1760 - 1774
  • [5] A shortest path algorithm based on hierarchical graph model
    Wu, YM
    Xu, JM
    Hu, YC
    Yang, QH
    2003 IEEE INTELLIGENT TRANSPORTATION SYSTEMS PROCEEDINGS, VOLS. 1 & 2, 2003, : 1511 - 1514
  • [6] A Hierarchical Clustering Algorithm based on Saturated Neighbor Graph
    Zhu, Qingsheng
    Cheng, Dongdong
    Huang, Jinlong
    2015 INTERNATIONAL CONFERENCE ON INDUSTRIAL INFORMATICS - COMPUTING TECHNOLOGY, INTELLIGENT TECHNOLOGY, INDUSTRIAL INFORMATION INTEGRATION (ICIICII), 2015, : 47 - 50
  • [7] Evaluation of Data Integration Plans based on Graph Data
    Vasiliev, Diana Anca
    Ghiran, Ana-Maria
    Buchmann, Robert Andrei
    KNOWLEDGE-BASED AND INTELLIGENT INFORMATION & ENGINEERING SYSTEMS (KSE 2021), 2021, 192 : 1041 - 1050
  • [8] HSMA: Hierarchical Schema Matching Algorithm for IoT Heterogeneous Data
    Guo S.
    Guo Z.
    Qiu Z.
    Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2018, 55 (11): : 2522 - 2531
  • [9] Unification of Graph Data Models for Heterogeneous Security Information Resources' Integration
    Sergey, Stupnikov
    Natalia, Miloslavskaya
    Vladimir, Budzko
    2015 3RD INTERNATIONAL CONFERENCE ON FUTURE INTERNET OF THINGS AND CLOUD (FICLOUD) AND INTERNATIONAL CONFERENCE ON OPEN AND BIG (OBD), 2015, : 457 - 464
  • [10] Hierarchical bottleneck for heterogeneous graph representation
    He, Yunfei
    Meng, Li
    Ma, Jian
    Zhang, Yiwen
    Wu, Qun
    Ding, Weiping
    Yang, Fei
    INFORMATION SCIENCES, 2024, 667