Evaluating diabetes dataset for knowledge graph embedding based link prediction

被引:0
作者
Singh, Sushmita [1 ]
Siwach, Manvi [1 ]
机构
[1] JC Bose Univ Sci & Technol, Dept Comp Engn, Faridabad, India
关键词
Link prediction; Knowledge graphs; Knowledge graph embeddings; Knowledge graph completion; Translational embeddings; Diabetes;
D O I
10.1016/j.datak.2025.102414
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
For doing any accurate analysis or prediction on data, a complete and well-populated dataset is required. Medical based data for any disease like diabetes is highly coupled and heterogeneous in nature, with numerous interconnections. This inherently complex data cannot be analysed by simple relational databases making knowledge graphs an ideal tool for its representation which can efficiently handle intricate relationships. Thus, knowledge graphs can be leveraged to analyse diabetes data, enhancing both the accuracy and efficiency of data-driven decision-making processes. Although substantial data exists on diabetes in various formats, the availability of organized and complete datasets is limited, highlighting the critical need for creation of a well- populated knowledge graph. Moreover while developing the knowledge graph, an inevitable problem of incompleteness is present due to missing links or relationships, necessitating the use of knowledge graph completion tasks to fill in this absent information which involves predicting missing data with various Link Prediction (LP) techniques. Among various link prediction methods, approaches based on knowledge graph embeddings have demonstrated superior performance and effectiveness. These knowledge graphs can support in-depth analysis and enhance the prediction of diabetes-associated risks in this field. This paper introduces a dataset specifically designed for performing link prediction on a diabetes knowledge graph, so that it can be used to fill the information gaps further contributing in the domain of risk analysis in diabetes. The accuracy of the dataset is assessed through validation with state-of-the-art embedding-based link prediction methods.
引用
收藏
页数:14
相关论文
共 28 条
  • [1] [Anonymous], 2012, J Mach Learn Res
  • [2] [Anonymous], about us
  • [3] DBpedia: A nucleus for a web of open data
    Auer, Soeren
    Bizer, Christian
    Kobilarov, Georgi
    Lehmann, Jens
    Cyganiak, Richard
    Ives, Zachary
    [J]. SEMANTIC WEB, PROCEEDINGS, 2007, 4825 : 722 - +
  • [4] Bordes A., 2013, Advances in neural information processing systems, P2787
  • [5] CDC, about us
  • [6] Evolution of Semantic Similarity-A Survey
    Chandrasekaran, Dhivya
    Mago, Vijay
    [J]. ACM COMPUTING SURVEYS, 2021, 54 (02)
  • [7] Knowledge Graph Completion: A Review
    Chen, Zhe
    Wang, Yuehan
    Zhao, Bin
    Cheng, Jing
    Zhao, Xin
    Duan, Zongtao
    [J]. IEEE ACCESS, 2020, 8 (08): : 192435 - 192456
  • [8] Corrado G., 2013, P 1 INT C LEARNING R, V1301, P3781
  • [9] diabetesstrong, About us
  • [10] From Data Fusion to Knowledge Fusion
    Dong, Xin Luna
    Gabrilovich, Evgeniy
    Heitz, Geremy
    Horn, Wilko
    Murphy, Kevin
    Sun, Shaohua
    Zhang, Wei
    [J]. PROCEEDINGS OF THE VLDB ENDOWMENT, 2014, 7 (10): : 881 - 892