Link prediction using low-dimensional node embeddings: The measurement problem

被引:0
作者
Menand, Nicolas [1 ]
Seshadhri, C. [2 ]
机构
[1] Univ Penn, Dept Comp & Informat Sci, Philadelphia, PA 19104 USA
[2] Univ Calif Santa Cruz, Dept Comp Sci, Santa Cruz, CA 95064 USA
关键词
-dimensional embeddings; link prediction; graph representational learning; graph embeddings; machine learning metrics;
D O I
暂无
中图分类号
O [数理科学和化学]; P [天文学、地球科学]; Q [生物科学]; N [自然科学总论];
学科分类号
07 ; 0710 ; 09 ;
摘要
Graph representation learning is a fundamental technique for machine learning (ML) success in link prediction. On closer investigation, we observe that the performance is measured by the AUC (area under the curve), which suffers biases. Since the ground truth in link prediction is sparse, we design a vertex-centric measure of performance, called the VCMPR@k plots. Under this measure, we show that link predictors using graph representations show poor scores. Despite having extremely high AUC scores, the predictors miss much of the ground truth. We identify a mathematical connection between this performance, the sparsity of the ground truth, and the low-dimensional geometry of the node embeddings. Under a formal theoretical framework, we prove that low-dimensional vectors cannot capture sparse ground truth using dot product similarities (the standard practice in the literature). Our results call into question existing results on link prediction and pose a significant scientific challenge for graph representation learning. The VCMPR plots identify specific scientific challenges for link prediction using low-dimensional node embeddings.
引用
收藏
页数:10
相关论文
共 45 条
  • [1] Friends and neighbors on the Web
    Adamic, LA
    Adar, E
    [J]. SOCIAL NETWORKS, 2003, 25 (03) : 211 - 230
  • [2] Role-Based Graph Embeddings
    Ahmed, Nesreen K.
    Rossi, Ryan A.
    Lee, John Boaz
    Willke, Theodore L.
    Zhou, Rong
    Kong, Xiangnan
    Eldardiry, Hoda
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2022, 34 (05) : 2401 - 2415
  • [3] Al Hasan M, 2011, SOCIAL NETWORK DATA ANALYTICS, P243
  • [4] Bordes A, 2013, ADV NEURAL INFORM PR, V2, P2787
  • [5] Cao Shaosheng, 2015, P 24 ACM INT C INF K, P891, DOI DOI 10.1145/2806416.2806512
  • [6] GRAPE for fast and scalable graph processing and random-walk-based embedding
    Cappelletti, Luca
    Fontana, Tommaso
    Casiraghi, Elena
    Ravanmehr, Vida
    Callahan, Tiffany J. J.
    Cano, Carlos
    Joachimiak, Marcin P. P.
    Mungall, Christopher J. J.
    Robinson, Peter N. N.
    Reese, Justin
    Valentini, Giorgio
    [J]. NATURE COMPUTATIONAL SCIENCE, 2023, 3 (06): : 552 - +
  • [7] Chami I, 2022, Arxiv, DOI [arXiv:2005.03675, 10.48550/arXiv.2005.03675]
  • [8] Chanpuriya S., 2020, Neural Information Processing Systems (NeurIPS)
  • [9] Query-based Music Recommendations via Preference Embedding
    Chen, Chih-Ming
    Tsai, Ming-Feng
    Lin, Yu-Ching
    Yang, Yi-Hsuan
    [J]. PROCEEDINGS OF THE 10TH ACM CONFERENCE ON RECOMMENDER SYSTEMS (RECSYS'16), 2016, : 79 - 82
  • [10] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874