Efficient and Effective Academic Expert Finding on Heterogeneous Graphs through (k, P)- Core based Embedding

被引:0
作者
Wang, Yuxiang [1 ]
Liu, Jun [1 ]
Xu, Xiaoliang [1 ]
Ke, Xiangyu [2 ]
Wu, Tianxing [3 ]
Gou, Xiaoxuan [1 ]
机构
[1] Hangzhou Dianzi Univ, 2 Ave, Hangzhou 310018, Zhejiang, Peoples R China
[2] Zhejiang Univ, 866 Yuhangtang Rd, Hangzhou 310058, Zhejiang, Peoples R China
[3] Southeast Univ, 2 Southeast Univ Rd, Nanjing 210096, Jiangsu, Peoples R China
关键词
Expert finding; (k; P)-core community; document/expert embedding; heterogeneous graph; MODELS;
D O I
10.1145/3578365
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Expert finding is crucial for a wealth of applications in both academia and industry. Given a user query and trove of academic papers, expert finding aims at retrieving the most relevant experts for the query, from the academic papers. Existing studies focus on embedding-based solutions that consider academic papers' textual semantic similarities to a query via document representation and extract the top-n experts fromthe most similar papers. Beyond implicit textual semantics, however, papers' explicit relationships (e.g., co-authorship) in a heterogeneous graph (e.g., DBLP) are critical for expert finding, because they help improve the representation quality. Despite their importance, the explicit relationships of papers generally have been ignored in the literature. In this article, we study expert finding on heterogeneous graphs by considering both the explicit relationships and implicit textual semantics of papers in one model. Specifically, we define the cohesive (k, P)-core community of papersw.r.t. a meta-path P (i.e., relationship) and propose a (k, P)-core based document embedding model to enhance the representation quality. Based on this, we design a proximity graph-based index (PGIndex) of papers and present a threshold algorithm (TA)-based method to efficiently extract top-n experts from papers returned by PG-Index. We further optimize our approach in two ways: (1) we boost effectiveness by considering the (k, P)-core community of experts and the diversity of experts' research interests, to achieve high-quality expert representation from paper representation; and (2) we streamline expert finding, going from "extract top-n experts fromtop-m (m > n) semantically similar papers" to "directly return top-n experts". The process of returning a large number of top-m papers as intermediate data is avoided, thereby improving the efficiency. Extensive experiments using real-world datasets demonstrate our approach's superiority.
引用
收藏
页数:35
相关论文
共 73 条
  • [1] Alarfaj F., 2012, P NAACL, P1
  • [2] [Anonymous], 2021, HUGGINGFACE
  • [3] [Anonymous], 2010, Search Engines: Information retrieval in practice
  • [4] Zipf distribution of US firm sizes
    Axtell, RL
    [J]. SCIENCE, 2001, 293 (5536) : 1818 - 1820
  • [5] Balog K., 2006, Proceedings of the Twenty-Ninth Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, P43, DOI 10.1145/1148170.1148181
  • [6] Expertise Retrieval
    Balog, Krisztian
    Fang, Yi
    de Rijke, Maarten
    Serdyukov, Pavel
    Si, Luo
    [J]. FOUNDATIONS AND TRENDS IN INFORMATION RETRIEVAL, 2012, 6 (2-3): : 127 - 256
  • [7] Batagelj V., 2003, ARXIV
  • [8] Beltagy I, 2019, 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019), P3615
  • [9] Berger Mark, 2020, P EMNLP, P56
  • [10] Bojchevski Aleksandar, 2017, INT C LEARNING REPRE