ExpertBert: Pretraining Expert Finding

被引:8
作者
Liu, Hongtao [1 ]
Lv, Zhepeng [1 ]
Yang, Qing [1 ]
Xu, Dongliang [1 ]
Peng, Qiyao [2 ]
机构
[1] Du Xiaoman Financial, Beijing, Peoples R China
[2] Tianjin Univ, Sch New Media & Commun, Tianjin, Peoples R China
来源
PROCEEDINGS OF THE 31ST ACM INTERNATIONAL CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT, CIKM 2022 | 2022年
关键词
Expert Finding; Community Question Answering; Pretraining;
D O I
10.1145/3511808.3557597
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Expert Finding is an important task in Community Question Answering (CQA) platforms, which could help route questions to potential expertise users to answer. The key is to model the question content and experts based on their historical answered questions accurately. Recently Pretrained Language Models (PLMs, e.g., Bert) have shown superior text modeling ability and have been used in expert finding preliminary. However, most PLMs-based models focus on the corpus or document granularity during pretraining, which is inconsistent with the downstream expert modeling and finding task. In this paper, we propose an expert-level pretraining language model named ExpertBert, aiming to model questions, experts as well as question-expert matching effectively in a pretraining manner. In our approach, we aggregate the historical answered questions of an expert as the expert-specific input. Besides, we integrate the target question into the input and design a label-augmented Masked Language Model (MLM) task to further capture the matching pattern between question and experts, which makes the pretraining objectives that more closely resemble the downstream expert finding task. Experimental results and detailed analysis on real-world CQA datasets demonstrate the effectiveness of our ExpertBert.
引用
收藏
页码:4244 / 4248
页数:5
相关论文
共 23 条
[1]  
[Anonymous], TECHNICAL REPORT
[2]  
[Anonymous], 2015, P INT JOINT C ART IN
[3]   A language modeling framework for expert finding [J].
Balog, Krisztian ;
Azzopardi, Leif ;
de Rijke, Maarten .
INFORMATION PROCESSING & MANAGEMENT, 2009, 45 (01) :1-19
[4]  
Chandra S, 2013, PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON CIRCUITS, POWER AND COMPUTING TECHNOLOGIES (ICCPCT 2013), P494
[5]  
Craswell N., 2009, Mean Reciprocal Rank, P1703, DOI DOI 10.1007/978-0-387-39940-9488
[6]  
Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171
[7]   Blood tests predict the therapeutic prognosis of anti-PD-1 in advanced biliary tract cancer [J].
Du, Fei ;
Qiu, Zhiquan ;
Ai, Wenchao ;
Huang, Chenjun ;
Ji, Jun ;
Xiao, Xiao ;
Zhou, Jun ;
Fang, Meng ;
Jiang, Xiaoqing ;
Gao, Chunfang .
JOURNAL OF LEUKOCYTE BIOLOGY, 2021, 110 (02) :327-334
[8]   Recurrent Memory Reasoning Network for Expert Finding in Community Question Answering [J].
Fu, Jinlan ;
Li, Yi ;
Zhang, Qi ;
Wu, Qinzhuo ;
Ma, Renfeng ;
Huang, Xuanjing ;
Jiang, Yu-Gang .
PROCEEDINGS OF THE 13TH INTERNATIONAL CONFERENCE ON WEB SEARCH AND DATA MINING (WSDM '20), 2020, :187-195
[9]   User Embedding for Expert Finding in Community Question Answering [J].
Ghasemi, Negin ;
Fatourechi, Ramin ;
Momtazi, Saeedeh .
ACM TRANSACTIONS ON KNOWLEDGE DISCOVERY FROM DATA, 2021, 15 (04)
[10]  
Huang PS, 2013, PROCEEDINGS OF THE 22ND ACM INTERNATIONAL CONFERENCE ON INFORMATION & KNOWLEDGE MANAGEMENT (CIKM'13), P2333