Towards unsupervised keyphrase extraction via an autoregressive approach

被引:2
作者
Li, Tuohang [1 ]
Hu, Liang [1 ]
Li, Hongtu [1 ]
Sun, Chengyu [1 ]
Li, Shuai [1 ]
Chi, Ling [1 ]
机构
[1] Jilin Univ, Coll Comp Sci & Technol, 2699 Qianjin St, Changchun 130012, Jilin, Peoples R China
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Autoregressive structure; Optimizer; Unsupervised model; Coverage decay optimizer;
D O I
10.1016/j.knosys.2023.110664
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction is a technique used to capture the core information of documents and is an upstream task for advanced information retrieval systems, particularly in the academic realm. Current unsupervised methods are primarily built on a score-and-rank framework with a consistent inability to acquire mutual information between extracted keyphrases, especially with graph-based models. Utilizing the autoregressive structure that is typically used in sequence-to-sequence text generation models, we propose a plug-and-play optimizer named C-Decay that can be integrated into any graph -based unsupervised keyphrase extraction model for a stable performance boost, and that mitigates the bias of certain semantically or lexically dominant tokens by optimizing the origin score distribution output by graph-based models directly. The architecture of C-Decay includes the keyphrase pool, the gain vector and the decay factor, where the keyphrase pool is designed to realize an autoregressive structure and the gain vector and the decay factor are the optimization operator. Herein, we examine three graph-based models integrated with C-Decay, and the experiment is conducted on four datasets KDD, Semeval, Nguyen, and Krapivin. Moreover, we prove that C-Decay can improve accuracy and F-Measure by an average of approximately 50% and 20%, respectively.& COPY; 2023 Elsevier B.V. All rights reserved.
引用
收藏
页数:10
相关论文
共 43 条
[1]  
[Anonymous], 2010, P 2010 C EMPIRICAL M
[2]  
[Anonymous], 2009, P 2009 C EMP METH NA, DOI DOI 10.3115/1699510.1699544
[3]  
Bennani-Smires K., 2018, P 22 C COMP NAT LANG, P221, DOI [10.18653, DOI 10.18653/V1/K18-1022]
[4]  
Boudin F., 2018, P 2018 C N AM CHAPTE, P667, DOI 10.18653/v1/n18-2105
[5]  
Bougouin Adrien, 2013, P 6 INT JOINT C NAT
[6]   YAKE! Collection-Independent Automatic Keyword Extractor [J].
Campos, Ricardo ;
Mangaravite, Vitor ;
Pasquali, Arian ;
Jorge, Alipio Mario ;
Nunes, Celia ;
Jatowt, Adam .
ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 :806-810
[7]  
Caragea Cornelia, 2014, P 2014 C EMP METH NA, P1435, DOI [10.3115/v1/D14-1150, DOI 10.3115/V1/D14-1150, 10.3115/v1/d14-1150]
[8]  
Chan HP, 2019, 57TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2019), P2163
[9]  
Chen J, 2018, 2018 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2018), P4057
[10]   ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method [J].
Chi, Ling ;
Hu, Liang .
KNOWLEDGE-BASED SYSTEMS, 2021, 223