A Contrastive Learning Framework for Keyphrase Extraction

被引:0
作者
Song, Jing [1 ]
Zu, Xian [2 ,3 ]
Xie, Fei [2 ]
机构
[1] Hefei Normal Univ, Dept Elect Informat & Elect Engn, Lianhua Rd, Hefei 230601, Peoples R China
[2] Hefei Normal Univ, Dept Comp & Artificial Intelligence, Lianhua Rd, Hefei 230601, Peoples R China
[3] Univ Sci & Technol China, Dept Safety Sci Engn, Hefei 230026, Anhui, Peoples R China
基金
中国国家自然科学基金;
关键词
Keyphrase extraction; Contrastive learning; Supervised; n-gram features; Document embedding; KEYWORD EXTRACTION;
D O I
10.3724/2096-7004.di.2024.0018
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Keyphrase extraction aims to extract important phrases that reflect the main topics of a document. Recently, deep learning methods are used to model semantic information and rank candidates based on the similarities between the n-grams and the document. However, existing keyphrase extraction methods mainly caused the keyphrase extraction task to be independent of the embedding. Based on the fact that phrases that are semantically closer to the document are more likely to become keyphrases, we propose a novel contrastive learning strategy for supervised keyphrase extraction by integrating local and global Information of the document. A pre-trained RoBERTa model is used to model contextual information of sub-words in the document. Then, the embedding vectors of n-grams and the document are calculated by the convolution neural layers. Finally, we propose a novel loss function for efficiently ranking candidate phrases by combining n-gram features and document embeddings during the training of the model.
引用
收藏
页码:1032 / 1056
页数:25
相关论文
共 56 条
[1]   Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents [J].
Al-Zaidy, Rabah A. ;
Caragea, Cornelia ;
Giles, C. Lee .
WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, :2551-2557
[2]   KP-Rank: a semantic-based unsupervised approach for keyphrase extraction from text data [J].
Aman, Muhammad ;
Abdulkadir, Said Jadid ;
Aziz, Izzatdin Abdul ;
Alhussian, Hitham ;
Ullah, Israr .
MULTIMEDIA TOOLS AND APPLICATIONS, 2021, 80 (08) :12469-12506
[3]  
Beltagy I, 2020, Arxiv, DOI arXiv:2004.05150
[4]  
Bennani-Smires K, 2018, Arxiv, DOI arXiv:1801.04470
[5]   A graph based keyword extraction model using collective node weight [J].
Biswas, Saroj Kr. ;
Bordoloi, Monali ;
Shreya, Jacob .
EXPERT SYSTEMS WITH APPLICATIONS, 2018, 97 :51-59
[6]  
Boudin F., 2016, P 26 INT C COMP LING, P69
[7]  
Boudin F, 2018, P 2018 C N AM CHAPT, V2, P667, DOI [DOI 10.18653/V1/N18-2105, 10.18653/v1/n18-2105]
[8]  
Bougouin Adrien, 2013, P 6 INT JOINT C NAT, P543
[9]  
Chen T, 2020, PR MACH LEARN RES, V119
[10]   ISKE: An unsupervised automatic keyphrase extraction approach using the iterated sentences based on graph method [J].
Chi, Ling ;
Hu, Liang .
KNOWLEDGE-BASED SYSTEMS, 2021, 223