A Contrastive Learning Framework for Keyphrase Extraction

被引：0

作者：

Song, Jing ^{[1
]}

Zu, Xian ^{[2
,3
]}

Xie, Fei ^{[2
]}

机构：

[1] Hefei Normal Univ, Dept Elect Informat & Elect Engn, Lianhua Rd, Hefei 230601, Peoples R China

[2] Hefei Normal Univ, Dept Comp & Artificial Intelligence, Lianhua Rd, Hefei 230601, Peoples R China

[3] Univ Sci & Technol China, Dept Safety Sci Engn, Hefei 230026, Anhui, Peoples R China

来源：

DATA INTELLIGENCE | 2024年 / 6卷 / 04期

基金：

中国国家自然科学基金;

关键词：

Keyphrase extraction; Contrastive learning; Supervised; n-gram features; Document embedding; KEYWORD EXTRACTION;

D O I：

10.3724/2096-7004.di.2024.0018

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

Keyphrase extraction aims to extract important phrases that reflect the main topics of a document. Recently, deep learning methods are used to model semantic information and rank candidates based on the similarities between the n-grams and the document. However, existing keyphrase extraction methods mainly caused the keyphrase extraction task to be independent of the embedding. Based on the fact that phrases that are semantically closer to the document are more likely to become keyphrases, we propose a novel contrastive learning strategy for supervised keyphrase extraction by integrating local and global Information of the document. A pre-trained RoBERTa model is used to model contextual information of sub-words in the document. Then, the embedding vectors of n-grams and the document are calculated by the convolution neural layers. Finally, we propose a novel loss function for efficiently ranking candidate phrases by combining n-gram features and document embeddings during the training of the model.

引用

页码：1032 / 1056

页数：25

共 56 条

[11]

Chung JY, 2014, Arxiv, DOI arXiv:1412.3555

[12]

Dai ZH, 2019, Arxiv, DOI [arXiv:1901.02860, DOI 10.48550/ARXIV.1901.02860]

[13]

Das Gollapalli S, 2016, Arxiv, DOI arXiv:1608.00329

[14]

Devlin J, 2019, 2019 CONFERENCE OF THE NORTH AMERICAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS: HUMAN LANGUAGE TECHNOLOGIES (NAACL HLT 2019), VOL. 1, P4171

[15]

Ding HR, 2021, 2021 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING (EMNLP 2021), P1919

[16] PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents [J].

Florescu, Corina ;

Caragea, Cornelia .

PROCEEDINGS OF THE 55TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2017), VOL 1, 2017, :1105-1115

[17]

Frank E, 1999, P 4 ACM C DIG LIB, P254, DOI [DOI 10.1145/313238.313437, DOI 10.4018/9781591404415.CH008]

[18]

Gao T., 2021, arXiv, DOI DOI 10.48550/ARXIV.2104.08821

[19] Momentum Contrast for Unsupervised Visual Representation Learning [J].

He, Kaiming ;

Fan, Haoqi ;

Wu, Yuxin ;

Xie, Saining ;

Girshick, Ross .

2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735

[20]

Huang Z., 2023, Lecture Notes in Computer Science, V14303, P744, DOI 10.1007/978-3-031-44696-2_58

← 1 2 3 4 5 6 →