Unsupervised Deep Keyphrase Generation

被引：0

作者：

Shen, Xianjie ^{[1
]}

Wang, Yinghan ^{[2
]}

Meng, Rui ^{[3
]}

Shang, Jingbo ^{[1
]}

机构：

[1] Univ Calif San Diego, La Jolla, CA 92093 USA

[2] Amazon Com Inc, Seattle, WA USA

[3] Salesforce Res, Palo Alto, CA USA

来源：

THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年

基金：

美国国家科学基金会;

关键词：

D O I：

暂无

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.

引用

页码：11303 / 11311

页数：9

共 47 条

[1] Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents
Al-Zaidy, Rabah A.
Caragea, Cornelia
Giles, C. Lee
[J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2551 - 2557
[2] Allamanis M, 2016, PR MACH LEARN RES, V48
[3] [Anonymous], 2010, P 23 AAAI C ART INT, DOI DOI 10.1145/1740592.1740596
[4] Bennani-Smires K., 2018, P 22 C COMPUTATIONAL, P221, DOI [DOI 10.18653/V1/K18-1022, 10.18653/v1/K18-1022]
[5] Bird S., 2009, NATURAL LANGUAGE PRO
[6] Boudin F., 2021, ARXIV210312440
[7] A Text Feature Based Automatic Keyword Extraction Method for Single Documents
Campos, Ricardo
Mangaravite, Vitor
Pasquali, Arian
Jorge, Alipio Mario
Nunes, Celia
Jatowt, Adam
[J]. ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 684 - 691
[8] Carbonell JG., 2017, SIGIR Forum, V51, P209, DOI [DOI 10.1145/3130348.3130369, 10.1145/3130348.3130369.]
[9] Celikyilmaz A., 2018, P 2018 C N AM CHAPT, P1662, DOI DOI 10.18653/V1/N18-1150
[10] Chen W, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1095

← 1 2 3 4 5 →