Unsupervised Deep Keyphrase Generation

被引:0
作者
Shen, Xianjie [1 ]
Wang, Yinghan [2 ]
Meng, Rui [3 ]
Shang, Jingbo [1 ]
机构
[1] Univ Calif San Diego, La Jolla, CA 92093 USA
[2] Amazon Com Inc, Seattle, WA USA
[3] Salesforce Res, Palo Alto, CA USA
来源
THIRTY-SIXTH AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE / THIRTY-FOURTH CONFERENCE ON INNOVATIVE APPLICATIONS OF ARTIFICIAL INTELLIGENCE / TWELVETH SYMPOSIUM ON EDUCATIONAL ADVANCES IN ARTIFICIAL INTELLIGENCE | 2022年
基金
美国国家科学基金会;
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase generation aims to summarize long documents with a collection of salient phrases. Deep neural models have demonstrated remarkable success in this task, with the capability of predicting keyphrases that are even absent from a document. However, such abstractiveness is acquired at the expense of a substantial amount of annotated data. In this paper, we present a novel method for keyphrase generation, AutoKeyGen, without the supervision of any annotated doc-keyphrase pairs. Motivated by the observation that an absent keyphrase in a document may appear in other places, in whole or in part, we construct a phrase bank by pooling all phrases extracted from a corpus. With this phrase bank, we assign phrase candidates to new documents by a simple partial matching algorithm, and then we rank these candidates by their relevance to the document from both lexical and semantic perspectives. Moreover, we bootstrap a deep generative model using these top-ranked pseudo keyphrases to produce more absent candidates. Extensive experiments demonstrate that AutoKeyGen outperforms all unsupervised baselines and can even beat a strong supervised method in certain cases.
引用
收藏
页码:11303 / 11311
页数:9
相关论文
共 47 条
  • [1] Bi-LSTM-CRF Sequence Labeling for Keyphrase Extraction from Scholarly Documents
    Al-Zaidy, Rabah A.
    Caragea, Cornelia
    Giles, C. Lee
    [J]. WEB CONFERENCE 2019: PROCEEDINGS OF THE WORLD WIDE WEB CONFERENCE (WWW 2019), 2019, : 2551 - 2557
  • [2] Allamanis M, 2016, PR MACH LEARN RES, V48
  • [3] [Anonymous], 2010, P 23 AAAI C ART INT, DOI DOI 10.1145/1740592.1740596
  • [4] Bennani-Smires K., 2018, P 22 C COMPUTATIONAL, P221, DOI [DOI 10.18653/V1/K18-1022, 10.18653/v1/K18-1022]
  • [5] Bird S., 2009, NATURAL LANGUAGE PRO
  • [6] Boudin F., 2021, ARXIV210312440
  • [7] A Text Feature Based Automatic Keyword Extraction Method for Single Documents
    Campos, Ricardo
    Mangaravite, Vitor
    Pasquali, Arian
    Jorge, Alipio Mario
    Nunes, Celia
    Jatowt, Adam
    [J]. ADVANCES IN INFORMATION RETRIEVAL (ECIR 2018), 2018, 10772 : 684 - 691
  • [8] Carbonell JG., 2017, SIGIR Forum, V51, P209, DOI [DOI 10.1145/3130348.3130369, 10.1145/3130348.3130369.]
  • [9] Celikyilmaz A., 2018, P 2018 C N AM CHAPT, P1662, DOI DOI 10.18653/V1/N18-1150
  • [10] Chen W, 2020, 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), P1095