WAGRank: A word ranking model based on word attention graph for keyphrase extraction

被引:0
|
作者
Bian, Rong [1 ,2 ]
Cheng, Bing [1 ,3 ]
机构
[1] Chinese Acad Sci, Acad Math & Syst Sci, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Sch Math Sci, Beijing, Peoples R China
[3] Chinese Acad Sci, Ctr Forecasting Sci, Acad Math & Syst Sci, Beijing, Peoples R China
基金
国家重点研发计划;
关键词
Keyphrase extraction; attention mechanism; graph-based model; pre-trained language model; semantic feature;
D O I
10.1177/1088467X241296257
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Keyphrase extraction is an essential task of identifying representative words or phrases in document processing. Main traditional models rely on each word frequency feature in a document and its associated corpus. There are two major limitations of the word frequency method: first, it fails to fully exploit semantic information in the document, that is, it is a bag-of-word method; second, it tends to be influenced by local word frequency in the short current text when the linked corpus is not available or incomplete. This paper proposes WAGRank, a novel unsupervised ranking model on a word attention graph, where nodes are words and edges are semantic relations between words. To assign edge weights, two interpretable statistical methods of assessing correlation strength between words are designed using attention mechanism. WAGRank depends on word semantics rather than frequency only in the current text, using external knowledge stored in a pre-trained language model. WAGRank was evaluated on two publicly available datasets against twelve baselines, presenting its effectiveness and robustness. Besides, the Granger causality test illustrated that word attention has a statistically significant predictive effect on word frequency, providing a more reasonable explanation for word frequency analysis.
引用
收藏
页数:23
相关论文
共 50 条
  • [31] Performance Analysis of Graph based Keyphrase Extraction metrics for uncertain User-generated data
    Garg, Muskan
    Kumar, Mukesh
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 419 - 425
  • [32] A graph-based unsupervised N-gram filtration technique for automatic keyphrase extraction
    Kumar, Niraj
    Srinathan, Kannan
    Varma, Vasudeva
    INTERNATIONAL JOURNAL OF DATA MINING MODELLING AND MANAGEMENT, 2016, 8 (02) : 124 - 143
  • [33] Event-Oriented Keyphrase Extraction Based on Bi-clustering Model
    Zhao, Lin
    Zang, Liangjun
    Huang, Longtao
    Han, Jizhong
    Hu, Songlin
    COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 207 - 220
  • [34] A semantic graph-based keyword extraction model using ranking method on big social data
    Devika, R.
    Subramaniyaswamy, V
    WIRELESS NETWORKS, 2021, 27 (08) : 5447 - 5459
  • [35] Attention-based Multi-layer Chinese Word Embedding
    Ma, Bing
    Sun, Haifeng
    Wang, Jingyu
    Qi, Qi
    2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA), 2019, : 2895 - 2902
  • [36] A semantic graph-based keyword extraction model using ranking method on big social data
    R. Devika
    V. Subramaniyaswamy
    Wireless Networks, 2021, 27 : 5447 - 5459
  • [37] Thesaurus-Based Method of Increasing Text-via-Keyphrase Graph Connectivity During Keyphrase Extraction for e-Tourism Applications
    Paramonov, Ilya
    Lagutina, Ksenia
    Mamedov, Eldar
    Lagutina, Nadezhda
    KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2016, 2016, 649 : 129 - 141
  • [38] Learning from Twitter Hashtags: Leveraging Proximate Tags to Enhance Graph-based Keyphrase Extraction
    Bellaachia, Abdelghani
    Al-Dhelaan, Mohammed
    2012 IEEE INTERNATIONAL CONFERENCE ON GREEN COMPUTING AND COMMUNICATIONS, CONFERENCE ON INTERNET OF THINGS, AND CONFERENCE ON CYBER, PHYSICAL AND SOCIAL COMPUTING (GREENCOM 2012), 2012, : 348 - 357
  • [39] Comparison of Naïve Bayes with graph based methods for keyphrase extraction in modern standard Arabic language
    Loukam M.
    International Journal of Speech Technology, 2023, 26 (1) : 141 - 150
  • [40] Acoustic Word Embedding Based on Multi-Head Attention Quadruplet Network
    Zhu, Shirong
    Zhang, Ying
    He, Kai
    Zhao, Lasheng
    IEEE SIGNAL PROCESSING LETTERS, 2022, 29 : 184 - 188