Diverse feature set based Keyphrase extraction and indexing techniques

被引:4
|
作者
Sharma, Saurabh [1 ]
Gupta, Vishal [1 ]
Juneja, Mamta [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh, India
关键词
Keyphrase extraction; Word embedding; Keyphrase indexing; External knowledge; Free indexing; Natural language processing; DOCUMENT; TEXT; MODEL;
D O I
10.1007/s11042-020-09423-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task.
引用
收藏
页码:4111 / 4142
页数:32
相关论文
共 50 条
  • [21] Precursory Pattern Based Feature Extraction Techniques for Earthquake Prediction
    Zhang, Lei
    Si, Langchun
    Yang, Haipeng
    Hu, Yuanzhi
    Qiu, Jianfeng
    IEEE ACCESS, 2019, 7 : 30991 - 31001
  • [22] Feature Extraction for Human Motion Indexing of Acted Dance Performances
    Aristidou, Andreas
    Chrysanthou, Yiorgos
    2014 PROCEEDINGS OF THE 9TH INTERNATIONAL CONFERENCE ON COMPUTER GRAPHICS THEORY AND APPLICATIONS (GRAPP 2014), 2014, : 277 - 287
  • [23] Keyphrase extraction-based query expansion in digital libraries
    Song, Min
    Song, Il-Yeol
    Allen, Robert B.
    Obradovic, Zoran
    OPENING INFORMATION HORIZONS, 2006, : 202 - +
  • [24] Patent Keyphrase Extraction Based on Patent Term and Layer Information
    Yan Y.
    Li W.
    Siyu Z.
    Data Analysis and Knowledge Discovery, 2023, 7 (06) : 99 - 112
  • [25] Keyphrase extraction for legal questions based on a sequence to sequence model
    Zeng D.
    Tong G.
    Dai Y.
    Li F.
    Han B.
    Xie S.
    Qinghua Daxue Xuebao/Journal of Tsinghua University, 2019, 59 (04): : 256 - 261
  • [26] Automatic Keyphrase Extraction using Graph-based Methods
    Mothe, Josiane
    Ramiandrisoa, Faneva
    Rasolomanana, Michael
    33RD ANNUAL ACM SYMPOSIUM ON APPLIED COMPUTING, 2018, : 728 - 730
  • [27] A Semantic-Based Approach for Keyphrase Extraction from Vietnamese Documents Using Thematic Vector
    Linh Viet Le
    Tho Thi Ngoc Le
    INTELLIGENT INFORMATION AND DATABASE SYSTEMS, ACIIDS 2022, PT I, 2022, 13757 : 416 - 427
  • [28] Automatic keyphrase extraction for Arabic news documents based on KEA system
    Duwairi, Rehab
    Hedaya, Mona
    JOURNAL OF INTELLIGENT & FUZZY SYSTEMS, 2016, 30 (04) : 2101 - 2110
  • [29] Keyphrase Extraction Based on Optimized Random Walks on Multiple Word Relations
    Chen, Wenyan
    Liu, Zheng
    Shi, Wei
    Yu, Jeffrey Xu
    WEB AND BIG DATA (APWEB-WAIM 2018), PT II, 2018, 10988 : 359 - 367
  • [30] Keyphrases Concentrated Area Identification from Academic Articles as Feature of Keyphrase Extraction: A New Unsupervised Approach
    Miah, Mohammad Badrul Alam
    Awang, Suryanti
    Azad, Md Saiful
    Rahman, Md Mustafizur
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (01) : 788 - 796