Diverse feature set based Keyphrase extraction and indexing techniques

被引:4
作者
Sharma, Saurabh [1 ]
Gupta, Vishal [1 ]
Juneja, Mamta [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh, India
关键词
Keyphrase extraction; Word embedding; Keyphrase indexing; External knowledge; Free indexing; Natural language processing; DOCUMENT; TEXT; MODEL;
D O I
10.1007/s11042-020-09423-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task.
引用
收藏
页码:4111 / 4142
页数:32
相关论文
共 50 条
  • [31] Event-Oriented Keyphrase Extraction Based on Bi-clustering Model
    Zhao, Lin
    Zang, Liangjun
    Huang, Longtao
    Han, Jizhong
    Hu, Songlin
    COMPUTATIONAL SCIENCE - ICCS 2019, PT V, 2019, 11540 : 207 - 220
  • [32] NE-Rank: A Novel Graph-based Keyphrase Extraction in Twitter
    Bellaachia, Abdelghani
    Al-Dhelaan, Mohammed
    2012 IEEE/WIC/ACM INTERNATIONAL CONFERENCE ON WEB INTELLIGENCE AND INTELLIGENT AGENT TECHNOLOGY (WI-IAT 2012), VOL 1, 2012, : 372 - 379
  • [33] Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction
    Hassan Alrehamy
    Coral Walker
    Soft Computing, 2018, 22 : 7041 - 7057
  • [34] Exploiting extensible background knowledge for clustering-based automatic keyphrase extraction
    Alrehamy, Hassan
    Walker, Coral
    SOFT COMPUTING, 2018, 22 (21) : 7041 - 7057
  • [35] WAGRank: A word ranking model based on word attention graph for keyphrase extraction
    Bian, Rong
    Cheng, Bing
    INTELLIGENT DATA ANALYSIS, 2024,
  • [36] Keyphrase extraction from Chinese news web pages based on semantic relations
    Xie, Fei
    Wu, Xindong
    Hu, Xue-Gang
    Wang, Fei-Yue
    INTELLIGENCE AND SECURITY INFORMATICS, PROCEEDINGS, 2008, 5075 : 490 - +
  • [37] A Graph-Based Keyphrase Extraction Model with Three-Way Decision
    Chen, Tianlei
    Miao, Duoqian
    Zhang, Yuebing
    ROUGH SETS, IJCRS 2020, 2020, 12179 : 111 - 121
  • [38] Local pattern transformation based feature extraction techniques for classification of epileptic EEG signals
    Jaiswal, Abeg Kumar
    Banka, Haider
    BIOMEDICAL SIGNAL PROCESSING AND CONTROL, 2017, 34 : 81 - 92
  • [39] A New Learning-to-Rank Framework for Keyphrase Extraction Using Multi-scale Ratings and Feature Fusion
    Florescu, Corina
    Shil, Avijeet
    Jin, Wei
    WEB AND BIG DATA, APWEB-WAIM 2024, PT V, 2024, 14965 : 63 - 79
  • [40] Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers
    Patel, Krutarth
    Caragea, Cornelia
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 37 - 44