Diverse feature set based Keyphrase extraction and indexing techniques

被引:4
作者
Sharma, Saurabh [1 ]
Gupta, Vishal [1 ]
Juneja, Mamta [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh, India
关键词
Keyphrase extraction; Word embedding; Keyphrase indexing; External knowledge; Free indexing; Natural language processing; DOCUMENT; TEXT; MODEL;
D O I
10.1007/s11042-020-09423-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task.
引用
收藏
页码:4111 / 4142
页数:32
相关论文
共 50 条
  • [41] Exploring Word Embeddings in CRF-based Keyphrase Extraction from Research Papers
    Patel, Krutarth
    Caragea, Cornelia
    PROCEEDINGS OF THE 10TH INTERNATIONAL CONFERENCE ON KNOWLEDGE CAPTURE (K-CAP '19), 2019, : 37 - 44
  • [42] Bert-Based Chinese Medical Keyphrase Extraction Model Enhanced with External Features
    Ding, Liangping
    Zhang, Zhixiong
    Zhao, Yang
    TOWARDS OPEN AND TRUSTWORTHY DIGITAL SOCIETIES, ICADL 2021, 2021, 13133 : 167 - 176
  • [43] TermITH-Eval: a French Standard-Based Resource for Keyphrase Extraction Evaluation
    Bougouin, Adrien
    Barreaux, Sabine
    Romary, Laurent
    Boudin, Florian
    Daille, Beatrice
    LREC 2016 - TENTH INTERNATIONAL CONFERENCE ON LANGUAGE RESOURCES AND EVALUATION, 2016, : 1924 - 1927
  • [44] Thesaurus-Based Method of Increasing Text-via-Keyphrase Graph Connectivity During Keyphrase Extraction for e-Tourism Applications
    Paramonov, Ilya
    Lagutina, Ksenia
    Mamedov, Eldar
    Lagutina, Nadezhda
    KNOWLEDGE ENGINEERING AND SEMANTIC WEB, KESW 2016, 2016, 649 : 129 - 141
  • [45] Evaluating Word Embedding Feature Extraction Techniques for Host-Based Intrusion Detection Systems
    Paul K. Mvula
    Paula Branco
    Guy-Vincent Jourdan
    Herna L. Viktor
    Discover Data, 1 (1):
  • [46] Survey of feature selection and extraction techniques for stock market prediction
    Htun, Htet Htet
    Biehl, Michael
    Petkov, Nicolai
    FINANCIAL INNOVATION, 2023, 9 (01)
  • [47] Domain-Specific Keyphrase Extraction and Near-Duplicate Article Detection based on Ontology
    Nhon Do
    LongVan Ho
    2015 IEEE RIVF INTERNATIONAL CONFERENCE ON COMPUTING & COMMUNICATION TECHNOLOGIES - RESEARCH, INNOVATION, AND VISION FOR THE FUTURE (RIVF), 2015, : 123 - 126
  • [48] A supervised keyphrase extraction method based on the logistic regression model for social question answering sites
    Lin, Ge
    Xiang, Yi
    Wang, Zhong
    Wang, Ruomei
    Journal of Information and Computational Science, 2014, 11 (10): : 3571 - 3583
  • [49] Performance Analysis of Graph based Keyphrase Extraction metrics for uncertain User-generated data
    Garg, Muskan
    Kumar, Mukesh
    8TH INTERNATIONAL CONFERENCE ON ADVANCES IN COMPUTING & COMMUNICATIONS (ICACC-2018), 2018, 143 : 419 - 425
  • [50] RankUp: Enhancing graph-based keyphrase extraction methods with error-feedback propagation
    Figueroa, Gerardo
    Chen, Po-Chi
    Chen, Yi-Shin
    COMPUTER SPEECH AND LANGUAGE, 2018, 47 : 112 - 131