Diverse feature set based Keyphrase extraction and indexing techniques

被引:4
|
作者
Sharma, Saurabh [1 ]
Gupta, Vishal [1 ]
Juneja, Mamta [1 ]
机构
[1] Panjab Univ, Univ Inst Engn & Technol, Chandigarh, India
关键词
Keyphrase extraction; Word embedding; Keyphrase indexing; External knowledge; Free indexing; Natural language processing; DOCUMENT; TEXT; MODEL;
D O I
10.1007/s11042-020-09423-2
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The internet changed the way that people communicate, and this has led to a vast amount of Text that is available in electronic format. It includes things like e-mail, technical and scientific reports, tweets, physician notes and military field reports. Providing key-phrases for these extensive text collections thus allows users to grab the essence of the lengthy contents quickly and helps to locate information with high efficiency. While designing a Keyword Extraction and Indexing system, it is essential to pick unique properties, called features. In this article, we proposed different unsupervised keyword extraction approaches, which is independent of the structure, size and domain of the documents. The proposed method relies on the novel and cognitive inspired set of standard, phrase, word embedding and external knowledge source features. The individual and selected feature results are reported through experimentation on four different datasets viz. SemEval, KDD, Inspec, and DUC. The selected (feature selection) and word embedding based features are the best features set to be used for keywords extraction and indexing among all mentioned datasets. That is the proposed distributed word vector with additional knowledge improves the results significantly over the use of individual features, combined features after feature selection and state-of-the-art. After successfully achieving the objective of developing various keyphrase extraction methods we also experimented it for document classification task.
引用
收藏
页码:4111 / 4142
页数:32
相关论文
共 50 条
  • [1] Diverse feature set based Keyphrase extraction and indexing techniques
    Saurabh Sharma
    Vishal Gupta
    Mamta Juneja
    Multimedia Tools and Applications, 2021, 80 : 4111 - 4142
  • [2] Y-Rank: A Multi-Feature-Based Keyphrase Extraction Method for Short Text
    Liu, Qiang
    Hui, Yan
    Liu, Shangdong
    Ji, Yimu
    APPLIED SCIENCES-BASEL, 2024, 14 (06):
  • [3] Thesaurus based automatic keyphrase indexing
    Medelyan, Olena
    Witten, Ian H.
    OPENING INFORMATION HORIZONS, 2006, : 296 - +
  • [4] Automatic Keyphrase Extraction Techniques: A Review
    Lim, Vicky Min-How
    Wong, Siew Fan
    Lim, Tong Ming
    2013 IEEE SYMPOSIUM ON COMPUTERS AND INFORMATICS (ISCI 2013), 2013,
  • [5] A Keyphrase Extraction Method Based on Multi-feature Evaluation and Mask Mechanism
    Ma, Liwen
    Liu, Weifeng
    2022 11TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION AND INFORMATION SCIENCES (ICCAIS), 2022, : 164 - 170
  • [6] Keyphrase Distance Analysis Technique from News Articles as a Feature for Keyphrase Extraction: An Unsupervised Approach
    Miah, Mohammad Badrul Alam
    Awang, Suryanti
    Rahman, Md Mustafizur
    Hosen, A. S. M. Sanwar
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2023, 14 (10) : 995 - 1002
  • [7] Experiment Research on Feature Selection and Learning Method in Keyphrase Extraction
    Wang, Chen
    Li, Sujian
    Wang, Wei
    COMPUTER PROCESSING OF ORIENTAL LANGUAGES: LANGUAGE TECHNOLOGY FOR THE KNOWLEDGE-BASED ECONOMY, 2009, 5459 : 305 - 312
  • [8] Keyphrase Extraction Based on Prior Knowledge
    He, Guoxiu
    Fang, Junwei
    Cui, Haoran
    Wu, Chuan
    Lu, Wei
    JCDL'18: PROCEEDINGS OF THE 18TH ACM/IEEE JOINT CONFERENCE ON DIGITAL LIBRARIES, 2018, : 341 - 342
  • [9] A LDA-based approach to keyphrase extraction
    Department of Automation, University of Science and Technology of China, Hefei
    230026, China
    不详
    230031, China
    Zhongnan Daxue Xuebao (Ziran Kexue Ban), 6 (2142-2148): : 2142 - 2148
  • [10] The Hot Keyphrase Extraction based on TF*PDF
    Gao, Yan
    Liu, Jin
    Ma, PeiXun
    TRUSTCOM 2011: 2011 INTERNATIONAL JOINT CONFERENCE OF IEEE TRUSTCOM-11/IEEE ICESS-11/FCST-11, 2011, : 1524 - 1528