Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

被引:1
作者
Khatun, Rubaya [1 ]
Sarkar, Arup [1 ]
机构
[1] Raiganj Univ, Coll Para, Dept Comp & Informat Sci, Univ Rd, Raiganj 733134, West Bengal, India
关键词
Parts of speech tagging; Word2Vector; Term frequency; Inverse average document frequency; Attention mechanism; Keyword extraction; TEXTRANK;
D O I
10.1007/s11042-024-18110-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.
引用
收藏
页码:68959 / 68991
页数:33
相关论文
共 21 条
  • [1] Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data
    Abid, Muhammad Adeel
    Mushtaq, Muhammad Faheem
    Akram, Urooj
    Abbasi, Mateen Ahmed
    Rustam, Furqan
    [J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2023, 42 (01) : 88 - 94
  • [2] Dwarf Mongoose Optimization Algorithm
    Agushaka, Jeffrey O.
    Ezugwu, Absalom E.
    Abualigah, Laith
    [J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2022, 391
  • [3] Applicability Analysis and Ensemble Application of BERT with TF-IDF, TextRank, MMR, and LDA for Topic Classification Based on Flood-Related VGI
    Du, Wenying
    Ge, Chang
    Yao, Shuang
    Chen, Nengcheng
    Xu, Lei
    [J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (06)
  • [4] Complex Network based Supervised Keyword Extractor
    Duari, Swagata
    Bhatnagar, Vasudha
    [J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
  • [5] Keyword extraction: Issues and methods
    Firoozeh, Nazanin
    Nazarenko, Adeline
    Alizon, Fabrice
    Daille, Beatrice
    [J]. NATURAL LANGUAGE ENGINEERING, 2020, 26 (03) : 259 - 291
  • [6] A survey on different dimensions for graphical keyword extraction techniques Issues and Challenges
    Garg, Muskan
    [J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (06) : 4731 - 4770
  • [7] MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure
    Goz, Furkan
    Mutlu, Alev
    [J]. KNOWLEDGE-BASED SYSTEMS, 2022, 251
  • [8] Multifeature Fusion Keyword Extraction Algorithm Based on TextRank
    Guo, Wenming
    Wang, Zihao
    Han, Fang
    [J]. IEEE ACCESS, 2022, 10 : 71805 - 71813
  • [9] Kabasakal O., 2021, J Naval Sci Eng, V17, P217
  • [10] A comparative study of keyword extraction algorithms for English texts
    Li, Jinye
    [J]. JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) : 808 - 815