Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

被引:1
作者
Khatun, Rubaya [1 ]
Sarkar, Arup [1 ]
机构
[1] Raiganj Univ, Coll Para, Dept Comp & Informat Sci, Univ Rd, Raiganj 733134, West Bengal, India
关键词
Parts of speech tagging; Word2Vector; Term frequency; Inverse average document frequency; Attention mechanism; Keyword extraction; TEXTRANK;
D O I
10.1007/s11042-024-18110-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.
引用
收藏
页码:68959 / 68991
页数:33
相关论文
共 21 条
[1]   Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data [J].
Abid, Muhammad Adeel ;
Mushtaq, Muhammad Faheem ;
Akram, Urooj ;
Abbasi, Mateen Ahmed ;
Rustam, Furqan .
MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2023, 42 (01) :88-94
[2]   Dwarf Mongoose Optimization Algorithm [J].
Agushaka, Jeffrey O. ;
Ezugwu, Absalom E. ;
Abualigah, Laith .
COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2022, 391
[3]   Applicability Analysis and Ensemble Application of BERT with TF-IDF, TextRank, MMR, and LDA for Topic Classification Based on Flood-Related VGI [J].
Du, Wenying ;
Ge, Chang ;
Yao, Shuang ;
Chen, Nengcheng ;
Xu, Lei .
ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (06)
[4]   Complex Network based Supervised Keyword Extractor [J].
Duari, Swagata ;
Bhatnagar, Vasudha .
EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
[5]   Keyword extraction: Issues and methods [J].
Firoozeh, Nazanin ;
Nazarenko, Adeline ;
Alizon, Fabrice ;
Daille, Beatrice .
NATURAL LANGUAGE ENGINEERING, 2020, 26 (03) :259-291
[6]   A survey on different dimensions for graphical keyword extraction techniques Issues and Challenges [J].
Garg, Muskan .
ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (06) :4731-4770
[7]   MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure [J].
Goz, Furkan ;
Mutlu, Alev .
KNOWLEDGE-BASED SYSTEMS, 2022, 251
[8]   Multifeature Fusion Keyword Extraction Algorithm Based on TextRank [J].
Guo, Wenming ;
Wang, Zihao ;
Han, Fang .
IEEE ACCESS, 2022, 10 :71805-71813
[9]  
Kabasakal O, 2021, Journal of Naval Sciences and Engineering, V17, P217
[10]   A comparative study of keyword extraction algorithms for English texts [J].
Li, Jinye .
JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) :808-815