Deep-KeywordNet: automated english keyword extraction in documents using deep keyword network based ranking

被引：1

作者：

Khatun, Rubaya ^{[1
]}

Sarkar, Arup ^{[1
]}

机构：

[1] Raiganj Univ, Coll Para, Dept Comp & Informat Sci, Univ Rd, Raiganj 733134, West Bengal, India

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2024年 / 83卷 / 27期

关键词：

Parts of speech tagging; Word2Vector; Term frequency; Inverse average document frequency; Attention mechanism; Keyword extraction; TEXTRANK;

D O I：

10.1007/s11042-024-18110-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

During the information retrieval process, individuals locate relevant web pages by entering specific keywords. Nevertheless, if users provide inaccurate keywords or if these keywords are absent from the intended page, the effectiveness of information retrieval will be significantly compromised. Thus, the role of keywords in text processing remains of utmost importance. Particularly in intricate contexts, relying on manual analysis by readers can prove to be both time-intensive and unfeasible. Most existing methods are addressed with limited accuracy, leading to elevated error rates and compromised training capabilities. To overcome these limitations, the proposed approach introduces an automated keyword extraction and ranking system based on deep learning. Several key stages, like data acquisition, pre-processing, tokenization, word-to-vector transformation, keyword classification, and ranking, are used. The effectiveness of this keyword extraction process is evaluated using 500N-KPCrowd, KPTimes, and KP20k datasets. During text pre-processing, eliminating stop words, applying Parts of Speech (PoS) tagging, stemming, and sentence segmentation are undertaken. The pre-processed text is fed into the Deep-KeywordNet model, while the pre-processed input is tokenized into individual words. The Word2Vec (W2V) Skip-gram embedding layer facilitates the categorization of distributed vector representations. The Attention Bidirectional Long Short-Term Memory Gated Convolutional Neural Network (Attn Bi-GCNN), along with the softmax layer, assign class labels, and the network's loss optimization employs the Dwarf Mongoose Algorithm (DMA). Significant keywords are ranked using the Term Frequency-Inverse Average Document Frequency (TF-IADF) model. Remarkably, the overall accuracy achieved through the implementation in PYTHON stands at 98.87%, with a minimized time complexity.

引用

页码：68959 / 68991

页数：33

共 21 条

[1] Comparative analysis of TF-IDF and loglikelihood method for keywords extraction of twitter data
Abid, Muhammad Adeel
Mushtaq, Muhammad Faheem
Akram, Urooj
Abbasi, Mateen Ahmed
Rustam, Furqan
[J]. MEHRAN UNIVERSITY RESEARCH JOURNAL OF ENGINEERING AND TECHNOLOGY, 2023, 42 (01) : 88 - 94
[2] Dwarf Mongoose Optimization Algorithm
Agushaka, Jeffrey O.
Ezugwu, Absalom E.
Abualigah, Laith
[J]. COMPUTER METHODS IN APPLIED MECHANICS AND ENGINEERING, 2022, 391
[3] Applicability Analysis and Ensemble Application of BERT with TF-IDF, TextRank, MMR, and LDA for Topic Classification Based on Flood-Related VGI
Du, Wenying
Ge, Chang
Yao, Shuang
Chen, Nengcheng
Xu, Lei
[J]. ISPRS INTERNATIONAL JOURNAL OF GEO-INFORMATION, 2023, 12 (06)
[4] Complex Network based Supervised Keyword Extractor
Duari, Swagata
Bhatnagar, Vasudha
[J]. EXPERT SYSTEMS WITH APPLICATIONS, 2020, 140
[5] Keyword extraction: Issues and methods
Firoozeh, Nazanin
Nazarenko, Adeline
Alizon, Fabrice
Daille, Beatrice
[J]. NATURAL LANGUAGE ENGINEERING, 2020, 26 (03) : 259 - 291
[6] A survey on different dimensions for graphical keyword extraction techniques Issues and Challenges
Garg, Muskan
[J]. ARTIFICIAL INTELLIGENCE REVIEW, 2021, 54 (06) : 4731 - 4770
[7] MGRank: A keyword extraction system based on multigraph GoW model and novel edge weighting procedure
Goz, Furkan
Mutlu, Alev
[J]. KNOWLEDGE-BASED SYSTEMS, 2022, 251
[8] Multifeature Fusion Keyword Extraction Algorithm Based on TextRank
Guo, Wenming
Wang, Zihao
Han, Fang
[J]. IEEE ACCESS, 2022, 10 : 71805 - 71813
[9] Kabasakal O., 2021, J Naval Sci Eng, V17, P217
[10] A comparative study of keyword extraction algorithms for English texts
Li, Jinye
[J]. JOURNAL OF INTELLIGENT SYSTEMS, 2021, 30 (01) : 808 - 815

← 1 2 3 →