Hybrid Fruit-Fly Optimization Algorithm with K-Means for Text Document Clustering

被引:80
作者
Bezdan, Timea [1 ]
Stoean, Catalin [2 ]
Al Naamany, Ahmed [3 ]
Bacanin, Nebojsa [1 ]
Rashid, Tarik A. [4 ]
Zivkovic, Miodrag [1 ]
Venkatachalam, K. [5 ]
机构
[1] Singidunum Univ, Fac Informat & Comp, Danijelova 32, Belgrade 11010, Serbia
[2] Univ Bucharest, Human Language Technol Res Ctr, Bucharest 010014, Romania
[3] Modern Coll Business & Sci, Dept Math & Comp Sci, Muscat 113, Oman
[4] Univ Kurdistan Hewler, Comp Sci & Engn Dept, Erbil 44001, Iraq
[5] CHRIST, Dept Comp Sci & Engn, Bangalore 560029, Karnataka, India
关键词
machine learning; text document clustering; metaheuristic algorithms; fruit-fly optimization algorithm; K-means;
D O I
10.3390/math9161929
中图分类号
O1 [数学];
学科分类号
0701 ; 070101 ;
摘要
The fast-growing Internet results in massive amounts of text data. Due to the large volume of the unstructured format of text data, extracting relevant information and its analysis becomes very challenging. Text document clustering is a text-mining process that partitions the set of text-based documents into mutually exclusive clusters in such a way that documents within the same group are similar to each other, while documents from different clusters differ based on the content. One of the biggest challenges in text clustering is partitioning the collection of text data by measuring the relevance of the content in the documents. Addressing this issue, in this work a hybrid swarm intelligence algorithm with a K-means algorithm is proposed for text clustering. First, the hybrid fruit-fly optimization algorithm is tested on ten unconstrained CEC2019 benchmark functions. Next, the proposed method is evaluated on six standard benchmark text datasets. The experimental evaluation on the unconstrained functions, as well as on text-based documents, indicated that the proposed approach is robust and superior to other state-of-the-art methods.
引用
收藏
页数:19
相关论文
共 74 条
[1]   A novel hybrid multi-verse optimizer with K-means for text documents clustering [J].
Abasi, Ammar Kamal ;
Khader, Ahamad Tajudin ;
Al-Betar, Mohammed Azmi ;
Naim, Syibrah ;
Alyasseri, Zaid Abdi Alkareem ;
Makhadmeh, Sharif Naser .
NEURAL COMPUTING & APPLICATIONS, 2020, 32 (23) :17703-17729
[2]  
Abualigah LM, 2016, 2016 IEEE SYMPOSIUM ON COMPUTER APPLICATIONS & INDUSTRIAL ELECTRONICS (ISCAIE), P67, DOI 10.1109/ISCAIE.2016.7575039
[3]  
Agrawal R., 1998, SIGMOD Record, V27, P94, DOI 10.1145/276305.276314
[4]  
Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
[5]  
Bacanin N., P INT C INT DAT ENG, P437
[6]  
Bacanin N., P 2019 27 TEL FOR TE, P1
[7]   Dropout Probability Estimation in Convolutional Neural Networks by the Enhanced Bat Algorithm [J].
Bacanin, Nebojsa ;
Tuba, Eva ;
Bezdan, Timea ;
Strumberger, Ivana ;
Jovanovic, Raka ;
Tuba, Milan .
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[8]   Optimized convolutional neural network by firefly algorithm for magnetic resonance image classification of glioma brain tumor grade [J].
Bacanin, Nebojsa ;
Bezdan, Timea ;
Venkatachalam, K. ;
Al-Turjman, Fadi .
JOURNAL OF REAL-TIME IMAGE PROCESSING, 2021, 18 (04) :1085-1098
[9]   Monarch Butterfly Optimization Based Convolutional Neural Network Design [J].
Bacanin, Nebojsa ;
Bezdan, Timea ;
Tuba, Eva ;
Strumberger, Ivana ;
Tuba, Milan .
MATHEMATICS, 2020, 8 (06)
[10]   Optimizing Convolutional Neural Network Hyperparameters by Enhanced Swarm Intelligence Metaheuristics [J].
Bacanin, Nebojsa ;
Bezdan, Timea ;
Tuba, Eva ;
Strumberger, Ivana ;
Tuba, Milan .
ALGORITHMS, 2020, 13 (03)