Automatic text classification using machine learning and optimization algorithms

被引:16
|
作者
Janani, R. [1 ]
Vijayarani, S. [1 ]
机构
[1] Bharathiar Univ, Dept Comp Sci, Coimbatore, Tamil Nadu, India
关键词
Text mining; Information retrieval; Document classification; Content analysis; Feature selection; Bio-inspired algorithms; PSO; ACO; ABC; FA; OTFS algorithm; Machine learning algorithms; NB; KNN; SVM; PNN; MLearn-ATC; DOCUMENTS;
D O I
10.1007/s00500-020-05209-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the recent years, the volume of text documents in the form of digital way has grown up extremely in size. As significance, there is a need to be competent to automatically bring together and classify the documents based on their content. The main goal of text classification is to partition the unstructured set of documents into their respective categories based on its content. The main aim of this research work is to automatically classify the documents which are stored in the personal computer into their relevant categories. This work has two significant phases. In the first phase, the important features are selected for classification and the second phase is the classification of text documents. For selecting the optimal features, this research work proposes a new algorithm, optimization technique for feature selection (OTFS) algorithm. To estimate the proficiency of proposed feature selection algorithm, the OTFS algorithm was compared with the existing approaches artificial bee colony, firefly algorithm, ant colony optimization and particle swarm optimization. In the second phase, this research work proposed machine learning-based automatic text classification (MLearn-ATC) algorithm for text classification. In classification, the MLearn-ATC algorithm was compared with widely used classification techniques probabilistic neural network, support vector machine, K-nearest neighbor and Naive Bayes. From this, the output of first phase is used as the input for classification phase. The decisive results establish that the proposed algorithms achieve the better accuracy for optimizing the features and classifying the text documents based on their content.
引用
收藏
页码:1129 / 1145
页数:17
相关论文
共 50 条
  • [41] Performance Analysis of Machine Learning Algorithms for Thyroid Disease
    Abbad Ur Rehman, Hafiz
    Lin, Chyi-Yeu
    Mushtaq, Zohaib
    Su, Shun-Feng
    ARABIAN JOURNAL FOR SCIENCE AND ENGINEERING, 2021, 46 (10) : 9437 - 9449
  • [42] Performance Analysis of Machine Learning Algorithms for Thyroid Disease
    Hafiz Abbad Ur Rehman
    Chyi-Yeu Lin
    Zohaib Mushtaq
    Shun-Feng Su
    Arabian Journal for Science and Engineering, 2021, 46 : 9437 - 9449
  • [43] Comparison of Machine Learning Algorithms for Classification of Penaeid Prawn Species
    Sucharita, V.
    Jyothi, S.
    Rao, P. Venkateswara
    PROCEEDINGS OF THE 10TH INDIACOM - 2016 3RD INTERNATIONAL CONFERENCE ON COMPUTING FOR SUSTAINABLE GLOBAL DEVELOPMENT, 2016, : 1610 - 1613
  • [44] Multi-class classification of COVID-19 documents using machine learning algorithms
    Rabby, Gollam
    Berka, Petr
    JOURNAL OF INTELLIGENT INFORMATION SYSTEMS, 2023, 60 (02) : 571 - 591
  • [45] Automatic Screening of Diabetic Retinopathy Using Fundus Images and Machine Learning Algorithms
    Rahman, K. K. Mujeeb
    Nasor, Mohamed
    Imran, Ahmed
    DIAGNOSTICS, 2022, 12 (09)
  • [46] Automatic PDF Document Classification with Machine Learning
    Llacer Luna, Socrates
    Garigliotti, Dario
    Martinez Plumed, Fernando
    Ferri Ramirez, Cesar
    INTELLIGENT DATA ENGINEERING AND AUTOMATED LEARNING - IDEAL 2024, PT I, 2025, 15346 : 447 - 459
  • [47] Multi-class classification of COVID-19 documents using machine learning algorithms
    Gollam Rabby
    Petr Berka
    Journal of Intelligent Information Systems, 2023, 60 : 571 - 591
  • [48] Ransomware Classification and Detection With Machine Learning Algorithms
    Masum, Mohammad
    Faruk, Md Jobair Hossain
    Shahriar, Hossain
    Qian, Kai
    Lo, Dan
    Adnan, Muhaiminul Islam
    2022 IEEE 12TH ANNUAL COMPUTING AND COMMUNICATION WORKSHOP AND CONFERENCE (CCWC), 2022, : 316 - 322
  • [49] Comparison of Machine Learning Algorithms for Classification Problems
    Sekeroglu, Boran
    Hasan, Shakar Sherwan
    Abdullah, Saman Mirza
    ADVANCES IN COMPUTER VISION, VOL 2, 2020, 944 : 491 - 499
  • [50] Automatic Classification for Vulnerability Based on Machine Learning
    Shuai, Bo
    Li, Haifeng
    Li, Mengjun
    Zhang, Quan
    Tang, Chaojing
    2013 IEEE INTERNATIONAL CONFERENCE ON INFORMATION AND AUTOMATION (ICIA), 2013, : 312 - 318