Automatic text classification using machine learning and optimization algorithms

被引:16
|
作者
Janani, R. [1 ]
Vijayarani, S. [1 ]
机构
[1] Bharathiar Univ, Dept Comp Sci, Coimbatore, Tamil Nadu, India
关键词
Text mining; Information retrieval; Document classification; Content analysis; Feature selection; Bio-inspired algorithms; PSO; ACO; ABC; FA; OTFS algorithm; Machine learning algorithms; NB; KNN; SVM; PNN; MLearn-ATC; DOCUMENTS;
D O I
10.1007/s00500-020-05209-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the recent years, the volume of text documents in the form of digital way has grown up extremely in size. As significance, there is a need to be competent to automatically bring together and classify the documents based on their content. The main goal of text classification is to partition the unstructured set of documents into their respective categories based on its content. The main aim of this research work is to automatically classify the documents which are stored in the personal computer into their relevant categories. This work has two significant phases. In the first phase, the important features are selected for classification and the second phase is the classification of text documents. For selecting the optimal features, this research work proposes a new algorithm, optimization technique for feature selection (OTFS) algorithm. To estimate the proficiency of proposed feature selection algorithm, the OTFS algorithm was compared with the existing approaches artificial bee colony, firefly algorithm, ant colony optimization and particle swarm optimization. In the second phase, this research work proposed machine learning-based automatic text classification (MLearn-ATC) algorithm for text classification. In classification, the MLearn-ATC algorithm was compared with widely used classification techniques probabilistic neural network, support vector machine, K-nearest neighbor and Naive Bayes. From this, the output of first phase is used as the input for classification phase. The decisive results establish that the proposed algorithms achieve the better accuracy for optimizing the features and classifying the text documents based on their content.
引用
收藏
页码:1129 / 1145
页数:17
相关论文
共 50 条
  • [21] Automatic Electronic Invoice Classification Using Machine Learning Models
    Bardelli, Chiara
    Rondinelli, Alessandro
    Vecchio, Ruggero
    Figini, Silvia
    MACHINE LEARNING AND KNOWLEDGE EXTRACTION, 2020, 2 (04): : 617 - 629
  • [22] Application of machine learning algorithms for SCG signal classification
    Natalia, Konnova
    Mikhail, Basarab
    Vera, Khaperskaya
    2020 INTERNATIONAL CONFERENCE ON IMAGE, VIDEO PROCESSING AND ARTIFICIAL INTELLIGENCE, 2020, 11584
  • [23] Comprehensive DDoS Attack Classification Using Machine Learning Algorithms
    Ussatova, Olga
    Zhumabekova, Aidana
    Begimbayeva, Yenlik
    Matson, Eric T.
    Ussatov, Nikita
    CMC-COMPUTERS MATERIALS & CONTINUA, 2022, 73 (01): : 577 - 594
  • [24] Comparative Analysis of Different Machine Learning Algorithms in Classification
    Wang, Lincong
    Xu, Weiwen
    Zhu, Zhenghao
    2022 INTERNATIONAL CONFERENCE ON BIG DATA, INFORMATION AND COMPUTER NETWORK (BDICN 2022), 2022, : 257 - 263
  • [25] Machine Learning Based Automatic Classification of Customer Sentiment
    Hasan, Tonmoy
    Matin, Abdul
    Joy, M. Shakif Rahman
    2020 23RD INTERNATIONAL CONFERENCE ON COMPUTER AND INFORMATION TECHNOLOGY (ICCIT 2020), 2020,
  • [26] Classification of Spam Mail using different machine learning algorithms
    Shrivastava, Aditya
    Dubey, Rachana
    2018 INTERNATIONAL CONFERENCE ON ADVANCED COMPUTATION AND TELECOMMUNICATION (ICACAT), 2018,
  • [27] Machine Learning Algorithms for Document Classification: Comparative Analysis
    Rashid, Faizur
    Gargaare, Suleiman M. A.
    Aden, Abdulkadir H.
    Abdi, Afendi
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2022, 13 (04) : 260 - 265
  • [28] Bearing Fault Classification of Induction Motor Using Statistical Features and Machine Learning Algorithms
    Toma, Rafia Nishat
    Kim, Jong-myon
    INTELLIGENT SYSTEMS DESIGN AND APPLICATIONS, ISDA 2021, 2022, 418 : 243 - 254
  • [29] Quantification of Cartilage loss for Automatic Detection and Classification of Osteoarthritis using Machine Learning approach
    Kumar, Abhinav
    Saxena, Priyank
    2019 10TH INTERNATIONAL CONFERENCE ON COMPUTING, COMMUNICATION AND NETWORKING TECHNOLOGIES (ICCCNT), 2019,
  • [30] Parkinson's disease classification using machine learning algorithms: performance analysis and comparison
    Ouhmida, Asmae
    Raihani, Abdelhadi
    Cherradi, Bouchaib
    Lamalem, Yasser
    2022 2ND INTERNATIONAL CONFERENCE ON INNOVATIVE RESEARCH IN APPLIED SCIENCE, ENGINEERING AND TECHNOLOGY (IRASET'2022), 2022, : 606 - 611