Automatic text classification using machine learning and optimization algorithms

被引:15
|
作者
Janani, R. [1 ]
Vijayarani, S. [1 ]
机构
[1] Bharathiar Univ, Dept Comp Sci, Coimbatore, Tamil Nadu, India
关键词
Text mining; Information retrieval; Document classification; Content analysis; Feature selection; Bio-inspired algorithms; PSO; ACO; ABC; FA; OTFS algorithm; Machine learning algorithms; NB; KNN; SVM; PNN; MLearn-ATC; DOCUMENTS;
D O I
10.1007/s00500-020-05209-8
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In the recent years, the volume of text documents in the form of digital way has grown up extremely in size. As significance, there is a need to be competent to automatically bring together and classify the documents based on their content. The main goal of text classification is to partition the unstructured set of documents into their respective categories based on its content. The main aim of this research work is to automatically classify the documents which are stored in the personal computer into their relevant categories. This work has two significant phases. In the first phase, the important features are selected for classification and the second phase is the classification of text documents. For selecting the optimal features, this research work proposes a new algorithm, optimization technique for feature selection (OTFS) algorithm. To estimate the proficiency of proposed feature selection algorithm, the OTFS algorithm was compared with the existing approaches artificial bee colony, firefly algorithm, ant colony optimization and particle swarm optimization. In the second phase, this research work proposed machine learning-based automatic text classification (MLearn-ATC) algorithm for text classification. In classification, the MLearn-ATC algorithm was compared with widely used classification techniques probabilistic neural network, support vector machine, K-nearest neighbor and Naive Bayes. From this, the output of first phase is used as the input for classification phase. The decisive results establish that the proposed algorithms achieve the better accuracy for optimizing the features and classifying the text documents based on their content.
引用
收藏
页码:1129 / 1145
页数:17
相关论文
共 50 条
  • [21] Petrofacies classification using machine learning algorithms
    Silva A.A.
    Tavares M.W.
    Carrasquilla A.
    Misságia R.
    Ceia M.
    Silva, Adrielle A. (adrielle@lenep.uenf.br), 1600, Society of Exploration Geophysicists (85): : WA101 - WA113
  • [22] Automatic Vulnerability Classification Using Machine Learning
    Gawron, Marian
    Cheng, Feng
    Meinel, Christoph
    RISKS AND SECURITY OF INTERNET AND SYSTEMS, CRISIS 2017, 2018, 10694 : 3 - 17
  • [23] Automatic text summarization using a machine learning approach
    Neto, JL
    Freitas, AA
    Kaestner, CAA
    ADVANCES IN ARTIFICIAL INTELLIGENCE, PROCEEDINGS, 2002, 2507 : 205 - 215
  • [24] Feature selection and machine learning algorithms for uyghur text sentiment classification
    Turhuntay, Raxida
    Slamu, Wushour
    Dawut, Abdusalam
    Hamdulla, Askar
    Turhun, Erxat
    Boletin Tecnico/Technical Bulletin, 2017, 55 (13): : 56 - 66
  • [25] Clinical Text Classification with Word Representation Features and Machine Learning Algorithms
    Almazaydeh, Laiali
    Abuhelaleh, Mohammed
    Al Tawil, Arar
    Elleithy, Khaled
    INTERNATIONAL JOURNAL OF ONLINE AND BIOMEDICAL ENGINEERING, 2023, 19 (04) : 65 - 76
  • [26] Stemming Text-based Web Page Classification using Machine Learning Algorithms: A Comparison
    Razali, Ansari
    Daud, Salwani Mohd
    Zin, Nor Azan Mat
    Shahidi, Faezehsadat
    INTERNATIONAL JOURNAL OF ADVANCED COMPUTER SCIENCE AND APPLICATIONS, 2020, 11 (01) : 570 - 576
  • [27] Classification and Optimization Scheme for Text Data using Machine Learning Naive Bayes Classifier
    Venkatesh
    Ranjitha, K., V
    PROCEEDINGS OF 2018 IEEE WORLD SYMPOSIUM ON COMMUNICATION ENGINEERING (WSCE), 2018, : 33 - 36
  • [29] An exploration on text classification using machine learning techniques
    Athanasios, Tzimourtas
    Spyros, Bakalakos
    Panagiota, Tselenti
    Athanasios, Voulodimos
    25TH PAN-HELLENIC CONFERENCE ON INFORMATICS WITH INTERNATIONAL PARTICIPATION (PCI2021), 2021, : 247 - 249
  • [30] Domain Text Classification Using Machine Learning Models
    Rao, Akula V. S. Siva Rama
    Bhavani, D. Ganga
    Krishna, J. Gopi
    Swapna, B.
    Varma, K. Rama Sai
    PROCEEDINGS OF SECOND INTERNATIONAL CONFERENCE ON SUSTAINABLE EXPERT SYSTEMS (ICSES 2021), 2022, 351 : 573 - 582