Feature Extraction based Text Classification using K-Nearest Neighbor Algorithm

被引:0
作者
Azam, Muhammad [1 ]
Ahmed, Tanvir [1 ]
Sabah, Fahad [1 ]
Hussain, Muhammad Iftikhar [2 ,3 ]
机构
[1] Super Univ Lahore, Dept Comp Sci & Informat Technol, Lahore, Pakistan
[2] Beijing Univ Technol, Fac Informat Technol, Beijing 100124, Peoples R China
[3] Beijing Univ Technol, Beijing Engn Res Ctr IoT Software & Syst, Beijing 100124, Peoples R China
来源
INTERNATIONAL JOURNAL OF COMPUTER SCIENCE AND NETWORK SECURITY | 2018年 / 18卷 / 12期
关键词
K-NN; naive bayes; text classification; rapid miner; feature extraction;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Scientific publications has been increasing enormously, with this increase classification of scientific publications is becoming challenging task. The core objective of this research is to analyze the performance of classification algorithms using Scopus dataset. In text classification, classification and feature extraction from the document using extracted features are the major issues for decreasing the performances in different algorithms. In this paper, performances of classification algorithms such as Naive Bayes (NB) and K-Nearest Neighbor (K-NN) shown better improvement using Bayesian boost and bagging. The performance results were analyzed through selected classification algorithms over 10K documents from Scopus examined using F-measure and produced comparison matrices to estimate accuracy, precision and recall using NB and KNN classifier. Further, data preprocessing and cleaning steps are induced on the selected dataset and class imbalance issues are analyzed to increase the performance of text classification algorithms. Experimental results showed performances over 7% using K-NN and revealed better as compared to NB.
引用
收藏
页码:95 / 101
页数:7
相关论文
共 50 条
[21]   KNNCC: An Algorithm for K-Nearest Neighbor Clique Clustering [J].
Qu Chao ;
Yuan Ruifen ;
Wei Xiaorui .
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS (ICMLC), VOLS 1-4, 2013, :1763-1766
[22]   Fault classification method based on fast k-nearest neighbor with hybrid feature generation and K-medoids clustering [J].
Zhou, Zhe ;
Zeng, Fanliang ;
Huang, Jiacheng ;
Zheng, Jinhui ;
Li, Zuxin .
2020 35TH YOUTH ACADEMIC ANNUAL CONFERENCE OF CHINESE ASSOCIATION OF AUTOMATION (YAC), 2020, :568-573
[23]   A new fast k-nearest neighbor classification algorithm in cognitive radio networks based on parallel computing [J].
Benmammar, Badr .
CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2021, 33 (05)
[24]   Early Stage Breast Cancer Detection System using GLCM feature extraction and K-Nearest Neighbor (k-NN) on Mammography image [J].
Htay, Than Than ;
Maung, Su Su .
2018 18TH INTERNATIONAL SYMPOSIUM ON COMMUNICATIONS AND INFORMATION TECHNOLOGIES (ISCIT), 2018, :171-175
[25]   Test Cost Reduction for VLSI Adaptive Test With K-Nearest Neighbor Classification Algorithm [J].
Song, Tai ;
Huang, Zhengfeng ;
Zhang, Li ;
Hong, Qi ;
Yang, Zhao ;
Krstic, Milos .
IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS II-EXPRESS BRIEFS, 2024, 71 (07) :3508-3512
[26]   Classification of EMG Signals by K-Nearest Neighbor Algorithm and Support Vector Machine Methods [J].
Kucuk, Hanife ;
Tepe, Cengiz ;
Eminoglu, Ilyas .
2013 21ST SIGNAL PROCESSING AND COMMUNICATIONS APPLICATIONS CONFERENCE (SIU), 2013,
[27]   An Efficient Slime Mould Algorithm Combined With K-Nearest Neighbor for Medical Classification Tasks [J].
Wazery, Yaser M. ;
Saber, Eman ;
Houssein, Essam H. ;
Ali, Abdelmgeid A. ;
Amer, Eslam .
IEEE ACCESS, 2021, 9 :113666-113682
[28]   Noisy data elimination using mutual k-nearest neighbor for classification mining [J].
Liu, Huawen ;
Zhang, Shichao .
JOURNAL OF SYSTEMS AND SOFTWARE, 2012, 85 (05) :1067-1074
[29]   Intelligent feature selection with modified K-nearest neighbor for kidney transplantation prediction [J].
Atallah, Dalia M. ;
Badawy, Mohammed ;
El-Sayed, Ayman .
SN APPLIED SCIENCES, 2019, 1 (10)
[30]   An Improved K-Nearest Neighbor Algorithm Using Tree Structure and Pruning Technology [J].
Li, Juan .
INTELLIGENT AUTOMATION AND SOFT COMPUTING, 2019, 25 (01) :35-48