Application of improved distributed naive Bayesian algorithms in text classification

被引:0
作者
Hongyi Gao
Xi Zeng
Chunhua Yao
机构
[1] China Electronics Technology Group Corporation,
来源
The Journal of Supercomputing | 2019年 / 75卷
关键词
Distributed; Naive Bayesian algorithm; Text classification; Feature selection;
D O I
暂无
中图分类号
学科分类号
摘要
The naive Bayes classifier is a widely used text classification method that applies statistical theory to text classification. Due to the particularity of the text, related feature items may generate new semantic information, which may be lost when the traditional vector space model represents text. This paper mainly studies the construction and improvement of distributed naive Bayes automatic classification system. The application of Hadoop cloud computing in web page classification is one of the focuses of this article. Firstly, the text classification system and Bayesian classification model are analyzed and discussed, including the representation and extraction of text information, text classification methods and Bayesian text classification methods. Then, in view of the shortcomings of the above-mentioned naive Bayesian text classification method, when training text, we use the mutual information method to check the correlation between the feature sets generated after feature selection, and then combine the features with higher correlation degree appropriately. Through a series of tests, the experimental data show that the improved text classification system can achieve better classification results.
引用
收藏
页码:5831 / 5847
页数:16
相关论文
共 33 条
[1]  
Xu J(2014)Study of network public opinion classification method based on naive bayesian algorithm in hadoop environment Appl Mech Mater 519–520 4-39
[2]  
Ma B(2016)Deep feature weighting for naive Bayes and its application to text classification Eng Appl Artif Intell 52 26-360
[3]  
Jiang L(2018)Improved side information generation algorithm based on naive Bayesian theory for distributed video coding IET Image Process 12 354-768
[4]  
Li C(2015)A text mining based approach for web service classification Inf Syst e-Bus Manag 13 751-199
[5]  
Wang S(2017)Using differential evolution for fine tuning naïve Bayesian classifiers and its application for text classification Appl Soft Comput 54 183-144
[6]  
Cao Y(2014)Generalized Dirichlet priors for Naive Bayesian classifiers with multinomial models in document classification Data Min Knowl Discov 28 123-456
[7]  
Sun L(2014)Varying Naïve Bayes models with applications to classification of chinese text documents J Bus Econ Stat 32 445-1371
[8]  
Han C(2013)Improved algorithm for learning hidden Naive Bayes J Chin Comput Syst 21 1361-223
[9]  
Nisa R(2016)Distributed multi-human location algorithm using Naive Bayes classifier for a binary pyroelectric infrared sensor tracking system IEEE Sens J 16 216-310
[10]  
Qamar U(2015)Accelerated image classification algorithm based on naive Bayes K-nearest neighbor Beijing Hangkong Hangtian Daxue Xuebao/J Beijing Univ Aeronaut Astronaut 41 302-89