News Topic Classification using Mutual Information and Bayesian Network

被引:0
作者
Nurfikri, Fahmi Salman [1 ]
Mubarok, Mohamad Syahrul [1 ]
Adiwijaya [1 ]
机构
[1] Telkom Univ, Sch Comp, Bandung, Indonesia
来源
2018 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT) | 2018年
关键词
News Topic; Text Classification; Bayesian Network; Mutual Information; Feature Selection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
News topic classification in this research is categorizing or distinguishing news, in textual data format, into a particular category based on information contained in the news. One of methods that can be used for this task is Bayesian Network that is one of uncertainty reasoning methods that uses probabilistic and directed acyclic graph to model conditional dependencies among variables. However, a textual data normally contains a considerable amount of variables and it could be problem for Bayesian Network since a large number of variables results high complexity, especially time complexity, in learning of Bayesian Network both structure and parameters. In addition, a considerable amount of variables could degrade accuracy since some variables might be irrelevant. In this research, we used Mutual Information as text feature selection method to provide relevant features for Bayesian Network classifier. Based on the conducted research, Mutual information as feature selector is able to improve classification performance of Bayesian Network. The highest classification rate obtained by employing Mutual Information is 75.34%, meanwhile the classification rate without Mutual Information is 45.95%, both in micro-average F1-score.
引用
收藏
页码:162 / 166
页数:5
相关论文
共 12 条
[1]  
Adiwijaya, 2016, MATEMATIKA DISKRIT D
[2]  
Adiwijaya, 2014, APL MATR RUANG VEKT
[3]  
Adriani Mirna, 2007, ACM Trans. Asian Lang. Inf. Process., V6, P1, DOI [10.1145/1316457.1316459, DOI 10.1145/1316457.1316459]
[4]  
[Anonymous], 2003, THESIS U AMSTERDAM A
[5]  
Aziz R. A., 2016, IND S COMP IND, P139
[6]   Bayesian network models for hierarchical text classification from a thesaurus [J].
de Campos, Luis M. ;
Romero, Alfonso E. .
INTERNATIONAL JOURNAL OF APPROXIMATE REASONING, 2009, 50 (07) :932-944
[7]   Aspect-based Sentiment Analysis to Review Products Using Naive Bayes [J].
Mubarok, Mohamad Syahrul ;
Adiwijaya ;
Aldhi, Muhammad Dwi .
INTERNATIONAL CONFERENCE ON MATHEMATICS: PURE, APPLIED AND COMPUTATION: EMPOWERING ENGINEERING USING MATHEMATICS, 2017, 1867
[8]   On the Feature Selection and Classification Based on Information Gain for Document Sentiment Analysis [J].
Pratiwi, Asriyanti Indah ;
Adiwijaya .
APPLIED COMPUTATIONAL INTELLIGENCE AND SOFT COMPUTING, 2018, 2018
[9]  
Purbolaksono M. D., 2017, EPROCEEDINGS ENG, V4
[10]  
Saputra Adiwijaya A. D., 2017, EPROCEEDINGS ENG, V4