Semantic Representation in Text Classification Using Topic Signature Mapping

被引:1
作者
Achananuparp, Palakorn [1 ]
Zhou, Xiaohua [1 ]
Hu, Xiaohua [1 ]
Zhang, Xiaodan [1 ]
机构
[1] Drexel Univ, Coll Informat Sci & Technol, Philadelphia, PA 19104 USA
来源
2008 IEEE INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1-8 | 2008年
关键词
D O I
10.1109/IJCNN.2008.4633926
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Document representation is one of the crucial components that determine the effectiveness of text classification tasks. Traditional document representation approaches typically adopt a popular bag-of-word method as the underlying document representation. Although it's a simple and efficient method, the major shortcoming of bag-of-word representation is in the independent of word feature assumption. Many researchers have attempted to address this issue by incorporating semantic information into document representation. In this paper, we study the effect of semantic representation on the effectiveness of text classification systems. We employed a novel semantic smoothing technique to derive semantic information in a form of mapping probability between topic signatures and single-word features. Two classifiers, Naive Bayes and Support Vector Machine, were selected to carry out the classification experiments. Overall, our topic-signature semantic representation approaches significantly outperformed traditional bag-of-word representation in most datasets.
引用
收藏
页码:1034 / 1040
页数:7
相关论文
共 24 条
  • [1] [Anonymous], P 24 ANN INT ACM SIG, DOI DOI 10.1145/383952.384019
  • [2] Bekkerman R., 2003, Journal of Machine Learning Research, V3, P1183, DOI 10.1162/153244303322753625
  • [3] Berger A, 1999, SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P222, DOI 10.1145/312624.312681
  • [4] BLOEHDORN S, 2004, WORKSH TEXT BAS INF
  • [5] Cai L., 2003, P 26 ANN INT ACM SIG, P182
  • [6] DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
  • [7] 2-9
  • [8] Hotho A, 2003, THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, P541
  • [9] Hu X., 2007, P 19 IEEE INT C TOOL
  • [10] Jelinek F., 1980, Pattern Recognition in Practice. Proceedings of an International Workshop, P381