Estimating a one -class naive Bayes text classifier

被引:6
作者
Zhang, Yihong [1 ]
Jatowt, Adam [2 ]
机构
[1] Osaka Univ, Grad Sch Informat Sci & Technol, Dept Multimedia Engn, Osaka 5650871, Japan
[2] Kyoto Univ, Grad Sch Informat, Dept Social Informat, Kyoto 6068501, Japan
关键词
Machine learning; naive Bayes; one class classifier;
D O I
10.3233/IDA-194669
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Nowadays more and more information extraction projects need to classify large amounts of text data. The common way to classify text is to build a supervised classifier trained on human-labeled positive and negative examples. In many cases, however, it is easy to label positive examples, but hard to label negative examples. In this paper, we address the problem of building a one-class classifier when only the positive examples are labeled. Previous works on building one-class classifier mostly use positive examples and unlabeled data. In this paper, we show that a configurable one-class classifier such as one-class naive Bayes can be optimized by examining the clustering quality of the classification on target data. We propose to use existing and new quality scores for determining clustering quality of the classification. Experimental analysis with real-world data show that our approach generally achieves high classification accuracy, and in some cases improves the accuracy by more than 10% compared to state-of-art baselines. © 2020 - IOS Press and the authors. All rights reserved.
引用
收藏
页码:567 / 579
页数:13
相关论文
共 28 条
  • [1] Alan S., 2011, Organizational Behaviour: Understanding and Managing Life at Work, V7
  • [2] [Anonymous], 2013, PROC AAAI WORKSHOP E
  • [3] [Anonymous], 2010, Proceedings of the 19th international conference on World wide web, WWW'10, DOI [10.1145/1772690.1772777, 10.1145/ 1772690.1772777]
  • [4] Bekker J, 2018, AAAI CONF ARTIF INTE, P2712
  • [5] Building text classifiers using positive and unlabeled examples
    Bing, L
    Yang, D
    Li, XL
    Lee, WS
    Yu, PS
    [J]. THIRD IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2003, : 179 - 186
  • [6] Learning Bayesian classifiers from positive and unlabeled examples
    Calvo, Boria
    Larranaga, Pedro
    Lozano, Jose A.
    [J]. PATTERN RECOGNITION LETTERS, 2007, 28 (16) : 2375 - 2384
  • [7] LIBSVM: A Library for Support Vector Machines
    Chang, Chih-Chung
    Lin, Chih-Jen
    [J]. ACM TRANSACTIONS ON INTELLIGENT SYSTEMS AND TECHNOLOGY, 2011, 2 (03)
  • [8] Denis F., 2003, Proceedings of the ICML 2003 workshop: the continuum from labeled to unlabeled data, P80
  • [9] Elkan C., 2008, P 14 ACM SIGKDD INT, P213
  • [10] He J., 2010, P 10 SIAM INT C DATA, P361