A comparative study on the use of labeled and unlabeled data for large margin classifiers

被引:0
作者
Takamura, H [1 ]
Okumura, M [1 ]
机构
[1] Tokyo Inst Technol, Precis & Intelligence Lab, Midori Ku, Yokohama, Kanagawa 2268503, Japan
来源
NATURAL LANGUAGE PROCESSING - IJCNLP 2004 | 2005年 / 3248卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose to use both labeled and unlabeled data with the Expectation-Maximization (EM) algorithm in order to estimate the generative model and use this model to construct a Fisher kernel. The Naive Bayes generative probability is used to model a document. Through the experiments of text categorization, we empirically show that, (a) the Fisher kernel with labeled and unlabeled data outperforms Naive Bayes classifiers with EM and other methods for a sufficient amount of labeled data, (b) the value of additional unlabeled data diminishes when the labeled data size is large enough for estimating a reliable model, (c) the use of categories as latent variables is effective, and (d) larger unlabeled training datasets yield better results.
引用
收藏
页码:456 / 465
页数:10
相关论文
共 17 条
  • [1] [Anonymous], 1997, Geometrical Foundations of Asymptotic Inference
  • [2] MAXIMUM LIKELIHOOD FROM INCOMPLETE DATA VIA EM ALGORITHM
    DEMPSTER, AP
    LAIRD, NM
    RUBIN, DB
    [J]. JOURNAL OF THE ROYAL STATISTICAL SOCIETY SERIES B-METHODOLOGICAL, 1977, 39 (01): : 1 - 38
  • [3] HERBRICH R, 2000, ADV NEURAL INFORMATI, V12, P224
  • [4] Hofmann T, 2000, ADV NEUR IN, V12, P914
  • [5] Hofmann T., 1998, STAT MODELS COOCCURR
  • [6] Jaakkola TS, 1999, ADV NEUR IN, V11, P487
  • [7] Joachims T, 1999, MACHINE LEARNING, PROCEEDINGS, P200
  • [8] Joachims T., 1998, Lecture Notes in Computer Science, P137, DOI DOI 10.1007/BFB0026683
  • [9] Cardiac involvement of female carrier of Duchenne muscular dystrophy
    Kamakura, K
    [J]. INTERNAL MEDICINE, 2000, 39 (01) : 2 - 3
  • [10] Kressel UHG, 1999, ADVANCES IN KERNEL METHODS, P255