Word sense disambiguation by learning from unlabeled data

被引:0
作者
Park, SB [1 ]
Zhang, BT [1 ]
Kim, YT [1 ]
机构
[1] Seoul Natl Univ, SCAI, Artificial Intelligence Lab, Sch Comp Sci & Engn, Seoul 151742, South Korea
来源
38TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, PROCEEDINGS OF THE CONFERENCE | 2000年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Most corpus-based approaches to natural language processing suffer from lack of training data. This is because acquiring a large number of labeled data is expensive. This paper describes a learning method that exploits unlabeled data to tackle data sparseness problem. The method uses committee learning to predict the labels of unlabeled data that augment the existing training data. Our experiments on word sense disambiguation show that predictive accuracy is significantly improved by using additional unlabeled data.
引用
收藏
页码:547 / 554
页数:8
相关论文
共 17 条
  • [1] ATSUSHI F, 1998, COMPUT LINGUIST, V24, P573
  • [2] Blum A., 1998, Proceedings of the Eleventh Annual Conference on Computational Learning Theory, P92, DOI 10.1145/279943.279962
  • [3] Bagging predictors
    Breiman, L
    [J]. MACHINE LEARNING, 1996, 24 (02) : 123 - 140
  • [4] CHO JM, 1995, P NAT LANG PROC PAC, P691
  • [5] DAGAN I, 1997, P 14 INT C MACH LEAR, P150
  • [6] FREUND Y, 1992, P NIPS 92, P483
  • [7] KIM N, 1996, J KISS, V23, P766
  • [8] Liere R., 1997, AAAI 97, P591
  • [9] THE WEIGHTED MAJORITY ALGORITHM
    LITTLESTONE, N
    WARMUTH, MK
    [J]. INFORMATION AND COMPUTATION, 1994, 108 (02) : 212 - 261
  • [10] Ng H.T., 1996, P 34 ANN M ASS COMPU, P40, DOI DOI 10.3115/981863.981869