Selecting Training Data for Unsupervised Domain Adaptation in Word Sense Disambiguation

被引:0
作者
Komiya, Kanako [1 ]
Sasaki, Minoru [1 ]
Shinnou, Hiroyuki [1 ]
Kotani, Yoshiyuki [2 ]
Okumura, Manabu [3 ]
机构
[1] Ibaraki Univ, 4-12-1 Nakanarusawa, Hitachi, Ibaraki 3168511, Japan
[2] Tokyo Univ Agr & Thechnol, 2-24-16 Naka Cho, Koganei, Tokyo 1848588, Japan
[3] Tokyo Inst Technol, Midori Ku, 4259 Nagatuta, Yokohama, Kanagawa 2268503, Japan
来源
PRICAI 2016: TRENDS IN ARTIFICIAL INTELLIGENCE | 2016年 / 9810卷
关键词
Domain adaptation; Word sense disambiguation; Data selection;
D O I
10.1007/978-3-319-42911-3_18
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper describes a method of domain adaptation, which involves adapting a classifier developed from source to target data. We automatically select the training data set that is suitable for the target data from the whole source data of multiple domains. This is unsupervised domain adaptation for Japanese word sense disambiguation (WSD). Experiments revealed that the accuracies of WSD improved when we automatically selected the training data set using two criteria, the degree of confidence and the leave-one-out (LOO)-bound score, compared with when the classifier was trained with all the data.
引用
收藏
页码:220 / 232
页数:13
相关论文
共 26 条
  • [11] Komiya K., 2011, P 5 INT JOINT C NAT, P1107
  • [12] Komiya K., 2012, J NLP, V19, P143
  • [13] Komiya K., 2012, PACLIC, V2012, P77
  • [14] Kouno Kazuhei, 2015, P 29 PAC AS C LANG I, P224
  • [15] Kunii S., 2013, P PACLIC 27, P224
  • [16] Maekawa Kikuo, 2008, P 3 INT JOINT C NAT, P101
  • [17] McClosky David, 2010, Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, P28
  • [18] National Institute for Japanese Language, 1964, LING BUNR
  • [19] Word Sense Disambiguation: A Survey
    Navigli, Roberto
    [J]. ACM COMPUTING SURVEYS, 2009, 41 (02)
  • [20] NISHIO M, 1994, IWANAMI KOKUGO JITEN