Boosting-Based Ensemble Learning with Penalty Setting Profiles for Automatic Thai Unknown Word Recognition

被引:0
作者
TeCho, Jakkrit [1 ]
Nattee, Cholwich [1 ]
Theeramunkong, Thanaruk [1 ]
机构
[1] Thammasat Univ, Sirindhorn Int Inst Technol, Sch Informat Comp & Commun Technol, Bangkok, Thailand
来源
COMPUTATIONAL COLLECTIVE INTELLIGENCE: TECHNOLOGIES AND APPLICATIONS, PT II | 2010年 / 6422卷
关键词
Ensemble learning; Boosting Technique; Data mining; Unknown word recognition; Word boundary detection;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
A boosting-based ensemble learning can be used to improve classification accuracy by using multiple classification models constructing to cope with errors obtained from preceding steps. This paper presents an application of the boosting-based ensemble learning with penalty setting profiles on automatic unknown word recognition in Thai. Treating a sequential task as a non-sequential problem requires us to rank a set of generated candidates for a potential unknown word position. Since the correct candidate might not located at the highest rank among those candidates in the set, the proposed method provides penalties, in the form of a penalty setting profile, to improper ranking in order to reconstruct the succeeding classification model. In addition a number of alternative penalty setting profiles are introduced and their performances are compared on the task of extracting unknown words from a large Thai medical text. Using the naive Bayes as the base classifier for ensemble learning, the proposed method achieves the accuracy of 89.24%, which is an improvement of 9.91%, 7.54%, 5.25% over conventional naive Bayes, non-ensemble version, and flat penalty setting profile.
引用
收藏
页码:132 / 141
页数:10
相关论文
共 9 条
  • [1] [Anonymous], 2006, P COLING ACL MAIN C
  • [2] Feature-based Thai unknown word boundary identification using Winnow
    Charoenpornsawat, P
    Kijsirikul, B
    Meknavin, S
    [J]. APCCAS '98 - IEEE ASIA-PACIFIC CONFERENCE ON CIRCUITS AND SYSTEMS: MICROELECTRONICS AND INTEGRATING SYSTEMS, 1998, : 547 - 550
  • [3] Ensemble methods in machine learning
    Dietterich, TG
    [J]. MULTIPLE CLASSIFIER SYSTEMS, 2000, 1857 : 1 - 15
  • [4] Freund Y., 1999, Journal of Japanese Society for Artificial Intelligence, V14, P771
  • [5] Kawtrakul A., 1997, P NAT LANG PROC PAC, P341
  • [6] Sornlertlamvanich V., 1996, P 16 C COMPUTATIONAL, P1143
  • [7] A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
    Techo, Jakkrit
    Nattee, Cholwich
    Theeramunkong, Thanaruk
    [J]. IEICE TRANSACTIONS ON INFORMATION AND SYSTEMS, 2009, E92D (12): : 2321 - 2333
  • [8] Theeramunkong T, 2004, IEICE T INF SYST, VE87D, P1254
  • [9] Theeramunkong T., 2007, P 2 INT C KNOWL INF, P45