Definition extraction with balanced Random Forests

被引:0
作者
Kobylinski, Lukasz [1 ]
Przepiorkowski, Adam [2 ,3 ]
机构
[1] Warsaw Univ Technol, Inst Comp Sci, Ul Nowowiejska 15-19, PL-00665 Warsaw, Poland
[2] Polish Acad Sci, Inst Comp Sci, PL-01237 Warsaw, Poland
[3] Warsaw Univ, Inst Informat, PL-02097 Warsaw, Poland
来源
ADVANCES IN NATURAL LANGUAGE PROCESSING, PROCEEDINGS | 2008年 / 5221卷
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We propose a novel machine learning approach to the task of identifying definitions in Polish documents. Specifics of the problem domain and characteristics of the available dataset have been taken into consideration, by carefully choosing and adapting a classification method to highly imbalanced and noisy data. We evaluate the performance of a Random Forest-based classifier in extracting definitional sentences from natural language text and give a comparison with previous work.
引用
收藏
页码:237 / +
页数:3
相关论文
共 19 条
  • [1] Androutsopoulos I., 2004, P 20 INT C COMP LING, P1360
  • [2] [Anonymous], 2007, P 6 WORKSH BALT SLAV
  • [3] [Anonymous], P EACL 2006 WORKSH L
  • [4] Bachimont B., 2004, 3 ED COMPUTERM WORKS, P55
  • [5] Random forests
    Breiman, L
    [J]. MACHINE LEARNING, 2001, 45 (01) : 5 - 32
  • [6] Chen C, 2004, Using random forest to learn imbalanced data
  • [7] Degorski Lukasz, 2008, P 6 INT C LANG RES E
  • [8] Kingsbury P., 2002, P 3 INT C LANG RES E, P1989
  • [9] KLAVANS JL, 2000, P ANN FALL S AM MED
  • [10] KLAVANS JL, 2001, P AMIA S