Using Low-Cost Annotation to Train a Reliable Czech Shallow Parser

被引:0
作者
Radziszewski, Adam [1 ]
Grac, Marek [2 ]
机构
[1] Wroclaw Univ Technol, Inst Informat, PL-50370 Wroclaw, Poland
[2] Masaryk Univ, Fac Arts, Dept Czech Language, Computat Linguist Ctr, Brno, Czech Republic
来源
TEXT, SPEECH, AND DIALOGUE, TSD 2013 | 2013年 / 8082卷
关键词
corpus annotation; shallow parsing; Czech;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Bushbank is a relatively new concept - a type of annotated corpus where annotation is driven by use of automatic tools and the task of human annotators is limited to accepting or rejecting parts of their output. This creates a possibility to obtain annotated corpora of considerable size at relatively low cost. In this paper we ask the question if the Czech Bushbank is reliable enough to be used for a NLP task instead of a traditional corpus with high annotation rigour. We perform evaluation of three different parsers using its shallow syntactic annotation, including a CRF chunker made originally for Polish. The results are very promising, showing that many practical applications could benefit from low-cost annotation.
引用
收藏
页码:575 / 582
页数:8
相关论文
共 14 条
  • [11] Radziszewski A, 2011, LECT NOTES ARTIF INT, V6836, P434, DOI 10.1007/978-3-642-23538-2_55
  • [12] Shen H, 2004, THESIS
  • [13] Smerk P., 2008, K MORFOLOGICKE DESAM
  • [14] Waszczuk Jakub, 2010, Proceedings 2010 International Multiconference on Computer Science and Information Technology (IMCSIT 2010), P531