Using electronic texts for an annotated corpus building

被引:9
作者
Galicia-Haro, SN [1 ]
机构
[1] Inst Politecn Nacl, Computat Res Ctr, Nat Language & Text Proc Lab, Mexico City 07738, DF, Mexico
来源
PROCEEDINGS OF THE FOURTH MEXICAN INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE (ENC 2003) | 2003年
关键词
D O I
10.1109/ENC.2003.1232870
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Nowadays, collections of texts with annotations on several levels are useful resources. They are employed for diverse tasks in theoretical research and natural language applications. The most important collections are dedicated to English. However, huge efforts are required to develop the corresponding resource for other languages. In this work, we present the initial steps for the compilation of an annotated Mexican corpus using electronic texts obtained from the WEB.
引用
收藏
页码:26 / 32
页数:7
相关论文
共 17 条
  • [1] BERTHOUZOZ C, 1997, P REC ADV NAT LANG P, P179
  • [2] Biber D., 1993, Computational Linguistics, V19, P219
  • [3] Bolshakov I. A., 2002, Computational Linguistics and Intelligent Text Processing. Third International Conference, CICLing 2002. Proceedings (Lecture Notes in Computer Science Vol.2276), P210
  • [4] CALZOLARI N, 2001, 2 C INT TEXT PROC CO
  • [5] CARMONA J, 1998, 1 INT C LANG RES EV
  • [6] FERNANDEZ EN, 1995, DICCIONARIO CONSTRUC
  • [7] Francis W. N., 1982, FREQUENCY ANAL ENGLI
  • [8] GALICIAHARO SN, 2001, 2 INT WORKSH SPAN LA, P147
  • [9] GELBUKH A, 2002, LECT NOTES COMPUTE N, V2276, P285
  • [10] KILGARIFF A, 2001, P CORP LING 2001 C U, V13, P342