Automatic Identification of Lexical Units

被引:0
作者
Daudaravicius, Vidas [1 ]
机构
[1] Vytautas Magnus Univ, Fac Informat, Vileikos 8, Kaunas, Lithuania
来源
INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS | 2010年 / 34卷 / 01期
关键词
lexical unit; lexical unit identification; token/type ratio; dice score; corpus size; average minimum law;
D O I
暂无
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Lexical unit is a word or collocation. Extracting lexical knowledge is an essential and difficult task in NLP. The methods of extracting of lexical units are discussed. We present a method for the identification of lexical boundaries. The problem of necessity of large corpora for training is discussed. The advantage of identification of lexical boundaries within a text over traditional window method or full parsing approach allows to reduce human judgment significantly.
引用
收藏
页码:85 / 91
页数:7
相关论文
共 14 条
  • [1] Boitet Christian, 2006, INT WORKSH SPOK LANG, P23
  • [2] Daudaravicius V., 2004, INT J CORPUS LINGUIS, V9, P321
  • [3] Dias Gael, 2003, P ACL WORKSH MULT EX, p41U
  • [4] Lin Dekang, 1998, 1 WORKSH COMP TERM M, p57U
  • [5] Marcinkeviciene R., 2005, 2 BALT C HUM LANG TE, p299U
  • [6] Orliac Brigitte, 2003, MT SUMM 9 NEW ORL US, P292
  • [7] Rimkute E., 2007, 45 ANN M ASS COMP LI, P94
  • [8] SAG IA, 2002, P 3 INT C INT TEXT P, P1
  • [9] Seretan V., 2008, THESIS
  • [10] Seretan V, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P953