Automatic Identification of Lexical Units

被引：0

作者：

Daudaravicius, Vidas ^{[1
]}

机构：

[1] Vytautas Magnus Univ, Fac Informat, Vileikos 8, Kaunas, Lithuania

来源：

INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS | 2010年 / 34卷 / 01期

关键词：

lexical unit; lexical unit identification; token/type ratio; dice score; corpus size; average minimum law;

D O I：

暂无

中图分类号：

TP31 [计算机软件];

学科分类号：

081202 ; 0835 ;

摘要：

Lexical unit is a word or collocation. Extracting lexical knowledge is an essential and difficult task in NLP. The methods of extracting of lexical units are discussed. We present a method for the identification of lexical boundaries. The problem of necessity of large corpora for training is discussed. The advantage of identification of lexical boundaries within a text over traditional window method or full parsing approach allows to reduce human judgment significantly.

引用

页码：85 / 91

页数：7

共 14 条

[1] Boitet Christian, 2006, INT WORKSH SPOK LANG, P23
[2] Daudaravicius V., 2004, INT J CORPUS LINGUIS, V9, P321
[3] Dias Gael, 2003, P ACL WORKSH MULT EX, p41U
[4] Lin Dekang, 1998, 1 WORKSH COMP TERM M, p57U
[5] Marcinkeviciene R., 2005, 2 BALT C HUM LANG TE, p299U
[6] Orliac Brigitte, 2003, MT SUMM 9 NEW ORL US, P292
[7] Rimkute E., 2007, 45 ANN M ASS COMP LI, P94
[8] SAG IA, 2002, P 3 INT C INT TEXT P, P1
[9] Seretan V., 2008, THESIS
[10] Seretan V, 2006, COLING/ACL 2006, VOLS 1 AND 2, PROCEEDINGS OF THE CONFERENCE, P953

← 1 2 →