Morphological Analyzer and Generator for Russian and Ukrainian Languages

被引:95
作者
Korobov, Mikhail [1 ]
机构
[1] ScrapingHub Inc, Ekaterinburg, Russia
来源
ANALYSIS OF IMAGES, SOCIAL NETWORKS AND TEXTS, AIST 2015 | 2015年 / 542卷
关键词
Morphological analyzer; Russian; Ukrainian; Morphological generator; Open source; OpenCorpora; LanguageTool; pymorphy2; pymorphy;
D O I
10.1007/978-3-319-26123-2_31
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
pymorphy2 is a morphological analyzer and generator for Russian and Ukrainian languages. It uses large efficiently encoded lexicons built from OpenCorpora and LanguageTool data. A set of linguistically motivated rules is developed to enable morphological analysis and generation of out-of-vocabulary words observed in real-world documents. For Russian pymorphy2 provides state-of-the-arts morphological analysis quality. The analyzer is implemented in Python programming language with optional C++ extensions. Emphasis is put on ease of use, documentation and extensibility. The package is distributed under a permissive open-source license, encouraging its use in both academic and commercial setting.
引用
收藏
页码:330 / U476
页数:13
相关论文
共 14 条
[1]  
Astaf'eva I., 2010, COMPUTATIONAL LINGUI, V1
[2]  
Bocharov V.V., 2013, COMPUTATIONAL LINGUI, V1
[3]  
Bocharov V.V., 2012, NEW INFORM TECHNOLOG
[4]  
Bolshakov I.A., 2012, COMPUTATIONAL LINGUI, V1
[5]   Incremental construction of minimal acyclic finite-state automata [J].
Daciuk, J ;
Mihov, S ;
Watson, BW ;
Watson, RE .
COMPUTATIONAL LINGUISTICS, 2000, 26 (01) :3-16
[6]  
Daciuk J., 2001, LNCS, V2214, P71
[7]  
Krylov S.A., 2003, P INT C DIAL 2003
[8]  
Mikheev A, 1997, COMPUT LINGUIST, V23, P405
[9]  
Segalovich Ilya, 2003, P MLMTA 2003 LAS VEG
[10]  
Sokirko A., 2004, COMPUTATIONAL LINGUI