Is it possible to create a very large wordnet in 100 days? An evaluation

被引:0
作者
Krister Lindén
Jyrki Niemi
机构
[1] University of Helsinki,Department of Modern Languages
来源
Language Resources and Evaluation | 2014年 / 48卷
关键词
Wordnet; Bilingual lexicon; Quality assessment; Knowledge representation; Word-sense disambiguation;
D O I
暂无
中图分类号
学科分类号
摘要
Wordnets are large-scale lexical databases of related words and concepts, useful for language-aware software applications. They have recently been built for many languages by using various approaches. The Finnish wordnet, FinnWordNet (FiWN), was created by translating the more than 200,000 word senses in the English Princeton WordNet (PWN) 3.0 in 100 days. To ensure quality, they were translated by professional translators. The direct translation approach was based on the assumption that most synsets in PWN represent language-independent real-world concepts. Thus also the semantic relations between synsets were assumed mostly language-independent, so the structure of PWN could be reused as well. This approach allowed the creation of an extensive Finnish wordnet directly aligned with PWN and also provided us with a translation relation and thus a bilingual wordnet usable as a dictionary. In this paper, we address several concerns raised with regard to our approach, many of them for the first time. We evaluate the craftsmanship of the translators by checking the spelling and translation quality, the viability of the approach by assessing the synonym quality both on the lexeme and concept level, as well as the usefulness of the resulting lexical resource both for humans and in a language-technological task. We discovered no new problems compared with those already known in PWN. As a whole, the paper contributes to the scientific discourse on what it takes to create a very large wordnet. As a side-effect of the evaluation, we extended FiWN to contain 208,645 word senses in 120,449 synsets, effectively making version 2.0 of FiWN currently the largest wordnet in the world by these statistics.
引用
收藏
页码:191 / 201
页数:10
相关论文
共 1 条