Valence extraction using EM selection and co-occurrence matrices

被引:0
作者
Łukasz Dębowski
机构
[1] Instytut Podstaw Informatyki PAN,
[2] Centrum Wiskunde and Informatica,undefined
来源
Language Resources and Evaluation | 2009年 / 43卷
关键词
Verb valence extraction; EM algorithm; Co-occurrence matrices; Polish language;
D O I
暂无
中图分类号
学科分类号
摘要
This paper discusses two new procedures for extracting verb valences from raw texts, with an application to the Polish language. The first novel technique, the EM selection algorithm, performs unsupervised disambiguation of valence frame forests, obtained by applying a non-probabilistic deep grammar parser and some post-processing to the text. The second new idea concerns filtering of incorrect frames detected in the parsed text and is motivated by an observation that verbs which take similar arguments tend to have similar frames. This phenomenon is described in terms of newly introduced co-occurrence matrices. Using co-occurrence matrices, we split filtering into two steps. The list of valid arguments is first determined for each verb, whereas the pattern according to which the arguments are combined into frames is computed in the following stage. Our best extracted dictionary reaches an F-score of 45%, compared to an F-score of 39% for the standard frame-based BHT filtering.
引用
收藏
页码:301 / 327
页数:26
相关论文
共 20 条
[1]  
Artstein R.(2008)Inter-coder agreement for computational linguistics Computational Linguistics 34 555-596
[2]  
Poesio M.(1972)Inequality and associated maximization technique in statistical estimation of probabilistic functions of Markov processes Inequalities 3 1-8
[3]  
Baum L. E.(1993)From grammar to Lexicon: Unsupervised learning of lexical syntax Computational Linguistics 19 243-262
[4]  
Brent M. R.(1998)Estimation of probabilistic context-free grammars Computational Linguistics 24 299-305
[5]  
Chi Z.(1977)Maximum likelihood from incomplete data via the EM algorithm Journal of the Royal Statistical Society, Series B 39 185-197
[6]  
Geman S.(1998)Processing capacity defined by relational complexity: Implications for comparative, developmental and cognitive psychology Behavioral Brain Sciences 21 803-864
[7]  
Dempster A. P.(1992)Robust part-of-speech tagging using a hidden Markov model Computer Speech and Language 6 225-242
[8]  
Laird N. M.(2004)Verb class disambiguation using informative priors Computational Linguistics 30 45-73
[9]  
Rubin D. B.(2005)Automatic acquisition of syntactic verb classes with basic resources Language Resources and Evaluation 39 295-312
[10]  
Halford G. S.(1994)Tagging English text with a probabilistic model Computational Linguistics 20 155-171