Corpus-based learning of analogies and semantic relations

被引:65
作者
Turney, PD
Littman, ML
机构
[1] Natl Res Council Canada, Inst Informat Technol, Ottawa, ON K1A 0R6, Canada
[2] Rutgers State Univ, Dept Comp Sci, Piscataway, NJ 08854 USA
关键词
analogy; metaphor; semantic relations; vector space model; cosine similarity; noun-modifier pairs;
D O I
10.1007/s10994-005-0913-1
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present an algorithm for learning from unlabeled text, based on the Vector Space Model (VSM) of information retrieval, that can solve verbal analogy questions of the kind found in the SAT college entrance exam. A verbal analogy has the form A:B::C:D, meaning "A is to B as C is to D"; for example, mason:stone::carpenter:wood. SAT analogy questions provide a word pair, A:B, and the problem is to select the most analogous word pair, C:D, from a set of five choices. The VSM algorithm correctly answers 47% of a collection of 374 college-level analogy questions (random guessing would yield 20% correct; the average college-bound senior high school student answers about 57% correctly). We motivate this research by applying it to a difficult problem in natural language processing, determining semantic relations in noun-modifier pairs. The problem is to classify a noun-modifier pair, such as "laser printer", according to the semantic relation between the noun (printer) and the modifier (laser). We use a supervised nearest-neighbour algorithm that assigns a class to a given noun-modifier pair by finding the most analogous noun-modifier pair in the training data. With 30 classes of semantic relations, on a collection of 600 labeled noun-modifier pairs, the learning algorithm attains an F value of 26.5% (random guessing: 3.3%). With 5 classes of semantic relations, the F value is 43.2% (random: 20%). The performance is state-of-the-art for both verbal analogies and noun-modifier relations.
引用
收藏
页码:251 / 278
页数:28
相关论文
共 54 条
[1]  
[Anonymous], J MACHINE LEARNING R
[2]  
BAEZAYATES RA, 1999, MODERN INFORMATION R
[3]  
Barker K, 1998, P 17 INT C COMP LING, P96
[4]  
Berland Matthew, 1999, Proceedings of the 37th annual meeting of the Association for Computational Linguistics on Computational Linguistics, P57, DOI DOI 10.3115/1034678.1034697
[5]  
BROADIE S, 2001, NICOMACHEAN ETHICS
[6]  
CHURCH KW, 1990, 27TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, P76
[7]  
CLAMAN C, 2000, 10 REAL SATS
[8]  
Cristianini N., 2000, Intelligent Data Analysis: An Introduction, DOI 10.1017/CBO9780511801389
[9]  
DAGAN I, 2002, 6 C NAT LANG LEARN C, P15
[10]   THE CELL TRANSMISSION MODEL - A DYNAMIC REPRESENTATION OF HIGHWAY TRAFFIC CONSISTENT WITH THE HYDRODYNAMIC THEORY [J].
DAGANZO, CF .
TRANSPORTATION RESEARCH PART B-METHODOLOGICAL, 1994, 28 (04) :269-287