Distributional Memory: A General Framework for Corpus-Based Semantics

被引:235
作者
Baroni, Marco [1 ]
Lenci, Alessandro [2 ]
机构
[1] Univ Trento, Ctr Mind Brain Sci CIMeC, I-38068 Rovereto, TN, Italy
[2] Univ Pisa, Dept Linguist T Bolelli, I-56126 Pisa, PI, Italy
关键词
FEATURE PRODUCTION NORMS; LARGE SET; DISCOVERY; INDUCTION;
D O I
10.1162/coli_a_00016
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Research into corpus-based semantics has focused on the development of ad hoc models that treat single tasks, or sets of closely related tasks, as unrelated challenges to be tackled by extracting different kinds of distributional information from the corpus. As an alternative to this "one task, one model" approach, the Distributional Memory framework extracts distributional information once and for all from the corpus, in the form of a set of weighted word-link-word tuples arranged into a third-order tensor. Different matrices are then generated from the tensor, and their rows and columns constitute natural spaces to deal with different semantic problems. In this way, the same distributional information can be shared across tasks such as modeling word similarity judgments, discovering synonyms, concept categorization, predicting selectional preferences of verbs, solving analogy problems, classifying relations between word pairs, harvesting qualia structures with patterns or example pairs, predicting the typical properties of concepts, and classifying verbs into alternation classes. Extensive empirical testing in all these domains shows that a Distributional Memory implementation performs competitively against task-specific algorithms recently reported in the literature for the same tasks, and against our implementations of several state-of-the-art methods. The Distributional Memory approach is thus shown to be tenable despite the constraints imposed by its multi-purpose nature.
引用
收藏
页码:673 / 721
页数:49
相关论文
共 116 条
  • [1] Quantum aspects of semantic analysis and symbolic artificial intelligence
    Aerts, D
    Czachor, M
    [J]. JOURNAL OF PHYSICS A-MATHEMATICAL AND GENERAL, 2004, 37 (12): : L123 - L132
  • [2] Alfonseca Enrique, 2005, P RANLP BOR, P1
  • [3] Almuhareb A., 2005, Proceedings of CogSci, P103
  • [4] Almuhareb A., 2004, Procs. of EMNLP, P158
  • [5] [Anonymous], 1992, COLING 1992, DOI DOI 10.3115/992133.992154
  • [6] [Anonymous], P WORKSH GEOM MOD NA
  • [7] [Anonymous], 2006, Attributes in lexical acquisition
  • [8] [Anonymous], 2007, P 45 ANN M ASS COMPU
  • [9] [Anonymous], 2005, INTRO RANDOM INDEXIN
  • [10] [Anonymous], P TAINN 2006