Investigating Word-Class Distributions in Word Vector Spaces

被引:0
|
作者
Sasano, Ryohei [1 ]
Korhonen, Anna [2 ]
机构
[1] Nagoya Univ, Grad Sch Informat, Nagoya, Aichi, Japan
[2] Univ Cambridge, Language Technol Lab, Cambridge, England
关键词
MODEL;
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
This paper presents an investigation on the distribution of word vectors belonging to a certain word class in a pre-trained word vector space. To this end, we made several assumptions about the distribution, modeled the distribution accordingly, and validated each assumption by comparing the goodness of each model. Specifically, we considered two types of word classes - the semantic class of direct objects of a verb and the semantic class in a thesaurus - and tried to build models that properly estimate how likely it is that a word in the vector space is a member of a given word class. Our results on selectional preference and WordNet datasets show that the centroid-based model will fail to achieve good enough performance, the geometry of the distribution and the existence of subgroups will have limited impact, and also the negative instances need to be considered for adequate modeling of the distribution. We further investigated the relationship between the scores calculated by each model and the degree of membership and found that discriminative learning-based models are best in finding the boundaries of a class, while models based on the offset between positive and negative instances perform best in determining the degree of membership.
引用
收藏
页码:3657 / 3666
页数:10
相关论文
共 50 条
  • [1] WORD-CLASS TRANSFERS IN POETRY AND PROSE
    FONAGY, I
    LANGUAGE AND STYLE, 1982, 15 (04): : 227 - 240
  • [2] Word-class embeddings for multiclass text classification
    Moreo, Alejandro
    Esuli, Andrea
    Sebastiani, Fabrizio
    DATA MINING AND KNOWLEDGE DISCOVERY, 2021, 35 (03) : 911 - 963
  • [3] WORD-CLASS DISTRIBUTION IN SENTENCES OF FIXED LENGTH
    ABORN, M
    RUBENSTEIN, H
    LANGUAGE, 1956, 32 (04) : 666 - 674
  • [4] ON THE RELATIONSHIP BETWEEN VOICE AND WORD-CLASS IN ENGLISH
    BERG, T
    ZEITSCHRIFT FUR ANGLISTIK UND AMERIKANISTIK, 1993, 41 (03): : 198 - 212
  • [5] Escaping ethnocentrism in the study of word-class universals
    Haspelmath, Martin
    THEORETICAL LINGUISTICS, 2012, 38 (1-2) : 91 - 102
  • [6] Word-class embeddings for multiclass text classification
    Alejandro Moreo
    Andrea Esuli
    Fabrizio Sebastiani
    Data Mining and Knowledge Discovery, 2021, 35 : 911 - 963
  • [7] Electrocortical word-class differences in normals and schizophrenic patients
    Reichert, A
    Mohr, B
    Härle, M
    Pulvermüller, F
    Rockstroh, B
    JOURNAL OF COGNITIVE NEUROSCIENCE, 2000, : 124 - 124
  • [8] Learning Word-Class Lattices for Definition and Hypernym Extraction
    Navigli, Roberto
    Velardi, Paola
    ACL 2010: 48TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, 2010, : 1318 - 1327
  • [9] Oil firms accused of "word-class accounting scandal"
    不详
    TCE, 2002, (734): : 6 - 6
  • [10] Recurrent word combinations in the London-Lund corpus: Coverage and use for word-class tagging
    EegOlofsson, M
    Altenberg, B
    SYNCHRONIC CORPUS LINGUISTICS, 1996, (16): : 97 - 107