Reflective Random Indexing and indirect inference: A scalable method for discovery of implicit connections

被引:57
作者
Cohen, Trevor [1 ]
Schvaneveldt, Roger [2 ]
Widdows, Dominic [3 ]
机构
[1] Univ Texas Houston, Ctr Cognit Informat & Decis Making, Sch Hlth Informat Sci, Houston, TX 77030 USA
[2] Arizona State Univ, Appl Psychol Unit, Tempe, AZ 85287 USA
[3] Google Inc, Mountain View, CA USA
关键词
Distributional semantics; Literature-based discovery; Implicit associations; Indirect inference; GENERATING HYPOTHESES; RAYNAUDS; OIL;
D O I
10.1016/j.jbi.2009.09.003
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
The discovery of implicit connections between terms that do not occur together in any scientific document underlies the model of literature-based knowledge discovery first proposed by Swanson. Corpus-derived statistical models of semantic distance such as Latent Semantic Analysis (LSA) have been evaluated previously as methods for the discovery of such implicit connections. However, LSA in particular is dependent on a computationally demanding method of dimension reduction as a means to obtain meaningful indirect inference, limiting its ability to scale to large text corpora. In this paper, we evaluate the ability of Random Indexing (RI), a scalable distributional model of word associations, to draw meaningful implicit relationships between terms in general and biomedical language. Proponents of this method have achieved comparable performance to LSA on several cognitive tasks while using a simpler and less computationally demanding method of dimension reduction than LSA employs. In this paper, we demonstrate that the original implementation of RI is ineffective at inferring meaningful indirect connections, and evaluate Reflective Random Indexing (RRI), an iterative variant of the method that is better able to perform indirect inference. RRI is shown to lead to more clearly related indirect connections and to outperform existing RI implementations in the prediction of future direct co-occurrence in the MEDLINE corpus. (C) 2009 Elsevier Inc. All rights reserved.
引用
收藏
页码:240 / 256
页数:17
相关论文
共 49 条
[1]  
[Anonymous], P 30 ANN M COGN SCI
[2]  
BRUZA P, 2006, OPERATIONAL ABDUCTIO
[3]  
Bruza PD, 2009, HANDBOOK OF QUANTUM LOGIC AND QUANTUM STRUCTURES: QUANTUM LOGIC, P625, DOI 10.1016/B978-0-444-52869-8.50017-7
[4]   Explorations in context space: Words, sentences, discourse [J].
Burgess, C ;
Livesay, K ;
Lund, K .
DISCOURSE PROCESSES, 1998, 25 (2-3) :211-257
[5]  
Cohen Trevor, 2008, AMIA Annu Symp Proc, P126
[6]   Empirical distributional semantics: Methods and biomedical applications [J].
Cohen, Trevor ;
Widdows, Dominic .
JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (02) :390-405
[7]  
Cole RJ, 2005, LECT NOTES COMPUT SC, V3735, P84
[8]  
DAVID B.I., 1997, Numerical Linear Algebra
[9]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[10]  
2-9