Improving the efficiency of the Bayesian network retrieval model by reducing relationships between terms

被引:3
作者
De Campos, LM [1 ]
Fernandez-Luna, JM [1 ]
Huete, JF [1 ]
机构
[1] Univ Granada, ETSI Informat, Dept Ciencias Computac & Inteligencia Artificial, Granada 18071, Spain
关键词
Bayesian networks; information retrieval; learning; dependence; clustering;
D O I
10.1142/S0218488503002296
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The Bayesian Network Retrieval Model is able to represent the main (in)dependence relationships between the terms from a document collection by means of a specific type of Bayesian network, namely a polytree. However, although the learning and propagation algorithms designed for this topology are very efficient, in collections with a very large number of terms, these two tasks might be very time-consuming. This paper shows how by reducing the size of the polytree, which will only comprise one subset of terms which are selected according to their retrieval quality, the performance of the model is maintained, whereas the efforts needed to learn and later propagate in the model are considerably reduced. A method for selecting the best terms, based on their inverse document frequency and term discrimination value, is also presented.
引用
收藏
页码:101 / 116
页数:16
相关论文
共 27 条
[1]  
[Anonymous], 1997, Proceedings of the fourteenth international conference on machine learning, DOI DOI 10.1016/J.ESWA.2008.05.026
[2]   Is this document relevant? ... probably: A survey of probabilistic models in information retrieval [J].
Crestani, F ;
Lalmas, M ;
Van Rijsbergen, CJ ;
Campbell, I .
ACM COMPUTING SURVEYS, 1998, 30 (04) :528-552
[3]   INFORMATION-RETRIEVAL BY LOGICAL IMAGING [J].
CRESTANI, F ;
VANRIJSBERGEN, CJ .
JOURNAL OF DOCUMENTATION, 1995, 51 (01) :3-17
[4]  
CROUCH CJ, 1992, SIGIR 92 : PROCEEDINGS OF THE FIFTEENTH ANNUAL INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, P77
[5]  
de Campos LM, 2002, LECT NOTES COMPUT SC, V2291, P169
[6]   Building Bayesian network-based information retrieval systems [J].
de Campos, LM ;
Fernández, JM ;
Huete, JF .
11TH INTERNATIONAL WORKSHOP ON DATABASE AND EXPERT SYSTEMS APPLICATION, PROCEEDINGS, 2000, :543-550
[7]  
DECAMPOS LM, 2002, P 9 INF P MAN UNC KN, P1195
[8]  
DECAMPOS LM, UNPUB INT J APPROXIM
[9]  
DECAMPOS LM, 1998, P 14 UNC ART INT C, P53
[10]  
FERNANDEZLUNA JM, 2001, THESIS U GRANADA