Contextualized Text OLAP Based on Information Retrieval

被引:3
作者
Oukid, Lamia [1 ]
Benblidia, Nadjia [1 ]
Bentayeb, Fadila [2 ]
Asfari, Ounas [2 ]
Boussaid, Omar [2 ]
机构
[1] Univ Blida 1, LRDSI Lab, Blida, Algeria
[2] Univ Lyon 2, ERIC Lab, Lyon, France
关键词
Aggregation Operator; Context; Information Retrieval; Query Expansion; Text Cube; Text OLAP; DOCUMENTS; MODEL;
D O I
10.4018/ijdwm.2015040101
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
Current data warehousing and On-Line Analytical Processing (OLAP) systems are not yet particularly appropriate for textual data analysis. It is therefore crucial to develop a new data model and an OLAP system to provide the necessary analyses for textual data. To achieve this objective, this paper proposes a new approach based on information retrieval (IR) techniques. Moreover, several contextual factors may significantly affect the information relevant to a decision-maker. Thus, the paper proposes to consider contextual factors in an OLAP system to provide relevant results. It provides a generalized approach for Text OLAP analysis which consists of two parts: The first one is a context-based text cube model, denoted CXT-Cube. It is characterized by several contextual dimensions. Hence, during the OLAP analysis process, CXT-Cube exploits the contextual information in order to better consider the semantics of textual data. Besides, the work associates to CXT-Cube a new text analysis measure based on an OLAP-adapted vector space model and a relevance propagation technique. The second part is an OLAP aggregation operator called ORank (OLAP-Rank) which allows to aggregate textual data in an OLAP environment while considering relevant contextual factors. To consider the user context, this paper proposes a query expansion method based on a decision-maker profile. Based on IR metrics, it evaluates the proposed aggregation operator in different cases using several data analysis queries. The evaluation shows that the precision of the system is significantly better than that of a Text OLAP system based on classical IR. This is due to the consideration of the contextual factors.
引用
收藏
页码:1 / 21
页数:21
相关论文
共 19 条
[1]  
Aknouche R., 2012, P INT CROSS DOM C WO, P244, DOI [10.1007/978-3-642-32498-7_19, DOI 10.1007/978-3-642-32498-7_19]
[2]  
[Anonymous], P 10 NOR WORK SECV I
[3]  
Asfari O., 2011, INT J ADV INTELLIGEN, V4, P128
[4]  
Asfari O., 2008, P 5 FRENCH INF RETR, P377
[5]  
Bringay S., 2011, ACT 3 JOURN FRANC EN, P87
[6]  
Cassens J., 2006, LECT NOTES ARTIF INT, P619
[7]  
Dey A. K., 2011, J PERSONAL UBIQUITOU, V5, P4, DOI [10.1007/s007790170019, DOI 10.1007/S007790170019]
[8]   Probabilistic latent semantic indexing [J].
Hofmann, T .
SIGIR'99: PROCEEDINGS OF 22ND INTERNATIONAL CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 1999, :50-57
[9]  
Kraft R., 2006, P 15 INT C WORLD WID, P367
[10]   Text Cube: Computing IR Measures for Multidimensional Text Database Analysis [J].
Lin, Cindy Xide ;
Ding, Bolin ;
Han, Jiawei ;
Zhu, Feida ;
Zhao, Bo .
ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, :905-910