OLAP on Multidimensional Text Databases: Topic Network Cube and its Applications

被引:0
作者
Zhang, Zhiyuan [1 ]
Wang, Hong [1 ]
Feng, Xingjie [1 ]
机构
[1] Civil Aviat Univ China, Sch Comp Sci & Technol, Tianjin, Peoples R China
基金
中国国家自然科学基金;
关键词
multidimensional text database; topic network cube; OLAP; text mining; complex network;
D O I
10.2298/FIL1805973Z
中图分类号
O29 [应用数学];
学科分类号
070104 ;
摘要
Multidimensional text data contains both structured attributes and unstructured text. Unlike the traditional numerical data, it is not straightforward to apply online analytical processing on multidimensional text data. Although some OLAP methods such as topic cube have been proposed in order to effectively utilize its structured information and valuable text data, these methods cant tell the relations of topic words. Considering that topics usually consist of several subtopics and each subtopic usually contains some topic words, we here use a topic network manner, in which related topic words are connected, to express the complex relations of topics. This paper introduces a new concept of topic network cube to perform OLAP analysis on multidimensional text databases. Firstly, we propose a method called GL-LDA based on Gibbs sampling outputs of Labeled LDA to measure the relations between topic words. Secondly, we give a storage model of topic network cube which can efficiently generate topic network using GL-LDA. Thirdly, we show how to perform OLAP analysis on topic network cube. Experimental results show that we can analyze multidimensional text databases in different granularity easily and effectively using just a few simple SQL statements, and the output network provides rich and useful information of topics.
引用
收藏
页码:1973 / 1982
页数:10
相关论文
共 16 条
  • [1] [Anonymous], 2001, P 12 EUR C MACH LEAR, DOI DOI 10.1007/3-540-44795-4_42
  • [2] Bimonte S, 2006, LECT NOTES COMPUT SC, V4243, P100
  • [3] Blei D.M., 2007, P 20 INT C NEUR INF, P121, DOI DOI 10.5555/2981562.2981578
  • [4] Latent Dirichlet allocation
    Blei, DM
    Ng, AY
    Jordan, MI
    [J]. JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) : 993 - 1022
  • [5] Chaudhuri S., 1997, SIGMOD Record, V26, P65, DOI 10.1145/248603.248616
  • [6] Graph OLAP: a multi-dimensional framework for graph data analysis
    Chen, Chen
    Yan, Xifeng
    Zhu, Feida
    Han, Jiawei
    Yu, Philip S.
    [J]. KNOWLEDGE AND INFORMATION SYSTEMS, 2009, 21 (01) : 41 - 63
  • [7] Church K. W., 1990, Computational Linguistics, V16, P22
  • [8] Improving Word Similarity by Augmenting PMI with Estimates of Word Polysemy
    Han, Lushan
    Finin, Tim
    McNamee, Paul
    Joshi, Anupam
    Yesha, Yelena
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2013, 25 (06) : 1307 - 1322
  • [9] Unsupervised learning by probabilistic latent semantic analysis
    Hofmann, T
    [J]. MACHINE LEARNING, 2001, 42 (1-2) : 177 - 196
  • [10] Text Cube: Computing IR Measures for Multidimensional Text Database Analysis
    Lin, Cindy Xide
    Ding, Bolin
    Han, Jiawei
    Zhu, Feida
    Zhao, Bo
    [J]. ICDM 2008: EIGHTH IEEE INTERNATIONAL CONFERENCE ON DATA MINING, PROCEEDINGS, 2008, : 905 - 910