TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING

被引:2
|
作者
Crossno, Patricia J. [1 ]
Wilson, Andrew T. [1 ]
Shead, Timothy M. [1 ]
Davis, Warren L. [1 ]
Dunlavy, Daniel M. [1 ]
机构
[1] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
Text analysis; visual model analysis; latent semantic analysis; latent dirichlet allocation; clustering;
D O I
10.1142/S0218213013600087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new approach for analyzing topic models using visual analytics. We have developed Topic View, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. Topic View uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of Topic View's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
引用
收藏
页数:36
相关论文
共 50 条
  • [31] Incorporating Popularity in Topic Models for Social Network Analysis
    Cha, Youngchul
    Bi, Bin
    Hsieh, Chu-Cheng
    Cho, Junghoo
    SIGIR'13: THE PROCEEDINGS OF THE 36TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH & DEVELOPMENT IN INFORMATION RETRIEVAL, 2013, : 223 - 232
  • [32] Clustering with Probabilistic Topic Models on Arabic Texts: A Comparative Study of LDA and K-Means
    Kelaiaia, Abdessalem
    Merouani, Hayet
    INTERNATIONAL ARAB JOURNAL OF INFORMATION TECHNOLOGY, 2016, 13 (02) : 332 - 338
  • [33] Expert Refined Topic Models to Edit Topic Clusters in Image Analysis Applied to Welding Engineering
    Allen, Theodore T.
    Xiong, Hui
    Tseng, Shih-Hsien
    INFORMATICS-BASEL, 2020, 7 (03):
  • [34] Image tone mapping based on clustering and human visual system models
    Han, Xueyu
    Khan, Ishtiaq Rasool
    Rahardja, Susanto
    SIGNAL PROCESSING-IMAGE COMMUNICATION, 2024, 120
  • [35] Clustering Analysis via Deep Generative Models With Mixture Models
    Yang, Lin
    Fan, Wentao
    Bouguila, Nizar
    IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, 2022, 33 (01) : 340 - 350
  • [36] Multi-granular document-level sentiment topic analysis for online reviews
    Faliang Huang
    Changan Yuan
    Yingzhou Bi
    Jianbo Lu
    Liqiong Lu
    Xing Wang
    Applied Intelligence, 2022, 52 : 7723 - 7733
  • [37] Multi-granular document-level sentiment topic analysis for online reviews
    Huang, Faliang
    Yuan, Changan
    Bi, Yingzhou
    Lu, Jianbo
    Lu, Liqiong
    Wang, Xing
    APPLIED INTELLIGENCE, 2022, 52 (07) : 7723 - 7733
  • [38] Empirical Analysis of Single and Multi Document Summarization using Clustering Algorithms
    Bewoor, Mrunal S.
    Patil, Suhas H.
    ENGINEERING TECHNOLOGY & APPLIED SCIENCE RESEARCH, 2018, 8 (01) : 2562 - 2567
  • [39] Multi-page document analysis based on format consistency and clustering
    Gao, Liangcai
    Tang, Zhi
    Fang, Jing
    Lin, Xiaofan
    INTERNATIONAL JOURNAL OF COMPUTER APPLICATIONS IN TECHNOLOGY, 2010, 38 (04) : 306 - 315
  • [40] Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric Models
    Fan, Wentao
    Guo, Zhiyan
    Bouguila, Nizar
    Hou, Wenjuan
    SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, : 2126 - 2130