TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING

被引:2
作者
Crossno, Patricia J. [1 ]
Wilson, Andrew T. [1 ]
Shead, Timothy M. [1 ]
Davis, Warren L. [1 ]
Dunlavy, Daniel M. [1 ]
机构
[1] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
Text analysis; visual model analysis; latent semantic analysis; latent dirichlet allocation; clustering;
D O I
10.1142/S0218213013600087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new approach for analyzing topic models using visual analytics. We have developed Topic View, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. Topic View uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of Topic View's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
引用
收藏
页数:36
相关论文
共 50 条
  • [41] Multi-document summarization using weighted similarity between topic and clustering-based non-negative semantic feature
    Park, Sun
    Lee, Ju-Hong
    Kim, Deok-Hwan
    Ahn, Chan-Min
    ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2007, 4505 : 108 - +
  • [42] Validation of scientific topic models using graph analysis and corpus metadata
    Vazquez, Manuel A.
    Pereira-Delgado, Jorge
    Cid-Sueiro, Jesus
    Arenas-Garcia, Jeronimo
    SCIENTOMETRICS, 2022, 127 (09) : 5441 - 5458
  • [43] Topic models meet discourse analysis: a quantitative tool for a qualitative approach
    Jacobs, Thomas
    Tschotschel, Robin
    INTERNATIONAL JOURNAL OF SOCIAL RESEARCH METHODOLOGY, 2019, 22 (05) : 469 - 485
  • [44] Validation of scientific topic models using graph analysis and corpus metadata
    Manuel A. Vázquez
    Jorge Pereira-Delgado
    Jesús Cid-Sueiro
    Jerónimo Arenas-García
    Scientometrics, 2022, 127 : 5441 - 5458
  • [45] Impact of mobility models on clustering based routing protocols in mobile WSNs
    Khan, Atta Ur Rehman
    Ali, Shahzad
    Mustafa, Saad
    Othman, Mazliza
    10TH INTERNATIONAL CONFERENCE ON FRONTIERS OF INFORMATION TECHNOLOGY (FIT 2012), 2012, : 366 - 370
  • [46] Design and evaluation of a parallel document clustering algorithm based on hierarchical latent semantic analysis
    Seshadri, Karthick
    Iyer, K. Viswanathan
    Shalinie, Mercy S.
    CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE, 2019, 31 (13)
  • [47] Proximity-based k-partitions clustering with ranking for document categorization and analysis
    Mei, Jian-Ping
    Chen, Lihui
    EXPERT SYSTEMS WITH APPLICATIONS, 2014, 41 (16) : 7095 - 7105
  • [48] A Comprehensive Tensor Framework for the Clustering of Hyperspectral Paper Data With an Application to Forensic Document Analysis
    Francis, Jobin
    Madathil, Baburaj
    George, Sudhish N.
    George, Sony
    IEEE ACCESS, 2022, 10 : 6194 - 6207
  • [49] Pseudo-document simulation for comparing LDA, GSDMM and GPM topic models on short and sparse text using Twitter data
    Christoph Weisser
    Christoph Gerloff
    Anton Thielmann
    Andre Python
    Arik Reuter
    Thomas Kneib
    Benjamin Säfken
    Computational Statistics, 2023, 38 : 647 - 674
  • [50] Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists
    Brehmer, Matthew
    Ingram, Stephen
    Stray, Jonathan
    Munzner, Tamara
    IEEE TRANSACTIONS ON VISUALIZATION AND COMPUTER GRAPHICS, 2014, 20 (12) : 2271 - 2280