TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING

被引:2
|
作者
Crossno, Patricia J. [1 ]
Wilson, Andrew T. [1 ]
Shead, Timothy M. [1 ]
Davis, Warren L. [1 ]
Dunlavy, Daniel M. [1 ]
机构
[1] Sandia Natl Labs, Albuquerque, NM 87185 USA
关键词
Text analysis; visual model analysis; latent semantic analysis; latent dirichlet allocation; clustering;
D O I
10.1142/S0218213013600087
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We present a new approach for analyzing topic models using visual analytics. We have developed Topic View, an application for visually comparing and exploring multiple models of text corpora, as a prototype for this type of analysis tool. Topic View uses multiple linked views to visually analyze conceptual and topical content, document relationships identified by models, and the impact of models on the results of document clustering. As case studies, we examine models created using two standard approaches: Latent Semantic Analysis (LSA) and Latent Dirichlet Allocation (LDA). Conceptual content is compared through the combination of (i) a bipartite graph matching LSA concepts with LDA topics based on the cosine similarities of model factors and (ii) a table containing the terms for each LSA concept and LDA topic listed in decreasing order of importance. Document relationships are examined through the combination of (i) side-by-side document similarity graphs, (ii) a table listing the weights for each document's contribution to each concept/topic, and (iii) a full text reader for documents selected in either of the graphs or the table. The impact of LSA and LDA models on document clustering applications is explored through similar means, using proximities between documents and cluster exemplars for graph layout edge weighting and table entries. We demonstrate the utility of Topic View's visual approach to model assessment by comparing LSA and LDA models of several example corpora.
引用
收藏
页数:36
相关论文
共 50 条
  • [1] TopicView: Visually Comparing Topic Models of Text Collections
    Crossno, Patricia J.
    Wilson, Andrew T.
    Shead, Timothy M.
    Dunlavy, Daniel M.
    2011 23RD IEEE INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2011), 2011, : 936 - 943
  • [2] A Novel Approach of Neural Topic Modelling for Document Clustering
    Subramani, Sandhya
    Sridhar, Vaishnavi
    Shetty, Kaushal
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2169 - 2173
  • [3] An Overview of Clustering Models with an Application to Document Clustering
    Pauletic, Iva
    Nacinovic Prskalo, Lucia
    Bakaric, Marija Brkic
    2019 42ND INTERNATIONAL CONVENTION ON INFORMATION AND COMMUNICATION TECHNOLOGY, ELECTRONICS AND MICROELECTRONICS (MIPRO), 2019, : 1659 - 1664
  • [4] Unsupervised Topic Aware Document-Level Semantic Representation for Document Clustering
    Rafi, Muhammad
    Khan, Hamza
    Nadeem, Haya
    Shakeel, Hassan
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 170 - 179
  • [5] A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
    Quang Vu Bui
    Sayadi, Karim
    Bui, Marc
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2016, 40 (02): : 169 - 180
  • [6] A Comparative Study of Topic Models for Topic Clustering of Chinese Web News
    Wu, Yonghui
    Ding, Yuxin
    Wang, Xiaolong
    Xu, Jun
    PROCEEDINGS OF 2010 3RD IEEE INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND INFORMATION TECHNOLOGY (ICCSIT 2010), VOL 5, 2010, : 236 - 240
  • [7] Tagged Image Clustering via Topic Models
    Cui, Junjun
    Liu, Lizhen
    Wang, Hanshi
    Du, Chao
    Song, Wei
    2015 27TH CHINESE CONTROL AND DECISION CONFERENCE (CCDC), 2015, : 4424 - 4429
  • [8] Trend-based Document Clustering for Sensitive and Stable Topic Detection
    Sato, Yoshihide
    Kawashima, Harumi
    Okuda, Hidenori
    Oku, Masahiro
    PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION, 2008, : 331 - +
  • [9] Clustering with biological visual models
    Rodriguez, Alma
    Cuevas, Erik
    Zaldivar, Daniel
    Castaneda, Luis
    PHYSICA A-STATISTICAL MECHANICS AND ITS APPLICATIONS, 2019, 528
  • [10] Leveraging Global and Local Topic Popularities for LDA-Based Document Clustering
    Yang, Peng
    Yao, Yu
    Zhou, Huajian
    IEEE ACCESS, 2020, 8 (08): : 24734 - 24745