(Chinavis 2024) TextLens: large language models-powered visual analytics enhancing text clustering

被引:0
作者
Peng, Ruixiao [1 ,2 ]
Dong, Yu [1 ]
Li, Guan [1 ,2 ]
Tian, Dong [1 ,2 ]
Shan, Guihua [1 ,2 ,3 ]
机构
[1] Chinese Acad Sci, Comp Network Informat Ctr, Beijing, Peoples R China
[2] Univ Chinese Acad Sci, Beijing, Peoples R China
[3] Univ Chinese Acad Sci, Hangzhou Inst Adv Study, Hangzhou, Zhejiang, Peoples R China
基金
中国国家自然科学基金;
关键词
Text Clustering; Large Language Models; Visual Analytics; Natural Language Processing;
D O I
10.1007/s12650-025-01043-y
中图分类号
TP39 [计算机的应用];
学科分类号
081203 ; 0835 ;
摘要
Text clustering is a cornerstone task in natural language processing with a broad spectrum of applications. Given the advancements in large language models, employing such models to enhance general text clustering has shown promising potential in boosting clustering effectiveness. However, current LLMs-driven approaches often act as black boxes in analyzing the processes of text clustering, leading to poor interpretability. Additionally, these approaches are associated with significant API usage costs and lack effective techniques to explore cluster details. To align these challenges, we propose an LLMs-powered visual analytics approach, called TextLens, to enhance text clustering. First, we present an LLMs-powered framework that integrated LLMs for guiding topic extraction, anomaly filtering, and modification assessment. Second, we introduce a visual analytics system designed to support proposed framework, which facilitates interactive exploration of clusters, analysis of cluster-level thematic extraction, and iterative refinement of clustering results. Finally, we conduct evaluations by applying two datasets into four case studies and a user study to compare clustering outcomes with previous methods, demonstrating the effectiveness and scalability of our approach.
引用
收藏
页码:625 / 643
页数:19
相关论文
共 37 条
[1]   Topic modeling algorithms and applications: A survey [J].
Abdelrazek, Aly ;
Eid, Yomna ;
Gawish, Eman ;
Medhat, Walaa ;
Hassan, Ahmed .
INFORMATION SYSTEMS, 2023, 112
[2]  
Achiam J., 2023, Open AI GPT-4 technical report, DOI [DOI 10.48550/ARXIV.2303.08774, 10.48550/arxiv.2303.08774]
[3]  
Basu S, 2004, SIAM PROC S, P333
[4]  
Bianchi F, 2020, ARXIV
[5]   Latent Dirichlet allocation [J].
Blei, DM ;
Ng, AY ;
Jordan, MI .
JOURNAL OF MACHINE LEARNING RESEARCH, 2003, 3 (4-5) :993-1022
[6]  
Blei DM, 2006, P 23 INT C MACH LEAR, P113, DOI DOI 10.1145/1143844.1143859
[7]  
Casanueva Inigo, 2020, arXiv
[8]   ConfVisExplorer: a literature-based visual analysis system for conference comparison [J].
Chen, Kaixin ;
Wang, Yang ;
Yu, Minzhu ;
Shen, Han-Wei ;
Yu, Xiaomin ;
Shan, Guihua .
JOURNAL OF VISUALIZATION, 2021, 24 (02) :381-395
[9]  
DEERWESTER S, 1990, J AM SOC INFORM SCI, V41, P391, DOI 10.1002/(SICI)1097-4571(199009)41:6<391::AID-ASI1>3.0.CO
[10]  
2-9