Trend-based Document Clustering for Sensitive and Stable Topic Detection

被引:0
作者
Sato, Yoshihide [1 ]
Kawashima, Harumi [2 ]
Okuda, Hidenori [2 ]
Oku, Masahiro [2 ]
机构
[1] NTT Corp, NTT West Corp, 1-1 Hikarino Oka, Yokosuka, Kanagawa 2390847, Japan
[2] NTT Corp, NTT Cyber Solut Labs, Yokosuka, Kanagawa 2390847, Japan
来源
PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION | 2008年
关键词
trend; clustering; gradient model; word frequency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to detect new topics and track them is important given the huge amounts of documents. This paper introduces a trend-based document clustering algorithm for analyzing them. Its key characteristic; is that it gives scores to words on the basis of the fluctuation in word frequency. The algorithm generates clusters in a practical time, with O(n) processing cost due to preliminary calculation of document distances. The attribute allows the user to settle on the best level of granularity for identifying topics. Experiments prove that our algorithm can gather relevant documents with F measure of 63.0% on average from the beginning to the end of topic lifetime and it largely surpasses other algorithms.
引用
收藏
页码:331 / +
页数:2
相关论文
共 50 条
[21]   Unsupervised neural networks for automatic Arabic text summarization using document clustering and topic modeling [J].
Alami, Nabil ;
Meknassi, Mohammed ;
En-nahnahi, Noureddine ;
El Adlouni, Yassine ;
Ammor, Ouafae .
EXPERT SYSTEMS WITH APPLICATIONS, 2021, 172
[22]   Document clustering based on constructing density tree [J].
Dai W. ;
Wang W. ;
Hou Y. ;
Wang Y. ;
Zhang L. .
Transactions of Tianjin University, 2008, 14 (1) :21-26
[23]   Document Clustering Based on Constructing Density Tree [J].
戴维迪 ;
王文俊 ;
侯越先 ;
王英 ;
张璐 .
Transactions of Tianjin University, 2008, (01) :21-26
[24]   Clustering-Based Online News Topic Detection and Tracking Through Hierarchical Bayesian Nonparametric Models [J].
Fan, Wentao ;
Guo, Zhiyan ;
Bouguila, Nizar ;
Hou, Wenjuan .
SIGIR '21 - PROCEEDINGS OF THE 44TH INTERNATIONAL ACM SIGIR CONFERENCE ON RESEARCH AND DEVELOPMENT IN INFORMATION RETRIEVAL, 2021, :2126-2130
[25]   A K-medoids Based Clustering Scheme with an Application to Document Clustering [J].
Onan, Aytug .
2017 INTERNATIONAL CONFERENCE ON COMPUTER SCIENCE AND ENGINEERING (UBMK), 2017, :354-359
[26]   Enhancing Scientific Collaborations using Community Detection and Document Clustering [J].
Radulescu, Iulia-Maria ;
Truica, Ciprian-Octavian ;
Apostol, Elena-Simona ;
Dobre, Ciprian .
2020 IEEE 16TH INTERNATIONAL CONFERENCE ON INTELLIGENT COMPUTER COMMUNICATION AND PROCESSING (ICCP 2020), 2020, :43-50
[27]   Multi-document summarization using weighted similarity between topic and clustering-based non-negative semantic feature [J].
Park, Sun ;
Lee, Ju-Hong ;
Kim, Deok-Hwan ;
Ahn, Chan-Min .
ADVANCES IN DATA AND WEB MANAGEMENT, PROCEEDINGS, 2007, 4505 :108-+
[28]   On-line Hot Topic Recommendation Using Tolerance Rough Set Based Topic Clustering [J].
Wu, Yonghui ;
Ding, Yuxin ;
Wang, Xiaolong ;
Xu, Jun .
JOURNAL OF COMPUTERS, 2010, 5 (04) :549-556
[29]   Corpus-based topic diffusion for short text clustering [J].
Zheng, Chu Tao ;
Liu, Cheng ;
Wong, Hau San .
NEUROCOMPUTING, 2018, 275 :2444-2458
[30]   Probability based document clustering and image clustering using content-based image retrieval [J].
Karthikeyan, M. ;
Aruna, P. .
APPLIED SOFT COMPUTING, 2013, 13 (02) :959-966