Trend-based Document Clustering for Sensitive and Stable Topic Detection

被引:0
作者
Sato, Yoshihide [1 ]
Kawashima, Harumi [2 ]
Okuda, Hidenori [2 ]
Oku, Masahiro [2 ]
机构
[1] NTT Corp, NTT West Corp, 1-1 Hikarino Oka, Yokosuka, Kanagawa 2390847, Japan
[2] NTT Corp, NTT Cyber Solut Labs, Yokosuka, Kanagawa 2390847, Japan
来源
PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION | 2008年
关键词
trend; clustering; gradient model; word frequency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to detect new topics and track them is important given the huge amounts of documents. This paper introduces a trend-based document clustering algorithm for analyzing them. Its key characteristic; is that it gives scores to words on the basis of the fluctuation in word frequency. The algorithm generates clusters in a practical time, with O(n) processing cost due to preliminary calculation of document distances. The attribute allows the user to settle on the best level of granularity for identifying topics. Experiments prove that our algorithm can gather relevant documents with F measure of 63.0% on average from the beginning to the end of topic lifetime and it largely surpasses other algorithms.
引用
收藏
页码:331 / +
页数:2
相关论文
共 50 条
[41]   Identifying Evolutionary Topic Temporal Patterns Based on Bursty Phrase Clustering [J].
Liu, Yixuan ;
Gao, Zihao ;
Iwaihara, Mizuho .
WEB AND BIG DATA, APWEB-WAIM 2017, PT II, 2017, 10367 :276-284
[42]   Mass of short texts clustering and topic extraction based on frequent itemsets [J].
Peng, Min ;
Huang, Jiajia ;
Zhu, Jiahui ;
Huang, Jimin ;
Liu, Jiping .
Jisuanji Yanjiu yu Fazhan/Computer Research and Development, 2015, 52 (09) :1941-1953
[43]   Graph-based topic models for trajectory clustering in crowd videos [J].
Al Ghamdi, Manal ;
Gotoh, Yoshihiko .
MACHINE VISION AND APPLICATIONS, 2020, 31 (05)
[44]   Graph-based topic models for trajectory clustering in crowd videos [J].
Manal Al Ghamdi ;
Yoshihiko Gotoh .
Machine Vision and Applications, 2020, 31
[45]   Extractive text summarization using clustering-based topic modeling [J].
Belwal, Ramesh Chandra ;
Rai, Sawan ;
Gupta, Atul .
SOFT COMPUTING, 2023, 27 (07) :3965-3982
[46]   Short Text Embedding for Clustering based on Word and Topic Semantic Information [J].
Chen, Ziheng ;
Ren, Jiangtao .
2019 IEEE INTERNATIONAL CONFERENCE ON DATA SCIENCE AND ADVANCED ANALYTICS (DSAA 2019), 2019, :61-70
[47]   Extractive text summarization using clustering-based topic modeling [J].
Ramesh Chandra Belwal ;
Sawan Rai ;
Atul Gupta .
Soft Computing, 2023, 27 :3965-3982
[48]   An effective web document clustering algorithm based on bisection and merge [J].
Lee, Ingyu ;
On, Byung-Won .
ARTIFICIAL INTELLIGENCE REVIEW, 2011, 36 (01) :69-85
[49]   Extended ACO Based Document Clustering with hybrid Distance Metric [J].
Subhadra, K. ;
Shashi, M. ;
Das, Abhishek .
2015 IEEE INTERNATIONAL CONFERENCE ON ELECTRICAL, COMPUTER AND COMMUNICATION TECHNOLOGIES, 2015,
[50]   Text Document Clustering Based on Density K-means [J].
Wu, Di ;
Zeng, Yan ;
Qu, Yin-chuan .
INTERNATIONAL CONFERENCE ON COMPUTER, MECHATRONICS AND ELECTRONIC ENGINEERING (CMEE 2016), 2016,