Trend-based Document Clustering for Sensitive and Stable Topic Detection

被引:0
作者
Sato, Yoshihide [1 ]
Kawashima, Harumi [2 ]
Okuda, Hidenori [2 ]
Oku, Masahiro [2 ]
机构
[1] NTT Corp, NTT West Corp, 1-1 Hikarino Oka, Yokosuka, Kanagawa 2390847, Japan
[2] NTT Corp, NTT Cyber Solut Labs, Yokosuka, Kanagawa 2390847, Japan
来源
PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION | 2008年
关键词
trend; clustering; gradient model; word frequency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to detect new topics and track them is important given the huge amounts of documents. This paper introduces a trend-based document clustering algorithm for analyzing them. Its key characteristic; is that it gives scores to words on the basis of the fluctuation in word frequency. The algorithm generates clusters in a practical time, with O(n) processing cost due to preliminary calculation of document distances. The attribute allows the user to settle on the best level of granularity for identifying topics. Experiments prove that our algorithm can gather relevant documents with F measure of 63.0% on average from the beginning to the end of topic lifetime and it largely surpasses other algorithms.
引用
收藏
页码:331 / +
页数:2
相关论文
共 50 条
[31]   Multi-document summarization based on unsupervised clustering [J].
Ji, Paul .
INFORMATION RETRIEVAL TECHNOLOLGY, PROCEEDINGS, 2006, 4182 :560-566
[32]   Web Document Clustering Research Based on Granular Computing [J].
Zheng Shangzhi ;
Zhao Xiaolong ;
Zhang Buqun ;
Bu Hualong .
PROCEEDINGS OF THE SECOND INTERNATIONAL SYMPOSIUM ON ELECTRONIC COMMERCE AND SECURITY, VOL II, 2009, :446-450
[33]   Consensus-based clustering for document image segmentation [J].
Soumyadeep Dey ;
Jayanta Mukherjee ;
Shamik Sural .
International Journal on Document Analysis and Recognition (IJDAR), 2016, 19 :351-368
[34]   WordNet and Semantic Similarity based Approach for Document Clustering [J].
Desai, Sneha S. ;
Laxminarayana, J. A. .
2016 INTERNATIONAL CONFERENCE ON COMPUTATION SYSTEM AND INFORMATION TECHNOLOGY FOR SUSTAINABLE SOLUTIONS (CSITSS), 2016, :312-317
[35]   Consensus-based clustering for document image segmentation [J].
Dey, Soumyadeep ;
Mukherjee, Jayanta ;
Sural, Shamik .
INTERNATIONAL JOURNAL ON DOCUMENT ANALYSIS AND RECOGNITION, 2016, 19 (04) :351-368
[36]   JSON']JSON document clustering based on schema embeddings [J].
Priya, D. Uma ;
Thilagam, P. Santhi .
JOURNAL OF INFORMATION SCIENCE, 2024, 50 (05) :1112-1130
[37]   Partitioning-based clustering for Web document categorization [J].
Boley, D ;
Gini, M ;
Gross, R ;
Han, EH ;
Hastings, K ;
Karypis, G ;
Kumar, V ;
Mobasher, B ;
Moore, J .
DECISION SUPPORT SYSTEMS, 1999, 27 (03) :329-341
[38]   Research of Clustering Algorithm based on Information Entropy and Frequency Sensitive Discrepancy Metric in Anomaly Detection [J].
Li, Han ;
Wu, Qiuxin .
PROCEEDINGS OF 2013 INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND CLOUD COMPUTING COMPANION (ISCC-C), 2014, :799-805
[39]   Hot Topic Detection Based on Complex Networks [J].
Deng, Jingwei ;
Deng, Kaiying ;
Li, Yongsheng ;
Li, Yingxing .
2013 10TH INTERNATIONAL CONFERENCE ON FUZZY SYSTEMS AND KNOWLEDGE DISCOVERY (FSKD), 2013, :1055-1059
[40]   Trend based sketching for massive uncertain time series clustering [J].
Chen, J., 1600, Asian Network for Scientific Information (12) :7280-7288