Trend-based Document Clustering for Sensitive and Stable Topic Detection

被引:0
|
作者
Sato, Yoshihide [1 ]
Kawashima, Harumi [2 ]
Okuda, Hidenori [2 ]
Oku, Masahiro [2 ]
机构
[1] NTT Corp, NTT West Corp, 1-1 Hikarino Oka, Yokosuka, Kanagawa 2390847, Japan
[2] NTT Corp, NTT Cyber Solut Labs, Yokosuka, Kanagawa 2390847, Japan
来源
PACLIC 22: PROCEEDINGS OF THE 22ND PACIFIC ASIA CONFERENCE ON LANGUAGE, INFORMATION AND COMPUTATION | 2008年
关键词
trend; clustering; gradient model; word frequency;
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The ability to detect new topics and track them is important given the huge amounts of documents. This paper introduces a trend-based document clustering algorithm for analyzing them. Its key characteristic; is that it gives scores to words on the basis of the fluctuation in word frequency. The algorithm generates clusters in a practical time, with O(n) processing cost due to preliminary calculation of document distances. The attribute allows the user to settle on the best level of granularity for identifying topics. Experiments prove that our algorithm can gather relevant documents with F measure of 63.0% on average from the beginning to the end of topic lifetime and it largely surpasses other algorithms.
引用
收藏
页码:331 / +
页数:2
相关论文
共 50 条
  • [1] Trend-Based Granular Representation of Time Series and Its Application in Clustering
    Guo, Hongyue
    Wang, Lidong
    Liu, Xiaodong
    Pedrycz, Witold
    IEEE TRANSACTIONS ON CYBERNETICS, 2022, 52 (09) : 9101 - 9110
  • [2] A comparison of forecasting methods for medical device demand using trend-based clustering scheme
    Shuojiang Xu
    Hing Kai Chan
    Eugene Ch’ng
    Kim Hua Tan
    Journal of Data, Information and Management, 2020, 2 (2): : 85 - 94
  • [3] Trend-based Time Series Prediction Algorithm
    Xin, Qi
    Hong, Liang
    Zhen, Li
    PROGRESS IN MEASUREMENT AND TESTING, PTS 1 AND 2, 2010, 108-111 : 1164 - 1169
  • [4] Topic Detection based on Group Average Hierarchical Clustering
    Gao, Ni
    Gao, Ling
    He, Yiyue
    Wang, Hai
    Sun, Qian
    2013 INTERNATIONAL CONFERENCE ON ADVANCED CLOUD AND BIG DATA (CBD), 2013, : 88 - 92
  • [5] A Novel Approach of Neural Topic Modelling for Document Clustering
    Subramani, Sandhya
    Sridhar, Vaishnavi
    Shetty, Kaushal
    2018 IEEE SYMPOSIUM SERIES ON COMPUTATIONAL INTELLIGENCE (IEEE SSCI), 2018, : 2169 - 2173
  • [6] Unsupervised Topic Aware Document-Level Semantic Representation for Document Clustering
    Rafi, Muhammad
    Khan, Hamza
    Nadeem, Haya
    Shakeel, Hassan
    2021 22ND INTERNATIONAL ARAB CONFERENCE ON INFORMATION TECHNOLOGY (ACIT), 2021, : 170 - 179
  • [7] A Multi-Criteria Document Clustering Method Based on Topic Modeling and Pseudoclosure Function
    Quang Vu Bui
    Sayadi, Karim
    Bui, Marc
    INFORMATICA-JOURNAL OF COMPUTING AND INFORMATICS, 2016, 40 (02): : 169 - 180
  • [8] Multistage optimization filter for trend-based short-term forecasting
    Zafar, Usman
    Kellard, Neil
    Vinogradov, Dmitri
    JOURNAL OF FORECASTING, 2022, 41 (02) : 345 - 360
  • [9] TOPICVIEW: VISUAL ANALYSIS OF TOPIC MODELS AND THEIR IMPACT ON DOCUMENT CLUSTERING
    Crossno, Patricia J.
    Wilson, Andrew T.
    Shead, Timothy M.
    Davis, Warren L.
    Dunlavy, Daniel M.
    INTERNATIONAL JOURNAL ON ARTIFICIAL INTELLIGENCE TOOLS, 2013, 22 (05)
  • [10] Specific Document Sign Location Detection Based on Point Matching and Clustering
    Xiong, Huaixin
    ADVANCES IN VISUAL COMPUTING, ISVC 2018, 2018, 11241 : 180 - 190