A topic detection method based on KM-LSH Fusion algorithm and improved BTM model

被引:0
作者
Liu, Wenjun [1 ,2 ,5 ]
Guo, Huan [1 ]
Gan, Jiaxin [1 ]
Wang, Hai [1 ]
Wang, Hailan [1 ]
Zhang, Chao [3 ]
Peng, Qingcheng [1 ]
Sun, Yuyan [1 ]
Yu, Bao [1 ]
Hou, Mengshu [2 ,4 ]
Li, Bo [1 ]
Li, Xiaolei [1 ]
机构
[1] School of Computer and Software Engineering, XiHua University, Chengdu
[2] School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu
[3] Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College, Luzhou
[4] School of Big Data and Artificial Intelligence, Chengdu Technological University, Chengdu
[5] Sichuan Provincial Engineering Research Center of Hydroelectric Energy Power Equipment Technology, Chengdu
基金
中国国家自然科学基金;
关键词
BTM model; Cohesiveness; K-means algorithm; Text modeling; Topic detection;
D O I
10.1007/s00500-024-09874-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic detection is an information processing technology designed to help people deal with the growing problem of data information on the Internet. In the research literature, topic detection methods are used for topic classification through word embedding, supervised-based and unsupervised-based approaches. However, most methods for topic detection only address the problem of clustering and do not focus on the problem of topic detection accuracy reduction due to the cohesiveness of topics. Also, the sequence of biterm during topic detection can cause substantial deviations in the detected topic content. To solve the above problems, this paper proposes a topic detection method based on KM-LSH fusion algorithm and improved BTM model. KM-LSH fusion algorithm is a fusion algorithm that combines K-means clustering and LSH refinement clustering. The proposed method can solve the problem of cohesiveness of topic detection, and the improved BTM model can solve the influence of the sequence of biterm on topic detection. First, the text vector is constructed by processing the collected set of microblog texts using text preprocessing methods. Secondly, the KM-LSH fusion algorithm is used to calculate text similarity and perform topic clustering and refinement. Finally, the improved BTM model is used to model the texts, which is combined with the word position and the improved TF-IDF weight calculation algorithm to adjust the microblogging texts in clustering. The experiment results indicate that the proposed KM-LSH-IBTM method improves the evaluation indexes compared with the other three topic detection methods. In conclusion, the proposed KM-LSH-IBTM method promotes the processing capability of topic detection in terms of cohesiveness and the sequence of biterm. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:11421 / 11438
页数:17
相关论文
共 43 条
[31]   Research on improved RFM customer segmentation model based on K-Means algorithm [J].
Huang, Yong ;
Zhang, Mingzhen ;
He, Yue .
2020 5TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL INTELLIGENCE AND APPLICATIONS (ICCIA 2020), 2020, :24-27
[32]   Research on Network Intrusion Detection System Based on Improved K-means Clustering Algorithm [J].
Li Tian ;
Wang Jianwen .
2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, :76-79
[33]   Sea Surface Ships Detection Method of UAV Based on Improved YOLOv3 [J].
Zhang Xiangfu ;
Shi Zhangsong ;
Wu Zhonghong ;
Liu Jian .
ELEVENTH INTERNATIONAL CONFERENCE ON GRAPHICS AND IMAGE PROCESSING (ICGIP 2019), 2020, 11373
[34]   Analysis of Non-Negative Double Singular Value Decomposition Initialization Method on Eigenspace-based Fuzzy C-Means Algorithm For Indonesian Online News Topic Detection [J].
Sutrisman, Raden Trivan ;
Murfi, Hendri .
2018 6TH INTERNATIONAL CONFERENCE ON INFORMATION AND COMMUNICATION TECHNOLOGY (ICOICT), 2018, :55-60
[35]   Lightweight and accurate aphid detection model based on an improved deep-learning network [J].
Sun, Weihai ;
Li, Yane ;
Feng, Hailin ;
Weng, Xiang ;
Ruan, Yaoping ;
Fang, Kai ;
Huang, Leijun .
ECOLOGICAL INFORMATICS, 2024, 83
[36]   A Network Intrusion Detection Model Based on K-means Algorithm and Information Entropy [J].
Meng, Gao ;
Dan, Li ;
Ni-Hong, Wang ;
Li-Chen, Liu .
INTERNATIONAL JOURNAL OF SECURITY AND ITS APPLICATIONS, 2014, 8 (06) :285-294
[37]   Dynamic equivalence method of PMSG wind farms based on improved D-K clustering algorithm [J].
Wang L. ;
Gai C. ;
Wang H. .
Taiyangneng Xuebao/Acta Energiae Solaris Sinica, 2021, 42 (03) :48-54
[38]   A Hybrid Method Combining Improved K-means Algorithm with BADA Model for Generating Nominal Flight Profiles [J].
Tang Xinmin ;
Gu Junwei ;
Shen Zhiyuan ;
Chen Ping ;
Li Bo .
TransactionsofNanjingUniversityofAeronauticsandAstronautics, 2016, 33 (04) :414-424
[39]   A Peak Shaving Method of Aggregating the Distributed Photovoltaics and Energy Storages Based on the Improved K-means++ Algorithm [J].
Wu Y. ;
Yao L. ;
Liao S. ;
Liu Y. ;
Li J. ;
Wang X. .
Dianwang Jishu/Power System Technology, 2022, 46 (10) :3923-3931
[40]   Detection Method for Power Theft Based on SOM Neural Network and K-means Clustering Algorithm [J].
Guo Lingqing ;
Chen Xiaobin ;
Liu Zhaoming ;
Kang Jinping ;
Liu Bingchen ;
Liu Sha .
2019 22ND INTERNATIONAL CONFERENCE ON ELECTRICAL MACHINES AND SYSTEMS (ICEMS 2019), 2019, :3255-3259