A topic detection method based on KM-LSH Fusion algorithm and improved BTM model

被引:0
作者
Liu, Wenjun [1 ,2 ,5 ]
Guo, Huan [1 ]
Gan, Jiaxin [1 ]
Wang, Hai [1 ]
Wang, Hailan [1 ]
Zhang, Chao [3 ]
Peng, Qingcheng [1 ]
Sun, Yuyan [1 ]
Yu, Bao [1 ]
Hou, Mengshu [2 ,4 ]
Li, Bo [1 ]
Li, Xiaolei [1 ]
机构
[1] School of Computer and Software Engineering, XiHua University, Chengdu
[2] School of Computer Science and Engineering, University of Electronic Science and Technology of China (UESTC), Chengdu
[3] Intelligent Policing Key Laboratory of Sichuan Province, Sichuan Police College, Luzhou
[4] School of Big Data and Artificial Intelligence, Chengdu Technological University, Chengdu
[5] Sichuan Provincial Engineering Research Center of Hydroelectric Energy Power Equipment Technology, Chengdu
基金
中国国家自然科学基金;
关键词
BTM model; Cohesiveness; K-means algorithm; Text modeling; Topic detection;
D O I
10.1007/s00500-024-09874-x
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
Topic detection is an information processing technology designed to help people deal with the growing problem of data information on the Internet. In the research literature, topic detection methods are used for topic classification through word embedding, supervised-based and unsupervised-based approaches. However, most methods for topic detection only address the problem of clustering and do not focus on the problem of topic detection accuracy reduction due to the cohesiveness of topics. Also, the sequence of biterm during topic detection can cause substantial deviations in the detected topic content. To solve the above problems, this paper proposes a topic detection method based on KM-LSH fusion algorithm and improved BTM model. KM-LSH fusion algorithm is a fusion algorithm that combines K-means clustering and LSH refinement clustering. The proposed method can solve the problem of cohesiveness of topic detection, and the improved BTM model can solve the influence of the sequence of biterm on topic detection. First, the text vector is constructed by processing the collected set of microblog texts using text preprocessing methods. Secondly, the KM-LSH fusion algorithm is used to calculate text similarity and perform topic clustering and refinement. Finally, the improved BTM model is used to model the texts, which is combined with the word position and the improved TF-IDF weight calculation algorithm to adjust the microblogging texts in clustering. The experiment results indicate that the proposed KM-LSH-IBTM method improves the evaluation indexes compared with the other three topic detection methods. In conclusion, the proposed KM-LSH-IBTM method promotes the processing capability of topic detection in terms of cohesiveness and the sequence of biterm. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024.
引用
收藏
页码:11421 / 11438
页数:17
相关论文
共 43 条
  • [21] PLSA-based Topic Detection in Meetings for Adaptation of Lexicon and Language Model
    Akita, Yuya
    Nemoto, Yusuke
    Kawahara, Tatsuya
    INTERSPEECH 2007: 8TH ANNUAL CONFERENCE OF THE INTERNATIONAL SPEECH COMMUNICATION ASSOCIATION, VOLS 1-4, 2007, : 1321 - 1324
  • [22] Topic Detection from Short Text: A Term-based Consensus Clustering Method
    Lin, Hao
    Sun, Bo
    Wu, Junjie
    Xiong, Haitao
    2016 13TH INTERNATIONAL CONFERENCE ON SERVICE SYSTEMS AND SERVICE MANAGEMENT, 2016,
  • [23] An Improved K-Means Clustering Algorithm Based on Spectral Method
    Tian, Shengwen
    Yang, Hongyong
    Wang, Yilei
    Li, Ali
    ADVANCES IN COMPUTATION AND INTELLIGENCE, PROCEEDINGS, 2008, 5370 : 530 - 536
  • [24] An Improved K-Means Clustering Algorithm Based on Semantic Model
    Liu, Zhe
    Bao, Jianmin
    Ding, Fei
    PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY AND ELECTRICAL ENGINEERING 2018 (ICITEE '18), 2018,
  • [25] Research on small target detection algorithm based on improved yolov3
    Ye, Kangquan
    Fang, Zhongbin
    Huang, Xiaojie
    Ma, Xizhe
    Ji, Jing
    Wu, Qiantong
    Xie, Yongjun
    2020 5TH INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING (ICMCCE 2020), 2020, : 1467 - 1470
  • [26] Data fusion of space electronic detection signal based on symmetrical wavelet reconstruction algorithm
    Dong, Yongjian
    JOURNAL OF COMPUTATIONAL METHODS IN SCIENCES AND ENGINEERING, 2025, 25 (01) : 699 - 711
  • [27] An Image Segmentation Method Based on Improved Krill Herd Algorithm and Fuzzy C - Means Clustering Algorithm
    Wang, Ziwei
    Ye, Zhiwei
    Liu, Wei
    Hu, Mingwei
    Tang, Yuanzhi
    Zhang, Li
    Wei, Ming
    PROCEEDINGS OF THE 2019 10TH IEEE INTERNATIONAL CONFERENCE ON INTELLIGENT DATA ACQUISITION AND ADVANCED COMPUTING SYSTEMS - TECHNOLOGY AND APPLICATIONS (IDAACS), VOL. 1, 2019, : 542 - 547
  • [28] A method for clustering rock discontinuities with multiple properties based on an improved netting algorithm
    Hou, Qinkuan
    Wang, Shuhong
    Yong, Rui
    Xiu, Zhanguo
    Han, Wenshuai
    Zhang, Ze
    GEOMECHANICS AND GEOPHYSICS FOR GEO-ENERGY AND GEO-RESOURCES, 2023, 9 (01)
  • [29] The Research of Grade Prediction Model Based on Improved K-means Algorithm
    Zhang, Yongguang
    Wang, Hua
    Li, Hongyang
    PROCEEDINGS OF THE 2016 2ND INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE AND INDUSTRIAL ENGINEERING (AIIE 2016), 2016, 133 : 7 - 10
  • [30] Research on Network Intrusion Detection System Based on Improved K-means Clustering Algorithm
    Li Tian
    Wang Jianwen
    2009 INTERNATIONAL FORUM ON COMPUTER SCIENCE-TECHNOLOGY AND APPLICATIONS, VOL 1, PROCEEDINGS, 2009, : 76 - 79