A hybrid linear text segmentation algorithm using hierarchical agglomerative clustering and discrete particle swarm optimization

被引:34
|
作者
Wu, Ji-Wei [1 ]
Tseng, Judy C. R. [2 ]
Tsai, Wen-Nung
机构
[1] Natl Chiao Tung Univ, Dept Comp Sci, Hsinchu, Taiwan
[2] Chung Hua Univ, Dept Comp Sci & Informat Engn, Hsinchu, Taiwan
关键词
Linear text segmentation; hierarchical agglomerative clustering; discrete particle swarm optimization; natural language processing;
D O I
10.3233/ICA-130446
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Linear text segmentation plays an important role in many natural language processing tasks. Many algorithms have been proposed and shown to improve the performance of linear text segmentation. However, the previous studies often suffer from either lower segmentation accuracy or higher computational complexity. Moreover, parameter setting is another critical problem in some algorithms. Although manual assignment is an approach to solve this problem, it may increase the user's burden, and the parameters provided may not always be suitable to reflect the real metadata of a text. In this paper, a hybrid algorithm, TSHAC-DPSO, is proposed to tackle these problems. A novel linear Text Segmentation algorithm based on Hierarchical Agglomerative Clustering (TSHAC) is proposed to rapidly generate a satisfactory solution without an auxiliary knowledge base, parameter setting, or user involvement; then an efficient evolutional algorithm, Discrete Particle Swarm Optimization (DPSO), is adopted to generate the global optimal solution by refining the solution created by TSHAC. TSHAC-DPSO fully utilizes the merits of both algorithms which not only improve the accuracy of linear text segmentation, but also make the execution more efficient and flexible. The experimental results show that TSHAC-DPSO provides comparable segmentation accuracy with several well-known linear text segmentation algorithms.
引用
收藏
页码:35 / 46
页数:12
相关论文
共 50 条
  • [1] Text document clustering using Spectral Clustering algorithm with Particle Swarm Optimization
    Janani, R.
    Vijayarani, S.
    EXPERT SYSTEMS WITH APPLICATIONS, 2019, 134 : 192 - 200
  • [2] Customer Segmentation Using K-Means Clustering and the Hybrid Particle Swarm Optimization Algorithm
    Li, Yue
    Qi, Jianfang
    Chu, Xiaoquan
    Mu, Weisong
    COMPUTER JOURNAL, 2023, 66 (04): : 941 - 962
  • [3] Discrete Particle Swarm Optimization Algorithm for Data Clustering
    Karthi, R.
    Arumugam, S.
    Kumar, K. Ramesh
    NICSO 2008: NATURE INSPIRED COOPERATIVE STRATEGIES FOR OPTIMIZATION, 2009, 236 : 75 - +
  • [4] Customer Segmentation Using Hierarchical Agglomerative Clustering
    Phan Duy Hung
    Nguyen Thi Thuy Lien
    Nguyen Duc Ngoc
    PROCEEDINGS OF THE 2ND INTERNATIONAL CONFERENCE ON INFORMATION SCIENCE AND SYSTEMS (ICISS 2019), 2019, : 33 - 37
  • [5] An effective clustering method using a Discrete Particle Swarm Optimization algorithm-based hybrid approach
    Guan, Jing-Hua
    Liu, Da-You
    Jia, Hai-Yang
    Yu, Peng
    PROCEEDINGS OF 2006 INTERNATIONAL CONFERENCE ON MACHINE LEARNING AND CYBERNETICS, VOLS 1-7, 2006, : 1114 - +
  • [6] A hybrid particle swarm optimization algorithm for clustering analysis
    Marinakis, Yannis
    Marinaki, Magdalene
    Matsatsinis, Nikolaos
    DATA WAREHOUSING AND KNOWLEDGE DISCOVERY, PROCEEDINGS, 2007, 4654 : 241 - +
  • [7] An agglomerative hierarchical clustering algorithm for linear ordinal rankings
    Liu, Nana
    Xu, Zeshui
    Zeng, Xiao-Jun
    Ren, Peijia
    INFORMATION SCIENCES, 2021, 557 : 170 - 193
  • [8] Automatic Data Clustering Using Hybrid Firefly Particle Swarm Optimization Algorithm
    Agbaje, Moyinoluwa B.
    Ezugwu, Absalom E.
    Els, Rosanne
    IEEE ACCESS, 2019, 7 : 184963 - 184984
  • [9] Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering
    Abualigah, Laith Mohammad
    Khader, Ahamad Tajudin
    JOURNAL OF SUPERCOMPUTING, 2017, 73 (11): : 4773 - 4795
  • [10] Unsupervised text feature selection technique based on hybrid particle swarm optimization algorithm with genetic operators for the text clustering
    Laith Mohammad Abualigah
    Ahamad Tajudin Khader
    The Journal of Supercomputing, 2017, 73 : 4773 - 4795