Clustering Sentence-Level Text Using a Novel Fuzzy Relational Clustering Algorithm

被引:40
作者
Skabar, Andrew [1 ]
Abdalgader, Khaled [1 ]
机构
[1] La Trobe Univ, Dept Comp Sci & Comp Engn, Melbourne, Vic 3086, Australia
关键词
Fuzzy relational clustering; natural language processing; graph centrality; C-MEANS; CLASSIFICATION;
D O I
10.1109/TKDE.2011.205
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
In comparison with hard clustering methods, in which a pattern belongs to a single cluster, fuzzy clustering algorithms allow patterns to belong to all clusters with differing degrees of membership. This is important in domains such as sentence clustering, since a sentence is likely to be related to more than one theme or topic present within a document or set of documents. However, because most sentence similarity measures do not represent sentences in a common metric space, conventional fuzzy clustering approaches based on prototypes or mixtures of Gaussians are generally not applicable to sentence clustering. This paper presents a novel fuzzy clustering algorithm that operates on relational input data; i.e., data in the form of a square matrix of pairwise similarities between data objects. The algorithm uses a graph representation of the data, and operates in an Expectation-Maximization framework in which the graph centrality of an object in the graph is interpreted as a likelihood. Results of applying the algorithm to sentence clustering tasks demonstrate that the algorithm is capable of identifying overlapping clusters of semantically related sentences, and that it is therefore of potential use in a variety of text mining tasks. We also include results of applying the algorithm to benchmark data sets in several other domains.
引用
收藏
页码:62 / 75
页数:14
相关论文
共 56 条
[1]   A new sentence similarity measure and sentence based extractive technique for automatic text summarization [J].
Aliguliyev, Ramiz M. .
EXPERT SYSTEMS WITH APPLICATIONS, 2009, 36 (04) :7764-7772
[2]  
[Anonymous], UCI MACHINE LEARNING
[3]  
[Anonymous], 2005, Network Analysis: Methodological Foundations
[4]  
[Anonymous], 2001, Pattern Classification
[5]   A CLUSTERING TECHNIQUE FOR SUMMARIZING MULTIVARIATE DATA [J].
BALL, GH ;
HALL, DJ .
BEHAVIORAL SCIENCE, 1967, 12 (02) :153-&
[6]  
Bellman R., 1961, Adaptive Control Processes: A Guided Tour, DOI DOI 10.1515/9781400874668
[7]  
Bezdek J. C., 1981, Pattern recognition with fuzzy objective function algorithms
[8]  
Bezdek J. C., 1973, Journal of Cybernetics, V3, P58, DOI 10.1080/01969727308546047
[9]  
Bezdek J. C., 1975, P 8 ANN INT C NUM TA, P143
[10]  
Bishop CM., 1995, NEURAL NETWORKS PATT