OvNMTF Algorithm: an Overlapping Non-Negative Matrix Tri-Factorization for Coclustering

被引:1
作者
de Freitas Junior, Waldyr L. [1 ]
Peres, Sarajane M. [1 ]
Freire, Valdinei [1 ]
Brunialti, Lucas Fernandes [2 ]
机构
[1] Univ Sao Paulo, Escola Artes Ciencias & Humanidades, Sao Paulo, Brazil
[2] Cobli, Sao Paulo, Brazil
来源
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN) | 2020年
关键词
coclustering; matrix factorization;
D O I
10.1109/ijcnn48605.2020.9207364
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Coclustering algorithms are an alternative to classic one-sided clustering algorithms. Because of its ability to simultaneously cluster rows and columns of a dyadic data matrix, coclustering offers a higher value-added information: it offers column clusters besides row clusters, and the relationship between them in terms of coclusters. Different structures of coclusters are possible, and those that overlap in terms of rows or columns still represent an open question with room for improvements. In addition, while most related literature cites coclustering as a means of producing better results from one-side clustering, few initiatives study it as a tool capable of providing higher quality descriptive information about this clustering. In this paper, we present a new coclustering algorithm - OvNMTF, based on triple matrix factorization, which properly handle overlapped coclusters, by adding degrees of freedom for matrix factorization that enable the discovery of specialized column clusters for each row cluster. As a proof of concept, we modeled text analysis as a coclustering problem with column overlaps, assuming that given words (data matrix columns) are associated with over one document cluster (row cluster) because they can assume different semantic relationships in each association. Experiments on synthetic data sets show the OvNMTF algorithm reasonableness; experiments on real-world text data show its power for extracting high quality information.
引用
收藏
页数:8
相关论文
共 31 条
  • [1] [Anonymous], 2003, P 26 ANN INT ACM SIG
  • [2] [Anonymous], Snowball: A language for stemming algorithms
  • [3] The BinOvNMTF Algorithm: Overlapping Columns Co-clustering based on Non-negative Matrix Tri-Factorization
    Brunialti, Lucas F.
    Peres, Sarajane M.
    da Silva, Valdinei Freire
    de Moraes Lima, Clodoaldo A.
    [J]. 2017 6TH BRAZILIAN CONFERENCE ON INTELLIGENT SYSTEMS (BRACIS), 2017, : 330 - 335
  • [4] Graph Regularized Nonnegative Matrix Factorization for Data Representation
    Cai, Deng
    He, Xiaofei
    Han, Jiawei
    Huang, Thomas S.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2011, 33 (08) : 1548 - 1560
  • [5] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
  • [6] Non-negative Matrix Tri-Factorization for co-clustering: An analysis of the block matrix
    Del Buono, N.
    Pio, G.
    [J]. INFORMATION SCIENCES, 2015, 301 : 13 - 26
  • [7] Ding C, 2005, SIAM PROC S, P606
  • [8] Convex and Semi-Nonnegative Matrix Factorizations
    Ding, Chris
    Li, Tao
    Jordan, Michael I.
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2010, 32 (01) : 45 - 55
  • [9] Ding Chris, 2006, P 12 ACM SIGKDD INT, P126, DOI 10.1145/1150402.1150420
  • [10] Fayyad U, 1996, AI MAG, V17, P37