A multi-block clustering algorithm for high dimensional binarized sparse data

被引:3
作者
Kosztyan, Zsolt T. [1 ,2 ,3 ]
Telcs, Andras [1 ,2 ,4 ]
Abonyi, Janos [5 ]
机构
[1] Univ Pannonia, Dept Quantitat Methods, Egyet Str 10, H-8200 Veszprem, Hungary
[2] MTA PE Budapest Ranking Res Grp, Piarista Str 4, H-1052 Budapest, Hungary
[3] Koszeg iASK, Inst Adv Studies, Charnel Str 10, H-9730 Koszeg, Hungary
[4] Wigner Res Ctr Phys, Dept Computat Sci, H-1121 Budapest, Hungary
[5] Univ Pannonia, MTA PE Lendulet Complex Syst Monitoring Res Grp, Egyet Str 10, H-8200 Veszprem, Hungary
关键词
Multidimensional clustering; High dimensional data; Ranking; Higher educational institutes; EXPRESSION DATA;
D O I
10.1016/j.eswa.2021.116219
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We introduce a multidimensional multiblock clustering (MDMBC) algorithm in this paper. MDMBC can generate overlapping clusters with similar values along clusters of dimensions. The parsimonious binary vector representation of multidimensional clusters lends itself to the application of efficient meta-heuristic optimization algorithms. In this paper, a hill-climbing (HC) greedy search algorithm has been presented that can be extended by several stochastic and population-based meta-heuristic frameworks. The benefits of the algorithm are demonstrated in a bi-clustering benchmark problem and in the analysis of the Leiden higher education ranking system, which measures the scientific performance of 903 institutions along four dimensions of 20 indicators representing publication output and collaboration in different scientific fields and time periods.
引用
收藏
页数:12
相关论文
共 39 条
  • [1] A multi-step approach to time series analysis and gene expression clustering
    Amato, R
    Ciaramella, A
    Deniskina, N
    Del Mondo, C
    di Bernardo, D
    Donalek, C
    Longo, G
    Mangano, G
    Miele, G
    Raiconi, G
    Staiano, A
    Tagliaferri, R
    [J]. BIOINFORMATICS, 2006, 22 (05) : 589 - 596
  • [2] Ayadi W, 2010, LECT N BIOINFORMAT, V6282, P219, DOI 10.1007/978-3-642-16001-1_19
  • [3] Chandana B. S., 2014, Int. J. Electr. Comput. Eng. (IJECE), V4, P923
  • [4] Cheng Y, 2000, Proc Int Conf Intell Syst Mol Biol, V8, P93
  • [5] Culhane A., 2019, R PACKAGE
  • [6] Biclustering of expression data with evolutionary computation
    Divina, F
    Aguilar-Ruiz, JS
    [J]. IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, 2006, 18 (05) : 590 - 602
  • [7] Biclustering: Overcoming Data Dimensionality Problems in Market Segmentation
    Dolnicar, Sara
    Kaiser, Sebastian
    Lazarevski, Katie
    Leisch, Friedrich
    [J]. JOURNAL OF TRAVEL RESEARCH, 2012, 51 (01) : 41 - 49
  • [8] Forero PA, 2019, INT CONF ACOUST SPEE, P3442, DOI 10.1109/ICASSP.2019.8683789
  • [9] Block clustering with Bernoulli mixture models: Comparison of different approaches
    Govaert, Gerard
    Nadif, Mohamed
    [J]. COMPUTATIONAL STATISTICS & DATA ANALYSIS, 2008, 52 (06) : 3233 - 3245
  • [10] iBBiG: iterative binary bi-clustering of gene sets
    Gusenleitner, Daniel
    Howe, Eleanor A.
    Bentink, Stefan
    Quackenbush, John
    Culhane, Aedin C.
    [J]. BIOINFORMATICS, 2012, 28 (19) : 2484 - 2492