Distributed non-negative matrix factorization with determination of the number of latent features

被引:15
作者
Chennupati, Gopinath [1 ]
Vangara, Raviteja [2 ]
Skau, Erik [1 ]
Djidjev, Hristo [1 ]
Alexandrov, Boian [2 ]
机构
[1] LANL, Informat Sci CCS 3 Grp, Los Alamos, NM 87545 USA
[2] LANL, Theoret Div T 1 Grp, Los Alamos, NM 87545 USA
关键词
NMF; Latent features; Distributed processing; Clustering; Parallel programming; Silhouette; Big data; STATISTICAL-INFERENCE; MUTATIONAL PROCESSES; ALGORITHM; IMPLEMENTATION; IDENTIFICATION; DECOMPOSITION; SEPARATION; SIGNATURES; MODEL;
D O I
10.1007/s11227-020-03181-6
中图分类号
TP3 [计算技术、计算机技术];
学科分类号
0812 ;
摘要
The holistic analysis and understanding of the latent (that is, not directly observable) variables and patterns buried in large datasets is crucial for data-driven science, decision making and emergency response. Such exploratory analyses require devising unsupervised learning methods for data mining and extraction of the latent features, and non-negative matrix factorization (NMF) is one of the prominent such methods. NMF is based on compute-intense non-convex constrained minimization, which, for large datasets requires fast and distributed algorithms. However, current parallel implementations of NMF fail to estimate the number of latent features. In practice, identifying these features is both difficult and significant for pattern recognition and latent feature analysis, especially for large dense matrices. In this paper, we introduce a distributed NMF algorithm coupled with distributed custom clustering followed by a stability analysis on dense data, which we call DnMFk, to determine the number of latent variables. The results on synthetic data and the classical Swimmer data set demonstrate the accuracy of model determination while scaling nearly linearly across multiple processors for large data. Further, we employ DnMFk to determine the number of hidden features from a terabyte matrix.
引用
收藏
页码:7458 / 7488
页数:31
相关论文
共 62 条
  • [1] Nonnegative tensor decomposition with custom clustering for microphase separation of block copolymers
    Alexandrov, Boian S.
    Stanev, Valentin G.
    Vesselinov, Velimir V.
    Rasmussen, Kim O.
    [J]. STATISTICAL ANALYSIS AND DATA MINING, 2019, 12 (04) : 302 - 310
  • [2] Blind source separation for groundwater pressure analysis based on nonnegative matrix factorization
    Alexandrov, Boian S.
    Vesselinov, Velimir V.
    [J]. WATER RESOURCES RESEARCH, 2014, 50 (09) : 7332 - 7347
  • [3] Alexandrov BS, 2018, US Patent App, Patent No. [15/690,176, 15690176]
  • [4] Signatures of mutational processes in human cancer
    Alexandrov, Ludmil B.
    Nik-Zainal, Serena
    Wedge, David C.
    Aparicio, Samuel A. J. R.
    Behjati, Sam
    Biankin, Andrew V.
    Bignell, Graham R.
    Bolli, Niccolo
    Borg, Ake
    Borresen-Dale, Anne-Lise
    Boyault, Sandrine
    Burkhardt, Birgit
    Butler, Adam P.
    Caldas, Carlos
    Davies, Helen R.
    Desmedt, Christine
    Eils, Roland
    Eyfjord, Jorunn Erla
    Foekens, John A.
    Greaves, Mel
    Hosoda, Fumie
    Hutter, Barbara
    Ilicic, Tomislav
    Imbeaud, Sandrine
    Imielinsk, Marcin
    Jaeger, Natalie
    Jones, David T. W.
    Jones, David
    Knappskog, Stian
    Kool, Marcel
    Lakhani, Sunil R.
    Lopez-Otin, Carlos
    Martin, Sancha
    Munshi, Nikhil C.
    Nakamura, Hiromi
    Northcott, Paul A.
    Pajic, Marina
    Papaemmanuil, Elli
    Paradiso, Angelo
    Pearson, John V.
    Puente, Xose S.
    Raine, Keiran
    Ramakrishna, Manasa
    Richardson, Andrea L.
    Richter, Julia
    Rosenstiel, Philip
    Schlesner, Matthias
    Schumacher, Ton N.
    Span, Paul N.
    Teague, Jon W.
    [J]. NATURE, 2013, 500 (7463) : 415 - +
  • [5] Deciphering Signatures of Mutational Processes Operative in Human Cancer
    Alexandrov, Ludmil B.
    Nik-Zainal, Serena
    Wedge, David C.
    Campbell, Peter J.
    Stratton, Michael R.
    [J]. CELL REPORTS, 2013, 3 (01): : 246 - 259
  • [6] Amari S, 1996, ADV NEUR IN, V8, P757
  • [7] [Anonymous], 2009, NONNEGATIVE MATRIX T
  • [8] [Anonymous], P 21 ACM SIGPLAN S P
  • [9] Unsupervised Learning
    Barlow, H. B.
    [J]. NEURAL COMPUTATION, 1989, 1 (03) : 295 - 311
  • [10] Battenberg Eric., 2009, ISMIR, P501