Binary matrix factorization for analyzing gene expression data

被引:0
|
作者
Zhong-Yuan Zhang
Tao Li
Chris Ding
Xian-Wen Ren
Xiang-Sun Zhang
机构
[1] Central University of Finance and Economics,School of Statistics
[2] Florida International University,School of Computing and Information Sciences
[3] University of Texas,Department of Computer Science and Engineering
[4] Chinese Academy of Sciences,Academy of Mathematics and Systems Science
来源
Data Mining and Knowledge Discovery | 2010年 / 20卷
关键词
Biclustering; Non-negative matrix factorization; Boundedness property of NMF; Binary matrix;
D O I
暂无
中图分类号
学科分类号
摘要
The advent of microarray technology enables us to monitor an entire genome in a single chip using a systematic approach. Clustering, as a widely used data mining approach, has been used to discover phenotypes from the raw expression data. However traditional clustering algorithms have limitations since they can not identify the substructures of samples and features hidden behind the data. Different from clustering, biclustering is a new methodology for discovering genes that are highly related to a subset of samples. Several biclustering models/methods have been presented and used for tumor clinical diagnosis and pathological research. In this paper, we present a new biclustering model using Binary Matrix Factorization (BMF). BMF is a new variant rooted from non-negative matrix factorization (NMF). We begin by proving a new boundedness property of NMF. Two different algorithms to implement the model and their comparison are then presented. We show that the microarray data biclustering problem can be formulated as a BMF problem and can be solved effectively using our proposed algorithms. Unlike the greedy strategy-based algorithms, our proposed algorithms for BMF are more likely to find the global optima. Experimental results on synthetic and real datasets demonstrate the advantages of BMF over existing biclustering methods. Besides the attractive clustering performance, BMF can generate sparse results (i.e., the number of genes/features involved in each biclustering structure is very small related to the total number of genes/features) that are in accordance with the common practice in molecular biology.
引用
收藏
页码:28 / 52
页数:24
相关论文
共 50 条
  • [1] Binary matrix factorization for analyzing gene expression data
    Zhang, Zhong-Yuan
    Li, Tao
    Ding, Chris
    Ren, Xian-Wen
    Zhang, Xiang-Sun
    DATA MINING AND KNOWLEDGE DISCOVERY, 2010, 20 (01) : 28 - 52
  • [2] Regularized Nonnegative Matrix Factorization for Clustering Gene Expression Data
    Liu, Weixiang
    Wang, Tianfu
    Chen, Siping
    2013 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM), 2013,
  • [3] A New Binary Biclustering Algorithm Based on Weight Adjacency Difference Matrix for Analyzing Gene Expression Data
    Chu, He-Ming
    Kong, Xiang-Zhen
    Liu, Jin-Xing
    Zheng, Chun-Hou
    Zhang, Han
    IEEE-ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS, 2023, 20 (05) : 2802 - 2809
  • [4] Matrix factorization-based improved classification of gene expression data
    Malik S.
    Bansal P.
    Recent Advances in Computer Science and Communications, 2020, 13 (05) : 858 - 863
  • [5] Application of a Deep Matrix Factorization Model on Integrated Gene Expression Data
    Hao, Yong-Jing
    Hou, Mi-Xiao
    Gao, Ying-Lian
    Liu, Jin-Xing
    Kong, Xiang-Zhen
    CURRENT BIOINFORMATICS, 2020, 15 (04) : 359 - 367
  • [6] The Estimation of Dimensionality In Gene Expression Data using Nonnegative Matrix Factorization
    Kelton, Conor J.
    Lee, Waishing
    Rusay, Matthew
    Maxian, Ondrej
    Fertig, Elana J.
    Ochs, Michael F.
    PROCEEDINGS 2015 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE, 2015, : 1642 - 1649
  • [7] MATRIX FACTORIZATION METHODS FOR ANALYZING DIFFUSION BATTERY DATA
    PAATERO, P
    TAPPER, U
    AALTO, P
    KULMALA, M
    JOURNAL OF AEROSOL SCIENCE, 1991, 22 : S273 - S276
  • [8] Non-negative Matrix Factorization for Binary Data
    Larsen, Jacob Sogaard
    Clemmensen, Line Katrine Harder
    2015 7TH INTERNATIONAL JOINT CONFERENCE ON KNOWLEDGE DISCOVERY, KNOWLEDGE ENGINEERING AND KNOWLEDGE MANAGEMENT (IC3K), 2015, : 555 - 563
  • [9] Gene Expression Data Classification Based on Non-negative Matrix Factorization
    Zheng, Chun-Hou
    Zhang, Ping
    Zhang, Lei
    Liu, Xin-Xin
    Han, Ju
    IJCNN: 2009 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS, VOLS 1- 6, 2009, : 194 - +
  • [10] On α-divergence based nonnegative matrix factorization for clustering cancer gene expression data
    Liu, Weixiang
    Yuan, Kehong
    Ye, Datian
    ARTIFICIAL INTELLIGENCE IN MEDICINE, 2008, 44 (01) : 1 - 5