Binary Matrix Factorization Discretization

被引:0
作者
Spyrides, Georges [1 ]
Poggi, Marcus [1 ]
Lopes, Helio [1 ]
机构
[1] Potificia Univ Catolica Rio de Janeiro, Rio De Janeiro, Brazil
来源
ARTIFICIAL INTELLIGENCE AND SOFT COMPUTING, ICAISC 2023, PT II | 2023年 / 14126卷
关键词
Binary matrix factorization; Gene Expression; Algorithms; GENE-EXPRESSION; ASSOCIATION; PSORIASIS; PATHWAYS; SURVIVAL; CANCER;
D O I
10.1007/978-3-031-42508-0_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Binary Matrix Factorization can be used at the core of many data analysis pipelines. It is used for clustering items, categorical characteristics of observations, and recommendation systems for users interacting with itemsets. The most common algorithms approximate the factorization through gradient descent. However, the results are approximately binary. When thresholded, the reconstruction error is so high that the matrices are no longer representative of the original. Therefore, the analyst must always choose between precision and explainability. We achieved theoretical results that greatly improve solving the exact subproblem of this factorization. These results enable a backtracking approach that can solve the linearized formulation of the subproblem in large binary matrices taking advantage of their sparsity in real settings. Finally, we test this new approach post-processing matrices yielded by gradient descent algorithms using the new backtracking to obtain actually binary factorized matrices with a diminished reconstruction error, close the level of what gradient descent is capable of finding. We tested our algorithm using gene expression datasets, and could find a error rate comparable to the relaxed continuous problem before discretization. The discretized matrices allow for domain experts to question biclusters of gene-expressions and samples taken.
引用
收藏
页码:388 / 401
页数:14
相关论文
共 39 条
  • [1] NCBI GEO: archive for functional genomics data sets-update
    Barrett, Tanya
    Wilhite, Stephen E.
    Ledoux, Pierre
    Evangelista, Carlos
    Kim, Irene F.
    Tomashevsky, Maxim
    Marshall, Kimberly A.
    Phillippy, Katherine H.
    Sherman, Patti M.
    Holko, Michelle
    Yefanov, Andrey
    Lee, Hyeseung
    Zhang, Naigong
    Robertson, Cynthia L.
    Serova, Nadezhda
    Davis, Sean
    Soboleva, Alexandra
    [J]. NUCLEIC ACIDS RESEARCH, 2013, 41 (D1) : D991 - D995
  • [2] Gene expression profiles of prostate cancer reveal involvement of multiple molecular pathways in the metastatic process
    Chandran, Uma R.
    Ma, Changqing
    Dhir, Rajiv
    Bisceglia, Michelle
    Lyons-Weiler, Maureen
    Liang, Wenjing
    Michalopoulos, George
    Becich, Michael
    Monzon, Federico A.
    [J]. BMC CANCER, 2007, 7 (1)
  • [3] Mining gene expression databases for association rules
    Creighton, C
    Hanash, S
    [J]. BIOINFORMATICS, 2003, 19 (01) : 79 - 86
  • [4] Genomic analysis of rodent pulmonary tissue following bis-(2-chloroethyl) sulfide exposure
    Dillman, JF
    Phillips, CS
    Dorsch, LM
    Croxton, MD
    Hege, AI
    Sylvester, AJ
    Moran, TS
    Sciuto, AM
    [J]. CHEMICAL RESEARCH IN TOXICOLOGY, 2005, 18 (01) : 28 - 34
  • [5] Gene Expression Omnibus: NCBI gene expression and hybridization array data repository
    Edgar, R
    Domrachev, M
    Lash, AE
    [J]. NUCLEIC ACIDS RESEARCH, 2002, 30 (01) : 207 - 210
  • [6] Immature cell populations and an erythropoiesis gene-expression signature in systemic juvenile idiopathic arthritis: implications for pathogenesis
    Hinze, Claas H.
    Fall, Ndate
    Thornton, Sherry
    Mo, Jun Q.
    Aronow, Bruce J.
    Layh-Schmitt, GerlinDe
    Griffin, Thomas A.
    Thompson, Susan D.
    Colbert, Robert A.
    Glass, David N.
    Barnes, Michael G.
    Grom, Alexei A.
    [J]. ARTHRITIS RESEARCH & THERAPY, 2010, 12 (03)
  • [7] The radiation-induced cell-death signaling pathway is activated by concurrent use of cisplatin in sequential biopsy specimens from patients with cervical cancer
    Iwakawa, Mayumi
    Ohno, Tatsuya
    Imadome, Kaori
    Nakawatari, Miyako
    Ishikawal, Ken-ichi
    Sakai, Minako
    Katoh, Shingo
    Ishikawa, Hitoshi
    Tsujii, Hirohiko
    Imai, Takashi
    [J]. CANCER BIOLOGY & THERAPY, 2007, 6 (06) : 905 - 911
  • [8] Kumar R., 2019, INT C MACHINE LEARNI, P3551
  • [9] Integrating Factor Analysis and a Transgenic Mouse Model to Reveal a Peripheral Blood Predictor of Breast Tumors
    LaBreche, Heather G.
    Nevins, Joseph R.
    Huang, Erich
    [J]. BMC MEDICAL GENOMICS, 2011, 4
  • [10] Learning the parts of objects by non-negative matrix factorization
    Lee, DD
    Seung, HS
    [J]. NATURE, 1999, 401 (6755) : 788 - 791