Diffusion on PCA-UMAP Manifold: The Impact of Data Structure Preservation to Denoise High-Dimensional Single-Cell RNA Sequencing Data

被引:4
作者
Cristian, Padron-Manrique [1 ,2 ]
Aaron, Vazquez-Jimenez [1 ]
Armando, Esquivel-Hernandez Diego [1 ]
Estrella, Martinez-Lopez Yoscelina [1 ,3 ]
Daniel, Neri-Rosario [1 ,4 ]
David, Giron-Villalobos [1 ,4 ]
Edgar, Mixcoha [1 ,5 ]
Paul, Sanchez-Castaneda Jean [1 ,4 ]
Osbaldo, Resendis-Antonio [1 ,6 ,7 ]
机构
[1] Inst Nacl Med Genomica INMEGEN, Human Syst Biol Lab, Arenal Tepepan, Perifer 4809, Mexico City 14610, Mexico
[2] Univ Nacl Autonoma Mexico, Programa Doctorado Ciencias Biomed, Coyoacan Unidad Posgrad, Edificio primer Piso B,Ciudad Univ, Mexico City 04510, Mexico
[3] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Doctorado Ciencias Med, Ciudad Univ,Edificio A,1er Piso, Mexico City 04510, Mexico
[4] Univ Nacl Autonoma Mexico, Unidad Posgrad, Programa Maestria Ciencias Bioquim, Ciudad Univ,Edificio B,1er Piso, Mexico City 04510, Mexico
[5] CONAHCYT INMEGEN, Periferico Sur 4809, Mexico City 14610, Mexico
[6] Inst Nacl Ciencias Med & Nutr Salvador Zubiran, Coordinac Invest Cient Red Apoyo Invest, Belisario Dominguez Seccion16, Mexico City 14080, Mexico
[7] Unvers Nacl Autonoma Mexico UNAM, Ctr Ciencias Complejidad, Circuito Ctr Cultural, Mexico City 04510, Mexico
来源
BIOLOGY-BASEL | 2024年 / 13卷 / 07期
关键词
manifold learning; UMAP; diffusion maps; scRNA-seq; imputation; denoising; high-dimensional data; ADHESION; HYPOXIA; DEATH;
D O I
10.3390/biology13070512
中图分类号
Q [生物科学];
学科分类号
07 ; 0710 ; 09 ;
摘要
Simple Summary In scRNA-seq analysis, diffusion-based approaches help identify the connections between cells, allowing us to observe the progression of individual cells as they change phenotypes within a mathematical space known as a manifold. Recently, these approaches have been used as a reference for imputation, a technique that addresses missing data, a common challenge in scRNA-seq analysis. For example, MAGIC is a popular diffusion-based imputation method, and it has shown success in uncovering gene-gene interactions related to phenotypic transitions that would not be possible without imputation. However, previous evaluations have not adequately compared the impact of different parameter settings on MAGIC, particularly over-smoothing issues. To address this, we developed sc-PHENIX, which utilizes a similar diffusion approach as MAGIC but incorporates a PCA-UMAP initialization step, whereas MAGIC only uses PCA. We compared sc-PHENIX and MAGIC in terms of imputation accuracy, visualization, biological insights, and preservation of data structure. Our findings show that sc-PHENIX outperforms MAGIC across various common parameters such as "diffusion time" (t), the number of nearest neighbors (knn), and PCA dimensions. It effectively captures and preserves the global, local, and continuous data structures, leading to more reliable imputation and potentially uncovering new biological insights in diverse datasets.Abstract Single-cell transcriptomics (scRNA-seq) is revolutionizing biological research, yet it faces challenges such as inefficient transcript capture and noise. To address these challenges, methods like neighbor averaging or graph diffusion are used. These methods often rely on k-nearest neighbor graphs from low-dimensional manifolds. However, scRNA-seq data suffer from the 'curse of dimensionality', leading to the over-smoothing of data when using imputation methods. To overcome this, sc-PHENIX employs a PCA-UMAP diffusion method, which enhances the preservation of data structures and allows for a refined use of PCA dimensions and diffusion parameters (e.g., k-nearest neighbors, exponentiation of the Markov matrix) to minimize noise introduction. This approach enables a more accurate construction of the exponentiated Markov matrix (cell neighborhood graph), surpassing methods like MAGIC. sc-PHENIX significantly mitigates over-smoothing, as validated through various scRNA-seq datasets, demonstrating improved cell phenotype representation. Applied to a multicellular tumor spheroid dataset, sc-PHENIX identified known extreme phenotype states, showcasing its effectiveness. sc-PHENIX is open-source and available for use and modification.
引用
收藏
页数:43
相关论文
共 70 条
  • [1] An Introduction to the Analysis of Single-Cell RNA-Sequencing Data
    AlJanahi, Aisha A.
    Danielsen, Mark
    Dunbar, Cynthia E.
    [J]. MOLECULAR THERAPY-METHODS & CLINICAL DEVELOPMENT, 2018, 10 : 189 - 196
  • [2] viSNE enables visualization of high dimensional single-cell data and reveals phenotypic heterogeneity of leukemia
    Amir, El-ad David
    Davis, Kara L.
    Tadmor, Michelle D.
    Simonds, Erin F.
    Levine, Jacob H.
    Bendall, Sean C.
    Shenfeld, Daniel K.
    Krishnaswamy, Smita
    Nolan, Garry P.
    Pe'er, Dana
    [J]. NATURE BIOTECHNOLOGY, 2013, 31 (06) : 545 - +
  • [3] Unveiling functional heterogeneity in breast cancer multicellular tumor spheroids through single-cell RNA-seq
    Andres Mucino-Olmos, Erick
    Vazquez-Jimenez, Aaron
    Avila-Ponce de Leon, Ugo
    Matadamas-Guzman, Meztli
    Maldonado, Vilma
    Lopez-Santaella, Tayde
    Hernandez-Hernandez, Abrahan
    Resendis-Antonio, Osbaldo
    [J]. SCIENTIFIC REPORTS, 2020, 10 (01)
  • [4] Identifying cell populations with scRNASeq
    Andrews, Tallulah S.
    Hemberg, Martin
    [J]. MOLECULAR ASPECTS OF MEDICINE, 2018, 59 : 114 - 122
  • [5] [Anonymous], Scipy.stats.tukeyhsd-SciPy v1.13.1 Manual
  • [6] Mouse Genome Database (MGD): Knowledgebase for mouse-human comparative biology
    Blake, Judith A.
    Baldarelli, Richard
    Kadin, James A.
    Richardson, Joel E.
    Smith, Cynthia L.
    Bult, Carol J.
    [J]. NUCLEIC ACIDS RESEARCH, 2021, 49 (D1) : D981 - D987
  • [7] The specious art of single-cell genomics
    Chari, Tara
    Pachter, Lior
    [J]. PLOS COMPUTATIONAL BIOLOGY, 2023, 19 (08)
  • [8] Evaluating imputation methods for single-cell RNA-seq data
    Cheng, Yi
    Ma, Xiuli
    Yuan, Lang
    Sun, Zhaoguo
    Wang, Pingzhang
    [J]. BMC BIOINFORMATICS, 2023, 24 (01)
  • [9] Geometric diffusions as a tool for harmonic analysis and structure definition of data: Diffusion maps
    Coifman, RR
    Lafon, S
    Lee, AB
    Maggioni, M
    Nadler, B
    Warner, F
    Zucker, SW
    [J]. PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA, 2005, 102 (21) : 7426 - 7431
  • [10] ARCHETYPAL ANALYSIS
    CUTLER, A
    BREIMAN, L
    [J]. TECHNOMETRICS, 1994, 36 (04) : 338 - 347