Single cell clustering based on cell-pair differentiability correlation and variance analysis

被引:70
作者
Jiang, Hao [1 ]
Sohn, Lydia L. [2 ]
Huan, Haiyan [3 ]
Chen, Luonan [4 ,5 ]
机构
[1] Renmin Univ China, Sch Informat, Dept Math, Beijing 100872, Peoples R China
[2] Univ Calif Berkeley, Dept Mech Engn, Berkeley, CA 94720 USA
[3] Univ Calif Berkeley, Dept Stat, Berkeley, CA 94720 USA
[4] Chinese Acad Sci, CAS Ctr Excellence Mol Cell Sci, Inst Biochem & Cell Biol, Key Lab Syst Biol,Shanghai Inst Biol Sci, Shanghai 200031, Peoples R China
[5] Chinese Acad Sci, CAS Ctr Excellence Anim Evolut & Genet, Kunming 650223, Yunnan, Peoples R China
基金
中国国家自然科学基金; 国家重点研发计划;
关键词
EMBRYONIC STEM-CELLS; RNA-SEQ; MICROENVIRONMENT; CHALLENGES; GENES;
D O I
10.1093/bioinformatics/bty390
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Motivation: The rapid advancement of single cell technologies has shed new light on the complex mechanisms of cellular heterogeneity. Identification of intercellular transcriptomic heterogeneity is one of the most critical tasks in single-cell RNA-sequencing studies. Results: We propose a new cell similarity measure based on cell-pair differentiability correlation, which is derived from gene differential pattern among all cell pairs. Through plugging into the frame-work of hierarchical clustering with this new measure, we further develop a variance analysis based clustering algorithm 'Corr' that can determine cluster number automatically and identify cell types accurately. The robustness and superiority of the proposed algorithm are compared with representative algorithms: shared nearest neighbor (SNN)-Cliq and several other state-of-the-art clustering methods, on many benchmark or real single cell RNA-sequencing datasets in terms of both internal criteria (clustering number and accuracy) and external criteria (purity, adjusted rand index, F1-measure). Moreover, differentiability vector with our new measure provides a new means in identifying potential biomarkers from cancer related single cell datasets even with strong noise. Prognosis analyses from independent datasets of cancers confirmed the effectiveness of our 'Corr' method.
引用
收藏
页码:3684 / 3694
页数:11
相关论文
共 32 条
  • [1] SurvExpress: An Online Biomarker Validation Tool and Database for Cancer Gene Expression Data Using Survival Analysis
    Aguirre-Gamboa, Raul
    Gomez-Rueda, Hugo
    Martinez-Ledesma, Emmanuel
    Martinez-Torteya, Antonio
    Chacolla-Huaringa, Rafael
    Rodriguez-Barrientos, Alberto
    Tamez-Pena, Jose G.
    Trevino, Victor
    [J]. PLOS ONE, 2013, 8 (09):
  • [2] [Anonymous], SIAM INT C DATA MINI
  • [3] Beyer K, 1999, LECT NOTES COMPUT SC, V1540, P217
  • [4] Engineering cellular microenvironments to cell-based drug testing improve
    Bhadriraju, K
    Chen, CS
    [J]. DRUG DISCOVERY TODAY, 2002, 7 (11) : 612 - 620
  • [5] Cell fate inclination within 2-cell and 4-cell mouse embryos revealed by single-cell RNA sequencing
    Blase, Fernando H.
    Cao, Xiaoyi
    Zhong, Sheng
    [J]. GENOME RESEARCH, 2014, 24 (11) : 1787 - 1796
  • [6] Brennecke P, 2013, NAT METHODS, V10, P1093, DOI [10.1038/nmeth.2645, 10.1038/NMETH.2645]
  • [7] Calinski T., 1974, "Commun. Statist.-Theory Methods, V3, P1, DOI [10.1080/03610927408827101, DOI 10.1080/03610927408827101]
  • [8] CLUSTER SEPARATION MEASURE
    DAVIES, DL
    BOULDIN, DW
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 1979, 1 (02) : 224 - 227
  • [9] The promise of single-cell sequencing
    Eberwine, James
    Sul, Jai-Yoon
    Bartfai, Tamas
    Kim, Junhyong
    [J]. NATURE METHODS, 2014, 11 (01) : 25 - 27
  • [10] Human housekeeping genes, revisited
    Eisenberg, Eli
    Levanon, Erez Y.
    [J]. TRENDS IN GENETICS, 2013, 29 (10) : 569 - 574