scHFC: a hybrid fuzzy clustering method for single-cell RNA-seq data optimized by natural computation

被引:10
作者
Wang, Jing [2 ]
Xia, Junfeng [3 ,4 ]
Tan, Dayu [3 ,4 ]
Lin, Rongxin [2 ]
Su, Yansen [1 ]
Zheng, Chun-Hou [1 ]
机构
[1] Anhui Univ, Sch Artificial Intelligence, Hefei 230601, Anhui, Peoples R China
[2] Anhui Univ, Sch Comp Sci & Technol, Hefei, Peoples R China
[3] Anhui Univ, Inst Phys Sci, Hefei, Peoples R China
[4] Anhui Univ, Inst Informat Technol, Hefei, Peoples R China
基金
中国国家自然科学基金;
关键词
scRNA-seq; cell clustering; fuzzy C Mean; Gath-Geva; natural computation; IMAGE SEGMENTATION; MIXTURE MODEL; EXPRESSION; ALGORITHM; HETEROGENEITY; INFORMATION;
D O I
10.1093/bib/bbab588
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Rapid development of single-cell RNA sequencing (scRNA-seq) technology has allowed researchers to explore biological phenomena at the cellular scale. Clustering is a crucial and helpful step for researchers to study the heterogeneity of cell. Although many clustering methods have been proposed, massive dropout events and the curse of dimensionality in scRNA-seq data make it still difficult to analysis because they reduce the accuracy of clustering methods, leading to misidentification of cell types. In this work, we propose the scHFC, which is a hybrid fuzzy clustering method optimized by natural computation based on Fuzzy C Mean (FCM) and Gath-Geva (GG) algorithms. Specifically, principal component analysis algorithm is utilized to reduce the dimensions of scRNA-seq data after it is preprocessed. Then, FCM algorithm optimized by simulated annealing algorithm and genetic algorithm is applied to cluster the data to output a membership matrix, which represents the initial clustering result and is taken as the input for GG algorithm to get the final clustering results. We also develop a cluster number estimation method called multi-index comprehensive estimation, which can estimate the cluster numbers well by combining four clustering effectiveness indexes. The performance of the scHFC method is evaluated on 17 scRNA-seq datasets, and compared with six state-of-the-art methods. Experimental results validate the better performance of our scHFC method in terms of clustering accuracy and stability of algorithm. In short, scHFC is an effective method to cluster cells for scRNA-seq data, and it presents great potential for downstream analysis of scRNA-seq data. The source code is available at https://github.com/WJ319/scHFC.
引用
收藏
页数:13
相关论文
共 54 条
[1]  
于剑, 2002, [中国科学. E辑, 技术科学, Science in China], V32, P274
[2]  
[Anonymous], 1981, PATTERN RECOGN
[3]   Fuzzy and hard clustering analysis for thyroid disease [J].
Azar, Ahmad Taher ;
El-Said, Shaimaa Ahmed ;
Hassanien, Aboul Ella .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2013, 111 (01) :1-16
[4]   Validity-guided (re)clustering with applications to image segmentation [J].
Bensaid, AM ;
Hall, LO ;
Bezdek, JC ;
Clarke, LP ;
Silbiger, ML ;
Arrington, JA ;
Murtagh, RF .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 1996, 4 (02) :112-123
[5]   NUMERICAL TAXONOMY WITH FUZZY SETS [J].
BEZDEK, JC .
JOURNAL OF MATHEMATICAL BIOLOGY, 1974, 1 (01) :57-71
[6]   OPTIMAL FUZZY PARTITIONS - HEURISTIC FOR ESTIMATING PARAMETERS IN A MIXTURE OF NORMAL DISTRIBUTIONS [J].
BEZDEK, JC ;
DUNN, JC .
IEEE TRANSACTIONS ON COMPUTERS, 1975, 24 (08) :835-838
[7]   Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells [J].
Buettner, Florian ;
Natarajan, Kedar N. ;
Casale, F. Paolo ;
Proserpio, Valentina ;
Scialdone, Antonio ;
Theis, Fabian J. ;
Teichmann, Sarah A. ;
Marioni, John C. ;
Stegie, Oliver .
NATURE BIOTECHNOLOGY, 2015, 33 (02) :155-160
[8]   SFCM: A Fuzzy Clustering Algorithm of Extracting the Shape Information of Data [J].
Bui, Quang-Thinh ;
Vo, Bay ;
Snasel, Vaclav ;
Pedrycz, Witold ;
Hong, Tzung-Pei ;
Nguyen, Ngoc-Thanh ;
Chen, Mu-Yen .
IEEE TRANSACTIONS ON FUZZY SYSTEMS, 2021, 29 (01) :75-89
[9]  
Dunn JC., 1973, J CYBERNETICS, V3, P32, DOI [10.1080/01969727308546046, 10.1080/ 01969727308546046, DOI 10.1080/01969727308546046]
[10]   Single-cell RNA-seq denoising using a deep count autoencoder [J].
Eraslan, Goekcen ;
Simon, Lukas M. ;
Mircea, Maria ;
Mueller, Nikola S. ;
Theis, Fabian J. .
NATURE COMMUNICATIONS, 2019, 10 (1)