Exploring high-throughput biomolecular data with multiobjective robust continuous clustering

被引:3
作者
Wang, Yunhe [1 ]
Wong, Ka-Chun [3 ]
Li, Xiangtao [2 ]
机构
[1] Hebei Univ Technol, Sch Artificial Intelligence, Tianjin 300401, Peoples R China
[2] Jilin Univ, Sch Artificial Intelligence, Changchun 130012, Jilin, Peoples R China
[3] City Univ Hong Kong, Dept Comp Sci, Hong Kong, Peoples R China
基金
中国国家自然科学基金;
关键词
Robust continuous clustering; Evolutionary clustering; Multiobjective optimization; Single-cell RNA-seq data; MANY-OBJECTIVE OPTIMIZATION; SPATIAL WEIGHTS MATRIX; RNA-SEQ DATA; FEATURE-SELECTION; ALGORITHM;
D O I
10.1016/j.ins.2021.11.030
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Clustering of cell types from a large number of high-dimensional heterogeneous cells is a vital step in analyzing single-cell RNA-seq data. Although several computational methods have been proposed to evolve such data, most of them suffer from some limitations such as high-level noise, high dimensionality, and low generalization. To address these challenges, a multiobjective robust continuous clustering algorithm (MORCC) is presented to discriminate the different cell types in a single-cell RNA-seq dataset. Stepwise, first a dimensionality reduction method is applied to map the high-dimensional heterogeneous cells into a desired low-dimensional space while preserving the features of the original space. Then, to overcome the instability of trial-and-error connectivity weights in the robust continuous clustering, MORCC proposes applying evolutionary operators to optimize the connectivity weights dynamically, and to select the suitable parameters with two cluster validity indices. To demonstrate the effectiveness of MORCC, we compare it to several state-ofthe-art methods on six single-cell RNA-seq datasets, revealing its superior clustering ability from several perspectives. In addition, we carry out a parameter analysis, a case study, and visualization and biological interpretability analyses to validate MORCC's cell identification capability on single-cell RNA-seq data. (c) 2021 Elsevier Inc. All rights reserved.
引用
收藏
页码:239 / 265
页数:27
相关论文
共 48 条
[1]   Using AMOEBA to create a spatial weights matrix and identify spatial clusters [J].
Aldstadt, Jared ;
Getis, Arthur .
GEOGRAPHICAL ANALYSIS, 2006, 38 (04) :327-343
[2]   HypE: An Algorithm for Fast Hypervolume-Based Many-Objective Optimization [J].
Bader, Johannes ;
Zitzler, Eckart .
EVOLUTIONARY COMPUTATION, 2011, 19 (01) :45-76
[3]  
Botafogo R. A., 1991, Third ACM Conference on Hypertext Proceedings, P63, DOI 10.1145/122974.122981
[4]   Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection [J].
Brito, MR ;
Chavez, EL ;
Quiroz, AJ ;
Yukich, JE .
STATISTICS & PROBABILITY LETTERS, 1997, 35 (01) :33-42
[5]   Single-Cell RNA-Seq Technologies and Related Computational Data Analysis [J].
Chen, Geng ;
Ning, Baitang ;
Shi, Tieliu .
FRONTIERS IN GENETICS, 2019, 10
[6]   Single-cell RNA-seq data semi-supervised clustering and annotation via structural regularized domain adaptation [J].
Chen, Liang ;
He, Qiuyan ;
Zhai, Yuyao ;
Deng, Minghua .
BIOINFORMATICS, 2021, 37 (06) :775-784
[7]   A fast and elitist multiobjective genetic algorithm: NSGA-II [J].
Deb, K ;
Pratap, A ;
Agarwal, S ;
Meyarivan, T .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2002, 6 (02) :182-197
[8]   An Evolutionary Many-Objective Optimization Algorithm Using Reference-Point-Based Nondominated Sorting Approach, Part I: Solving Problems With Box Constraints [J].
Deb, Kalyanmoy ;
Jain, Himanshu .
IEEE TRANSACTIONS ON EVOLUTIONARY COMPUTATION, 2014, 18 (04) :577-601
[9]   Dimensionality reduction techniques to analyze heating systems in buildings [J].
Dominguez, Manuel ;
Alonso, Serafin ;
Moran, Antonio ;
Prada, Miguel A. ;
Fuertes, Juan J. .
INFORMATION SCIENCES, 2015, 294 :553-564
[10]  
Eberhart RC, 2000, IEEE C EVOL COMPUTAT, P84, DOI 10.1109/CEC.2000.870279