Realistic Cell Type Annotation and Discovery for Single-cell RNA-seq Data

被引:0
作者
Zhai, Yuyao [1 ]
Chen, Liang [4 ]
Deng, Minghua [1 ,2 ,3 ]
机构
[1] Peking Univ, Sch Math Sci, Beijing, Peoples R China
[2] Peking Univ, Ctr Stat Sci, Beijing, Peoples R China
[3] Peking Univ, Ctr Quantitat Biol, Beijing, Peoples R China
[4] Huawei Technol Co Ltd, Shenzhen, Peoples R China
来源
PROCEEDINGS OF THE THIRTY-SECOND INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE, IJCAI 2023 | 2023年
关键词
D O I
暂无
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The rapid development of single-cell RNA sequencing (scRNA-seq) technologies allows us to explore tissue heterogeneity at the cellular level. Cell annotation plays an essential role in the substantial downstream analysis of scRNA-seq data. Existing methods usually classify the novel cells in target data as an "unassigned" group and rarely discover the fine-grained cell type structure among them. Besides, these methods carry risks, such as susceptibility to batch effect between reference and target data, thus further compromising of inherent discrimination of target data. Considering these limitations, here we propose a new and practical task called realistic cell type annotation and discovery for scRNA-seq data. In this task, cells from seen cell types are given class labels, while cells from novel cell types are given cluster labels. To tackle this problem, we propose an end-to-end algorithm called scPOT from the perspective of optimal transport ( OT). Specifically, we first design an OT-based prototypical representation learning paradigm to encourage both global discriminations of clusters and local consistency of cells to uncover the intrinsic structure of target data. Then we propose an unbalanced OT-based partial alignment strategy with statistical filling to detect the cells from seen cell types across reference and target data. Notably, scPOT also introduces an easy yet effective solution to automatically estimate the total cell type number in target data. Extensive results on our carefully designed evaluation benchmarks demonstrate the superiority of scPOT over various state-of-the-art clustering and annotation methods.
引用
收藏
页码:4967 / 4974
页数:8
相关论文
共 25 条
[1]   MARS: discovering novel cell types across heterogeneous single-cell experiments [J].
Brbic, Maria ;
Zitnik, Marinka ;
Wang, Sheng ;
Pisco, Angela O. ;
Altman, Russ B. ;
Darmanis, Spyros ;
Leskovec, Jure .
NATURE METHODS, 2020, 17 (12) :1200-+
[2]  
Brent R.P., 2013, Algorithms for Minimization Without Derivatives
[3]  
Cao Z., 2019, bioRxiv
[4]  
Caron M, 2020, ADV NEUR IN, V33
[5]   Deep soft K-means clustering with self-training for single-cell RNA sequence data [J].
Chen, Liang ;
Wang, Weinan ;
Zhai, Yuyao ;
Deng, Minghua .
NAR GENOMICS AND BIOINFORMATICS, 2020, 2 (02)
[6]   Integrating Deep Supervised, Self-Supervised and Unsupervised Learning for Single-Cell RNA-seq Clustering and Annotation [J].
Chen, Liang ;
Zhai, Yuyao ;
He, Qiuyan ;
Wang, Weinan ;
Deng, Minghua .
GENES, 2020, 11 (07) :1-20
[7]   SCALING ALGORITHMS FOR UNBALANCED OPTIMAL TRANSPORT PROBLEMS [J].
Chizat, Lenaic ;
Peyre, Gabriel ;
Schmitzer, Bernhard ;
Vialard, Francois-Xavier .
MATHEMATICS OF COMPUTATION, 2018, 87 (314) :2563-2609
[8]  
Cuturi M., 2013, NEURIPS, P1, DOI 10.48550/arXiv.1306.0895
[9]   Deep learning: new computational modelling techniques for genomics [J].
Eraslan, Gokcen ;
Avsec, Ziga ;
Gagneur, Julien ;
Theis, Fabian J. .
NATURE REVIEWS GENETICS, 2019, 20 (07) :389-403
[10]   Momentum Contrast for Unsupervised Visual Representation Learning [J].
He, Kaiming ;
Fan, Haoqi ;
Wu, Yuxin ;
Xie, Saining ;
Girshick, Ross .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735