GLOBE: a contrastive learning-based framework for integrating single-cell transcriptome datasets

被引:12
作者
Yan, Xuhua [1 ]
Zheng, Ruiqing [1 ]
Li, Min [1 ]
机构
[1] Cent South Univ, Sch Comp Sci & Engn, Changsha 410083, Peoples R China
关键词
scRNA-seq; datasets integration; batch effect; contrastive learning;
D O I
10.1093/bib/bbac311
中图分类号
Q5 [生物化学];
学科分类号
071010 ; 081704 ;
摘要
Integration of single-cell transcriptome datasets from multiple sources plays an important role in investigating complex biological systems. The key to integration of transcriptome datasets is batch effect removal. Recent methods attempt to apply a contrastive learning strategy to correct batch effects. Despite their encouraging performance, the optimal contrastive learning framework for batch effect removal is still under exploration. We develop an improved contrastive learning-based batch correction framework, GLOBE. GLOBE defines adaptive translation transformations for each cell to guarantee the stability of approximating batch effects. To enhance the consistency of representations alignment, GLOBE utilizes a loss function that is both hardness-aware and consistency-aware to learn batch effect-invariant representations. Moreover, GLOBE computes batch-corrected gene matrix in a transparent approach to support diverse downstream analysis. Benchmarking results on a wide spectrum of datasets show that GLOBE outperforms other state-of-the-art methods in terms of robust batch mixing and superior conservation of biological signals. We further apply GLOBE to integrate two developing mouse neocortex datasets and show GLOBE succeeds in removing batch effects while preserving the contiguous structure of cells in raw data. Finally, a comprehensive study is conducted to validate the effectiveness of GLOBE.
引用
收藏
页数:11
相关论文
共 39 条
[1]   Joint analysis of heterogeneous single-cell RNA-seq dataset collections [J].
Barkas, Nikolas ;
Petukhov, Viktor ;
Nikolaeva, Daria ;
Lozinsky, Yaroslav ;
Demharter, Samuel ;
Khodosevich, Konstantin ;
Kharchenko, Peter V. .
NATURE METHODS, 2019, 16 (08) :695-+
[2]   Dimensionality reduction for visualizing single-cell data using UMAP [J].
Becht, Etienne ;
McInnes, Leland ;
Healy, John ;
Dutertre, Charles-Antoine ;
Kwok, Immanuel W. H. ;
Ng, Lai Guan ;
Ginhoux, Florent ;
Newell, Evan W. .
NATURE BIOTECHNOLOGY, 2019, 37 (01) :38-+
[3]  
Caron M, 2020, ADV NEUR IN, V33
[4]  
Chen T, 2020, PR MACH LEARN RES, V119
[5]  
Du J.H., 2020, Model-Based Trajectory Inference for Single-Cell RNA Sequencing Using Deep Learning with a Mixture Prior, V26, DOI [10.1101/2020.12.26.424452, DOI 10.1101/2020.12.26.424452]
[6]  
Glorot X., 2011, P 14 INT C ART INT S, P315, DOI DOI 10.1002/ECS2.1832
[7]   Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors [J].
Haghverdi, Laleh ;
Lun, Aaron T. L. ;
Morgan, Michael D. ;
Marioni, John C. .
NATURE BIOTECHNOLOGY, 2018, 36 (05) :421-+
[8]  
Han W., BIORXIV, DOI DOI 10.1101/2021.07.26.453730
[9]   Momentum Contrast for Unsupervised Visual Representation Learning [J].
He, Kaiming ;
Fan, Haoqi ;
Wu, Yuxin ;
Xie, Saining ;
Girshick, Ross .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :9726-9735
[10]   Efficient integration of heterogeneous single-cell transcriptomes using Scanorama [J].
Hie, Brian ;
Bryson, Bryan ;
Berger, Bonnie .
NATURE BIOTECHNOLOGY, 2019, 37 (06) :685-+