Embedding to reference t-SNE space addresses batch effects in single-cell classification

被引:0
|
作者
Pavlin G. Poličar
Martin Stražar
Blaž Zupan
机构
[1] University of Ljubljana,Faculty of Computer and Information Science
[2] Baylor College of Medicine,undefined
来源
Machine Learning | 2023年 / 112卷
关键词
Batch effects; Embedding; t-SNE; Visualization; Single-cell transcriptomics; Data integration; Domain adaptation.;
D O I
暂无
中图分类号
学科分类号
摘要
Dimensionality reduction techniques, such as t-SNE, can construct informative visualizations of high-dimensional data. When jointly visualising multiple data sets, a straightforward application of these methods often fails; instead of revealing underlying classes, the resulting visualizations expose dataset-specific clusters. To circumvent these batch effects, we propose an embedding procedure that uses a t-SNE visualization constructed on a reference data set as a scaffold for embedding new data points. Each data instance from a new, unseen, secondary data is embedded independently and does not change the reference embedding. This prevents any interactions between instances in the secondary data and implicitly mitigates batch effects. We demonstrate the utility of this approach by analyzing six recently published single-cell gene expression data sets with up to tens of thousands of cells and thousands of genes. The batch effects in our studies are particularly strong as the data comes from different institutions using different experimental protocols. The visualizations constructed by our proposed approach are clear of batch effects, and the cells from secondary data sets correctly co-cluster with cells of the same type from the primary data. We also show the predictive power of our simple, visual classification approach in t-SNE space matches the accuracy of specialized machine learning techniques that consider the entire compendium of features that profile single cells.
引用
收藏
页码:721 / 740
页数:19
相关论文
共 7 条
  • [1] Embedding to Reference t-SNE Space Addresses Batch Effects in Single-Cell Classification
    Policar, Pavlin G.
    Strazar, Martin
    Zupan, Blaz
    DISCOVERY SCIENCE (DS 2019), 2019, 11828 : 246 - 260
  • [2] Embedding to reference t-SNE space addresses batch effects in single-cell classification
    Policar, Pavlin G.
    Strazar, Martin
    Zupan, Blaz
    MACHINE LEARNING, 2023, 112 (02) : 721 - 740
  • [3] A generalization of t-SNE and UMAP to single-cell multimodal omics
    Van Hoan Do
    Stefan Canzar
    Genome Biology, 22
  • [4] A generalization of t-SNE and UMAP to single-cell multimodal omics
    Van Hoan Do
    Canzar, Stefan
    GENOME BIOLOGY, 2021, 22 (01)
  • [5] Unsupervised Classification of Neolithic Pottery From the Northern Alpine Space Using t-SNE and HDBSCAN
    Hinz, Martin
    Heitz, Caroline
    OPEN ARCHAEOLOGY, 2022, 8 (01): : 1183 - 1217
  • [6] Dimension Reduction and Clustering of Single Cell Calcium Spiking: Comparison of t-SNE and UMAP
    Gare, Suman
    Chel, Soumita
    Kuruba, Manohar
    Jana, Soumya
    Giri, Lopamudra
    2021 NATIONAL CONFERENCE ON COMMUNICATIONS (NCC), 2021, : 528 - 533
  • [7] Beaconet: A Reference-Free Method for Integrating Multiple Batches of Single-Cell Transcriptomic Data in Original Molecular Space
    Xu, Han
    Ye, Yusen
    Duan, Ran
    Gao, Yong
    Hu, Yuxuan
    Gao, Lin
    ADVANCED SCIENCE, 2024, 11 (26)