Revised Conditional t-SNE: Looking Beyond the Nearest Neighbors

被引:2
作者
Heiter, Edith [1 ]
Kang, Bo [1 ]
Seurinck, Ruth [1 ,2 ]
Lijffijt, Jefrey [1 ]
机构
[1] Univ Ghent, Ghent, Belgium
[2] VIB Ctr Inflammat Res, Ghent, Belgium
来源
ADVANCES IN INTELLIGENT DATA ANALYSIS XXI, IDA 2023 | 2023年 / 13876卷
基金
欧盟地平线“2020”; 欧洲研究理事会;
关键词
D O I
10.1007/978-3-031-30047-9_14
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Conditional t-SNE (ct-SNE) is a recent extension to t-SNE that allows removal of known cluster information from the embedding, to obtain a visualization revealing structure beyond label information. This is useful, for example, when one wants to factor out unwanted differences between a set of classes. We show that ct-SNE fails in many realistic settings, namely if the data is well clustered over the labels in the original high-dimensional space. We introduce a revised method by conditioning the high-dimensional similarities instead of the low-dimensional similarities and storing within- and across-label nearest neighbors separately. This also enables the use of recently proposed speedups for t-SNE, improving the scalability. From experiments on synthetic data, we find that our proposed method resolves the considered problems and improves the embedding quality. On real data containing batch effects, the expected improvement is not always there. We argue revised ct-SNE is preferable overall, given its improved scalability. The results also highlight new open questions, such as how to handle distance variations between clusters.
引用
收藏
页码:169 / 181
页数:13
相关论文
共 11 条
[1]  
[Anonymous], 2013, Annoy
[2]  
de Bodt C., 2019, ESANN
[3]   Conditional t-SNE: more informative t-SNE embeddings [J].
Kang, Bo ;
Garcia Garcia, Dario ;
Lijffijt, Jefrey ;
Santos-Rodriguez, Raul ;
De Bie, Tijl .
MACHINE LEARNING, 2021, 110 (10) :2905-2940
[4]   Multi-scale similarities in stochastic neighbour embedding: Reducing dimensionality while preserving both local and global structure [J].
Lee, John A. ;
Peluffo-Ordonez, Diego H. ;
Verleysen, Michel .
NEUROCOMPUTING, 2015, 169 :246-261
[5]   Quality assessment of dimensionality reduction: Rank-based criteria [J].
Lee, John A. ;
Verleysen, Michel .
NEUROCOMPUTING, 2009, 72 (7-9) :1431-1443
[6]   Fast interpolation-based t-SNE for improved visualization of single-cell RNA-seq data [J].
Linderman, George C. ;
Rachh, Manas ;
Hoskins, Jeremy G. ;
Steinerberger, Stefan ;
Kluger, Yuval .
NATURE METHODS, 2019, 16 (03) :243-+
[7]   Embedding to reference t-SNE space addresses batch effects in single-cell classification [J].
Policar, Pavlin G. ;
Strazar, Martin ;
Zupan, Blaz .
MACHINE LEARNING, 2023, 112 (02) :721-740
[8]  
Satija Lab, 2019, PANC8 SEURATDATA 8 P
[9]  
van der Maaten L, 2014, J MACH LEARN RES, V15, P3221
[10]  
Vu V.M., 2021, INT JOINT C NEURAL N, P1