Getting over High-Dimensionality: How Multidimensional Projection Methods Can Assist Data Science

被引:7
|
作者
Ortigossa, Evandro S. [1 ]
Dias, Fabio Felix [1 ]
Carvalho do Nascimento, Diego [2 ]
机构
[1] Univ Sao Paulo, Inst Math & Comp Sci, BR-13566590 Sao Carlos, Brazil
[2] Univ Atacama, Fac Ingn, Dept Matemat, Copiapo 1530000, Chile
来源
APPLIED SCIENCES-BASEL | 2022年 / 12卷 / 13期
关键词
high-dimensional data; dimensionality reduction; multidimensional scaling; artificial intelligence; information visualization; DATA VISUALIZATION; PERCEPTION; REDUCTION; SELECTION;
D O I
10.3390/app12136799
中图分类号
O6 [化学];
学科分类号
0703 ;
摘要
The exploration and analysis of multidimensional data can be pretty complex tasks, requiring sophisticated tools able to transform large amounts of data bearing multiple parameters into helpful information. Multidimensional projection techniques figure as powerful tools for transforming multidimensional data into visual information according to similarity features. Integrating this class of methods into a framework devoted to data sciences can contribute to generating more expressive means of visual analytics. Although the Principal Component Analysis (PCA) is a well-known method in this context, it is not the only one, and, sometimes, its abilities and limitations are not adequately discussed or taken into consideration by users. Therefore, knowing in-depth multidimensional projection techniques, their strengths, and the possible distortions they can create is of significant importance for researchers developing knowledge-discovery systems. This research presents a comprehensive overview of current state-of-the-art multidimensional projection techniques and shows example codes in Python and R languages, all available on the internet. The survey segment discusses the different types of techniques applied to multidimensional projection tasks from their background, application processes, capabilities, and limitations, opening the internal processes of the methods and demystifying their concepts. We also illustrate two problems, from a genetic experiment (supervised) and text mining (non-supervised), presenting solutions through multidimensional projection application. Finally, we brought elements that reverberate the competitiveness of multidimensional projection techniques towards high-dimension data visualization, commonly needed in data sciences solutions.
引用
收藏
页数:36
相关论文
empty
未找到相关数据