LoCH: A neighborhood-based multidimensional projection technique for high-dimensional sparse spaces

被引:18
作者
Fadel, Samuel G. [1 ]
Fatore, Francisco M. [1 ]
Duarte, Felipe S. L. G. [1 ]
Paulovich, Fernando V. [1 ]
机构
[1] Univ Sao Paulo, Inst Ciencias Matemat & Comp, Sao Carlos, SP, Brazil
基金
巴西圣保罗研究基金会;
关键词
Local multidimensional projection; High-dimensional sparse space; Visual data mining; EXPLORATION; ALGORITHM;
D O I
10.1016/j.neucom.2014.07.071
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
On the last few years multidimensional projection techniques have advanced towards defining faster and user-centered approaches. However, most of existing methods are designed as generic tools without considering particular features of the data under processing, such as the distance distribution when the data is embedded into a certain metric space. In this paper we split the projection techniques into two groups, global and local techniques, conduct an analysis of them, and present a novel local technique specially designed for projecting heavy tail distance distributions, such as the one produced by high-dimensional sparse spaces. This novel approach, called Local Convex Hull (LoCH), relies on an iterative process that seeks to place each point close to the convex hull of its nearest neighbors. The accuracy, in terms of neighborhood preservation, is confirmed by a set of comparisons and tests, showing that LoCH is capable of successfully segregating groups of similar instances embedded in high-dimensional sparse spaces and of defining the borders between them, significantly better than most projection techniques. (C) 2014 Elsevier B.V. All rights reserved.
引用
收藏
页码:546 / 556
页数:11
相关论文
共 48 条
[1]   Database-friendly random projections: Johnson-Lindenstrauss with binary coins [J].
Achlioptas, D .
JOURNAL OF COMPUTER AND SYSTEM SCIENCES, 2003, 66 (04) :671-687
[2]   Near-optimal hashing algorithms for approximate nearest neighbor in high dimensions [J].
Andoni, Alexandr ;
Indyk, Piotr .
COMMUNICATIONS OF THE ACM, 2008, 51 (01) :117-122
[3]  
Andrews K., 2002, Information Visualization, V1, P166, DOI 10.1057/palgrave.ivs.9500023
[4]  
[Anonymous], 2004, Technical Report
[5]  
Asuncion Arthur, 2007, UCI machine learning repository
[6]  
Beygelzimer A., 2006, ICML, DOI 10.1145/1143844.1143857
[7]   Large-Scale Machine Learning with Stochastic Gradient Descent [J].
Bottou, Leon .
COMPSTAT'2010: 19TH INTERNATIONAL CONFERENCE ON COMPUTATIONAL STATISTICS, 2010, :177-186
[8]  
Brandes U, 2007, LECT NOTES COMPUT SC, V4372, P42
[9]   A linear iteration time layout algorithm for visualising high-dimensional data [J].
Chalmers, M .
VISUALIZATION '96, PROCEEDINGS, 1996, :127-+
[10]   Searching in metric spaces [J].
Chávez, E ;
Navarro, G ;
BaezaYates, R ;
Marroquín, JL .
ACM COMPUTING SURVEYS, 2001, 33 (03) :273-321