Detecting Spatio-Temporal Outliers with Kernels and Statistical Testing

被引:0
作者
Rogers, James P. [1 ]
Barbara, Daniel [2 ]
Domeniconi, Carlotta [2 ]
机构
[1] USA, Engn Res & Dev Ctr, 7701 Telegraph Rd, Alexandria, VA 22315 USA
[2] George Mason Univ, Dept CS, MSN 4A5, Fairfax, VA 22030 USA
来源
2009 17TH INTERNATIONAL CONFERENCE ON GEOINFORMATICS, VOLS 1 AND 2 | 2009年
关键词
D O I
暂无
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Outlier detection is the discovery of points that are exceptional when compared with a set of observations that are considered normal. Such points are important since they often lead to the discovery of exceptional events. In spatio-temporal data, observations are vectors of feature values, tagged with a geographical location and a timestamp. A spatio-temporal outlier is an observation whose attribute values are significantly different from those of other spatially and temporally referenced objects in a spatio-temporal neighborhood. It represents an object that is significantly different from its neighbors, even though it may not be significantly different from the entire population. The discovery of outliers in spatio-temporal data is then complicated by the fact that one needs to focus the search on appropriate spatio-temporal neighborhoods of points. The work in this paper leverages an algorithm, StrOUD (Strangeness-based Outlier Detection algorithm), that has been developed and used by the authors to detect outliers in various scenarios (including vector spaces and non-vectorial data). StrOUD uses a measure of strangeness to categorize an observation, and compares the strangeness of a point with the distribution of strangeness of a set of baseline observations (which are assumed to be mostly from normal points). Using statistical testing, StrOUD determines if the point is an outlier or not. The technique described in this paper defines strangeness as the sum of distances to nearest neighbors, where the distance between two observations is computed as a weighted combination of the distance between their vectors of features, their geographical distance, and their temporal distance. Using this multi-modal distance measure (thereby called kernel), our technique is able to diagnose outliers with respect to spatio-temporal neighborhoods. We show how our approach is capable of determining outliers in real-life data, including crime data, and a set of observations collected by buoys in the Gulf of Mexico during the 2005 hurricane season. We show that the use of different weightings on the kernel distances allows the user to adapt the size of spatio-temporal neighborhoods.
引用
收藏
页码:639 / +
页数:2
相关论文
共 11 条
[11]  
Vovk V, 1999, MACHINE LEARNING, PROCEEDINGS, P444