Statistically-Robust Clustering Techniques for Mapping Spatial Hotspots: A Survey

被引:20
作者
Xie, Yiqun [1 ]
Shekhar, Shashi [2 ]
Li, Yan [2 ]
机构
[1] Univ Maryland, Ctr Geospatial Informat Sci, 7251 Preinkert Dr, College Pk, MD 20742 USA
[2] Univ Minnesota, Dept Comp Sci, 200 Union St SE, Minneapolis, MN 55455 USA
基金
美国国家科学基金会;
关键词
Hotspot; mapping; clustering; statistical rigor; scan statistics; FAST SUBSET SCAN; EVENT DETECTION; HOT-SPOTS; ALGORITHM; INFERENCE; APPROXIMATIONS; BIODIVERSITY; AGRICULTURE; FRAMEWORK; NETWORKS;
D O I
10.1145/3487893
中图分类号
TP301 [理论、方法];
学科分类号
081202 ;
摘要
Mapping of spatial hotspots, i.e., regions with significantly higher rates of generating cases of certain events (e.g., disease or crime cases), is an important task in diverse societal domains, including public health, public safety, transportation, agriculture, environmental science, and so on. Clustering techniques required by these domains differ from traditional clustering methods due to the high economic and social costs of spurious results (e.g., false alarms of crime clusters). As a result, statistical rigor is needed explicitly to control the rate of spurious detections. To address this challenge, techniques for statistically-robust clustering (e.g., scan statistics) have been extensively studied by the data mining and statistics communities. In this survey, we present an up-to-date and detailed review of the models and algorithms developed by this field. We first present a general taxonomy for statistically-robust clustering, covering key steps of data and statistical modeling, region enumeration and maximization, and significance testing. We further discuss different paradigms and methods within each of the key steps. Finally, we highlight research gaps and potential future directions, which may serve as a stepping stone in generating new ideas and thoughts in this growing field and beyond.
引用
收藏
页数:38
相关论文
共 226 条
  • [1] Agarwal D., 2006, Proceedings ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, P24
  • [2] The Hunting of the Bump: On Maximizing Statistical Discrepancy
    Agarwal, Deepak
    Phillips, Jeff M.
    Venkatasubramanian, Suresh
    [J]. PROCEEDINGS OF THE SEVENTHEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, 2006, : 1137 - 1146
  • [3] Graph based anomaly detection and description: a survey
    Akoglu, Leman
    Tong, Hanghang
    Koutra, Danai
    [J]. DATA MINING AND KNOWLEDGE DISCOVERY, 2015, 29 (03) : 626 - 688
  • [4] Aksoylar C, 2017, PR MACH LEARN RES, V70
  • [5] Ali RY, 2015, P 23 ACM SIGSPATIAL, P1
  • [6] Amin R., 2012, Journal of Environment and Ecology, V3, P246, DOI [DOI 10.5296/JEE.V3I1.2700, 10.5296/jee.v3i1.2700]
  • [7] Ankerst M, 1999, SIGMOD RECORD, VOL 28, NO 2 - JUNE 1999, P49
  • [8] [Anonymous], 2021, NE US BENCHMARK
  • [9] [Anonymous], 2017, NATL CANC I SURVEILL
  • [10] [Anonymous], 2021, SATSCAN DATASETS