Weighted Machine Learning for Spatial-Temporal Data

被引：13

作者：

Hashemi, Mahdi ^{[1
]}

Karimi, Hassan A. ^{[2
]}

机构：

[1] George Mason Univ, Dept Informat Sci & Technol, Fairfax, VA 22030 USA

[2] Univ Pittsburgh, Sch Comp & Informat, Pittsburgh, PA 15213 USA

来源：

IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING | 2020年 / 13卷

关键词：

Machine learning; Training; Correlation; Support vector machines; Kernel; Spatial databases; Bandwidth; Analytical learning; autocorrelation; inductive learning; machine learning; spatial data; temporal data; ALGORITHMS; CLASSIFICATION; MODEL; PATTERNS; SENSOR; TIME;

D O I：

10.1109/JSTARS.2020.2995834

中图分类号：

TM [电工技术]; TN [电子技术、通信技术];

学科分类号：

0808 ; 0809 ;

摘要：

Applying machine learning techniques to spatial-temporal data poses the question that how the recorded location and time for training samples should contribute to the training and testing process. The prior knowledge of how spatial-temporal phenomena are autocorrelated cannot be properly captured by machine learning techniques, which either ignore location and time altogether or consider them as input features. Not to mention that the latter approach leads to slightly increased sparseness of data in the feature space and more free parameters in the predictor; thus, demanding for larger training datasets. We use the prior knowledge about the spatial-temporal autocorrelation to determine how relevant each training sample would be, given its spatial and temporal distances to the irresponsive (unlabeled) sample. Weighted machine learning techniques use this prior knowledge by taking the relevance of training samples with regard to the irresponsive sample into account as training samples' weights. The proposed approach overcomes the aforementioned issues by enriching the training process with the prior knowledge about spatial-temporal autocorrelation. Because the spatial-temporal weight of training samples depends on the irresponsive sample's location and time, the machine needs to be trained separately for each irresponsive sample. However, we show that in practice using only a small subset of training samples with largest spatial-temporal weights not only mitigates the training time but also results in the best accuracy in most cases.

引用

页码：3066 / 3082

页数：17

共 65 条

[1]

Allen R. G., 1998, FAO Irrigation and Drainage Paper

[2] Characterizing land cover dynamics using multi-temporal imagery [J].

Alves, DS ;

Skole, DL .

INTERNATIONAL JOURNAL OF REMOTE SENSING, 1996, 17 (04) :835-839

[3]

[Anonymous], 1998, OC SURF MET READ TAK

[4]

[Anonymous], 2000, Time series analysis and its applications

[5] Support Vector Machines for Landslide Susceptibility Mapping: The Staffora River Basin Case Study, Italy [J].

Ballabio, Cristiano ;

Sterlacchini, Simone .

MATHEMATICAL GEOSCIENCES, 2012, 44 (01) :47-70

[6] Assessment of multi-temporal, multi-sensor radar and ancillary spatial data for grasslands monitoring in Ireland using machine learning approaches [J].

Barrett, Brian ;

Nitze, Ingmar ;

Green, Stuart ;

Cawkwell, Fiona .

REMOTE SENSING OF ENVIRONMENT, 2014, 152 :109-124

[7] Analysis of spatial autocorrelation in house prices [J].

Basu, S ;

Thibodeau, TG .

JOURNAL OF REAL ESTATE FINANCE AND ECONOMICS, 1998, 17 (01) :61-85

[8]

Bellman R. E., 1961, Adaptive Control Processes: A Guided Tour

[9]

Bohling G., 2005, INTRO GEOSTATISTICS, V2

[10] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

← 1 2 3 4 5 6 7 →