Inferring Ongoing Human Activities Based on Recurrent Self-Organizing Map Trajectory

被引:1
作者
Sun, Qianru [1 ]
Liu, Hong [2 ]
机构
[1] Peking Univ, Shenzhen Grad Sch, Engn Lab Intelligent Percept Internet Things ELIP, Beijing, Peoples R China
[2] Peking Univ, Key Laboratory of Machine Percept, Beijing, Peoples R China
来源
PROCEEDINGS OF THE BRITISH MACHINE VISION CONFERENCE 2013 | 2013年
基金
中国国家自然科学基金; 国家高技术研究发展计划(863计划);
关键词
RECOGNITION;
D O I
10.5244/C.27.11
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Automatically inferring ongoing activities is to enable the early recognition of unfinished activities, which is quite meaningful for applications, such as online human-machine interaction and security monitoring. State-of-the-art methods use the spatio-temporal interest point (STIP) based features as the low-level video description to handle complex scenes. While the existing problem is that typical bag-of-visual words (BoVW) focuses on the statistical distribution of features but ignores the inherent contexts in activity sequences, resulting in low discrimination when directly dealing with limited observations. To solve this problem, the Recurrent Self-Organizing Map (RSOM), which was designed to process sequential data, is novelly adopted in this paper for the high-level representation of ongoing human activities. The innovation lies that the currently observed features and their spatio-temporal contexts are encoded in a trajectory of the pre-trained RSOM units. Additionally, a combination of Dynamic Time Warping (DTW) distance and Edit distance, named DTW-E, is specially proposed to measure the structural dissimilarity between RSOM trajectories. Two real-world datasets with markedly different characteristics, complex scenes and inter-class ambiguities, serve as sources of data for evaluation. Experimental results based on kNN classifiers confirm that our approach can infer ongoing human activities with high accuracies.
引用
收藏
页数:11
相关论文
共 27 条
  • [11] DYSPHONIA DETECTED BY PATTERN-RECOGNITION OF SPECTRAL COMPOSITION
    LEINONEN, L
    KANGAS, J
    TORKKOLA, K
    JUVAS, A
    [J]. JOURNAL OF SPEECH AND HEARING RESEARCH, 1992, 35 (02): : 287 - 295
  • [12] Levenshtein V.I., 1966, Soviet Physics Doklady
  • [13] Activity recognition using the velocity histories of tracked keypoints
    Messing, Ross
    Pal, Chris
    Kautz, Henry
    [J]. 2009 IEEE 12TH INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2009, : 104 - 111
  • [14] Hoai M, 2012, PROC CVPR IEEE, P2863, DOI 10.1109/CVPR.2012.6248012
  • [15] Qianru Sun, 2013, Computer Vision - ACCV 2012. 11th Asian Conference on Computer Vision. Revised Selected Papers, P425, DOI 10.1007/978-3-642-37431-9_33
  • [16] Ryoo MS, 2011, IEEE I CONF COMP VIS, P1036, DOI 10.1109/ICCV.2011.6126349
  • [17] Ryoo M.S., 2010, UT-Interaction Dataset, ICPR contest on Semantic Description of Human Activities (SDHA)
  • [18] Savarese S., 2008, PROC WMVC, P1
  • [19] Recognizing human actions:: A local SVM approach
    Schüldt, C
    Laptev, I
    Caputo, B
    [J]. PROCEEDINGS OF THE 17TH INTERNATIONAL CONFERENCE ON PATTERN RECOGNITION, VOL 3, 2004, : 32 - 36
  • [20] Scovanner P., 2007, P 15 ACM INT C MULTI, P357