Modeling Temporal Visual Salience for Human Action Recognition Enabled Visual Anonymity Preservation

被引:4
作者
Al-Obaidi, Salah [1 ]
Al-Khafaji, Hiba [1 ]
Abhayaratne, Charith [1 ]
机构
[1] Univ Sheffield, Dept Elect & Elect Engn, Sheffield S1 3JD, S Yorkshire, England
关键词
Visual anonymization; human action recognition; histogram of gradients in salience (HOG-S); temporal visual salience estimation; privacy; video-based monitoring; assisted living; HISTOGRAMS; PRIVACY; ROBUST; LSTM;
D O I
10.1109/ACCESS.2020.3039740
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
This paper proposes a novel approach for visually anonymizing video clips while retaining the ability to machine-based analysis of the video clip, such as, human action recognition. The visual anonymization is achieved by proposing a novel method for generating the anonymization silhouette by modeling the frame-wise temporal visual salience. This is followed by analysing these temporal salience-based silhouettes by extracting the proposed histograms of gradients in salience (HOG-S) for learning the action representation in the visually anonymized domain. Since the anonymization maps are based on the temporal salience maps represented in gray scale, only the moving body parts related to the motion of the action are represented in larger gray values forming highly anonymized silhouettes, resulting in the highest mean anonymity score (MAS), the least identifiable visual appearance attributes and a high utility of human-perceived utility in action recognition. In terms of machine-based human action recognition, using the proposed HOG-S features has resulted in the highest accuracy rate in the anonymized domain compared to those achieved from the existing anonymization methods. Overall, the proposed holistic human action recognition method, i.e., the temporal salience modeling followed by the HOG-S feature extraction, has resulted in the best human action recognition accuracy rates for datasets DHA, KTH, UIUC1, UCF Sports and HMDB51 with improvements of 3%, 1.6%, 0.8%, 1.3% and 16.7%, respectively. The proposed method outperforms both feature-based and deep learning based existing approaches.
引用
收藏
页码:213806 / 213824
页数:19
相关论文
共 82 条
  • [1] Al-Obaidi JR, 2019, ESSENTIALS OF BIOINFORMATICS, VOL III: IN SILICO LIFE SCIENCES: AGRICULTURE, P1, DOI 10.1007/978-3-030-19318-8_1
  • [2] Al-Obaidi S, 2019, INT CONF ACOUST SPEE, P2017, DOI 10.1109/ICASSP.2019.8682569
  • [3] Angelini F, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4219, DOI 10.1109/ICASSP.2018.8461472
  • [4] [Anonymous], 2005, PROC CVPR IEEE
  • [5] [Anonymous], 2008, PROC BRIT MACH VIS C
  • [6] [Anonymous], ACMSIGHITRec., DOI DOI 10.1145/2384556.2384557
  • [7] 2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs
    Avola, Danilo
    Cascio, Marco
    Cinque, Luigi
    Foresti, Gian Luca
    Massaroni, Cristiano
    Rodola, Emanuele
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2481 - 2496
  • [8] The Privacy-Utility Tradeoff for Remotely Teleoperated Robots
    Butler, Daniel J.
    Huang, Justin
    Roesner, Franziska
    Cakmak, Maya
    [J]. PROCEEDINGS OF THE 2015 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'15), 2015, : 27 - 34
  • [9] Byoung-Jin Han, 2011, 2011 IEEE Conference on Open Systems, P86, DOI 10.1109/ICOS.2011.6079313
  • [10] Video based technology for ambient assisted living: A review of the literature
    Cardinaux, Fabien
    Bhowmik, Deepayan
    Abhayaratne, Charith
    Hawley, Mark S.
    [J]. JOURNAL OF AMBIENT INTELLIGENCE AND SMART ENVIRONMENTS, 2011, 3 (03) : 253 - 269