Modeling Temporal Visual Salience for Human Action Recognition Enabled Visual Anonymity Preservation

被引：4

作者：

Al-Obaidi, Salah ^{[1
]}

Al-Khafaji, Hiba ^{[1
]}

Abhayaratne, Charith ^{[1
]}

机构：

[1] Univ Sheffield, Dept Elect & Elect Engn, Sheffield S1 3JD, S Yorkshire, England

来源：

IEEE ACCESS | 2020年 / 8卷

关键词：

Visual anonymization; human action recognition; histogram of gradients in salience (HOG-S); temporal visual salience estimation; privacy; video-based monitoring; assisted living; HISTOGRAMS; PRIVACY; ROBUST; LSTM;

D O I：

10.1109/ACCESS.2020.3039740

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

This paper proposes a novel approach for visually anonymizing video clips while retaining the ability to machine-based analysis of the video clip, such as, human action recognition. The visual anonymization is achieved by proposing a novel method for generating the anonymization silhouette by modeling the frame-wise temporal visual salience. This is followed by analysing these temporal salience-based silhouettes by extracting the proposed histograms of gradients in salience (HOG-S) for learning the action representation in the visually anonymized domain. Since the anonymization maps are based on the temporal salience maps represented in gray scale, only the moving body parts related to the motion of the action are represented in larger gray values forming highly anonymized silhouettes, resulting in the highest mean anonymity score (MAS), the least identifiable visual appearance attributes and a high utility of human-perceived utility in action recognition. In terms of machine-based human action recognition, using the proposed HOG-S features has resulted in the highest accuracy rate in the anonymized domain compared to those achieved from the existing anonymization methods. Overall, the proposed holistic human action recognition method, i.e., the temporal salience modeling followed by the HOG-S feature extraction, has resulted in the best human action recognition accuracy rates for datasets DHA, KTH, UIUC1, UCF Sports and HMDB51 with improvements of 3%, 1.6%, 0.8%, 1.3% and 16.7%, respectively. The proposed method outperforms both feature-based and deep learning based existing approaches.

引用

页码：213806 / 213824

页数：19

共 82 条

[1] Al-Obaidi JR, 2019, ESSENTIALS OF BIOINFORMATICS, VOL III: IN SILICO LIFE SCIENCES: AGRICULTURE, P1, DOI 10.1007/978-3-030-19318-8_1
[2] Al-Obaidi S, 2019, INT CONF ACOUST SPEE, P2017, DOI 10.1109/ICASSP.2019.8682569
[3] Angelini F, 2018, 2018 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), P4219, DOI 10.1109/ICASSP.2018.8461472
[4] [Anonymous], 2005, PROC CVPR IEEE
[5] [Anonymous], 2008, PROC BRIT MACH VIS C
[6] [Anonymous], ACMSIGHITRec., DOI DOI 10.1145/2384556.2384557
[7] 2-D Skeleton-Based Action Recognition via Two-Branch Stacked LSTM-RNNs
Avola, Danilo
Cascio, Marco
Cinque, Luigi
Foresti, Gian Luca
Massaroni, Cristiano
Rodola, Emanuele
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2020, 22 (10) : 2481 - 2496
[8] The Privacy-Utility Tradeoff for Remotely Teleoperated Robots
Butler, Daniel J.
Huang, Justin
Roesner, Franziska
Cakmak, Maya
[J]. PROCEEDINGS OF THE 2015 ACM/IEEE INTERNATIONAL CONFERENCE ON HUMAN-ROBOT INTERACTION (HRI'15), 2015, : 27 - 34
[9] Byoung-Jin Han, 2011, 2011 IEEE Conference on Open Systems, P86, DOI 10.1109/ICOS.2011.6079313
[10] Video based technology for ambient assisted living: A review of the literature
Cardinaux, Fabien
Bhowmik, Deepayan
Abhayaratne, Charith
Hawley, Mark S.
[J]. JOURNAL OF AMBIENT INTELLIGENCE AND SMART ENVIRONMENTS, 2011, 3 (03) : 253 - 269

← 1 2 3 4 5 6 7 8 9 →