Deep metric learning for open-set human action recognition in videos

被引：0

作者：

Matheus Gutoski

André Eugênio Lazzaretti

Heitor Silvério Lopes

机构：

[1] Federal University of Technology – Paraná,CPGEI

来源：

Neural Computing and Applications | 2021年 / 33卷

关键词：

Human action recognition; Open-set recognition; Metric learning; Extreme value machine;

D O I：

暂无

中图分类号：

学科分类号：

摘要：

Human action recognition (HAR) is a topic widely studied in computer vision and pattern recognition. Despite the success of recent models for this issue, most of them approach HAR from the closed-set perspective. The closed-set recognition works under the assumption that all classes are known a priori and they appear during the training and test phase. Unlike most previous works, we approach HAR from the open-set perspective, that is, previously unknown classes are considered in the model. Additionally, feature extraction for HAR in the context of open set is still underexplored in the recent literature, since one needs to represent known classes with a low intra-class variance to reject unknown examples. To achieve this task, we propose a deep metric learning model named triplet inflated 3D convolutional neural network (TI3D), which builds upon the well-known I3D model. TI3D is a representation learning model that takes as input video sequences and outputs 256-dimensional representations. We perform extensive experiments and statistical comparisons on the UCF-101 dataset using a 30-fold cross-validation procedure in 25 different scenarios with varying degrees of openness and a varying number of training and test classes. Results reveal that the proposed TI3D achieves better performance than non-metric learning models in terms of F1\documentclass[12pt]{minimal} \usepackage{amsmath} \usepackage{wasysym} \usepackage{amsfonts} \usepackage{amssymb} \usepackage{amsbsy} \usepackage{mathrsfs} \usepackage{upgreek} \setlength{\oddsidemargin}{-69pt} \begin{document}$$F_1$$\end{document} score and Youdens index, indicating a promising approach for open-set video action recognition.

引用

页码：1207 / 1220

页数：13

共 76 条

[1]

Aslan MF(2019)Human action recognition with bag of visual words using different machine learning methods and hyperparameter optimization Neural Comput Appl 42 1-15

[2]

Durdu A(2020)Open set domain adaptation for image and action recognition IEEE Trans Pattern Anal Mach Intell 32 675-701

[3]

Sabanci K(1937)The use of ranks to avoid the assumption of normality implicit in the analysis of variance J Am Stat Assoc 27 2047-2054

[4]

Busto PP(2016)Human action recognition on depth dataset Neural Comput Appl 30 2787-2793

[5]

Iqbal A(2017)RegFrame: fast recognition of simple human actions on a stand-alone mobile device Neural Comput Appl 11 1066-1092

[6]

Gall J(2019)Deep metric learning: a survey Symmetry 34 76-84

[7]

Friedman M(2019)Human activity recognition via optical flow: decomposing activities into basic actions Neural Comput Appl 35 66-83

[8]

Gao Z(2017)Deep metric learning for visual understanding: an overview of recent advances IEEE Signal Process Mag 105 13-22

[9]

Zhang H(2018)Deep learning for understanding faces: machines may be just as good, or better, than humans IEEE Signal Process Mag 40 762-768

[10]

Liu AA(2018)A study of deep convolutional auto-encoders for anomaly detection in videos Pattern Recognit Lett 36 2317-2324

← 1 2 3 4 5 6 7 8 →