SMTCNN - A global spatio-temporal texture convolutional neural network for 3D dynamic texture recognition

被引：0

作者：

Wang, Liangliang ^{[1
,2
]}

Zhou, Lei ^{[1
]}

Liang, Peidong ^{[3
,4
]}

Wang, Ke ^{[5
]}

Ge, Lianzheng ^{[5
]}

机构：

[1] Wuhan Univ Technol, Sch Comp Sci & Artificial Intelligence, Wuhan 430070, Peoples R China

[2] Wuhan Univ Technol, Chongqing Res Inst, Chongqing 401120, Peoples R China

[3] Fujian Quanzhou Inst Adv Mfg Technol, Quanzhou 362008, Peoples R China

[4] Fujian Key Lab Intelligent Operat & Maintenance Ro, Quanzhou 362008, Peoples R China

[5] Harbin Inst Technol, State Key Lab Robot & Syst, Harbin 150001, Peoples R China

来源：

IMAGE AND VISION COMPUTING | 2024年 / 148卷

关键词：

Dynamic texture recognition; SMTCNN; Spatio-temporal semantic representation; Deep neural networks; PATTERNS;

D O I：

10.1016/j.imavis.2024.105145

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Dynamic textures (DT) are typically 3D videos of physical processes showing statistical regularity but have indeterminate spatial and temporal extent. Existing DT recognition methods usually neglect the global spatiotemporal relationships of DT which reflect the statistical regularities. In this paper, a spatio-temporal texture convolutional neural network (SMTCNN) is proposed for global semantic DT representation. Specifically, SMTCNN describes DT features by learning DT' temporal motion as well as the sources of the motions and the scenarios where the motion is happening, and accordingly, a motion net and a source net are formulated. In particular, a novel module consisting of expansion and concatenation implementations on deep features is presented, with an arbitrary 2D backbone as input, followed by a new 1D CNN including 4 convolutional, 2 pooling and 2 fully -connected layers to represent the 2D tensors in space-time, by transforming DT descriptors from discrete "words" to global "textures". A number of comparative experiments on three DT dataset - UCLA, DynTex and DynTex ++ are conducted to demonstrate our approach.

引用

页数：9

共 47 条

[1] Convolutional neural network on three orthogonal planes for dynamic texture classification [J].

Andrearczyk, Vincent ;

Whelan, Paul F. .

PATTERN RECOGNITION, 2018, 76 :36-49

[2] Dynamic texture representation using a deep multi-scale convolutional network [J].

Arashloo, Shervin Rahimzadeh ;

Amirani, Mehdi Chehel ;

Noroozi, Ardeshir .

JOURNAL OF VISUAL COMMUNICATION AND IMAGE REPRESENTATION, 2017, 43 :89-97

[3] Attention-Guided Progressive Neural Texture Fusion for High Dynamic Range Image Restoration [J].

Chen, Jie ;

Yang, Zaifeng ;

Chan, Tsz Nam ;

Li, Hui ;

Hou, Junhui ;

Chau, Lap-Pui .

IEEE TRANSACTIONS ON IMAGE PROCESSING, 2022, 31 :2661-2672

[4] LSTM with bio inspired algorithm for action recognition in sports videos [J].

Chen, Jun ;

Samuel, R. Dinesh Jackson ;

Poovendran, Parthasarathy .

IMAGE AND VISION COMPUTING, 2021, 112

[5] Learning Spatiotemporal Features with 3D Convolutional Networks [J].

Du Tran ;

Bourdev, Lubomir ;

Fergus, Rob ;

Torresani, Lorenzo ;

Paluri, Manohar .

2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, :4489-4497

[6]

Ghanem B, 2010, LECT NOTES COMPUT SC, V6312, P223

[7] Dark-channel based attention and classifier retraining for smoke detection in foggy environments [J].

Gong, Xiaoli ;

Hu, Hangyu ;

Wu, Zhengwei ;

He, Lijun ;

Yang, Le ;

Li, Fan .

DIGITAL SIGNAL PROCESSING, 2022, 123

[8]

Ha M. H., 2024, Journal of Computing Theories and Applications, V2, P39, DOI [10.62411/jcta.10551, DOI 10.62411/JCTA.10551]

[9] A New Large Scale Dynamic Texture Dataset with Application to ConvNet Understanding [J].

Hadji, Isma ;

Wildes, Richard P. .

COMPUTER VISION - ECCV 2018, PT XIV, 2018, 11218 :334-351

[10] Deep Residual Learning for Image Recognition [J].

He, Kaiming ;

Zhang, Xiangyu ;

Ren, Shaoqing ;

Sun, Jian .

2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :770-778

← 1 2 3 4 5 →