SiSe: Simultaneous and Sequential Transformers for multi-label activity recognition

被引：0

作者：

Chen, Zhao-Min ^{[1
]}

Jin, Xin ^{[2
]}

Chan, Sixian ^{[3
]}

机构：

[1] Wenzhou Univ, Key Lab Intelligent Informat Safety & Emergency Zh, Wenzhou 325035, Peoples R China

[2] Samsung Elect China R&D Ctr, Samsung Elect, Nanjing 210012, Peoples R China

[3] Zhejiang Univ Technol, Coll Comp Sci & Technol, Hangzhou 310023, Peoples R China

来源：

PATTERN RECOGNITION | 2024年 / 156卷

基金：

中国国家自然科学基金;

关键词：

Multi-label; Activity recognition; Sequential transformer; Hierarchical structure;

D O I：

10.1016/j.patcog.2024.110844

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Multi-label activity recognition is extremely challenging, where multiple activities may appear simultaneously or sequentially in a video. While previous works have realized the temporal co-occurrence of activities, the sequential order of activities have been largely overlooked. However, we argue that the sequential order of activities should also be preserved in correlation modeling, because shuffling the order might not form a semantically meaningful video. In this work, we present plug-and-play Simultaneous and Sequential Transformer (SiSe) modules for multi-label activity recognition. Upon frame features of all time steps, SiSe enhances spatiotemporal feature learning for multi-label activity recognition, by capturing the simultaneous and sequential activity correlations. Specifically, we employ a Simultaneous Transformer module to connect multiple activities that probably appear at each frame, and a hierarchical Sequential Transformer module to efficiently capture the sequential activity correlations in an order-preserved manner. Despite the straightforward and class- agnostic design of SiSe, it can outperform state-of-the-art approaches on three multi-label activity recognition benchmarks. In particular, we verify the significance of preserving the sequential order of activities with our Sequential Transformer in correlation modeling. We also conduct ablation studies and visual analysis for better understanding of our SiSe.

引用

页数：10

共 50 条

[41] Comparison of base classifiers for multi-label learning
Yapp, Edward K. Y.
Li, Xiang
Lu, Wen Feng
Tan, Puay Siew
NEUROCOMPUTING, 2020, 394 : 51 - 60
[42] An improved multi-label classification algorithm BRkNN
Geng, Xia
Tang, Yujia
Zhu, Yuquan
Cheng, Geng
Journal of Information and Computational Science, 2014, 11 (16): : 5927 - 5936
[43] Multi-label Automatic GrabCut for Image Segmentation
Khattab, Dina
Ebied, Hala M.
Hussein, Ashraf S.
Tolba, Mohamed F.
2014 14TH INTERNATIONAL CONFERENCE ON HYBRID INTELLIGENT SYSTEMS (HIS), 2014, : 152 - 157
[44] Multi-label classification based on analog reasoning
Nicolas, Ruben
Sancho-Asensio, Andreu
Golobardes, Elisabet
Fornells, Albert
Orriols-Puig, Albert
EXPERT SYSTEMS WITH APPLICATIONS, 2013, 40 (15) : 5924 - 5931
[45] Multi-label Active Learning for Image Classification
Wu, Jian
Sheng, Victor S.
Zhang, Jing
Zhao, Pengpeng
Cui, Zhiming
2014 IEEE INTERNATIONAL CONFERENCE ON IMAGE PROCESSING (ICIP), 2014, : 5227 - 5231
[46] Multi-Label Learning with Local Similarity of Samples
Zhu, Wenfang
Li, Weiwei
Jia, Xiuyi
2020 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2020,
[47] Deep Multi-Label Hashing for Image Retrieval
Zhong, Xian
Li, Jiachen
Huang, Wenxin
Xie, Liang
2019 IEEE 31ST INTERNATIONAL CONFERENCE ON TOOLS WITH ARTIFICIAL INTELLIGENCE (ICTAI 2019), 2019, : 1245 - 1251
[48] Direct Multi-label Linear Discriminant Analysis
Oikonomou, Maria
Tefas, Anastasios
ENGINEERING APPLICATIONS OF NEURAL NETWORKS, EANN 2013, PT I, 2013, 383 : 414 - 423
[49] A Community Discovery Approach in Multi-label data
Li, Na
Pan, Zhisong
Jiang, MingChu
Zhang, Yanyan
Yang, Haimin
2016 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2016, : 2196 - 2203
[50] Multi-label Crowdsourcing Learning with Incomplete Annotations
Li, Shao-Yuan
Jiang, Yuan
PRICAI 2018: TRENDS IN ARTIFICIAL INTELLIGENCE, PT I, 2018, 11012 : 232 - 245

← 1 2 3 4 5 →