Occlusion-robust workflow recognition with context-aware compositional ConvNet

被引：1

作者：

Zhang, Min ^{[1
,2
]}

Hu, Haiyang ^{[2
]}

Li, Zhongjin ^{[2
]}

Chen, Jie ^{[2
]}

机构：

[1] Zhejiang Ind Polytech Coll, Dept Design & Art, Shaoxing, Peoples R China

[2] Hangzhou Dianzi Univ, Sch Comp Sci & Technol, Hangzhou, Peoples R China

来源：

SOFT COMPUTING | 2024年 / 28卷 / 06期

基金：

中国国家自然科学基金;

关键词：

Workflow recognition; Occlusion detection; Deep learning; Bounding box voting;

D O I：

10.1007/s00500-023-09225-2

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Workflow recognition relying on deep convolutional neural network has obtained promising performance. Though impressive results have been achieved on standard industrial workflow, the performance on heavily occluded workflow remains far from satisfactory. In this paper, we present an effective context-aware compositional ConvNet (CA-CompNet) for occluded workflow detection with the following contributions. First, we combine compositional model and original ConvNet together to build a unified deep architecture for occluded workflow detection, which has shown innate robustness to address the problem of object classification under occlusion. Second, in order to overcome the variable occlusion limitations, the bounding box annotations are utilized to segment the context from target workflow instance during training. Then, these segmentations are used to learn the proposed CA-CompNet, which enables the network to untangle the feature representation of workflow instance from the context. Third, a robust voting mechanism for candidate bounding box is introduced to improve the detection accuracy, which facilitates the model to precisely detect the bounding box of a specific workflow instance. Comprehensive experiments demonstrate that the proposed context-aware network can robustly detect workflow instance under occlusion in industrial environment, increasing the detection performance on MS COCO dataset by 4.6% (from 45.1 to 49.7%) in absolute performance compared to the advanced CenterNet.

引用

页码：5125 / 5135

页数：11

共 38 条

[1] Arthur D, 2007, PROCEEDINGS OF THE EIGHTEENTH ANNUAL ACM-SIAM SYMPOSIUM ON DISCRETE ALGORITHMS, P1027
[2] Banerjee A, 2005, J MACH LEARN RES, V6, P1345
[3] Correlational spectral clustering
Blaschko, Matthew B.
Lampert, Christoph H.
[J]. 2008 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, VOLS 1-12, 2008, : 93 - +
[4] Bochkovskiy Alexey, 2020, YOLOv4: Optimal speed and accuracy of object detection, DOI DOI 10.48550/ARXIV.2004.10934
[5] Deng J, 2009, PROC CVPR IEEE, P248, DOI 10.1109/CVPRW.2009.5206848
[6] DeVries Terrance, 2017, Improved regularization of convolutional neural networks with cutout
[7] Fidler Sanja, 2014, ARXIV
[8] CONNECTIONISM AND COGNITIVE ARCHITECTURE - A CRITICAL ANALYSIS
FODOR, JA
PYLYSHYN, ZW
[J]. COGNITION, 1988, 28 (1-2) : 3 - 71
[9] A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs
George, Dileep
Lehrach, Wolfgang
Kansky, Ken
Lazaro-Gredilla, Miguel
Laan, Christopher
Marthi, Bhaskara
Lou, Xinghua
Meng, Zhaoshi
Liu, Yi
Wang, Huayan
Lavin, Alex
Phoenix, D. Scott
[J]. SCIENCE, 2017, 358 (6368)
[10] Rich feature hierarchies for accurate object detection and semantic segmentation
Girshick, Ross
Donahue, Jeff
Darrell, Trevor
Malik, Jitendra
[J]. 2014 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2014, : 580 - 587

← 1 2 3 4 →