FASSST: Fast Attention Based Single-Stage Segmentation Net for Real-Time Instance Segmentation

被引:1
作者
Cheng, Yuan [1 ,2 ]
Lin, Rui [2 ]
Zhen, Peining [1 ]
Hou, Tianshu [1 ]
Ng, Chiu Wa [2 ]
Chen, Hai-Bao [1 ]
Yu, Hao [3 ]
Wong, Ngai [2 ]
机构
[1] Shanghai Jiao Tong Univ, Shanghai, Peoples R China
[2] Univ Hong Kong, Hong Kong, Peoples R China
[3] Southern Univ Sci & Technol, Shenzhen, Peoples R China
来源
2022 IEEE WINTER CONFERENCE ON APPLICATIONS OF COMPUTER VISION (WACV 2022) | 2022年
关键词
D O I
10.1109/WACV51458.2022.00277
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Real-time instance segmentation is crucial in various AI applications. This work designs a network named Fast Attention based Single-Stage Segmentation NeT (FASSST) that performs instance segmentation with video-grade speed Using an instance attention module (IAM), FASSST quickly locates target instances and segments with region of interest (ROI) feature fusion (RFF) aggregating ROI features from pyramid mask layers. The module employs an efficient single-stage feature regression, straight from features to instance coordinates and class probabilities. Experiments on COCO and CityScapes datasets show that FASSST achieves state-of-the-art performance under competitive accuracy: real-time inference of 47 .5EPS on a GTX1080Ti GPU and 5.3FPS on a Jetson Xavier MC board with only 71.6GFLOPs.
引用
收藏
页码:2714 / 2722
页数:9
相关论文
共 32 条
[1]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[2]   SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation [J].
Cao, Jiale ;
Anwer, Rao Muhammad ;
Cholakkal, Hisham ;
Khan, Fahad Shahbaz ;
Pang, Yanwei ;
Shao, Ling .
COMPUTER VISION - ECCV 2020, PT XIV, 2020, 12359 :1-18
[3]   BlendMask: Top-Down Meets Bottom-Up for Instance Segmentation [J].
Chen, Hao ;
Sun, Kunyang ;
Tian, Zhi ;
Shen, Chunhua ;
Huang, Yongming ;
Yan, Youliang .
2020 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2020), 2020, :8570-8578
[4]  
Chen X., 2017, PROC CVPR IEEE, V1, P3, DOI DOI 10.1109/CVPR.2017.691
[5]  
Cheng Yuan, 2019, ICCAD, P1
[6]   The Cityscapes Dataset for Semantic Urban Scene Understanding [J].
Cordts, Marius ;
Omran, Mohamed ;
Ramos, Sebastian ;
Rehfeld, Timo ;
Enzweiler, Markus ;
Benenson, Rodrigo ;
Franke, Uwe ;
Roth, Stefan ;
Schiele, Bernt .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3213-3223
[7]   Instance-aware Semantic Segmentation via Multi-task Network Cascades [J].
Dai, Jifeng ;
He, Kaiming ;
Sun, Jian .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :3150-3158
[8]  
Dai JF, 2015, PROC CVPR IEEE, P3992, DOI 10.1109/CVPR.2015.7299025
[9]  
De Brabandere B, 2017, ARXIV
[10]   Convolutional Two-Stream Network Fusion for Video Action Recognition [J].
Feichtenhofer, Christoph ;
Pinz, Axel ;
Zisserman, Andrew .
2016 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2016, :1933-1941