A semantic segmentation algorithm for fashion images based on modified mask RCNN

被引:3
作者
He, Wentao [1 ]
Wang, Jing'an [1 ]
Wang, Lei [1 ]
Pan, Ruru [1 ]
Gao, Weidong [1 ]
机构
[1] Jiangnan Univ, Key Lab Ecotext, Minist Educ, 1800 Lihu Ave, Wuxi 214122, Jiangsu, Peoples R China
基金
中国国家自然科学基金;
关键词
Clothes; Semantic segmentation; Mask RCNN; Feature pyramid network; Detection accuracy;
D O I
10.1007/s11042-023-14958-1
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
The semantic segmentation of human body images has huge application potential in many fields, such as autonomous driving, artificial intelligence (AI) face changing, and virtual try-on. Nowadays, many researchers use additional human body posture information to generate multi-level human body analysis images. However, the existing method has limitations when faced with multiple poses and overlapping targets. In this paper, a novel algorithm based on Mask RCNN which has pixel-level accuracy is proposed. In the feature extraction process, a multi-scale feature fusion module applying dilated convolution is proposed to obtain richer semantic information from different perceptual fields. We added a small residual module to the original residual unit structure to increase the size of the receptive field of each layer to capture details and global characteristics. Three convolution kernels with different ratios are designed to obtain receptive fields of different scales. The experimental results show that our method has better performance while considering both object positioning and target classification.
引用
收藏
页码:28427 / 28444
页数:18
相关论文
共 25 条
[1]   FRED-Net: Fully residual encoder-decoder network for accurate iris segmentation [J].
Arsalan, Muhammad ;
Kim, Dong Seop ;
Lee, Min Beom ;
Owais, Muhammad ;
Park, Kang Ryoung .
EXPERT SYSTEMS WITH APPLICATIONS, 2019, 122 :217-241
[2]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[3]  
Chen LC., 2014, SEMANTIC IMAGE SEGME, DOI DOI 10.48550/ARXIV.1412.7062
[4]   Multi-layer Adaptive Feature Fusion for Semantic Segmentation [J].
Chen, Yizhen ;
Hu, Haifeng .
NEURAL PROCESSING LETTERS, 2020, 51 (02) :1081-1092
[5]   Res2Net: A New Multi-Scale Backbone Architecture [J].
Gao, Shang-Hua ;
Cheng, Ming-Ming ;
Zhao, Kai ;
Zhang, Xin-Yu ;
Yang, Ming-Hsuan ;
Torr, Philip .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2021, 43 (02) :652-662
[6]   A survey on deep learning techniques for image and video semantic segmentation [J].
Garcia-Garcia, Alberto ;
Orts-Escolano, Sergio ;
Oprea, Sergiu ;
Villena-Martinez, Victor ;
Martinez-Gonzalez, Pablo ;
Garcia-Rodriguez, Jose .
APPLIED SOFT COMPUTING, 2018, 70 :41-65
[7]   Look into Person: Self-supervised Structure-sensitive Learning and A New Benchmark for Human Parsing [J].
Gong, Ke ;
Liang, Xiaodan ;
Zhang, Dongyu ;
Shen, Xiaohui ;
Lin, Liang .
30TH IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2017), 2017, :6757-6765
[8]  
He KM, 2020, IEEE T PATTERN ANAL, V42, P386, DOI [10.1109/ICCV.2017.322, 10.1109/TPAMI.2018.2844175]
[9]  
HE KM, 2016, PROC CVPR IEEE, P770, DOI [DOI 10.1109/CVPR.2016.90, 10.1109/CVPR.2016.90]
[10]   DeepLabV3-Refiner-Based Semantic Segmentation Model for Dense 3D Point Clouds [J].
Kwak, Jeonghoon ;
Sung, Yunsick .
REMOTE SENSING, 2021, 13 (08)