MASPP and MWASP: multi-head self-attention based modules for UNet network in melon spot segmentation

被引:1
作者
Khoa-Dang Tran [1 ]
Trang-Thi Ho [2 ]
Huang, Yennun [1 ]
Nguyen Quoc Khanh Le [3 ]
Le Quoc Tuan [4 ]
Van Lam Ho [5 ]
机构
[1] Acad Sinica, Res Ctr Informat Technol Innovat, Taipei 10607, Taiwan
[2] Tamkang Univ, Dept Comp Sci & Informat Engn, New Taipei 251301, Taiwan
[3] Taipei Med Univ, Coll Med, Profess Master Program Artificial Intelligence Me, Taipei 106, Taiwan
[4] Yuan Ze Univ, Coll Management, Taoyuan 32003, Taiwan
[5] Quy Nhon Univ, Fac Informat Technol, Quy Nhon, Vietnam
关键词
Atrous spatial pyramid pooling; Multi-head self-attention; UNet; Semantic segmentation; Waterfall atrous spatial pooling;
D O I
10.1007/s11694-024-02466-1
中图分类号
TS2 [食品工业];
学科分类号
0832 ;
摘要
Sweet melon, and in particular, spotted melon, is one of the most profitable fruit crops for farmers in the international market. As the spot ratio impacts the melon's visual appeal, it plays a significant role in shaping consumers' initial impressions and influencing their decision to purchase a spotted melon. However, accurately determining the spot area on a melon's skin is challenging due to the diverse sizes and colors of these spots among different types of melons. In this study, the novel networks based on UNet model have been proposed to accurately determine the spot area on melon skins after harvesting. First, Mask R-CNN model was employed to isolate the melons from unwanted objects and backgrounds. Then, the novel variants of the Atrous Spatial Pyramid Pooling (ASPP) and Waterfall Atrous Spatial Pooling (WASP) were developed based on the multi-head self-attention (MHSA) approach to efficiently enhance the original structures. Finally, the proposed modules were integrated into VGG16-UNet network to segment melons' spots on its skin. The experimental results demonstrate that the proposed methods yielded promising outcomes, achieving a mean IoU of 89.86% and an accuracy of 99.45% across all classes. Moreover, it outperformed other existing models.
引用
收藏
页码:3935 / 3949
页数:15
相关论文
共 70 条
[1]   Tomato Fruit Detection and Counting in Greenhouses Using Deep Learning [J].
Afonso, Manya ;
Fonteijn, Hubert ;
Fiorentin, Felipe Schadeck ;
Lensink, Dick ;
Mooij, Marcel ;
Faber, Nanne ;
Polder, Gerrit ;
Wehrens, Ron .
FRONTIERS IN PLANT SCIENCE, 2020, 11
[2]   Skin lesion segmentation in dermoscopy images via deep full resolution convolutional networks [J].
Al-Masni, Mohammed A. ;
Al-antari, Mugahed A. ;
Choi, Mun-Taek ;
Han, Seung-Moo ;
Kim, Tae-Seong .
COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE, 2018, 162 :221-231
[3]   Orchard Mapping with Deep Learning Semantic Segmentation [J].
Anagnostis, Athanasios ;
Tagarakis, Aristotelis C. ;
Kateris, Dimitrios ;
Moysiadis, Vasileios ;
Sorensen, Claus Gron ;
Pearson, Simon ;
Bochtis, Dionysis .
SENSORS, 2021, 21 (11)
[4]   Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic Segmentation [J].
Artacho, Bruno ;
Savakis, Andreas .
SENSORS, 2019, 19 (24)
[5]   Improved Pixel-Level Pavement-Defect Segmentation Using a Deep Autoencoder [J].
Augustauskas, Rytis ;
Lipnickas, Arunas .
SENSORS, 2020, 20 (09)
[6]   SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation [J].
Badrinarayanan, Vijay ;
Kendall, Alex ;
Cipolla, Roberto .
IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2017, 39 (12) :2481-2495
[7]  
Balakrishna C, 2018, Arxiv, DOI arXiv:1806.07554
[8]   Image Segmentation for Fruit Detection and Yield Estimation in Apple Orchards [J].
Bargoti, Suchet ;
Underwood, James P. .
JOURNAL OF FIELD ROBOTICS, 2017, 34 (06) :1039-1060
[9]   Fully Convolutional Neural Network with Augmented Atrous Spatial Pyramid Pool and Fully Connected Fusion Path for High Resolution Remote Sensing Image Segmentation [J].
Chen, Guangsheng ;
Li, Chao ;
Wei, Wei ;
Jing, Weipeng ;
Wozniak, Marcin ;
Blazauskas, Tomas ;
Damasevicius, Robertas .
APPLIED SCIENCES-BASEL, 2019, 9 (09)
[10]  
Chen LC, 2016, Arxiv, DOI arXiv:1412.7062