Two-stream cross-attention vision Transformer based on RGB-D images for pig weight estimation

被引：11

作者：

He, Wei ^{[1
,2
,3
]}

Mi, Yang ^{[1
,2
,3
]}

Ding, Xiangdong ^{[4
,5
]}

Liu, Gang ^{[1
,2
]}

Li, Tao ^{[6
]}

机构：

[1] China Agr Univ, Coll Informat & Elect Engn, Beijing 100083, Peoples R China

[2] China Agr Univ, Minist Agr & Rural Affairs, Lab Agr Informat Acquisit Technol, Beijing 100083, Peoples R China

[3] Minist Agr & Rural Affairs, Key Lab Agr Machinery Monitoring & Big Data Applic, Beijing 100083, Peoples R China

[4] China Agr Univ, Anim Sci & Technol, Beijing 100193, Peoples R China

[5] Minist Agr & Rural Affairs, Engn Lab Anim Breeding, Lab Anim Genet Breeding & Reprod, Beijing 100193, Peoples R China

[6] Henan Fengyuan Hepu Agr & Anim Husb Co Ltd, Zhumadian 463900, Peoples R China

来源：

COMPUTERS AND ELECTRONICS IN AGRICULTURE | 2023年 / 212卷

关键词：

Pig-weight estimation; Cross-attention; Vision Transformer;

D O I：

10.1016/j.compag.2023.107986

中图分类号：

S [农业科学];

学科分类号：

09 ;

摘要：

Automatic non-contact estimation of pig weight can avoid porcine stress and prevent the spread of swine fever. Many recent relevant works employ convolutional neural networks to extract deeply learned features for regressing pig weight based on single modality, either RGB images or depth images. However, utilizing only one modality may not be sufficient for pig-weight estimation, since both modalities are complementary for representing the spatial body information of pigs. In this paper, we propose a two-stream cross-attention vision Transformer for regressing pig weight based on both RGB and depth images. Specifically, we employ two separate Swin Transformer to extract texture appearance information and spatial structure information from RGB and depth images, respectively. Meanwhile, we design the cross-attention blocks to learn mutual-modal representations from both modalities. Finally, we construct a feature fusion layer to combine the features from both streams for regressing pig weight. In the experiments, we collect a new dataset of paired RGB-D pig images, which contains 10,263 RGB-D pairs for training and 5203 RGB-D pairs for testing. Comprehensive comparative experimental results show that the proposed method yields the best performance on this dataset, where the mean absolute error is 3.237.

引用

页数：10

共 32 条

[1] Consumers' Concerns and Perceptions of Farm Animal Welfare [J].

Alonso, Marta E. ;

Gonzalez-Montana, Jose R. ;

Lomillos, Juan M. .

ANIMALS, 2020, 10 (03)

[2] Weighing affects short-term feeding patterns of growing-finishing pigs [J].

Augspurger, NR ;

Ellis, M .

CANADIAN JOURNAL OF ANIMAL SCIENCE, 2002, 82 (03) :445-448

[3] Determination of live weight of pigs from dimensions measured using image analysis [J].

Brandl, N ;

Jorgensen, E .

COMPUTERS AND ELECTRONICS IN AGRICULTURE, 1996, 15 (01) :57-72

[4] An Intelligent Pig Weights Estimate Method Based on Deep Learning in Sow Stall Environments [J].

Cang, Yan ;

He, Hengxiang ;

Qiao, Yulong .

IEEE ACCESS, 2019, 7 :164867-164875

[5] End-to-End Object Detection with Transformers [J].

Carion, Nicolas ;

Massa, Francisco ;

Synnaeve, Gabriel ;

Usunier, Nicolas ;

Kirillov, Alexander ;

Zagoruyko, Sergey .

COMPUTER VISION - ECCV 2020, PT I, 2020, 12346 :213-229

[6]

Couprie C., 2014, The Journal of Machine Learning Research

[7]

Dosovitskiy A, 2021, Arxiv, DOI arXiv:2010.11929

[8] Attention mechanisms in computer vision: A survey [J].

Guo, Meng-Hao ;

Xu, Tian-Xing ;

Liu, Jiang-Jiang ;

Liu, Zheng-Ning ;

Jiang, Peng-Tao ;

Mu, Tai-Jiang ;

Zhang, Song-Hai ;

Martin, Ralph R. ;

Cheng, Ming-Ming ;

Hu, Shi-Min .

COMPUTATIONAL VISUAL MEDIA, 2022, 8 (03) :331-368

[9] Learning Rich Features from RGB-D Images for Object Detection and Segmentation [J].

Gupta, Saurabh ;

Girshick, Ross ;

Arbelaez, Pablo ;

Malik, Jitendra .

COMPUTER VISION - ECCV 2014, PT VII, 2014, 8695 :345-360

[10]

Hao YY, 2021, Arxiv, DOI arXiv:2109.09406

← 1 2 3 4 →