Video object segmentation for automatic image annotation of ethernet connectors with environment mapping and 3D projection

被引：0

作者：

Danta, Marrone ^{[1
]}

Dreyer, Pedro ^{[1
]}

Bezerra, Daniel ^{[1
]}

Reis, Gabriel ^{[1
]}

Souza, Ricardo ^{[2
]}

Lins, Silvia ^{[2
]}

Kelner, Judith ^{[1
]}

Sadok, Djamel ^{[1
]}

机构：

[1] Univ Fed Pernambuco, Ctr Informat, Grp Pesquisa Redes & Telecomunicacao, Recife, PE, Brazil

[2] Ericsson Res, Indaiatuba, SP, Brazil

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2022年 / 81卷 / 28期

关键词：

RJ45; Automatic annotation; Object tracking; 3D projection; Video object segmentation;

D O I：

10.1007/s11042-022-13128-z

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

The creation of a dataset is time-consuming and sometimes discourages researchers from pursuing their goals. To overcome this problem, we present and discuss two solutions adopted for the automation of this process. Both optimize valuable user time and resources and use video object segmentation with object tracking and 3D projection. In our scenario, we acquire images from a moving robotic arm and, for each approach, generate distinct annotated datasets. We evaluated the precision of the annotations by comparing these with a manually annotated dataset. As a complementary test to assess the quality of the generated datasets and to achieve a generalization of our contribution, we tested detection and classification problems. In both tests, we rely on solutions with Convolution Neural Network and Deep Learning. For detection support, we used YOLO and obtained for the projection dataset an F1-Score, accuracy, and mAP values of 0.846, 0.924, and 0.875, respectively. Concerning the tracking dataset, we achieved an F1-Score of 0.861, an accuracy of 0.932, whereas mAP reached 0.894. For the classification, we adopted the two metrics accuracy and F1-Score, and used the known networks VGG, DenseNet, MobileNet, Inception, and ResNet. The VGG architecture outperformed the others for both projection and tracking datasets. It reached an accuracy and F1-score of 0.997 and 0.993, respectively. Similarly, for the tracking dataset, it achieved an accuracy of 0.991 and an F1-Score of 0.981.

引用

页码：39891 / 39913

页数：23

共 23 条

[21] A Direct Method for Robust Model-Based 3D Object Tracking from a Monocular RGB Image [J].

Seo, Byung-Kuk ;

Wuest, Harald .

COMPUTER VISION - ECCV 2016 WORKSHOPS, PT III, 2016, 9915 :551-562

[22] Robust Semi-Automatic Depth Map Generation in Unconstrained Images and Video Sequences for 2D to Stereoscopic 3D Conversion [J].

Phan, Raymond ;

Androutsos, Dimitrios .

IEEE TRANSACTIONS ON MULTIMEDIA, 2014, 16 (01) :122-136

[23] Occluded human recognition for a leader following system using 3D range and image data in forest environment [J].

Cho, Kuk ;

Ilyas, Muhammad ;

Baeg, Seung-Ho ;

Park, Sangdeok .

UNMANNED SYSTEMS TECHNOLOGY XVI, 2014, 9084

← 1 2 3 →