Effects of annotation quality on model performance

被引：9

作者：

Alhazmi, Khaled ^{[1
]}

Alsumari, Walaa ^{[1
]}

Seppo, Indrek ^{[2
]}

Podkuiko, Lara ^{[2
]}

Simon, Martin ^{[2
]}

机构：

[1] King Abdulaziz City Sci & Technol KACST, Natl Ctr Robot & IoT Technol, Riyadh, Saudi Arabia

[2] Marduk Technol OU, Tallinn, Estonia

来源：

3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年

关键词：

Machine learning; supervised learning data; training data; computer vision; custom dataset; object detection; annotation quality;

D O I：

10.1109/ICAIIC51459.2021.9415271

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous: the effect - at least while using relatively homogeneous sequential video data - is limited. The benefits Prom the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.

引用

页码：63 / 67

页数：5

共 7 条

[1] Bochkovskiy A., 2020, YOLOV4 OPTIMAL SPEED, DOI DOI 10.48550/ARXIV.2004.10934,ARXIV
[2] R Core Team, 2019, R LANG ENV STAT COMP
[3] On rendering synthetic images for training an object detector
Rozantsev, Artem
Lepetit, Vincent
Fua, Pascal
[J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 137 : 24 - 37
[4] Sheng VS, 2019, AAAI CONF ARTIF INTE, P9837
[5] Solawetz J., 2020, TACKLING SMALL OBJEC
[6] Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Sun, Chen
Shrivastava, Abhinav
Singh, Saurabh
Gupta, Abhinav
[J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 843 - 852
[7] Wickham H., 2016, GGPLOT2 ELEGANT GRAP

← 1 →