Effects of annotation quality on model performance

被引:9
作者
Alhazmi, Khaled [1 ]
Alsumari, Walaa [1 ]
Seppo, Indrek [2 ]
Podkuiko, Lara [2 ]
Simon, Martin [2 ]
机构
[1] King Abdulaziz City Sci & Technol KACST, Natl Ctr Robot & IoT Technol, Riyadh, Saudi Arabia
[2] Marduk Technol OU, Tallinn, Estonia
来源
3RD INTERNATIONAL CONFERENCE ON ARTIFICIAL INTELLIGENCE IN INFORMATION AND COMMUNICATION (IEEE ICAIIC 2021) | 2021年
关键词
Machine learning; supervised learning data; training data; computer vision; custom dataset; object detection; annotation quality;
D O I
10.1109/ICAIIC51459.2021.9415271
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Supervised machine learning generally requires pre-labelled data. Although there are several open access and pre-annotated datasets available for training machine learning algorithms, most contain a limited number of object classes, which may not be suitable for specific tasks. As previously available pre-annotated data is not usually sufficient for custom models, most of the real world applications require collecting and preparing training data. There is an obvious trade-off between annotation quality and quantity. Time and resources can be allocated for ensuring superior data quality or for increasing the quantity of the annotated data. We test the degree of the detrimental effect caused by the annotation errors. We conclude that while the results deteriorate if annotations are erroneous: the effect - at least while using relatively homogeneous sequential video data - is limited. The benefits Prom the increased annotated data set size (created by using imperfect auto-annotation methods) outweighs the deterioration caused by annotated data.
引用
收藏
页码:63 / 67
页数:5
相关论文
共 7 条
  • [1] Bochkovskiy A., 2020, YOLOV4 OPTIMAL SPEED, DOI DOI 10.48550/ARXIV.2004.10934,ARXIV
  • [2] R Core Team, 2019, R LANG ENV STAT COMP
  • [3] On rendering synthetic images for training an object detector
    Rozantsev, Artem
    Lepetit, Vincent
    Fua, Pascal
    [J]. COMPUTER VISION AND IMAGE UNDERSTANDING, 2015, 137 : 24 - 37
  • [4] Sheng VS, 2019, AAAI CONF ARTIF INTE, P9837
  • [5] Solawetz J., 2020, TACKLING SMALL OBJEC
  • [6] Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
    Sun, Chen
    Shrivastava, Abhinav
    Singh, Saurabh
    Gupta, Abhinav
    [J]. 2017 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2017, : 843 - 852
  • [7] Wickham H., 2016, GGPLOT2 ELEGANT GRAP