Scene Text Visual Question Answering

被引:145
作者
Biten, Ali Furkan [1 ]
Tito, Ruben [1 ]
Mafla, Andres [1 ]
Gomez, Lluis [1 ]
Rusinol, Marcal [1 ]
Valveny, Ernest [1 ]
Jawahar, C. V. [2 ]
Karatzas, Dimosthenis [1 ]
机构
[1] UAB, Comp Vis Ctr, Barcelona, Spain
[2] IIIT Hyderabad, CVIT, Hyderabad, India
来源
2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV 2019) | 2019年
关键词
D O I
10.1109/ICCV.2019.00439
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Current visual question answering datasets do not consider the rich semantic information conveyed by text within an image. In this work, we present a new dataset, ST-VQA, that aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the Visual Question Answering process. We use this dataset to define a series of tasks of increasing difficulty for which reading the scene text in the context provided by the visual information is necessary to reason and generate an appropriate answer. We propose a new evaluation metric for these tasks to account both for reasoning errors as well as shortcomings of the text recognition module. In addition we put forward a series of baseline methods, which provide further insight to the newly released dataset, and set the scene for further research.
引用
收藏
页码:4290 / 4300
页数:11
相关论文
共 60 条
  • [1] Acharya M, 2019, AAAI CONF ARTIF INTE, P8076
  • [2] Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
    Agrawal, Aishwarya
    Batra, Dhruv
    Parikh, Devi
    Kembhavi, Aniruddha
    [J]. 2018 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2018, : 4971 - 4980
  • [3] Word Spotting and Recognition with Embedded Attributes
    Almazan, Jon
    Gordo, Albert
    Fornes, Alicia
    Valveny, Ernest
    [J]. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2014, 36 (12) : 2552 - 2566
  • [4] [Anonymous], 2017, P IEEE C COMP VIS PA
  • [5] [Anonymous], 1966, Soviet Physics Doklady
  • [6] [Anonymous], 2018, P EUR C COMP VIS ECC
  • [7] [Anonymous], 2017, ARXIV171007300
  • [8] [Anonymous], 2017, ARXIV170904303
  • [9] [Anonymous], 2017, P IEEE C COMP VIS PA
  • [10] VQA: Visual Question Answering
    Antol, Stanislaw
    Agrawal, Aishwarya
    Lu, Jiasen
    Mitchell, Margaret
    Batra, Dhruv
    Zitnick, C. Lawrence
    Parikh, Devi
    [J]. 2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV), 2015, : 2425 - 2433