Visual explainable artificial intelligence for graph-based visual question answering and scene graph curation

被引:0
作者
Sebastian Künzel [1 ]
Tanja Munz-Körner [1 ]
Pascal Tilli [2 ]
Noel Schäfer [1 ]
Sandeep Vidyapu [1 ]
Ngoc Thang Vu [2 ]
Daniel Weiskopf [1 ]
机构
[1] VISUS, University of Stuttgart, Stuttgart
[2] IMS, University of Stuttgart, Stuttgart
关键词
Explainable artificial intelligence; Scene graphs; Visual analytics; Visual question answering;
D O I
10.1186/s42492-025-00185-y
中图分类号
学科分类号
摘要
This study presents a novel visualization approach to explainable artificial intelligence for graph-based visual question answering (VQA) systems. The method focuses on identifying false answer predictions by the model and offers users the opportunity to directly correct mistakes in the input space, thus facilitating dataset curation. The decision-making process of the model is demonstrated by highlighting certain internal states of a graph neural network (GNN). The proposed system is built on top of a GraphVQA framework that implements various GNN-based models for VQA trained on the GQA dataset. The authors evaluated their tool through the demonstration of identified use cases, quantitative measures, and a user study conducted with experts from machine learning, visualization, and natural language processing domains. The authors’ findings highlight the prominence of their implemented features in supporting the users with incorrect prediction identification and identifying the underlying issues. Additionally, their approach is easily extendable to similar models aiming at graph-based question answering. © The Author(s) 2025.
引用
收藏
相关论文
共 45 条
  • [1] Chang X.J., Ren P.Z., Xu P.F., Li Z.H., Chen X.J., Hauptmann A., A comprehensive survey of scene graphs: Generation and application, IEEE Trans Pattern Anal Mach Intell, 45, 1, pp. 1-26, (2023)
  • [2] Damodaran V., Chakravarthy S., Kumar A., Umapathy A., Mitamura T., Nakashima Y., Et al., Understanding the Role of Scene Graphs in Visual Question Answering, (2021)
  • [3] Liang W.X., Jiang Y.H., Liu Z.X., GraghVQA: Language-guided graph neural networks for graph-based visual question answering, Proceedings of the 3Rd Workshop on Multimodal Artificial Intelligence, Association for Computational Linguistics, (2021)
  • [4] Hudson D.A., Manning C.D., GQA: A new dataset for real-world visual reasoning and compositional question answering, Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE Computer Society, Long Beach, 15–20 June 2019, (2019)
  • [5] Vath D., Tilli P., Vu N.T., Beyond accuracy: A consolidated tool for visual question answering benchmarking, Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Punta Cana, 7–11 November 2021, (2021)
  • [6] Danilevsky M., Qian K., Aharonov R., Katsis Y., Kawas B., Sen P., A survey of the state of explainable AI for natural language processing, Proceedings of the 1St Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10Th International Joint Conference on Natural Language Processing, Association for Computational Linguistics, pp. 4-7, (2020)
  • [7] Dosilovic F.K., Brcic M., Hlupic N., Explainable artificial intelligence: a survey, Proceedings of the 41St International Convention on Information and Communication Technology, Electronics and Microelectronics, IEEE, Opatija, (2018)
  • [8] Xu F.Y., Uszkoreit H., Du Y.Z., Fan W., Zhao D.Y., Zhu J., Explainable AI: A brief survey on history, research areas, approaches and challenges, Natural Language Processing and Chinese Computing. 8Th CCF International Conference, NLPCC 2019, Dunhuang, October 2019, 11839, pp. 563-574, (2019)
  • [9] Schafer N., Kunzel S., Munz-Korner T., Tilli P., Vidyapu S., Thang Vu N., Et al., Visual analysis of scene-graph-based visual question answering, Proceedings of the 16Th International Symposium on Visual Information Communication and Interaction, Association for Computing Machinery, pp. 22-24, (2023)
  • [10] Schafer N., Kunzel S., Tilli P., Munz-Korner T., Vidyapu S., Vu N.T., Et al., Extended Visual Analysis System for Scene-Graph-Based Visual Question Answering. Darus, (2024)