More Diverse Training, Better Compositionality! Evidence from Multimodal Language Learning

被引:0
|
作者
Volquardsen, Caspar [1 ]
Lee, Jae Hee [1 ]
Weber, Cornelius [1 ]
Wermter, Stefan [1 ]
机构
[1] Univ Hamburg, Dept Informat, Knowledge Technol, Hamburg, Germany
关键词
Compositional generalization; Computer vision; Multimodality; Sequence-to-sequence; Robotics;
D O I
10.1007/978-3-031-15934-3_35
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
Artificial neural networks still fall short of human-level generalization and require a very large number of training examples to succeed. Model architectures that further improve generalization capabilities are therefore still an open research question. We created a multimodal dataset from simulation for measuring the compositional generalization of neural networks in multimodal language learning. The dataset consists of sequences showing a robot arm interacting with objects on a table in a simple 3D environment, with the goal of describing the interaction. Compositional object features, multiple actions, and distracting objects pose challenges to the model. We show that an LSTM-encoder-decoder architecture jointly trained together with a vision-encoder surpasses previous performance and handles multiple visible objects. Visualization of important input dimensions shows that a model that is trained with multiple objects, but not a model trained on just one object, has learnt to ignore irrelevant objects. Furthermore we show that additional modalities in the input improve the overall performance. We conclude that the underlying training data has a significant influence on the model's capability to generalize compositionally.
引用
收藏
页码:417 / 428
页数:12
相关论文
共 50 条
  • [1] More Diverse Means Better: Multimodal Deep Learning Meets Remote-Sensing Imagery Classification
    Hong, Danfeng
    Gao, Lianru
    Yokoya, Naoto
    Yao, Jing
    Chanussot, Jocelyn
    Du, Qian
    Zhang, Bing
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2021, 59 (05): : 4340 - 4354
  • [2] Learning Better Masking for Better Language Model Pre-training
    Yang, Dongjie
    Zhang, Zhuosheng
    Zhao, Hai
    PROCEEDINGS OF THE 61ST ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, ACL 2023, VOL 1, 2023, : 7255 - 7267
  • [3] Washback from language tests on teaching, learning and policy: evidence from diverse settings
    Rea-Dickins, Pauline
    Scott, Catriona
    ASSESSMENT IN EDUCATION-PRINCIPLES POLICY & PRACTICE, 2007, 14 (01) : 1 - 7
  • [4] Generalization in Multimodal Language Learning from Simulation
    Eisermann, Aaron
    Lee, Jae Hee
    Weber, Cornelius
    Wermter, Stefan
    2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN), 2021,
  • [5] Bilingual language control adapts to language switching training: More ERP evidence from late unbalanced bilinguals
    Kang, Chunyan
    Li, Shuhua
    Ma, Fengyang
    Guo, Taomei
    JOURNAL OF NEUROLINGUISTICS, 2023, 67
  • [6] The more, the better? The influence of learning opportunities in physics teacher training programs
    Schiering, Dustin
    Sorge, Stefan
    Neumann, Knut
    ZEITSCHRIFT FUR ERZIEHUNGSWISSENSCHAFT, 2021, 24 (03): : 545 - 570
  • [7] Enhanced Learning through Multimodal Training: Evidence from a Comprehensive Cognitive, Physical Fitness, and Neuroscience Intervention
    Ward, N.
    Paul, E.
    Watson, P.
    Cooke, G. E.
    Hillman, C. H.
    Cohen, N. J.
    Kramer, A. F.
    Barbey, A. K.
    SCIENTIFIC REPORTS, 2017, 7
  • [8] Enhanced Learning through Multimodal Training: Evidence from a Comprehensive Cognitive, Physical Fitness, and Neuroscience Intervention
    N. Ward
    E. Paul
    P. Watson
    G. E. Cooke
    C. H. Hillman
    N. J. Cohen
    A. F. Kramer
    A. K. Barbey
    Scientific Reports, 7
  • [9] INTRODUCTION TO THE SPECIAL ISSUE LANGUAGE LEARNING FROM MULTIMODAL INPUT
    Peters, Elke
    Munoz, Carmen
    STUDIES IN SECOND LANGUAGE ACQUISITION, 2020, 42 (03) : 489 - 497
  • [10] Deep learning: from speech recognition to language and multimodal processing
    Deng, Li
    APSIPA TRANSACTIONS ON SIGNAL AND INFORMATION PROCESSING, 2016, 5