Interpreting Natural Language Instructions Using Language, Vision, and Behavior

被引:3
|
作者
Benotti, Luciana [1 ,2 ]
Lau, Tessa [3 ]
Villalba, Martin [1 ,4 ]
机构
[1] Univ Nacl Cordoba, Cordoba, Argentina
[2] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
[3] Savioke Inc, Sunnyvale, CA USA
[4] Univ Potsdam, D-14476 Potsdam, Germany
关键词
Design; Algorithms; Performance; Natural language interpretation; multimodal understanding; action recognition; visual feedback; situated virtual agent; unsupervised learning;
D O I
10.1145/2629632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
引用
收藏
页数:22
相关论文
共 50 条
  • [41] Predicting Fluency With Language Proficiency, Working Memory, and Directionality in Simultaneous Interpreting
    Lin, Yumeng
    Lv, Qianxi
    Liang, Junying
    FRONTIERS IN PSYCHOLOGY, 2018, 9
  • [42] Detection of Fake News Using Machine Learning and Natural Language Processing Algorithms
    Prachi, Noshin Nirvana
    Habibullah, Md.
    Rafi, Md. Emanul Haque
    Alam, Evan
    Khan, Riasat
    JOURNAL OF ADVANCES IN INFORMATION TECHNOLOGY, 2022, 13 (06) : 652 - 661
  • [43] Using natural language processing to link patients' narratives to visual capabilities and sentiments
    He, Dongcheng
    Chung, Susana T. L.
    OPTOMETRY AND VISION SCIENCE, 2024, 101 (06) : 379 - 387
  • [44] Comparing Natural Language and Vibro-Audio Modalities for Inclusive STEM Learning with Blind and Low Vision Users
    Brown, Justin R.
    Doore, Stacy A.
    Dimmel, Justin K.
    Giudice, Norbert
    Giudice, Nicholas A.
    PROCEEDINGS OF THE 25TH INTERNATIONAL ACM SIGACCESS CONFERENCE ON COMPUTERS AND ACCESSIBILITY, ASSETS 2023, 2023,
  • [45] A Fast and Compact Language Model Implementation Using Double-Array Structures
    Norimatsu, Jun-Ya
    Yasuhara, Makoto
    Tanaka, Toru
    Yamamoto, Mikio
    ACM TRANSACTIONS ON ASIAN AND LOW-RESOURCE LANGUAGE INFORMATION PROCESSING, 2016, 15 (04)
  • [46] Word-Based Self-Indexes for Natural Language Text
    Farina, Antonio
    Brisaboa, Nieves R.
    Navarro, Gonzalo
    Claude, Francisco
    Places, Angeles S.
    Rodriguez, Eduardo
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 2012, 30 (01)
  • [47] Quantum Natural Language Processing: A Comprehensive Survey
    Varmantchaonala, Charles M.
    Fendji, Jean Louis K. E.
    Schoning, Julius
    Atemkeng, Marcellin
    IEEE ACCESS, 2024, 12 : 99578 - 99598
  • [48] Natural language processing in pathology: a scoping review
    Burger, Gerard
    Abu-Hanna, Ameen
    de Keizer, Nicolette
    Cornet, Ronald
    JOURNAL OF CLINICAL PATHOLOGY, 2016, 69 (11) : 949 - 955
  • [49] Discovering Natural Language Commands in Multimodal Interfaces
    Srinivasan, Arjun
    Dontcheva, Mira
    Adar, Eytan
    Walker, Seth
    PROCEEDINGS OF IUI 2019, 2019, : 661 - 672
  • [50] Natural language processing and foreign language learning: Methodological approach from the performance of a linguistic task
    Canas, Alejandro Ramirez
    CUADERNO ACTIVA, 2022, (14): : 43 - 63