Interpreting Natural Language Instructions Using Language, Vision, and Behavior

被引:3
|
作者
Benotti, Luciana [1 ,2 ]
Lau, Tessa [3 ]
Villalba, Martin [1 ,4 ]
机构
[1] Univ Nacl Cordoba, Cordoba, Argentina
[2] Consejo Nacl Invest Cient & Tecn, Buenos Aires, DF, Argentina
[3] Savioke Inc, Sunnyvale, CA USA
[4] Univ Potsdam, D-14476 Potsdam, Germany
关键词
Design; Algorithms; Performance; Natural language interpretation; multimodal understanding; action recognition; visual feedback; situated virtual agent; unsupervised learning;
D O I
10.1145/2629632
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
We define the problem of automatic instruction interpretation as follows. Given a natural language instruction, can we automatically predict what an instruction follower, such as a robot, should do in the environment to follow that instruction? Previous approaches to automatic instruction interpretation have required either extensive domain-dependent rule writing or extensive manually annotated corpora. This article presents a novel approach that leverages a large amount of unannotated, easy-to-collect data from humans interacting in a game-like environment. Our approach uses an automatic annotation phase based on artificial intelligence planning, for which two different annotation strategies are compared: one based on behavioral information and the other based on visibility information. The resulting annotations are used as training data for different automatic classifiers. This algorithm is based on the intuition that the problem of interpreting a situated instruction can be cast as a classification problem of choosing among the actions that are possible in the situation. Classification is done by combining language, vision, and behavior information. Our empirical analysis shows that machine learning classifiers achieve 77% accuracy on this task on available English corpora and 74% on similar German corpora. Finally, the inclusion of human feedback in the interpretation process is shown to boost performance to 92% for the English corpus and 90% for the German corpus.
引用
收藏
页数:22
相关论文
共 50 条
  • [21] Tracking medical students' clinical experiences using natural language processing
    Denny, Joshua C.
    Bastarache, Lisa
    Sastre, Elizabeth Ann
    Spickard, Anderson, III
    JOURNAL OF BIOMEDICAL INFORMATICS, 2009, 42 (05) : 781 - 789
  • [22] Programming language, natural language? Supporting the diverse computational activities of novice programmers
    Good, Judith
    Howland, Kate
    JOURNAL OF VISUAL LANGUAGES AND COMPUTING, 2017, 39 : 78 - 92
  • [23] KNOWLEDGE AND NATURAL-LANGUAGE PROCESSING
    BARNETT, J
    KNIGHT, K
    MANI, I
    RICH, E
    COMMUNICATIONS OF THE ACM, 1990, 33 (08) : 50 - 71
  • [24] Two Interpretive Systems for Natural Language?
    Frazier, Lyn
    JOURNAL OF PSYCHOLINGUISTIC RESEARCH, 2015, 44 (01) : 7 - 25
  • [25] Bayesian Analysis in Natural Language Processing
    Cohen S.
    Synthesis Lectures on Human Language Technologies, 2016, 9 (02): : 1 - 276
  • [26] Two Interpretive Systems for Natural Language?
    Lyn Frazier
    Journal of Psycholinguistic Research, 2015, 44 : 7 - 25
  • [27] Natural-language retrieval of images based on descriptive captions
    Guglielmo, EJ
    Rowe, NC
    ACM TRANSACTIONS ON INFORMATION SYSTEMS, 1996, 14 (03) : 237 - 267
  • [28] Does textual feedback hinder spoken interaction in natural language?
    Le Bigot, Ludovic
    Terrier, Patrice
    Jamet, Eric
    Botherel, Valerie
    Rouet, Jean-Francois
    ERGONOMICS, 2010, 53 (01) : 43 - 55
  • [29] Natural and Flexible Error Recovery for Generated Modular Language Environments
    de Jonge, Maartje
    Kats, Lennart C. L.
    Visser, Eelco
    Soderberg, Emma
    ACM TRANSACTIONS ON PROGRAMMING LANGUAGES AND SYSTEMS, 2012, 34 (04):
  • [30] NaLIX: A generic natural language search environment for XML data
    Li, Yunyao
    Yang, Huahai
    Jagadish, H. V.
    ACM TRANSACTIONS ON DATABASE SYSTEMS, 2007, 32 (04):