OmniActions: Predicting Digital Actions in Response to Real-World Multimodal Sensory Inputs with LLMs

被引:2
作者
Li, Jiahao Nick [1 ,2 ]
Xu, Yan [3 ]
Grossman, Tovi [4 ]
Santosa, Stephanie [1 ]
Li, Michelle [1 ]
机构
[1] Meta, Real Labs Res, Toronto, ON, Canada
[2] UCLA, Toronto, ON, Canada
[3] Meta, Real Labs Res, Redmond, WA USA
[4] Univ Toronto, Toronto, ON, Canada
来源
PROCEEDINGS OF THE 2024 CHI CONFERENCE ON HUMAN FACTORS IN COMPUTING SYTEMS, CHI 2024 | 2024年
关键词
digital follow-up actions; predictive interface; large language models; dataset; pervasive augmented reality; diary study; INFORMATION; BEHAVIOR;
D O I
10.1145/3613904.3642068
中图分类号
TP18 [人工智能理论];
学科分类号
081104 ; 0812 ; 0835 ; 1405 ;
摘要
The progression to "Pervasive Augmented Reality" envisions easy access to multimodal information continuously. However, in many everyday scenarios, users are occupied physically, cognitively or socially. This may increase the friction to act upon the multimodal information that users encounter in the world. To reduce such friction, future interactive interfaces should intelligently provide quick access to digital actions based on users' context. To explore the range of possible digital actions, we conducted a diary study that required participants to capture and share the media that they intended to perform actions on (e.g., images or audio), along with their desired actions and other contextual information. Using this data, we generated a holistic design space of digital follow-up actions that could be performed in response to diferent types of multimodal sensory inputs. We then designed OmniActions, a pipeline powered by large language models (LLMs) that processes multi-modal sensory inputs and predicts follow-up actions on the target information grounded in the derived design space. Using the empirical data collected in the diary study, we performed quantitative evaluations on three variations of LLM techniques (intent classifcation, in-context learning and fnetuning) and identifed the most efective technique for our task. Additionally, as an instantiation of the pipeline, we developed an interactive prototype and reported preliminary user feedback about how people perceive and react to the action predictions and its errors.
引用
收藏
页数:22
相关论文
共 66 条
  • [1] Ahn M., 2022, ARXIV
  • [2] An augmented reality interface to contextual information
    Ajanki, Antti
    Billinghurst, Mark
    Gamper, Hannes
    Jarvenpaa, Toni
    Kandemir, Melih
    Kaski, Samuel
    Koskela, Markus
    Kurimo, Mikko
    Laaksonen, Jorma
    Puolamaki, Kai
    Ruokolainen, Teemu
    Tossavainen, Timo
    [J]. VIRTUAL REALITY, 2011, 15 (2-3) : 161 - 173
  • [3] Amin A, 2009, LECT NOTES COMPUT SC, V5726, P736, DOI 10.1007/978-3-642-03655-2_80
  • [4] Ashbrook Daniel L, 2010, Enabling mobile microinteractions
  • [5] Brandt Joel, 2007, CHI 07 HUM FACT COMP, P2303, DOI [DOI 10.1145/1240866.1240998, 10.1145/1240866.1240998]
  • [6] Burke R., 2007, ADAPTIVE WEB METHODS, P377
  • [7] Chen G, 2000, J COGNITIVE NEUROSCI, P20
  • [8] Chen L, 2010, IEEE INFOCOM SER
  • [9] Chen X., 2023, ARXIV
  • [10] Cherubini M., 2011, P 13 INT C HUM COMP, P167, DOI DOI 10.1145/2037373.2037400