Understanding Gesture and Speech Multimodal Interactions for Manipulation Tasks in Augmented Reality Using Unconstrained Elicitation

被引:21
作者
Williams A.S. [1 ]
Ortega F.R. [1 ]
机构
[1] Colorado State University, Fort Collins, CO
基金
美国国家科学基金会;
关键词
augmented reality; elicitation; gesture and speech interaction; interaction; multimodal;
D O I
10.1145/3427330
中图分类号
学科分类号
摘要
This research establishes a better understanding of the syntax choices in speech interactions and of how speech, gesture, and multimodal gesture and speech interactions are produced by users in unconstrained object manipulation environments using augmented reality. The work presents a multimodal elicitation study conducted with 24 participants. The canonical referents for translation, rotation, and scale were used along with some abstract referents (create, destroy, and select). In this study time windows for gesture and speech multimodal interactions are developed using the start and stop times of gestures and speech as well as the stoke times for gestures. While gestures commonly precede speech by 81 ms we find that the stroke of the gesture is commonly within 10 ms of the start of speech. Indicating that the information content of a gesture and its co-occurring speech are well aligned to each other. Lastly, the trends across the most common proposals for each modality are examined. Showing that the disagreement between proposals is often caused by a variation of hand posture or syntax. Allowing us to present aliasing recommendations to increase the percentage of users' natural interactions captured by future multimodal interactive systems. © 2020 ACM.
引用
收藏
相关论文
共 65 条
  • [1] Alharbi O., Sabbir Arif A., Stuerzlinger W., Dunlop M.D., Komninos A., Wisetype: A tablet keyboard with color-coded visualization and various editing options for error correction, Proceedings of the 45th Graphics Interface Conference on Proceedings of Graphics Interface 2019 (Kingston, Canada) (GI'19). Canadian Human-Computer Communications Society, Waterloo, CAN, (2019)
  • [2] Anastasiou D., Jian C., Zhekova D., Speech and gesture interaction in an ambient assisted living lab, Proceedings of the 1st Workshop on Speech and Multimodal Interaction in Assistive Environments (Jeju, Republic of Korea) (SMIAE '12). Association for Computational Linguistics, Stroudsburg, PA, USA, pp. 18-27, (2012)
  • [3] Sabbir Arif A., Stuerzlinger W., Predicting the cost of error correction in character-based text entry technologies, Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (Atlanta, Georgia, USA) (CHI '10). Association for Computing Machinery, New York, NY, USA, pp. 5-14, (2010)
  • [4] Zeeshan Baig M., Kavakli M., Qualitative analysis of a multimodal interface system using speech/gesture, 2018 13th IEEE Conference on Industrial Electronics and Applications (ICIEA). IEEE, IEEE, Wuhan, China, pp. 2811-2816, (2018)
  • [5] Richard A.B., &ldquo
  • [6] put-that-there&rdquo
  • [7] : Voice and gesture at the graphics interface, SIGGRAPH Comput. Graph., 14, 3, pp. 262-270, (1980)
  • [8] Bourguet M., Towards a taxonomy of error-handling strategies in recognition-based multi-modal human-computer interfaces, Signal Processing, 86, 12, pp. 3625-3643, (2006)
  • [9] Bourguet M., Ando A., Synchronization of speech and hand gestures during multimodal human-computer interaction, CHI 98 Conference Summary on Human Factors in Computing Systems (Los Angeles, California, USA) (CHI '98). Association for Computing Machinery, New York, NY, USA, pp. 241-242, (1998)
  • [10] Bowman D.A., Kruijff E., LaViola J.J., Poupyrev I., 3D User Interfaces: Theory and Practice, (2004)