Investigation of Small Group Social Interactions Using Deep Visual Activity-Based Nonverbal Features

被引：16

作者：

Beyan, Cigdem ^{[1
]}

Shahid, Muhammad ^{[1
,2
]}

Murino, Vittorio ^{[1
,3
]}

机构：

[1] Ist Italiano Tecnol, Pattern Anal & Comp Vis, Genoa, Italy

[2] Univ Genoa, Elect Elect & Telecommun Engn & Naval Architectur, Genoa, Italy

[3] Univ Verona, Dept Comp Sci, Verona, Italy

来源：

PROCEEDINGS OF THE 2018 ACM MULTIMEDIA CONFERENCE (MM'18) | 2018年

关键词：

social interactions; small groups; meetings; visual activity; nonverbal behavior; deep neural network; feature encoding; AUTOMATIC-ANALYSIS; RECOGNITION;

D O I：

10.1145/3240508.3240685

中图分类号：

TP301 [理论、方法];

学科分类号：

081202 ;

摘要：

Understanding small group face-to-face interactions is a prominent research problem for social psychology while the automatic realization of it recently became popular in social computing. This is mainly investigated in terms of nonverbal behaviors, as they are one of the main facet of communication. Among several multi-modal nonverbal cues, visual activity is an important one and its sufficiently good performance can be crucial for instance, when the audio sensors are missing. The existing visual activity-based nonverbal features, which are all hand-crafted, were able to perform well enough for some applications while did not perform well for some other problems. Given these observations, we claim that there is a need of more robust feature representations, which can be learned from data itself. To realize this, we propose a novel method, which is composed of optical flow computation, deep neural network-based feature learning, feature encoding and classification. Additionally, a comprehensive analysis between different feature encoding techniques is also presented. The proposed method is tested on three research topics, which can be perceived during small group interactions i.e. meetings: i) emergent leader detection, ii) emergent leadership style prediction, and iii) high/low extraversion classification. The proposed method shows (significantly) better results not only as compared to the state of the art visual activity based-nonverbal features but also when the state of the art visual activity based-nonverbal features are combined with other audio-based and video-based nonverbal features.

引用

页码：311 / 319

页数：9

共 64 条

[1] Ahmet A. K., 2017, EURASIP J IMAGE VIDE, V2017, P1
[2] Al-Hames M., 2006, Machine Learning for Multimodal Interaction. Second International Workshop, MLMI 2005. Revised Selected Papers (Lecture Notes in Computer Science Vol. 3869), P52
[3] [Anonymous], 2008, Proceedings of the 10th International Conference on Multimodal interfaces, ICMI'08, DOI DOI 10.1145/1452392.1452404
[4] [Anonymous], 2016, P ACM ICMI ASSP4MI
[5] [Anonymous], 2013, P 33 INT JOINT C ART
[6] [Anonymous], P ACM MULT
[7] One of a Kind: Inferring Personality Impressions in Meetings
Aran, Oya
Gatica-Perez, Daniel
[J]. ICMI'13: PROCEEDINGS OF THE 2013 ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2013, : 11 - 18
[8] Emotion Recognition in the Wild from Videos using Images
Bargal, Sarah Adel
Barsoum, Emad
Ferrer, Cristian Canton
Zhang, Cha
[J]. ICMI'16: PROCEEDINGS OF THE 18TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, 2016, : 433 - 436
[9] Bass B., 2006, Transformational Leadership, V2nd, DOI DOI 10.4324/9781410617095
[10] Moving as a Leader: Detecting Emergent Leadership in Small Groups using Body Pose
Beyan, Cigdem
Katsageorgiou, Vasiliki-Maria
Murino, Vittorio
[J]. PROCEEDINGS OF THE 2017 ACM MULTIMEDIA CONFERENCE (MM'17), 2017, : 1425 - 1433

← 1 2 3 4 5 6 7 →