Automated Scoring of Asynchronous Interview Videos Based on Multi-Modal Window-Consistency Fusion

被引：1

作者：

Lv, Jianming ^{[1
]}

Chen, Chujie ^{[1
]}

Liang, Zequan ^{[1
]}

机构：

[1] South China Univ Technol, Sch Comp Sci & Engn, Guangzhou 510006, Peoples R China

来源：

IEEE TRANSACTIONS ON AFFECTIVE COMPUTING | 2024年 / 15卷 / 03期

关键词：

Interviews; Videos; Artificial intelligence; Task analysis; Computational modeling; Recruitment; Predictive models; Keyword attention; multi-modal interaction; natural language processing; social computing; time window; user modeling; EMOTION RECOGNITION; PERSONALITY; SPEECH; CUES;

D O I：

10.1109/TAFFC.2023.3294335

中图分类号：

TP18 [人工智能理论];

学科分类号：

081104 ; 0812 ; 0835 ; 1405 ;

摘要：

Soft skills, such as personality characteristics, communication skills and leadership, affect personal career performance greatly. Therefore, predicting the soft skills of interviewees can provide interviewers with a strong reference for the decision of hiring. Nowadays, as asynchronous video interviews have gradually become a popular form of interviews, automatic interview evaluation of soft skills has attracted widespread attention from researchers. However, existing automatic evaluation methods have two significant drawbacks. First, most of them model the problem as multi-modal fusion of long-term sequences, while ignoring the consistency of multi-modal expression in short-time windows, which is a key attribute of the interview scene. Second, without embedding of professional knowledge in the interview field, the interpretability of the model is relatively weak. To address the above problems, we propose a novel Multi-modal Window-Consistency Fusion network, namely MWCF, to capture the expression consistency of different modalities in a short-time window and re-weight the language signals to enhance important portions in verbal clues. Meanwhile, in order to enhance the interpretability of the evaluation model, we introduce the professional knowledge of interviewers by proposing a topic generation module based on question attention, and embedding the most representative keywords under different soft skills into the model. Furthermore, a real-world interview dataset is built by developing an asynchronous interview platform, and extensive experiments are conducted to show the superior performance of our proposed model.

引用

页码：799 / 814

页数：16

共 78 条

[1]

[Anonymous], 2004, P 6 INT C MULT INT I, DOI [DOI 10.1145/1027933.1027968, 10.1145/1027933]

[2]

[Anonymous], 2010, 2010 IEEE COMPUTER S

[3] Multimodal fusion for multimedia analysis: a survey [J].

Atrey, Pradeep K. ;

Hossain, M. Anwar ;

El Saddik, Abdulmotaleb ;

Kankanhalli, Mohan S. .

MULTIMEDIA SYSTEMS, 2010, 16 (06) :345-379

[4]

Aydin B, 2016, INT C PATT RECOG, P37, DOI 10.1109/ICPR.2016.7899604

[5]

Bahdanau K., 2015, P INT C LEARN REPR, P1

[6] The ethical use of artificial intelligence in human resource management: a decision-making framework [J].

Bankins, Sarah .

ETHICS AND INFORMATION TECHNOLOGY, 2021, 23 (04) :841-854

[7]

Booth BM, 2021, PROCEEDINGS OF THE 2021 INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION, ICMI 2021, P268, DOI 10.1145/3462244.3479897

[8] Random forests [J].

Breiman, L .

MACHINE LEARNING, 2001, 45 (01) :5-32

[9] Asynchronous Video Interviewing as a New Technology in Personnel Selection: The Applicant's Point of View [J].

Brenner, Falko S. ;

Ortner, Tuulia M. ;

Fay, Doris .

FRONTIERS IN PSYCHOLOGY, 2016, 7

[10]

Chao J. Tao, 2015, P 5 INT WORKSH AUD V, P65, DOI DOI 10.1145/2808196.2811634

← 1 2 3 4 5 6 7 8 →