Video visualization via face and speaker clustering

被引：0

作者：

Mojiborrahman, Dehvari ^{[1
]}

Yang, Chuan-Kai ^{[1
]}

机构：

[1] Natl Taiwan Univ Sci & Technol, Dept Informat Management, 43, Sec 4,Keelung Rd, Taipei 106, Taiwan

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 17期

关键词：

Face tracking; Scene change detection; Face clustering; Speaker clustering; DIARIZATION; RECOGNITION;

D O I：

10.1007/s11042-023-14552-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

When we are watching a video, often we may find it difficult to differentiate a character as we are unfamiliar with his/her face, especially if there are numerous actors/actresses or they are from different countries/cultures. There are also other circumstances like for deaf people or when people cannot hear the voice in noisy places(e.g. streets), a diarization method along with subtitles can be a more effective way to understand scripts. To address this, we proposed a video visualization system via face and speaker clustering. Given an input video, our system first separates the voice from the video and then extracts facial and voice features for face clustering and speaker clustering. Finally, the system finds the correspondence between face and speaker clustering results, and as a result, people could easily know when a character appears and who is the speaker in a video via our proposed video visualization system.

引用

页码：25865 / 25881

页数：17

共 29 条

[1] Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model
Ahmad, Rehan
Zubair, Syed
Alquhayz, Hani
Ditta, Allah
[J]. SENSORS, 2019, 19 (23)
[2] [Anonymous], PySceneDetect
[3] [Anonymous], SPEAKER DIARIZATION
[4] [Anonymous], MTCNN FACE DETECTOR
[5] Onsets Coincidence for Cross-Modal Analysis
Barzelay, Zohar
Schechner, Yoav Y.
[J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (02) : 108 - 120
[6] Bredin H, 2016, ACM MULTIMEDIA 2016
[7] Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
Cabanas-Molero, P.
Lucena, M.
Fuertes, J. M.
Vera-Candeas, P.
Ruiz-Reyes, N.
[J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 27685 - 27707
[8] Chung JS, 2018, INTERSPEECH, P1086
[9] Mental Map-Preserving Visualization through a Genetic Algorithm
Dehvari, Mojiborrahman
Yang, Chuan-Kai
Armando, Enrico
[J]. APPLIED SCIENCES-BASEL, 2021, 11 (10):
[10] A survey on deep learning and its applications
Dong, Shi
Wang, Ping
Abbas, Khushnood
[J]. COMPUTER SCIENCE REVIEW, 2021, 40

← 1 2 3 →