Video visualization via face and speaker clustering

被引：0

作者：

Mojiborrahman, Dehvari ^{[1
]}

Yang, Chuan-Kai ^{[1
]}

机构：

[1] Natl Taiwan Univ Sci & Technol, Dept Informat Management, 43, Sec 4,Keelung Rd, Taipei 106, Taiwan

来源：

MULTIMEDIA TOOLS AND APPLICATIONS | 2023年 / 82卷 / 17期

关键词：

Face tracking; Scene change detection; Face clustering; Speaker clustering; DIARIZATION; RECOGNITION;

D O I：

10.1007/s11042-023-14552-5

中图分类号：

TP [自动化技术、计算机技术];

学科分类号：

0812 ;

摘要：

When we are watching a video, often we may find it difficult to differentiate a character as we are unfamiliar with his/her face, especially if there are numerous actors/actresses or they are from different countries/cultures. There are also other circumstances like for deaf people or when people cannot hear the voice in noisy places(e.g. streets), a diarization method along with subtitles can be a more effective way to understand scripts. To address this, we proposed a video visualization system via face and speaker clustering. Given an input video, our system first separates the voice from the video and then extracts facial and voice features for face clustering and speaker clustering. Finally, the system finds the correspondence between face and speaker clustering results, and as a result, people could easily know when a character appears and who is the speaker in a video via our proposed video visualization system.

引用

页码：25865 / 25881

页数：17

共 29 条

[21] SHI JB, 1994, 1994 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, P593, DOI 10.1109/CVPR.1994.323794
[22] Pose robust face tracking by combining active appearance models and cylinder head models
Sung, Jaewon
Kanade, Takeo
Kim, Daijin
[J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 80 (02) : 260 - 274
[23] An overview of automatic speaker diarization systems
Tranter, Sue E.
Reynolds, Douglas A.
[J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1557 - 1565
[24] Wojke N, 2017, IEEE IMAGE PROC, P3645, DOI 10.1109/ICIP.2017.8296962
[25] Xie WD, 2019, INT CONF ACOUST SPEE, P5791, DOI 10.1109/ICASSP.2019.8683120
[26] Research on MTCNN Face Recognition System in Low Computing Power Scenarios
Xie, YingGang
Wang, Hui
Guo, ShaoHua
[J]. JOURNAL OF INTERNET TECHNOLOGY, 2020, 21 (05): : 1463 - 1475
[27] Zhang AN, 2019, INT CONF ACOUST SPEE, P6301, DOI 10.1109/ICASSP.2019.8683892
[28] Zhang K., 2016, IEEE SIGNAL PROC LET, V23, p1499 1503
[29] GhostVLAD for Set-Based Face Recognition
Zhong, Yujie
Arandjelovic, Relja
Zisserman, Andrew
[J]. COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 35 - 50

← 1 2 3 →