Video visualization via face and speaker clustering

被引:0
作者
Mojiborrahman, Dehvari [1 ]
Yang, Chuan-Kai [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Informat Management, 43, Sec 4,Keelung Rd, Taipei 106, Taiwan
关键词
Face tracking; Scene change detection; Face clustering; Speaker clustering; DIARIZATION; RECOGNITION;
D O I
10.1007/s11042-023-14552-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When we are watching a video, often we may find it difficult to differentiate a character as we are unfamiliar with his/her face, especially if there are numerous actors/actresses or they are from different countries/cultures. There are also other circumstances like for deaf people or when people cannot hear the voice in noisy places(e.g. streets), a diarization method along with subtitles can be a more effective way to understand scripts. To address this, we proposed a video visualization system via face and speaker clustering. Given an input video, our system first separates the voice from the video and then extracts facial and voice features for face clustering and speaker clustering. Finally, the system finds the correspondence between face and speaker clustering results, and as a result, people could easily know when a character appears and who is the speaker in a video via our proposed video visualization system.
引用
收藏
页码:25865 / 25881
页数:17
相关论文
共 29 条
  • [1] Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model
    Ahmad, Rehan
    Zubair, Syed
    Alquhayz, Hani
    Ditta, Allah
    [J]. SENSORS, 2019, 19 (23)
  • [2] [Anonymous], PySceneDetect
  • [3] [Anonymous], SPEAKER DIARIZATION
  • [4] [Anonymous], MTCNN FACE DETECTOR
  • [5] Onsets Coincidence for Cross-Modal Analysis
    Barzelay, Zohar
    Schechner, Yoav Y.
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2010, 12 (02) : 108 - 120
  • [6] Bredin H, 2016, ACM MULTIMEDIA 2016
  • [7] Multimodal speaker diarization for meetings using volume-evaluated SRP-PHAT and video analysis
    Cabanas-Molero, P.
    Lucena, M.
    Fuertes, J. M.
    Vera-Candeas, P.
    Ruiz-Reyes, N.
    [J]. MULTIMEDIA TOOLS AND APPLICATIONS, 2018, 77 (20) : 27685 - 27707
  • [8] Chung JS, 2018, INTERSPEECH, P1086
  • [9] Mental Map-Preserving Visualization through a Genetic Algorithm
    Dehvari, Mojiborrahman
    Yang, Chuan-Kai
    Armando, Enrico
    [J]. APPLIED SCIENCES-BASEL, 2021, 11 (10):
  • [10] A survey on deep learning and its applications
    Dong, Shi
    Wang, Ping
    Abbas, Khushnood
    [J]. COMPUTER SCIENCE REVIEW, 2021, 40