Video visualization via face and speaker clustering

被引:0
作者
Mojiborrahman, Dehvari [1 ]
Yang, Chuan-Kai [1 ]
机构
[1] Natl Taiwan Univ Sci & Technol, Dept Informat Management, 43, Sec 4,Keelung Rd, Taipei 106, Taiwan
关键词
Face tracking; Scene change detection; Face clustering; Speaker clustering; DIARIZATION; RECOGNITION;
D O I
10.1007/s11042-023-14552-5
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
When we are watching a video, often we may find it difficult to differentiate a character as we are unfamiliar with his/her face, especially if there are numerous actors/actresses or they are from different countries/cultures. There are also other circumstances like for deaf people or when people cannot hear the voice in noisy places(e.g. streets), a diarization method along with subtitles can be a more effective way to understand scripts. To address this, we proposed a video visualization system via face and speaker clustering. Given an input video, our system first separates the voice from the video and then extracts facial and voice features for face clustering and speaker clustering. Finally, the system finds the correspondence between face and speaker clustering results, and as a result, people could easily know when a character appears and who is the speaker in a video via our proposed video visualization system.
引用
收藏
页码:25865 / 25881
页数:17
相关论文
共 29 条
  • [21] SHI JB, 1994, 1994 IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION, PROCEEDINGS, P593, DOI 10.1109/CVPR.1994.323794
  • [22] Pose robust face tracking by combining active appearance models and cylinder head models
    Sung, Jaewon
    Kanade, Takeo
    Kim, Daijin
    [J]. INTERNATIONAL JOURNAL OF COMPUTER VISION, 2008, 80 (02) : 260 - 274
  • [23] An overview of automatic speaker diarization systems
    Tranter, Sue E.
    Reynolds, Douglas A.
    [J]. IEEE TRANSACTIONS ON AUDIO SPEECH AND LANGUAGE PROCESSING, 2006, 14 (05): : 1557 - 1565
  • [24] Wojke N, 2017, IEEE IMAGE PROC, P3645, DOI 10.1109/ICIP.2017.8296962
  • [25] Xie WD, 2019, INT CONF ACOUST SPEE, P5791, DOI 10.1109/ICASSP.2019.8683120
  • [26] Research on MTCNN Face Recognition System in Low Computing Power Scenarios
    Xie, YingGang
    Wang, Hui
    Guo, ShaoHua
    [J]. JOURNAL OF INTERNET TECHNOLOGY, 2020, 21 (05): : 1463 - 1475
  • [27] Zhang AN, 2019, INT CONF ACOUST SPEE, P6301, DOI 10.1109/ICASSP.2019.8683892
  • [28] Zhang K., 2016, IEEE SIGNAL PROC LET, V23, p1499 1503
  • [29] GhostVLAD for Set-Based Face Recognition
    Zhong, Yujie
    Arandjelovic, Relja
    Zisserman, Andrew
    [J]. COMPUTER VISION - ACCV 2018, PT II, 2019, 11362 : 35 - 50