Changepoint detection-assisted nonparametric clustering for unsupervised temporal sign segmentation

被引:0
作者
Sim, Hohyun [1 ]
Cho, Hyeonjoong [1 ]
Lee, Hankyu [2 ]
机构
[1] Korea Univ, Dept Comp Convergence Software, 2511 Sejong Ro, Sejong 30019, South Korea
[2] Elect & Telecommun Res Inst, 218 Gajeong Ro, Daejeon 34129, South Korea
关键词
Temporal sign segmentation; Changepoint detection; Computer vision; Unsupervised learning; Temporal action segmentation; RECOGNITION;
D O I
10.1016/j.engappai.2023.107323
中图分类号
TP [自动化技术、计算机技术];
学科分类号
0812 ;
摘要
Temporal sign segmentation aims to temporarily divide continuous sign language into category-agnostic sign segments. One of the challenges to temporal sign segmentation is the paucity of frame-level annotations for sign language videos, which restricts the applicability of supervised and semi-supervised approaches. To address this challenge, we consider temporal sign segmentation as a clustering problem defined as the grouping of semantically similar frames of a given video, which allows us to choose unsupervised clustering techniques as promising alternatives. However, most unsupervised clustering techniques are parametric. In the context of temporal sign segmentation, they assume that the number of sign segments of a given continuous sign video is predefined, which is implausible. The primary contributions of this study are as follows: (1) We propose the first nonparametric clustering algorithm for unsupervised temporal sign segmentation. The main concept is to enhance the hierarchical graph-based clustering algorithm to be nonparametric by adopting cost and penalty functions of a changepoint detection algorithm to determine the optimal number of sign segments. Experimental results show that the performance of the proposed unsupervised method is comparable to that of the latest semi-supervised sign segmentation method in terms of several metrics. Moreover, the execution time of the proposed clustering method was less than 1 s, thereby ensuring its applicability. (2) We identify that the conventional metrics for temporal sign segmentation do not sufficiently address over-segmentation. To overcome the difficulty, we propose a new main metric to evaluate the performance of temporal sign segmentation, called the adjusted MF1B.
引用
收藏
页数:10
相关论文
共 69 条
[1]   A Perceptual Prediction Framework for Self Supervised Event Segmentation [J].
Aakur, Sathyanarayanan N. ;
Sarkar, Sudeep .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :1197-1206
[2]   MS-TCN: Multi-Stage Temporal Convolutional Network for Action Segmentation [J].
Abu Farha, Yazan ;
Gall, Juergen .
2019 IEEE/CVF CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR 2019), 2019, :3570-3579
[3]  
Adaloglou N, 2021, Arxiv, DOI [arXiv:2007.12530, DOI 10.1109/TMM.2021.3070438]
[4]  
Adams R.P., 2007, arXiv
[5]  
Albanie S., 2021, arXiv
[6]   BSL-1K: Scaling Up Co-articulated Sign Language Recognition Using Mouthing Cues [J].
Albanie, Samuel ;
Varol, Gul ;
Momeni, Liliane ;
Afouras, Triantafyllos ;
Chung, Joon Son ;
Fox, Neil ;
Zisserman, Andrew .
COMPUTER VISION - ECCV 2020, PT XI, 2020, 12356 :35-53
[7]  
[Anonymous], 2012, 5 WORKSH REPR PROC S
[8]  
Bai J., 2006, Econometric Theory and Practice: Frontiers of Analysis and Applied Research, DOI DOI 10.1017/CBO9781139164863.010
[9]  
Braffort A., 2012, JEP TALNRECITAL 2012, P1
[10]  
Bull Hannah, 2021, P IEEE CVF INT C COM, P11552