Multimodal Video Summarization based on Fuzzy Similarity Features

被引:5
作者
Psallidas, Theodoros [1 ,3 ]
Vasilakakis, Michael D. [2 ]
Spyrou, Evaggelos [1 ,3 ]
Iakovidis, Dimitris K. [2 ]
机构
[1] Univ Thessaly, Dept Comp Sci & Telecommun, Lamia, Greece
[2] Univ Thessaly, Dept Comp Sci & Biomed Informat, Lamia, Greece
[3] Natl Ctr Sci Res Demokritos, Inst Informat & Telecommun, Athens, Greece
来源
2022 IEEE 14TH IMAGE, VIDEO, AND MULTIDIMENSIONAL SIGNAL PROCESSING WORKSHOP (IVMSP) | 2022年
关键词
D O I
10.1109/IVMSP54334.2022.9816266
中图分类号
TP31 [计算机软件];
学科分类号
081202 ; 0835 ;
摘要
The continuously growing number of user-generated videos has increased the need for efficient browsing through content collections and repositories, which in turn requires descriptive, yet compact representations. To this goal, a popular approach is to create a visual summary, which is by far more expressive compared to other approaches, e.g., textual descriptions. In this work, we present a video summarization approach that is based on the extraction and fusion of audio and visual features, in order to produce dynamic video summaries, i.e., comprising of the most important video segments of the original video, while preserving their temporal order. Based on the extracted features, each segment is classified as "interesting," or "uninteresting," thus included in the final summary, or not. The novelty of our approach is that prior to classification, the fused features are fuzzified, thus becoming more intuitive and robust to uncertainty. We evaluate our approach using a large dataset of user-generated videos and demonstrate that fuzzy features are able to boost classification performance, providing for more concrete video summaries.
引用
收藏
页数:5
相关论文
共 29 条
  • [1] A Review of Physical and Perceptual Feature Extraction Techniques for Speech, Music and Environmental Sounds
    Alias, Francesc
    Socoro, Joan Claudi
    Sevillano, Xavier
    [J]. APPLIED SCIENCES-BASEL, 2016, 6 (05):
  • [2] [Anonymous], 2001, P 2001 IEEE COMPUTER, DOI [DOI 10.1109/CVPR.2001.990517, 10.1109/CVPR.2001.990517]
  • [3] Bachem O, 2016, AAAI CONF ARTIF INTE, P1459
  • [4] Chen B.-C., 2017, BMVC
  • [5] XGBoost: A Scalable Tree Boosting System
    Chen, Tianqi
    Guestrin, Carlos
    [J]. KDD'16: PROCEEDINGS OF THE 22ND ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVERY AND DATA MINING, 2016, : 785 - 794
  • [6] Multimodal Saliency and Fusion for Movie Summarization Based on Aural, Visual, and Textual Attention
    Evangelopoulos, Georgios
    Zlatintsi, Athanasia
    Potamianos, Alexandros
    Maragos, Petros
    Rapantzikos, Konstantinos
    Skoumas, Georgios
    Avrithis, Yannis
    [J]. IEEE TRANSACTIONS ON MULTIMEDIA, 2013, 15 (07) : 1553 - 1568
  • [7] An introduction to ROC analysis
    Fawcett, Tom
    [J]. PATTERN RECOGNITION LETTERS, 2006, 27 (08) : 861 - 874
  • [8] Furini M, 2006, CONSUM COMM NETWORK, P1209
  • [9] pyAudioAnalysis: An Open-Source Python']Python Library for Audio Signal Analysis
    Giannakopoulos, Theodoros
    [J]. PLOS ONE, 2015, 10 (12):
  • [10] Efficient Hierarchical Graph-Based Video Segmentation
    Grundmann, Matthias
    Kwatra, Vivek
    Han, Mei
    Essa, Irfan
    [J]. 2010 IEEE CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION (CVPR), 2010, : 2141 - 2148