STFF-SM: Steganalysis Model Based on Spatial and Temporal Feature Fusion for Speech Streams

被引:4
|
作者
Tian, Hui [1 ,2 ]
Qiu, Yiqin [1 ,2 ]
Mazurczyk, Wojciech [3 ]
Li, Haizhou [4 ,5 ]
Qian, Zhenxing [6 ]
机构
[1] Natl Huaqiao Univ, Coll Comp Sci & Technol, Xiamen 361021, Peoples R China
[2] Xiamen Key Lab Data Secur & Blockchain Technol, Xiamen 361021, Peoples R China
[3] Warsaw Univ Technol, Fac Elect & Informat Technol, Inst Comp Sci, PL-00665 Warsaw, Poland
[4] Chinese Univ Hong Kong, Sch Data Sci, Shenzhen 518172, Peoples R China
[5] Natl Univ Singapore, Dept Elect & Comp Engn, Singapore 119077, Singapore
[6] Fudan Univ, Sch Comp Sci, Shanghai 200433, Peoples R China
基金
中国国家自然科学基金;
关键词
Delays; Feature extraction; Steganography; Quantization (signal); Distortion; Speech coding; Resistance; Steganalysis; steganography; voice over Internet protocol; speech streams; deep neural networks; pitch delays; STEGANOGRAPHY; SCHEME; VOICE;
D O I
10.1109/TASLP.2022.3224295
中图分类号
O42 [声学];
学科分类号
070206 ; 082403 ;
摘要
The real-time detection of speech steganography in Voice-over-Internet-Protocol (VoIP) scenarios remains an open problem, as it requires steganalysis methods to perform for low-intensity embeddings and short-sample inputs, as well as provide rapid detection results. To address these challenges, this paper presents a novel steganalysis model based on spatial and temporal feature fusion (STFF-SM). Differing from the existing methods, we take both the integer and fractional pitch delays as input, and design subframe-stitch module to organically integrate subframe-wise integer delays and frame-wise fractional pitch delays. Further, we design a spatial fusion module based on pre-activation residual convolution to extract the pitch spatial features and gradually increase their dimensions to discover finer steganographic distortions to enhance the detection effect, where a Group-Squeeze-Weighting block is introduced to alleviate the information loss in the process of increasing the feature dimension. In addition, we design a temporal fusion module to extract pitch temporal features using the stacked LSTM, where a Gated Feed-Forward Network is introduced to learn the interaction between different feature maps while suppressing the features that are not useful for detection. We evaluated the performance of STFF-SM through comprehensive experiments and comparisons with the state-of-the-art solutions. The experimental results demonstrate that STFF-SM can well meet the needs of real-time detection of speech steganography in VoIP streams, and outperforms the existing methods in detection performance, especially with low embedding strengths and short window sizes.
引用
收藏
页码:277 / 289
页数:13
相关论文
共 50 条
  • [41] Deep Spatial-Temporal Feature Fusion From Adaptive Dynamic Functional Connectivity for MCI Identification
    Li, Yang
    Liu, Jingyu
    Tang, Zhenyu
    Lei, Baiying
    IEEE TRANSACTIONS ON MEDICAL IMAGING, 2020, 39 (09) : 2818 - 2830
  • [42] Real-Time Infrared Small Target Detection With Nonlocal Spatial-Temporal Feature Fusion
    Xu, Hai
    Zhong, Sheng
    Zhang, Tianxu
    Zou, Xu
    IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING, 2024, 17 : 7888 - 7902
  • [43] Link Prediction Model for Opportunistic Networks Based on Feature Fusion
    Shu, Jian
    Shi, Jiawei
    Liao, Liang
    IEEE ACCESS, 2022, 10 : 80900 - 80909
  • [44] Unsupervised Video Summarization Based on the Diffusion Model of Feature Fusion
    Yu, Qinghao
    Yu, Hui
    Sun, Ying
    Ding, Derui
    Jian, Muwei
    IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, 2024, 11 (05): : 6010 - 6021
  • [45] A clustering based feature selection method in spectro-temporal domain for speech recognition
    Esfandian, Nafiseh
    Razzazi, Farbod
    Behrad, Alireza
    ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE, 2012, 25 (06) : 1194 - 1202
  • [46] A CNN-Based Spatial Feature Fusion Algorithm for Hyperspectral Imagery Classification
    Guo, Alan J. X.
    Zhu, Fei
    IEEE TRANSACTIONS ON GEOSCIENCE AND REMOTE SENSING, 2019, 57 (09): : 7170 - 7181
  • [47] STMF-IE: A Spatial-Temporal Multi-Feature Fusion and Intention-Enlightened Decoding Model for Vehicle Trajectory Prediction
    Gao, Kai
    Li, Xunhao
    Hu, Lin
    Liu, Xinyu
    Zhang, Jinlai
    Du, Ronghua
    Li, Yongfu
    IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY, 2025, 74 (03) : 4004 - 4018
  • [48] HNN: A Novel Model to Study the Intrusion Detection Based on Multi-Feature Correlation and Temporal-Spatial Analysis
    Lei, Shengwei
    Xia, Chunhe
    Li, Zhong
    Li, Xiaojian
    Wang, Tianbo
    IEEE TRANSACTIONS ON NETWORK SCIENCE AND ENGINEERING, 2021, 8 (04): : 3257 - 3274
  • [49] Auditory-model based robust feature selection for speech recognition
    Koniaris, Christos
    Kuropatwinski, Marcin
    Kleijn, W. Bastiaan
    JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA, 2010, 127 (02): : EL73 - EL79
  • [50] Comprehensive material painting feature recognition based on spatial model
    Zhao, Jing
    Liu, Aiqin
    SYSTEMS AND SOFT COMPUTING, 2025, 7